27/11/2016
The Five-Step Process of System Development - Richard Weissman
One of the most popular topics I speak on is “mechanical trading systems”, and with good reason, such systems are one of the most powerful tools in dampening trader emotionalism, which is especially useful when traders are enduring drawdowns in account equity. That stated the path to implementation of mechanical trading systems is fraught with numerous potential obstacles including:
- Cherry Picking - Only taking some of the mechanical trading system’s signals
- Not taking entry signals - Due to lack of confidence in the model’s efficacy
- Not managing positions according to the model’s rules
- Deviating from the model’s exit criteria
- Overleveraging - Abandonment of prudent rules of risk management
- Abandonment of the model during drawdown’s
In order to help with these obstacles I have developed what I call the five step process of system development.
The steps are as follows:
1.) In-Sample Back Testing
2.) Out-Of-Sample Back Testing
3.) Paper Trading
4.) Underleveraged Trading
5.) Full-Production
Let’s examine each step in this process in detail to ensure that weunderstand their significance:
1. In-Sample Back Testing
In-Sample Back Testing is the first step in the system development process, without it, how do we know our method enjoys positive expectancy? The keys to successful in-sample backtesting are ensuring that our testing is being done on a large enough data population that its results are in fact statistically significant so that we have confidence that our model will continue to behave well in an unknown future as long as the future is somewhat similar to the known past. This idea of the, “…unknown future being somewhat similar to the known past,” is key to all successful models and offers insight regarding model development.
We want to know why our model should continue to perform well in an unknown future and therefore need to develop models based on basic concepts of human behavior such as assets becoming undervalued during panics/overvalued during bubbles, seasonal tendencies (e.g. heat waves, cold waves, droughts, freezes, etc.,), tendencies for assets to experience fat-tails (aka “trends”) as well as the cyclical nature of volatility (e.g. the tendency for volatility to cycle from periods of low volatility to high volatility and vice versa).
In addition, we want to ensure that our data includes all kinds of market environments: bullish, bearish, trending and choppy and that we test on a wide variety of low correlated assets over a statistically large data sampling.
How large is large enough? A lot of this will depend on our trading timeframe, in other words if we are testing a relatively long-term moving average crossover system (e.g. 9 and 26-day simple moving average crossover system), we probably need to test on forty different assets for thirty years in order to get a statistically significant sample size.
By contrast, if we are testing an intraday model in which trades are triggered on five-minute bars; a two-year backtest might still be statistically significant. As a rule of thumb, I would be highly suspicious of back-tests on less than one thousand data points.
Lots of questions are typically asked about back-testing, optimization and curve-fitting. As a general rule of thumb, I explain the difference between optimization and curve-fitting as follows, “optimization good, curve-fitting bad.” What’s the difference?
Optimization is the process of refining an arbitrarily derived trading system by adjusting that system’s parameters (e.g. number of days stops as a percentage of an asset’s value, etc.,) and/or parameter sets (e.g. 2-moving average crossover with 7 and 29-period parameters).
By contrast, curve-fitting is overfitting the parameters and/or parameter sets to a specific data history so that the model works well when applied to the specific historical data in question and not at all when applied to an unknown future data set. The other way I explain it is as follows, “Too much optimization results in curve-fitting.” How do we prevent over-optimization? The answer to this question leads to our second step in the five-step process.
2. Out-Of-Sample Testing
Out-Of-Sample testing requires that at inception of our back testing process we artificially divide the data into two subsets, the larger, in-sample portion discussed in step one and a second smaller and more recent historical data set. The basic idea of out-of-sample testing is that we have been manipulating the in-sample data in order to develop something that will work if the unknown future behaves like the known past. But what if it doesn’t? By withholding the most recent data from the in-sample backtest we can determine if the model fails due to curve-fitting or insufficient in sample testing without needlessly sacrificing real (and finite) capital resources.
The only question remaining is how large should the out-of-sample test be? Our answer will depend on the size of the in-sample test. Typical starting points are two years out-of-sample and eighteen years in-sample, three months out-of-sample and nine months in-sample, etc.,. In general, the ratios of in-sample to out-of-sample that I’ve seen commonly used by developers range from as low as five percent and as high as twenty-five percent.
Obviously we are looking for a strong positive correlation in performance between in-sample and out-of-sample back-tests. Once we have achieved such robust out-of-sample results we are ready for step three, Paper Trading.
3. Paper Trading
Paper trading has a bad reputation in the speculative trading world because many traders misuse it as a crutch to avoid putting capital at risk in the markets. Another argument against paper trading is that it is counterproductive because it eliminates the biggest problem in real trading, namely emotional reactions arising during the decision-making processes of trade implementation. Needless to add, these arguments are absolutely right (assuming this is why paper trading is being done).
These disclaimers aside, there is a valid argument for paper trading, namely that it is better to learn real-time implementation of the model on our broker’s trading platform in a test environment as opposed to when real money is on the line. Once we have practiced order entry and trade management on the broker’s platform via paper trading we are ready to dedicate capital to our trading model.
4. Underleveraged Trading
At this stage most system developers transition to “full production” or the dedication of maximum capital exposures for the model… often with mixed results (some good, many bad). Instead, my recommendation for the fourth step of system development is “underleveraged trading”. If we use the simple rule of risk management of dedicating one percent of assets under management to any single trading idea, then I would argue that our model’s first foray into the real world of trading should be with less than what we will dedicate to the model once it experienced a statistically significant real-time trading history.
For this stage of underleveraged trading I typically risk around one-tenth of what will be risked once I transition to the final stage of “full production”. So, if using the one percent rule of risk management, at this stage I will only risk one-tenth of one percent of assets under management. Once traders hear this they argue, “Why bother? With this amount of capital at risk you might as well just keep paper trading.” Having done both, I can assure readers that there is a huge difference and that even though the monetary losses are tiny, underleveraged trading is still an emotional endeavor when contrasted with paper trading where emotional reactions to profits and loss are rare.
5. Full Production
The final step in model development is full production. At this stage the model has been fully integrated into our real-time trading with maximum risk exposures. Throughout this final stage we need to maintain emotional equanimity to avoid the problems outlined at the article’s inception (e.g. cherry-picking, not taking entry signals, failures in position management, overleveraging and abandonment of the model) while simultaneously remaining vigilant in our analysis of model results to ensure that performance does not significantly degrade due to a “paradigm shift” or long-term shift in the dynamics of the assets we are trading (e.g. Paradigm Shift in Brent Crude Oil since 2004 triggered by increase in emerging market demand).
© 2014 Richard L. Weissman.