Level 3
11 min readLesson 21 of 43

Stress Testing: Monte Carlo and Beyond

Will your strategy survive a black swan?

Stress Testing: Monte Carlo and Beyond

Your strategy survives your backtest. The metrics look good across thousands of trades and multiple walk-forward windows. But will it survive a black swan event?

Backtests show you one possible path through history. Reality could have unfolded differently. Monte Carlo simulation and other stress testing techniques show you the range of possible outcomes, including the bad ones that didn't happen in your specific historical data.

What Is Monte Carlo Simulation?

Monte Carlo simulation is a technique for understanding uncertainty by running many randomized simulations.

In trading, the most common application is randomizing trade order. Your backtest produced a sequence of trades that happened in a specific order. But that order was partly arbitrary. What if the losing trades had clustered differently? What if you'd hit your worst drawdown earlier when your account was smaller?

Monte Carlo simulation shuffles your trade sequence thousands of times and calculates metrics for each shuffle. Instead of one equity curve, you get thousands of possible equity curves, all using your actual trades but in different orders.

This distribution of outcomes shows you what could have happened, not just what did happen.

Randomizing Trade Order

The simplest Monte Carlo approach takes your list of trades and shuffles them randomly. Run the shuffled sequence through your equity curve logic. Record the maximum drawdown, ending equity, and other metrics. Repeat 1,000+ times.

Now you have a distribution of maximum drawdowns rather than a single number. Maybe your backtest showed 18% maximum drawdown, but 5% of Monte Carlo runs showed drawdowns exceeding 35%. That tells you a 35% drawdown was entirely possible given your actual edge.

This is crucial for position sizing. If you sized positions for an 18% maximum drawdown and actually experienced 35%, you might blow up. Monte Carlo shows you the tail risks.

The assumption here is that trades are independent, that the outcome of one trade doesn't affect the next. For many strategies this is approximately true. For strategies with serial correlation in trades, you need more sophisticated simulation.

Drawdown Probability Distributions

Monte Carlo results let you answer questions like: what's the probability of a 30% drawdown? How about 40%? 50%?

Plot the distribution of maximum drawdowns from your simulations. The 95th percentile drawdown tells you the worst case you should reasonably plan for. If 95% of simulations stayed above -25% and only 5% exceeded that, you can set -25% as your risk planning threshold.

This is far more useful than a single backtest drawdown number. Real trading will face different trade orderings than your backtest showed. Planning for the worst 5% of orderings is prudent risk management.

You can also examine drawdown duration distributions. How long were you underwater in each simulation? Some strategies have shallow drawdowns that last forever, others have deep drawdowns that recover quickly. Understanding your strategy's drawdown profile helps you psychologically prepare.

Regime Stress Testing

Monte Carlo tests what could have happened given your actual trades. Regime stress testing asks a different question: what happens if market conditions change?

Your strategy might have been backtested primarily in bull markets or primarily in ranging markets. How does it perform in conditions outside your backtest period?

One approach is to identify historical periods with extreme characteristics, severe crashes, melt-ups, volatility spikes, and see how your strategy performed during those specific times. If your backtest period didn't include a March 2020 style crash, manually test against that period.

Another approach is to synthetically create extreme conditions. Take your backtest trades and ask: what if every losing trade had been 50% worse? What if winning trades had been 30% smaller? This bounds your performance under adverse scenarios.

Our regime testing focuses on BTC regime filtering. Strategies are evaluated separately in bull regimes, bear regimes, and sideways regimes. A strategy that only works in bull markets isn't invalid, but you need to know its limitations.

When to Kill a Strategy

Stress testing helps answer the hardest question in quantitative trading: when do you abandon a strategy?

Every strategy has losing periods. The question is whether those losses fall within expected parameters or signal that the edge has disappeared.

If your Monte Carlo simulation showed 5% probability of a 30% drawdown, and you experience a 30% drawdown, that's within expectations. You keep trading. If you experience a 50% drawdown that only 0.1% of simulations predicted, something has likely changed. The strategy may have stopped working.

We set explicit kill criteria before deploying any strategy. Drawdown beyond the 99th percentile of Monte Carlo simulations triggers review. Performance falling below the 10th percentile of expected outcomes for extended periods triggers review. Consecutive losing trades beyond 3 standard deviations from expected triggers review.

Having pre-defined criteria removes emotion from the decision. When you're in a drawdown, you can compare current performance to expected parameters rather than panicking or hoping.

Practical Stress Testing Implementation

Here's a simplified Monte Carlo process:

Export your backtest trades as a list with profit/loss for each trade. Write a script that shuffles this list randomly, then simulates equity growth starting from $10,000 (or whatever starting capital). Calculate maximum drawdown and ending equity. Repeat 5,000 times. Plot the distribution of outcomes.

More sophisticated approaches include bootstrapping (resampling with replacement), block bootstrapping (preserving some trade clustering), and parametric simulation (modeling trade returns as a distribution and sampling from it).

For regime testing, segment your trades by market condition and analyze performance separately. Then stress test by reducing winner sizes or increasing loser sizes to see sensitivity.

Key Takeaways

Monte Carlo simulation randomizes trade order to show the range of possible outcomes, not just what happened in your backtest. Drawdown probability distributions reveal tail risks that single backtest numbers hide. Plan for the 95th percentile drawdown, not the backtest drawdown. Regime stress testing examines performance under conditions outside your primary backtest period. Set explicit kill criteria before deployment so you can evaluate drawdowns objectively.

This completes the Validation level. You now understand why backtests lie, how lookahead bias and overfitting create fake edges, how walk-forward validation reveals truth, why sample size matters, which metrics to trust, and how to stress test for tail risks. Next, we move to Discovery, where you'll learn how to systematically find edges that pass all these validation filters.