KD Kieran Duff ← All letters
Frameworks · Letter 002 · 22 May 2026

Don't trust your backtest: the stress test most traders skip.

Walk-forward, Monte Carlo and parameter sensitivity are necessary. None of them test what your strategy actually does when the market breaks. Historical-break testing is the layer most managers skip.

The short version
Don't trust your backtest

A lot of systematic books are tested against walk-forward, Monte Carlo bootstrap and parameter sensitivity. All three are necessary. None of them test what your strategy actually does when the market breaks.

Historical-break testing is the layer most retail systematic managers skip, and it's the layer that decides whether your live book survives the next regime crack.

Walk-forward tests robustness across different time windows. Monte Carlo tests robustness across alternative trade orderings. Parameter sensitivity tests robustness across the knobs you tuned. None of those three asks the question that matters most: when the market does the thing it has done before and will do again, where does the strategy sit?

The four breaks every FX, indices and metals book should be replayed against

Swiss franc unpeg, 15 January 2015. The SNB removed the EURCHF floor and EURCHF moved roughly 30% in minutes. This is the canonical test for FX strategies that hold positions through low-vol regimes. What position was the strategy holding into the event. What was the realised slippage at execution. Did the stop trigger and fill, or did it gap through and fill 200 pips deeper. Did portfolio risk-engine cut size on adjacent CHF crosses fast enough to contain the cascade. If your strategy has no answer to any of those four, it hasn't been stress-tested. It's been backtested.

If your strategy has no answer to any of those four, it hasn't been stress-tested. It's been backtested.

PBOC CNY devaluation, 11 August 2015. The PBOC changed the daily fix and the yuan moved roughly 2% over two days, dragging AUD, NZD, copper, and risk indices with it. Even if you don't trade CNY, what did your AUD/JPY, copper, or DAX position do during the move. Strategies that look uncorrelated in calm regimes will reveal a USD-bloc or risk-on shared factor exposure under contagion. If you don't have a replay against this period, you don't know your cross-asset behaviour.

COVID volatility spike, 11-23 March 2020. VIX hit 82.69 on 16 March 2020, SPX dropped roughly 12% in a single session that day, and FX vol surfaces lifted across the dollar bloc. Did vol-targeted sizing cut position size fast enough. What happened to mean-reversion strategies that assume the prior regime persists. Did your CFD broker widen spreads to a point where your strategy economics broke down.

US election overnight, 8-9 November 2016. Spot moves of 3% in major FX pairs in 60 minutes, indices futures circuit-breakered, gold spiked then collapsed. What did your strategy do between the close on Tuesday and the open on Wednesday, with no ability to manage positions in the intervening hours.

How the replay works

For each event, you reconstruct what your live strategy would have done across the event window using the same parameters, the same risk rules, the same execution assumptions you're using today. Not optimised for the event, not curve-fit to make the result look good. The strategy as it stands, run cold through the event tape.

You're looking for four outputs per event. Maximum intraday drawdown during the event. Total event-window P&L. Maximum slippage taken on any single trade. Risk-engine response time, measured as the lag between the vol event starting and your sizing cutting in response.

The strategies that have a chance at five-year survival are the ones that come through the replay with a maximum event drawdown inside the strategy's 99th-percentile backtested DD, and a risk response time shorter than the duration of the vol expansion itself.

For the XAQP book, every strategy added to the portfolio is replayed against these four events before live deployment, in addition to walk-forward and Monte Carlo. The strategies that fail the replay don't go live. The replay is the cheapest test in the suite and it surfaces the most failure modes.

Kieran Duff runs XAQP, a systematic strategy live since April 2025 with $3.7M+ in capital through Darwinex. He writes about how a systematic book is actually managed.

Capital at Risk. Past performance is not indicative of future results. Nothing in this letter constitutes investment advice, a solicitation, or an offer to buy or sell any financial instrument.

Performance figures are before fees (gross), denominated in USD, and reflect the live track record of XAQP since inception on 28 April 2025, as managed under Darwinex (Tradeslide Technologies Ltd). Returns are gross of costs; actual investor returns will be lower after fees.

Free · Unsubscribe anytime

Get the next letter in your inbox.