The Gap Between Backtest and Live
You've found an edge. Walk-forward validation passed. Robustness checks passed. Economic rationale makes sense.
Now what?
The worst mistake is jumping straight to live trading with full size. Backtests, no matter how rigorous, are simulations. Live markets have frictions, latencies, and surprises that backtests don't capture.
This lesson covers our promotion pipeline—the systematic process that takes edges from discovery through paper trading to live production.
The Promotion Pipeline
Our pipeline has four stages:
Stage 1: Discovery Edge passes statistical validation. It exists in our database as a "candidate."
Stage 2: Paper Trading Edge runs in production systems but executes no real trades. Signals fire, would-be P&L tracked, but no capital at risk.
Stage 3: Small Live Edge trades with minimal position sizes. Real money at risk, but limited exposure. Learning phase.
Stage 4: Full Live Edge trades at intended position sizes. Full production status.
Each stage has entry criteria and exit criteria. Edges can be promoted forward or demoted back.
Stage 1: Discovery Exit Criteria
To exit discovery and enter paper trading, an edge must pass:
- •Walk-forward validation: In-sample and out-of-sample performance both acceptable
- •Sample size: Minimum 500 events (preferably 1,000+)
- •Win rate: Above threshold for its type (e.g., >60% for mean reversion)
- •Profit factor: Above 1.5
- •Parameter stability: Performance degrades gracefully with parameter changes
- •Economic rationale: Explainable mechanism
- •Regime documentation: Clear about which market conditions it requires
- •No redundancy: Not duplicating existing live edges
Passing edges get assigned unique IDs and enter the paper trading queue.
Stage 2: Paper Trading
Paper trading is production infrastructure without real execution.
What Happens:
- •Edge loaded into signal scanner
- •Signals fire according to edge rules
- •Hypothetical entries and exits recorded
- •P&L calculated as if trades executed
- •Latency and timing tracked
Why It Matters:
- •Validates that production code matches backtest logic
- •Catches data pipeline issues (stale data, missing feeds)
- •Reveals operational problems (signal timing, execution windows)
- •Provides forward performance data
Duration: Minimum 30 days or 50 signals, whichever takes longer. Some edges signal frequently and hit 50 signals in two weeks. Others are rare and need months.
Exit Criteria to Promote:
- •Forward performance within 20% of backtest expectations
- •No operational issues
- •Signal quality maintained
- •At least 30 signals observed
Exit Criteria to Demote:
- •Forward performance significantly worse than backtest
- •Operational issues that can't be resolved
- •Changing market conditions invalidating the edge
Paper trading is cheap—no money at risk. Stay here until confident.
Stage 3: Small Live
First real money. Position sizes at 10-25% of intended final size.
Purpose:
- •Validate execution quality (slippage, fill rates)
- •Experience real P&L psychology
- •Catch issues that only appear with real orders
What We Track:
- •Actual vs expected entry prices
- •Actual vs expected exit prices
- •Slippage cost per trade
- •Fill rate (what % of signals actually execute)
- •Real P&L vs paper P&L
Duration: Minimum 30 days or 30 real trades.
Position Sizing Logic: If intended final size is 2% of portfolio per signal, small live uses 0.3-0.5% per signal. Enough to be real, small enough to be survivable if edge fails.
Exit Criteria to Promote:
- •Real performance tracks paper performance (within 25%)
- •Slippage acceptable (<0.2% typically)
- •Execution quality stable
- •No negative surprises
Exit Criteria to Demote:
- •Significant underperformance vs paper
- •Execution issues (poor fills, high slippage)
- •Edge showing decay
Stage 4: Full Live
Edge runs at intended position sizes with full confidence.
Ongoing Monitoring:
- •Rolling performance vs historical expectations
- •Regime alignment (is edge in its valid regime?)
- •Signal frequency vs expected
- •Drawdown tracking
Red Flags:
- •Performance degrading over time (edge decay)
- •Signal frequency dropping (market conditions changing)
- •Unexpected correlation with other edges (concentration risk)
Even full-live edges face regular review. No edge runs forever without scrutiny.
Kill Criteria: When to Stop Trading an Edge
Edges die. Markets change. What worked stops working.
Kill criteria (any triggers stop-loss review):
Performance-Based:
- •Rolling 30-day win rate drops below 50%
- •Drawdown exceeds 20% of edge's historical max drawdown
- •Three consecutive months of negative performance
Statistical:
- •Recent performance statistically worse than backtest (>2 standard deviations below expected)
- •Win rate confidence interval no longer contains backtest win rate
Operational:
- •Regime requirements no longer met for extended period
- •Data source becomes unavailable or unreliable
- •Exchange changes invalidate execution assumptions
Qualitative:
- •Market structure change that invalidates edge thesis
- •Regulatory changes affecting the edge's mechanism
- •Competitor activity suggests edge is now crowded
When kill criteria trigger, edge moves to "suspended" status. Can be re-evaluated later, but stops trading immediately.
Position Sizing During Transitions
Smart position sizing manages promotion risk:
Discovery → Paper: No position (paper only)
Paper → Small Live: 15-25% of target size
Small Live → Full Live: Graduated increase
- •Week 1-2: 25% of target
- •Week 3-4: 50% of target
- •Week 5-6: 75% of target
- •Week 7+: 100% of target
This graduated approach limits damage if something goes wrong during transition.
The Promotion Database
Every edge's journey is tracked:
| Field | Description |
|---|---|
| edge_id | Unique identifier |
| current_stage | DISCOVERY/PAPER/SMALL_LIVE/FULL_LIVE/SUSPENDED |
| stage_entry_date | When current stage started |
| paper_start_date | When paper trading began |
| paper_signals | Count of paper signals |
| paper_pnl | Cumulative paper P&L |
| live_start_date | When real trading began |
| live_signals | Count of real signals |
| live_pnl | Cumulative real P&L |
| notes | Observations, issues, decisions |
This history enables analysis: How long do edges typically survive? What predicts edge failure? Which discovery methods produce durable edges?
Example: Edge Promotion Timeline
Real example (anonymized):
Day 0: Edge discovered. Funding + OI divergence in bull regime. 68% backtest win rate, 1,247 samples.
Day 1: Passed validation review. Entered paper trading queue.
Day 14: Paper trading started after queue cleared.
Day 45: Paper trading complete. 37 signals, 65% win rate, -3% vs backtest. Acceptable variance.
Day 46: Promoted to small live at 20% size.
Day 78: Small live complete. 24 real trades, 63% win rate, slippage 0.08%. Real P&L 94% of paper expectation.
Day 79: Began graduated promotion to full size.
Day 107: Full live at 100% size.
Day 240: Still running. Rolling 90-day: 61% win rate, slight underperformance but within bounds.
Day 312: Win rate dropped to 54% over 60-day window. Yellow flag raised.
Day 350: Win rate continued decline. Suspended for review.
Day 380: Post-mortem: Market regime shifted, edge no longer valid in new conditions. Retired permanently.
Total lifespan: ~11 months. This is typical—edges don't last forever.
Continuous Improvement Without Overfitting
Edges can be improved, but carefully:
Allowed:
- •Regime filters tightened based on live observation
- •Exit timing adjusted based on execution data
- •Position sizing refined based on realized volatility
Dangerous:
- •Threshold parameters changed to improve recent performance
- •New factors added to filter recent bad signals
- •Logic changed to explain away losses
The difference: changes based on generalizable insights vs changes that curve-fit recent data.
When in doubt, don't modify. Deploy a new edge version and test it fresh through the pipeline rather than patching a live edge.
Key Takeaways
- •Never go straight from backtest to full-size live trading
- •Pipeline: Discovery → Paper → Small Live → Full Live
- •Each stage has entry and exit criteria with minimum durations
- •Position sizing increases gradually during transitions
- •Kill criteria define when to stop trading an edge
- •Track every edge's journey for learning and analysis
- •Edges have finite lifespans—continuous monitoring required
- •Improve carefully to avoid re-fitting to recent noise
The promotion pipeline is your risk management for new strategies. It costs time and opportunity but prevents disasters. A disciplined pipeline is what separates professional operations from gambling with backtest results.