The Gap Between Backtest and Live

You've found an edge. Walk-forward validation passed. Robustness checks passed. Economic rationale makes sense.

Now what?

The worst mistake is jumping straight to live trading with full size. Backtests, no matter how rigorous, are simulations. Live markets have frictions, latencies, and surprises that backtests don't capture.

This lesson covers our promotion pipeline—the systematic process that takes edges from discovery through paper trading to live production.

The Promotion Pipeline

Our pipeline has four stages:

Stage 1: Discovery Edge passes statistical validation. It exists in our database as a "candidate."

Stage 2: Paper Trading Edge runs in production systems but executes no real trades. Signals fire, would-be P&L tracked, but no capital at risk.

Stage 3: Small Live Edge trades with minimal position sizes. Real money at risk, but limited exposure. Learning phase.

Stage 4: Full Live Edge trades at intended position sizes. Full production status.

Each stage has entry criteria and exit criteria. Edges can be promoted forward or demoted back.

Stage 1: Discovery Exit Criteria

To exit discovery and enter paper trading, an edge must pass:

•Walk-forward validation: In-sample and out-of-sample performance both acceptable
•Sample size: Minimum 500 events (preferably 1,000+)
•Win rate: Above threshold for its type (e.g., >60% for mean reversion)
•Profit factor: Above 1.5
•Parameter stability: Performance degrades gracefully with parameter changes
•Economic rationale: Explainable mechanism
•Regime documentation: Clear about which market conditions it requires
•No redundancy: Not duplicating existing live edges

Passing edges get assigned unique IDs and enter the paper trading queue.

Stage 2: Paper Trading

Paper trading is production infrastructure without real execution.

What Happens:

•Edge loaded into signal scanner
•Signals fire according to edge rules
•Hypothetical entries and exits recorded
•P&L calculated as if trades executed
•Latency and timing tracked

Why It Matters:

•Validates that production code matches backtest logic
•Catches data pipeline issues (stale data, missing feeds)
•Reveals operational problems (signal timing, execution windows)
•Provides forward performance data

Duration: Minimum 30 days or 50 signals, whichever takes longer. Some edges signal frequently and hit 50 signals in two weeks. Others are rare and need months.

Exit Criteria to Promote:

•Forward performance within 20% of backtest expectations
•No operational issues
•Signal quality maintained
•At least 30 signals observed

Exit Criteria to Demote:

•Forward performance significantly worse than backtest
•Operational issues that can't be resolved
•Changing market conditions invalidating the edge

Paper trading is cheap—no money at risk. Stay here until confident.

Stage 3: Small Live

First real money. Position sizes at 10-25% of intended final size.

Purpose:

•Validate execution quality (slippage, fill rates)
•Experience real P&L psychology
•Catch issues that only appear with real orders

What We Track:

•Actual vs expected entry prices
•Actual vs expected exit prices
•Slippage cost per trade
•Fill rate (what % of signals actually execute)
•Real P&L vs paper P&L

Duration: Minimum 30 days or 30 real trades.

Position Sizing Logic: If intended final size is 2% of portfolio per signal, small live uses 0.3-0.5% per signal. Enough to be real, small enough to be survivable if edge fails.

Exit Criteria to Promote:

•Real performance tracks paper performance (within 25%)
•Slippage acceptable (<0.2% typically)
•Execution quality stable
•No negative surprises

Exit Criteria to Demote:

•Significant underperformance vs paper
•Execution issues (poor fills, high slippage)
•Edge showing decay

Stage 4: Full Live

Edge runs at intended position sizes with full confidence.

Ongoing Monitoring:

•Rolling performance vs historical expectations
•Regime alignment (is edge in its valid regime?)
•Signal frequency vs expected
•Drawdown tracking

Red Flags:

•Performance degrading over time (edge decay)
•Signal frequency dropping (market conditions changing)
•Unexpected correlation with other edges (concentration risk)

Even full-live edges face regular review. No edge runs forever without scrutiny.

Kill Criteria: When to Stop Trading an Edge

Edges die. Markets change. What worked stops working.

Kill criteria (any triggers stop-loss review):

Performance-Based:

•Rolling 30-day win rate drops below 50%
•Drawdown exceeds 20% of edge's historical max drawdown
•Three consecutive months of negative performance

Statistical:

•Recent performance statistically worse than backtest (>2 standard deviations below expected)
•Win rate confidence interval no longer contains backtest win rate

Operational:

•Regime requirements no longer met for extended period
•Data source becomes unavailable or unreliable
•Exchange changes invalidate execution assumptions

Qualitative:

•Market structure change that invalidates edge thesis
•Regulatory changes affecting the edge's mechanism
•Competitor activity suggests edge is now crowded

When kill criteria trigger, edge moves to "suspended" status. Can be re-evaluated later, but stops trading immediately.

Position Sizing During Transitions

Smart position sizing manages promotion risk:

Discovery → Paper: No position (paper only)

Paper → Small Live: 15-25% of target size

Small Live → Full Live: Graduated increase

•Week 1-2: 25% of target
•Week 3-4: 50% of target
•Week 5-6: 75% of target
•Week 7+: 100% of target

This graduated approach limits damage if something goes wrong during transition.

The Promotion Database

Every edge's journey is tracked:

Field	Description
edge_id	Unique identifier
current_stage	DISCOVERY/PAPER/SMALL_LIVE/FULL_LIVE/SUSPENDED
stage_entry_date	When current stage started
paper_start_date	When paper trading began
paper_signals	Count of paper signals
paper_pnl	Cumulative paper P&L
live_start_date	When real trading began
live_signals	Count of real signals
live_pnl	Cumulative real P&L
notes	Observations, issues, decisions

This history enables analysis: How long do edges typically survive? What predicts edge failure? Which discovery methods produce durable edges?

Example: Edge Promotion Timeline

Real example (anonymized):

Day 0: Edge discovered. Funding + OI divergence in bull regime. 68% backtest win rate, 1,247 samples.

Day 1: Passed validation review. Entered paper trading queue.

Day 14: Paper trading started after queue cleared.

Day 45: Paper trading complete. 37 signals, 65% win rate, -3% vs backtest. Acceptable variance.

Day 46: Promoted to small live at 20% size.

Day 78: Small live complete. 24 real trades, 63% win rate, slippage 0.08%. Real P&L 94% of paper expectation.

Day 79: Began graduated promotion to full size.

Day 107: Full live at 100% size.

Day 240: Still running. Rolling 90-day: 61% win rate, slight underperformance but within bounds.

Day 312: Win rate dropped to 54% over 60-day window. Yellow flag raised.

Day 350: Win rate continued decline. Suspended for review.

Day 380: Post-mortem: Market regime shifted, edge no longer valid in new conditions. Retired permanently.

Total lifespan: ~11 months. This is typical—edges don't last forever.

Continuous Improvement Without Overfitting

Edges can be improved, but carefully:

Allowed:

•Regime filters tightened based on live observation
•Exit timing adjusted based on execution data
•Position sizing refined based on realized volatility

Dangerous:

•Threshold parameters changed to improve recent performance
•New factors added to filter recent bad signals
•Logic changed to explain away losses

The difference: changes based on generalizable insights vs changes that curve-fit recent data.

When in doubt, don't modify. Deploy a new edge version and test it fresh through the pipeline rather than patching a live edge.

Key Takeaways

•Never go straight from backtest to full-size live trading
•Pipeline: Discovery → Paper → Small Live → Full Live
•Each stage has entry and exit criteria with minimum durations
•Position sizing increases gradually during transitions
•Kill criteria define when to stop trading an edge
•Track every edge's journey for learning and analysis
•Edges have finite lifespans—continuous monitoring required
•Improve carefully to avoid re-fitting to recent noise

The promotion pipeline is your risk management for new strategies. It costs time and opportunity but prevents disasters. A disciplined pipeline is what separates professional operations from gambling with backtest results.

From Discovery to Production: The Promotion Pipeline