Backtesting Best Practices¶
This document provides comprehensive guidelines for conducting robust, reliable backtests that accurately reflect real-world trading conditions.
Table of Contents¶
- Common Pitfalls and Solutions
- Validation Checklist
- Real-World Examples
- Avoiding Overfitting
- Walk-Forward Analysis Guidelines
- Corporate Actions Handling
- Bias Prevention Strategies
- Academic References
Common Pitfalls and Solutions¶
1. Look-Ahead Bias¶
Problem: Using information that would not have been available at the time of the trade.
Examples: - Using adjusted prices without considering when the adjustment occurred - Calculating indicators using future data points - Rebalancing portfolios based on end-of-period rankings
Solution:
from src.backtesting.engine import EnhancedBacktestEngine
from src.backtesting.bias_checker import BiasChecker
# Enable automatic bias checking
engine = EnhancedBacktestEngine(
symbol_list=['BTCUSDT'],
strategy_class=MyStrategy,
enable_bias_checking=True, # Critical!
bias_check_strict_mode=True
)
results = engine.run()
if not results['bias_check']['passed']:
print("WARNING: Look-ahead bias detected!")
for issue in results['bias_check']['issues']:
print(f"- {issue['description']}")
Best Practices: - Always use point-in-time data - Implement event-driven architecture - Use the BiasChecker module before production - Validate all indicator calculations for proper time alignment
2. Survivorship Bias¶
Problem: Only testing on assets that survived to the present, ignoring delisted or bankrupt companies.
Example: Testing a stock strategy only on current S&P 500 constituents.
Solution: - Include delisted securities in your dataset - Use survivorship-bias-free data providers - Consider the full historical universe at each point in time
# Example: Include delisted flag in data
SELECT * FROM market_data
WHERE symbol IN (
SELECT symbol FROM securities
WHERE (delisted_date IS NULL OR delisted_date > '2024-01-01')
AND listed_date <= '2023-01-01'
)
3. Data Quality Issues¶
Problem: Incorrect, missing, or unrealistic data leading to misleading results.
Common Issues: - Missing bars creating artificial gaps - Extreme prices due to data errors - Incorrect volume figures - Wrong timezone handling
Solution:
from src.backtesting.data_validator import DataValidator
validator = DataValidator()
issues = validator.validate_ohlcv_data(
data=df,
checks=['missing_bars', 'price_spikes', 'negative_values', 'volume_anomalies']
)
if issues:
print(f"Found {len(issues)} data quality issues:")
for issue in issues:
print(f" - {issue}")
Best Practices: - Validate data before backtesting - Check for realistic bid-ask spreads - Verify timezone consistency - Cross-reference multiple data sources - Handle corporate actions properly
4. Overfitting¶
Problem: Strategy performs well on historical data but fails in live trading.
Warning Signs: - Too many parameters (>5-7 is suspicious) - Perfect or near-perfect historical performance - Large performance gap between in-sample and out-of-sample - Strategy logic overly complex or highly specific
Solution: See Avoiding Overfitting section below.
5. Transaction Cost Underestimation¶
Problem: Not accounting for all costs of trading.
Hidden Costs: - Slippage (market impact) - Bid-ask spread - Commission fees - Exchange fees - Financing costs (overnight positions) - Tax implications
Solution:
from decimal import Decimal
engine = EnhancedBacktestEngine(
symbol_list=['BTCUSDT'],
strategy_class=MyStrategy,
# Conservative cost estimates
commission_pct=Decimal('0.001'), # 0.1% per trade
slippage_pct=Decimal('0.0005'), # 0.05% slippage
)
# For higher frequency strategies, increase costs:
# commission_pct=Decimal('0.002') # Account for market impact
Best Practices: - Use realistic commission rates - Model slippage based on order size and liquidity - Consider bid-ask spread explicitly - Test with conservative cost assumptions (1.5x expected) - Account for price improvement in limit orders
6. Incorrect Position Sizing¶
Problem: Using position sizes that couldn't be achieved in practice.
Issues: - Positions larger than available capital - Not accounting for margin requirements - Fractional shares when not supported - Ignoring minimum lot sizes
Solution:
from src.backtesting.portfolio import PositionSizingMethod
from decimal import Decimal
engine = EnhancedBacktestEngine(
position_sizing=PositionSizingMethod.PERCENT_CAPITAL,
position_size_value=Decimal('0.1'), # 10% per position
# Or use fixed risk per trade
# position_sizing=PositionSizingMethod.FIXED_RISK,
# position_size_value=Decimal('0.02'), # 2% risk per trade
)
7. Data Mining Bias¶
Problem: Testing many strategies and only reporting the best one.
Example: Testing 100 random strategies, finding 5 that work, and publishing those.
Solution: - Pre-register hypotheses before testing - Apply multiple testing corrections (Bonferroni, False Discovery Rate) - Use out-of-sample validation - Report all strategies tested, not just winners
Statistical Adjustment:
# Bonferroni correction for multiple tests
alpha = 0.05
num_strategies_tested = 20
adjusted_alpha = alpha / num_strategies_tested # 0.0025
# Strategy must exceed this threshold
required_sharpe = compute_sharpe_threshold(adjusted_alpha)
8. Regime Changes¶
Problem: Market conditions change over time, invalidating historical patterns.
Examples: - Low volatility strategy in a high-vol regime - Mean reversion during strong trends - Ignoring structural market changes (HFT, decimalization)
Solution:
# Test across different market regimes
regimes = {
'bull_market': ('2020-01-01', '2021-12-31'),
'bear_market': ('2022-01-01', '2022-12-31'),
'sideways': ('2023-01-01', '2023-12-31'),
}
for regime_name, (start, end) in regimes.items():
results = engine.run(start_date=start, end_date=end)
print(f"{regime_name}: Sharpe = {results['sharpe_ratio']:.2f}")
Validation Checklist¶
Use this checklist before deploying any strategy:
Data Validation¶
- Data quality checked (no gaps, spikes, or errors)
- Survivorship bias eliminated
- Corporate actions properly adjusted
- Timezone consistency verified
- Point-in-time data confirmed
Bias Prevention¶
- Look-ahead bias check passed
- Event-driven architecture implemented
- No future data leakage in indicators
- BiasChecker module executed successfully
- Manual code review for timing issues
Transaction Costs¶
- Realistic commission rates applied
- Slippage modeled appropriately
- Bid-ask spread considered
- Market impact estimated for large orders
- Conservative cost assumptions used
Robustness Testing¶
- Walk-forward analysis performed
- Out-of-sample validation completed
- Multiple time periods tested
- Different market regimes analyzed
- Sensitivity analysis conducted
Parameter Optimization¶
- Parameter count justified (<7 recommended)
- Optimization methodology documented
- In-sample/out-of-sample split defined
- Overfitting metrics checked
- Parameter stability verified
Risk Management¶
- Maximum drawdown acceptable
- Position sizing rules defined
- Risk per trade specified
- Correlation with other strategies assessed
- Tail risk evaluated
Statistical Validation¶
- Sufficient trade count (>100 recommended)
- Statistical significance verified
- Sharpe ratio realistic (0.5-2.0 achievable)
- Win rate not suspiciously high
- Performance consistent across periods
Documentation¶
- Strategy logic fully documented
- Assumptions clearly stated
- All parameters explained
- Known limitations identified
- Expected performance ranges specified
Real-World Examples¶
Example 1: Moving Average Crossover with Proper Validation¶
Bad Implementation:
# ❌ Common mistakes
def bad_backtest():
# Using entire dataset for optimization
best_params = optimize_params(data)
# Testing on same data used for optimization
results = backtest(data, best_params)
# Not accounting for costs
# No bias checking
# No regime analysis
return results
Good Implementation:
# ✅ Proper validation
from src.backtesting.engine import EnhancedBacktestEngine
from src.backtesting.optimization.walk_forward import WalkForwardAnalyzer
from datetime import datetime
from decimal import Decimal
def proper_backtest():
# 1. Define parameters with economic rationale
param_grid = {
'fast_period': [10, 20, 30], # Short-term trend
'slow_period': [50, 100, 200], # Long-term trend
}
# 2. Use walk-forward analysis
analyzer = WalkForwardAnalyzer(
optimizer=optimizer,
in_sample_days=252, # 1 year training
out_sample_days=63, # 3 months testing
step_days=21, # Roll monthly
anchored=False # Rolling window
)
# 3. Enable all safeguards
engine = EnhancedBacktestEngine(
symbol_list=['BTCUSDT'],
strategy_class=SMACrossover,
start_date=datetime(2020, 1, 1),
end_date=datetime(2024, 1, 1),
initial_capital=Decimal('100000'),
commission_pct=Decimal('0.001'),
slippage_pct=Decimal('0.0005'),
enable_corporate_actions=True,
enable_bias_checking=True,
bias_check_strict_mode=True
)
# 4. Run walk-forward analysis
wf_results = analyzer.run(
start_date=datetime(2020, 1, 1),
end_date=datetime(2024, 1, 1)
)
# 5. Validate results
if wf_results.mean_degradation_pct > 30:
print("WARNING: High performance degradation (>30%)")
print("Strategy likely overfit")
return None
# 6. Test across regimes
regimes = analyze_market_regimes(data)
regime_results = {}
for regime in regimes:
regime_results[regime.name] = engine.run(
start_date=regime.start,
end_date=regime.end
)
return {
'walk_forward': wf_results,
'regime_analysis': regime_results,
'final_validation': 'passed'
}
Example 2: Mean Reversion Strategy with Volume Filters¶
from src.backtesting.strategies.base import Strategy
from src.backtesting.events import SignalEvent
import pandas as pd
class MeanReversionStrategy(Strategy):
"""
Mean reversion with proper implementation:
- Volume confirmation
- Volatility filters
- Risk management
"""
def __init__(self, events, lookback=20, entry_std=2.0, exit_std=0.5):
super().__init__(events)
self.lookback = lookback
self.entry_std = entry_std
self.exit_std = exit_std
self.price_history = {}
self.volume_history = {}
def calculate_signals(self, event):
"""Generate signals with proper validation"""
symbol = event.symbol
# Update history (only using past data!)
if symbol not in self.price_history:
self.price_history[symbol] = []
self.volume_history[symbol] = []
self.price_history[symbol].append(event.close)
self.volume_history[symbol].append(event.volume)
# Need enough data
if len(self.price_history[symbol]) < self.lookback:
return
# Calculate indicators using ONLY historical data
prices = pd.Series(self.price_history[symbol][-self.lookback:])
volumes = pd.Series(self.volume_history[symbol][-self.lookback:])
mean_price = prices.mean()
std_price = prices.std()
current_price = event.close
# Z-score
zscore = (current_price - mean_price) / std_price
# Volume filter (confirm with above-average volume)
avg_volume = volumes.mean()
volume_confirmed = event.volume > avg_volume * 1.2
# Entry signals
if zscore < -self.entry_std and volume_confirmed:
# Oversold - buy signal
self.events.put(SignalEvent(
symbol=symbol,
datetime=event.datetime,
signal_type='LONG',
strength=min(abs(zscore) / 3.0, 1.0)
))
elif zscore > self.entry_std and volume_confirmed:
# Overbought - sell signal
self.events.put(SignalEvent(
symbol=symbol,
datetime=event.datetime,
signal_type='SHORT',
strength=min(abs(zscore) / 3.0, 1.0)
))
# Exit signals
elif abs(zscore) < self.exit_std:
# Return to mean - exit
self.events.put(SignalEvent(
symbol=symbol,
datetime=event.datetime,
signal_type='EXIT',
strength=1.0
))
Example 3: Handling Corporate Actions¶
from src.backtesting.corporate_actions import CorporateActionsEngine, CorporateAction, CorporateActionType
from datetime import datetime
# Load corporate actions
ca_engine = CorporateActionsEngine()
# Example: Tesla 3-for-1 split on Aug 25, 2022
tesla_split = CorporateAction(
symbol='TSLA',
action_type=CorporateActionType.STOCK_SPLIT,
ex_date=datetime(2022, 8, 25),
ratio=3.0,
description='3-for-1 stock split'
)
ca_engine.add_action(tesla_split)
# Backtest with corporate actions
engine = EnhancedBacktestEngine(
symbol_list=['TSLA'],
strategy_class=MyStrategy,
enable_corporate_actions=True,
corporate_actions_csv='data/corporate_actions.csv'
)
results = engine.run()
Avoiding Overfitting¶
Understanding Overfitting¶
Overfitting occurs when a strategy learns the noise in historical data rather than genuine patterns. It manifests as excellent backtest performance but poor live trading results.
Key Indicators of Overfitting¶
- Too Many Parameters: >7 parameters is a red flag
- Perfect or Near-Perfect Results: Sharpe > 3.0, Win rate > 70%
- High In-Sample/Out-of-Sample Gap: >30% degradation
- Complex Logic: Many special cases and exceptions
- Narrow Applicability: Only works for specific stocks/periods
Prevention Strategies¶
1. Limit Parameter Count¶
Rule of Thumb: Number of parameters ≤ log₁₀(number of trades)
# Good: 3 parameters, simple logic
class SimpleStrategy:
def __init__(self, fast_ma=10, slow_ma=50, volume_threshold=1.5):
self.fast_ma = fast_ma
self.slow_ma = slow_ma
self.volume_threshold = volume_threshold
# Bad: 12 parameters, likely overfit
class ComplexStrategy:
def __init__(self, ma1=10, ma2=20, ma3=50, ma4=100, ma5=200,
rsi_period=14, rsi_upper=70, rsi_lower=30,
volume_ma=20, volume_std=2, atr_period=14, atr_multiplier=2):
# Too many parameters!
pass
2. Use Information Criteria¶
from scipy import stats
import numpy as np
def calculate_aic(returns, num_parameters, num_trades):
"""
Calculate Akaike Information Criterion
Lower is better
"""
log_likelihood = np.sum(stats.norm.logpdf(returns))
aic = 2 * num_parameters - 2 * log_likelihood
return aic
def calculate_bic(returns, num_parameters, num_trades):
"""
Calculate Bayesian Information Criterion
Lower is better, penalizes parameters more than AIC
"""
log_likelihood = np.sum(stats.norm.logpdf(returns))
bic = num_parameters * np.log(num_trades) - 2 * log_likelihood
return bic
# Compare strategies
strategy_a_bic = calculate_bic(returns_a, 3, 500)
strategy_b_bic = calculate_bic(returns_b, 10, 500)
if strategy_b_bic > strategy_a_bic:
print("Strategy A is better (simpler and similar performance)")
3. Cross-Validation¶
from sklearn.model_selection import TimeSeriesSplit
def time_series_cross_validation(data, n_splits=5):
"""
Perform time series cross-validation
"""
tscv = TimeSeriesSplit(n_splits=n_splits)
results = []
for fold, (train_idx, test_idx) in enumerate(tscv.split(data)):
train_data = data.iloc[train_idx]
test_data = data.iloc[test_idx]
# Optimize on train
best_params = optimize(train_data)
# Test on test
test_result = backtest(test_data, best_params)
results.append({
'fold': fold,
'train_sharpe': optimize_result.sharpe,
'test_sharpe': test_result.sharpe,
'degradation': optimize_result.sharpe - test_result.sharpe
})
return pd.DataFrame(results)
4. Regularization¶
def penalized_sharpe(returns, num_parameters, penalty_factor=0.1):
"""
Calculate Sharpe ratio with complexity penalty
"""
sharpe = returns.mean() / returns.std() * np.sqrt(252)
penalty = penalty_factor * num_parameters
return sharpe - penalty
# Prefer simpler strategies
strategy_a_score = penalized_sharpe(returns_a, num_params=3)
strategy_b_score = penalized_sharpe(returns_b, num_params=10)
5. Economic Rationale¶
Every parameter should have a clear economic justification:
# ✅ Good: Economic rationale
class TrendFollowing:
def __init__(self,
short_ma=50, # 2-3 month trend
long_ma=200, # 1-year trend
volume_confirm=1.5): # Above-average activity
"""
Trend following based on:
- 50-day MA captures intermediate trend (academic support)
- 200-day MA is widely watched institutional level
- Volume confirms genuine momentum vs noise
"""
pass
# ❌ Bad: Arbitrary parameters
class MysteryStrategy:
def __init__(self,
threshold_1=73.24, # Why 73.24?
threshold_2=0.0043, # Suspiciously precise
lag_periods=17): # Why 17 days?
# No economic rationale!
pass
Walk-Forward Analysis Guidelines¶
Walk-forward analysis (WFA) is the gold standard for strategy validation. It simulates realistic strategy development and deployment.
Methodology¶
- Divide data into sequential windows
- Optimize parameters on in-sample (IS) data
- Test optimized parameters on out-of-sample (OOS) data
- Roll forward and repeat
- Analyze IS vs OOS performance degradation
Implementation¶
from src.backtesting.optimization.walk_forward import WalkForwardAnalyzer, WalkForwardWindow
from src.backtesting.optimization.grid_search import GridSearchOptimizer
from datetime import datetime
# Define parameter search space
param_grid = {
'fast_period': range(5, 30, 5),
'slow_period': range(30, 200, 20),
}
# Create optimizer
optimizer = GridSearchOptimizer(
strategy_class=SMACrossover,
param_grid=param_grid,
optimization_metric='sharpe_ratio'
)
# Configure walk-forward analysis
wf_analyzer = WalkForwardAnalyzer(
optimizer=optimizer,
in_sample_days=252, # 1 year for optimization
out_sample_days=63, # 3 months for validation
step_days=21, # Roll every month
anchored=False, # Rolling window (not expanding)
min_trades=30 # Require minimum trades
)
# Run analysis
results = wf_analyzer.run(
symbol_list=['BTCUSDT'],
start_date=datetime(2020, 1, 1),
end_date=datetime(2024, 1, 1),
initial_capital=100000
)
# Evaluate results
print(f"Mean IS Sharpe: {results.mean_in_sample:.2f}")
print(f"Mean OOS Sharpe: {results.mean_out_sample:.2f}")
print(f"Performance Degradation: {results.mean_degradation_pct:.1f}%")
# Check for overfitting
if results.mean_degradation_pct > 30:
print("⚠️ HIGH DEGRADATION - Likely Overfit")
elif results.mean_degradation_pct > 20:
print("⚠️ MODERATE DEGRADATION - Use Caution")
else:
print("✓ LOW DEGRADATION - Strategy Appears Robust")
# Visualize
wf_analyzer.plot_results(results, save_path='walk_forward_results.png')
Interpretation Guidelines¶
Performance Degradation: - < 20%: Excellent - Strategy is robust - 20-30%: Acceptable - Some overfitting but manageable - 30-50%: Concerning - Significant overfitting - > 50%: Severe - Strategy unlikely to work live
Consistency: - Check OOS performance across windows - Look for consistent parameter sets - Verify no extreme outlier windows
Minimum Requirements: - At least 5 walk-forward windows - Minimum 30 trades per OOS window - Positive OOS performance in >60% of windows
Advanced: Anchored vs. Rolling¶
# Anchored Walk-Forward (expanding window)
# Use when: Building long-term models, more data improves results
wf_anchored = WalkForwardAnalyzer(
anchored=True, # Window grows over time
in_sample_days=252,
out_sample_days=63
)
# Rolling Walk-Forward (sliding window)
# Use when: Market conditions change, recent data more relevant
wf_rolling = WalkForwardAnalyzer(
anchored=False, # Window slides forward
in_sample_days=252,
out_sample_days=63
)
Corporate Actions Handling¶
Corporate actions (splits, dividends, etc.) can significantly impact backtesting accuracy if not handled properly.
Types of Corporate Actions¶
- Stock Splits: N-for-1 split (e.g., 2-for-1, 3-for-1)
- Reverse Splits: 1-for-N split (e.g., 1-for-10)
- Cash Dividends: Cash payment to shareholders
- Stock Dividends: Additional shares distributed
- Rights Issues: Right to buy additional shares
- Bonus Issues: Free additional shares
Adjustment Methodology¶
from src.backtesting.corporate_actions import CorporateActionsEngine, CorporateAction, CorporateActionType
from datetime import datetime
import pandas as pd
# Initialize engine
ca_engine = CorporateActionsEngine()
# Example 1: Stock Split
split = CorporateAction(
symbol='AAPL',
action_type=CorporateActionType.STOCK_SPLIT,
ex_date=datetime(2020, 8, 31),
ratio=4.0, # 4-for-1 split
description='Apple 4-for-1 stock split'
)
ca_engine.add_action(split)
# Example 2: Dividend
dividend = CorporateAction(
symbol='AAPL',
action_type=CorporateActionType.CASH_DIVIDEND,
ex_date=datetime(2024, 2, 9),
amount=0.24, # $0.24 per share
description='Quarterly dividend'
)
ca_engine.add_action(dividend)
# Load price data
price_data = load_price_data('AAPL')
# Apply adjustments
adjusted_data = ca_engine.adjust_prices(
symbol='AAPL',
data=price_data,
adjust_price=True,
adjust_volume=True
)
Best Practices¶
1. Always Use Adjusted Prices¶
# ❌ Wrong: Using unadjusted prices
engine = BacktestEngine(
enable_corporate_actions=False # Don't do this!
)
# ✅ Correct: Using adjusted prices
engine = EnhancedBacktestEngine(
enable_corporate_actions=True,
corporate_actions_csv='data/corporate_actions.csv'
)
2. Verify Adjustment Factors¶
def verify_adjustments(symbol, ex_date):
"""Verify corporate action adjustments are correct"""
pre_split = get_price(symbol, ex_date - timedelta(days=1))
post_split = get_price(symbol, ex_date)
# For 2-for-1 split, expect ~50% price
ratio = post_split / pre_split
print(f"Price ratio: {ratio:.2f}")
# Verify adjusted historical prices
historical = get_historical_adjusted(symbol)
# All pre-split prices should be divided by split ratio
3. Handle Dividends Carefully¶
For total return calculations:
def calculate_total_return(prices, dividends):
"""
Calculate total return including dividends
"""
# Price return
price_return = (prices[-1] - prices[0]) / prices[0]
# Dividend return (reinvested)
dividend_return = 0
for div_date, div_amount in dividends.items():
price_at_div = prices[div_date]
shares_bought = div_amount / price_at_div
dividend_return += shares_bought * prices[-1]
total_return = price_return + dividend_return
return total_return
4. Cross-Reference Data Sources¶
import yfinance as yf
# Verify adjustments match market data
ticker = yf.Ticker('AAPL')
# Check splits
splits = ticker.splits
print("Recorded splits:")
print(splits)
# Check dividends
dividends = ticker.dividends
print("\nRecorded dividends:")
print(dividends)
Common Errors¶
Error 1: Mixing Adjusted and Unadjusted Data
# ❌ Wrong
adjusted_prices = load_adjusted_prices()
unadjusted_volume = load_unadjusted_volume() # Inconsistent!
# ✅ Correct
adjusted_prices = load_adjusted_prices()
adjusted_volume = load_adjusted_volume() # Both adjusted
Error 2: Not Adjusting Technical Indicators
# ❌ Wrong - indicators on unadjusted prices
sma = calculate_sma(unadjusted_prices)
# ✅ Correct - indicators on adjusted prices
sma = calculate_sma(adjusted_prices)
Bias Prevention Strategies¶
Types of Biases¶
- Look-Ahead Bias: Using future information
- Survivorship Bias: Only testing survivors
- Selection Bias: Cherry-picking favorable periods
- Data-Mining Bias: Testing too many strategies
- Optimization Bias: Over-optimizing parameters
Detection and Prevention¶
1. Automated Bias Detection¶
from src.backtesting.bias_checker import BiasChecker
# Configure bias checker
checker = BiasChecker(
strict_mode=True,
tolerance=0.01
)
# Run comprehensive checks
bias_results = checker.run_full_check(
strategy=strategy,
data_handler=data_handler,
portfolio=portfolio
)
# Review results
if bias_results.has_errors():
print("CRITICAL: Bias detected!")
for issue in bias_results.get_issues_by_severity('ERROR'):
print(f"- {issue.description}")
exit(1)
if bias_results.has_warnings():
print("WARNING: Potential bias issues")
for issue in bias_results.get_issues_by_severity('WARNING'):
print(f"- {issue.description}")
2. Point-in-Time Data¶
class PointInTimeDataHandler:
"""Ensure no look-ahead bias in data access"""
def __init__(self, data):
self.data = data
self.current_idx = 0
def get_latest_bars(self, symbol, n=1):
"""
Get latest N bars - ONLY historical data
Returns:
DataFrame with bars up to current_idx (not beyond!)
"""
if self.current_idx < n:
return None
# Critical: Only return data UP TO current point
return self.data.iloc[self.current_idx - n:self.current_idx]
def update_bars(self):
"""Move forward one bar"""
self.current_idx += 1
def get_current_bar(self, symbol):
"""Get current bar (point-in-time)"""
return self.data.iloc[self.current_idx]
3. Timestamp Validation¶
def validate_timestamps(strategy_signals, market_data):
"""
Ensure signals only use data available at signal time
"""
for signal in strategy_signals:
signal_time = signal.timestamp
# Check: Are all input data timestamps before signal?
for data_point in signal.input_data:
if data_point.timestamp >= signal_time:
raise ValueError(
f"Look-ahead bias: Signal at {signal_time} "
f"uses data from {data_point.timestamp}"
)
4. Randomization Tests¶
import numpy as np
def run_randomization_test(strategy, n_trials=1000):
"""
Test if strategy performance is due to skill or luck
"""
actual_sharpe = strategy.backtest().sharpe_ratio
random_sharpes = []
for _ in range(n_trials):
# Randomize entry/exit signals
random_strategy = randomize_signals(strategy)
random_sharpe = random_strategy.backtest().sharpe_ratio
random_sharpes.append(random_sharpe)
# Calculate p-value
p_value = np.mean(np.array(random_sharpes) >= actual_sharpe)
if p_value < 0.05:
print(f"✓ Strategy is statistically significant (p={p_value:.4f})")
else:
print(f"✗ Strategy not significant (p={p_value:.4f})")
return p_value
Academic References¶
Foundational Papers¶
- Bailey, D. H., & López de Prado, M. (2014)
- "The Deflated Sharpe Ratio: Correcting for Selection Bias, Backtest Overfitting, and Non-Normality"
- Journal of Portfolio Management, 40(5), 94-107
-
Key contribution: Statistical methods to detect overfitting
-
Harvey, C. R., Liu, Y., & Zhu, H. (2016)
- "... and the Cross-Section of Expected Returns"
- Review of Financial Studies, 29(1), 5-68
-
Key contribution: Multiple testing in strategy evaluation
-
Bailey, D. H., Borwein, J., López de Prado, M., & Zhu, Q. J. (2014)
- "Pseudo-Mathematics and Financial Charlatanism: The Effects of Backtest Overfitting on Out-of-Sample Performance"
- Notices of the AMS, 61(5), 458-471
-
Key contribution: Probability of backtest overfitting (PBO)
-
Pardo, R. (2008)
- "The Evaluation and Optimization of Trading Strategies" (2nd ed.)
- Wiley Trading
- Key contribution: Walk-forward analysis methodology
Testing and Validation¶
- White, H. (2000)
- "A Reality Check for Data Snooping"
- Econometrica, 68(5), 1097-1126
-
Key contribution: Bootstrap methods for multiple testing
-
Hansen, P. R. (2005)
- "A Test for Superior Predictive Ability"
- Journal of Business & Economic Statistics, 23(4), 365-380
-
Key contribution: SPA test for strategy comparison
-
Romano, J. P., & Wolf, M. (2005)
- "Stepwise Multiple Testing as Formalized Data Snooping"
- Econometrica, 73(4), 1237-1282
- Key contribution: Controlling false discoveries
Transaction Costs and Market Microstructure¶
- Keim, D. B., & Madhavan, A. (1997)
- "Transactions Costs and Investment Style: An Inter-Exchange Analysis of Institutional Equity Trades"
- Journal of Financial Economics, 46(3), 265-292
-
Key contribution: Realistic transaction cost modeling
-
Hasbrouck, J. (2009)
- "Trading Costs and Returns for U.S. Equities: Estimating Effective Costs from Daily Data"
- Journal of Finance, 64(3), 1445-1477
- Key contribution: Estimating implicit costs
Risk Management¶
-
Cornish, E. A., & Fisher, R. A. (1938)
- "Moments and Cumulants in the Specification of Distributions"
- Revue de l'Institut International de Statistique, 5(4), 307-320
- Key contribution: Higher moments in return distributions
-
Sortino, F. A., & Van der Meer, R. (1991)
- "Downside Risk"
- Journal of Portfolio Management, 17(4), 27-31
- Key contribution: Sortino ratio and downside risk
Best Practices Books¶
-
Aronson, D. (2006)
- "Evidence-Based Technical Analysis"
- Wiley
- Comprehensive guide to rigorous backtesting
-
López de Prado, M. (2018)
- "Advances in Financial Machine Learning"
- Wiley
- Modern techniques for strategy development and validation
-
Chan, E. P. (2013)
- "Algorithmic Trading: Winning Strategies and Their Rationale"
- Wiley
- Practical guide to strategy development
Online Resources¶
-
SSRN Financial Markets
- https://www.ssrn.com/
- Working papers on quantitative finance
-
Journal of Portfolio Management
- https://jpm.pm-research.com/
- Practitioner-focused research
-
QuantStart Blog
- https://www.quantstart.com/
- Tutorials on backtesting best practices
Summary¶
Successful backtesting requires:
- Clean, Point-in-Time Data: No survivorship bias, proper corporate actions
- Realistic Costs: Conservative estimates of commissions, slippage, spread
- Bias Prevention: Automated checks, manual review, timestamp validation
- Robust Validation: Walk-forward analysis, multiple regimes, cross-validation
- Statistical Rigor: Multiple testing corrections, sufficient sample size
- Economic Rationale: Every parameter and rule justified
- Documentation: Full transparency about methodology and assumptions
Remember: The goal is not to create a perfect backtest, but to create an honest assessment of expected performance.
Last updated: 2024-01-27 For questions or contributions: see CONTRIBUTING.md