Skip to content

Backtesting Best Practices

This document provides comprehensive guidelines for conducting robust, reliable backtests that accurately reflect real-world trading conditions.

Table of Contents

  1. Common Pitfalls and Solutions
  2. Validation Checklist
  3. Real-World Examples
  4. Avoiding Overfitting
  5. Walk-Forward Analysis Guidelines
  6. Corporate Actions Handling
  7. Bias Prevention Strategies
  8. Academic References

Common Pitfalls and Solutions

1. Look-Ahead Bias

Problem: Using information that would not have been available at the time of the trade.

Examples: - Using adjusted prices without considering when the adjustment occurred - Calculating indicators using future data points - Rebalancing portfolios based on end-of-period rankings

Solution:

from src.backtesting.engine import EnhancedBacktestEngine
from src.backtesting.bias_checker import BiasChecker

# Enable automatic bias checking
engine = EnhancedBacktestEngine(
    symbol_list=['BTCUSDT'],
    strategy_class=MyStrategy,
    enable_bias_checking=True,  # Critical!
    bias_check_strict_mode=True
)

results = engine.run()
if not results['bias_check']['passed']:
    print("WARNING: Look-ahead bias detected!")
    for issue in results['bias_check']['issues']:
        print(f"- {issue['description']}")

Best Practices: - Always use point-in-time data - Implement event-driven architecture - Use the BiasChecker module before production - Validate all indicator calculations for proper time alignment

2. Survivorship Bias

Problem: Only testing on assets that survived to the present, ignoring delisted or bankrupt companies.

Example: Testing a stock strategy only on current S&P 500 constituents.

Solution: - Include delisted securities in your dataset - Use survivorship-bias-free data providers - Consider the full historical universe at each point in time

# Example: Include delisted flag in data
SELECT * FROM market_data 
WHERE symbol IN (
    SELECT symbol FROM securities 
    WHERE (delisted_date IS NULL OR delisted_date > '2024-01-01')
    AND listed_date <= '2023-01-01'
)

3. Data Quality Issues

Problem: Incorrect, missing, or unrealistic data leading to misleading results.

Common Issues: - Missing bars creating artificial gaps - Extreme prices due to data errors - Incorrect volume figures - Wrong timezone handling

Solution:

from src.backtesting.data_validator import DataValidator

validator = DataValidator()
issues = validator.validate_ohlcv_data(
    data=df,
    checks=['missing_bars', 'price_spikes', 'negative_values', 'volume_anomalies']
)

if issues:
    print(f"Found {len(issues)} data quality issues:")
    for issue in issues:
        print(f"  - {issue}")

Best Practices: - Validate data before backtesting - Check for realistic bid-ask spreads - Verify timezone consistency - Cross-reference multiple data sources - Handle corporate actions properly

4. Overfitting

Problem: Strategy performs well on historical data but fails in live trading.

Warning Signs: - Too many parameters (>5-7 is suspicious) - Perfect or near-perfect historical performance - Large performance gap between in-sample and out-of-sample - Strategy logic overly complex or highly specific

Solution: See Avoiding Overfitting section below.

5. Transaction Cost Underestimation

Problem: Not accounting for all costs of trading.

Hidden Costs: - Slippage (market impact) - Bid-ask spread - Commission fees - Exchange fees - Financing costs (overnight positions) - Tax implications

Solution:

from decimal import Decimal

engine = EnhancedBacktestEngine(
    symbol_list=['BTCUSDT'],
    strategy_class=MyStrategy,
    # Conservative cost estimates
    commission_pct=Decimal('0.001'),  # 0.1% per trade
    slippage_pct=Decimal('0.0005'),   # 0.05% slippage
)

# For higher frequency strategies, increase costs:
# commission_pct=Decimal('0.002')  # Account for market impact

Best Practices: - Use realistic commission rates - Model slippage based on order size and liquidity - Consider bid-ask spread explicitly - Test with conservative cost assumptions (1.5x expected) - Account for price improvement in limit orders

6. Incorrect Position Sizing

Problem: Using position sizes that couldn't be achieved in practice.

Issues: - Positions larger than available capital - Not accounting for margin requirements - Fractional shares when not supported - Ignoring minimum lot sizes

Solution:

from src.backtesting.portfolio import PositionSizingMethod
from decimal import Decimal

engine = EnhancedBacktestEngine(
    position_sizing=PositionSizingMethod.PERCENT_CAPITAL,
    position_size_value=Decimal('0.1'),  # 10% per position
    # Or use fixed risk per trade
    # position_sizing=PositionSizingMethod.FIXED_RISK,
    # position_size_value=Decimal('0.02'),  # 2% risk per trade
)

7. Data Mining Bias

Problem: Testing many strategies and only reporting the best one.

Example: Testing 100 random strategies, finding 5 that work, and publishing those.

Solution: - Pre-register hypotheses before testing - Apply multiple testing corrections (Bonferroni, False Discovery Rate) - Use out-of-sample validation - Report all strategies tested, not just winners

Statistical Adjustment:

# Bonferroni correction for multiple tests
alpha = 0.05
num_strategies_tested = 20
adjusted_alpha = alpha / num_strategies_tested  # 0.0025

# Strategy must exceed this threshold
required_sharpe = compute_sharpe_threshold(adjusted_alpha)

8. Regime Changes

Problem: Market conditions change over time, invalidating historical patterns.

Examples: - Low volatility strategy in a high-vol regime - Mean reversion during strong trends - Ignoring structural market changes (HFT, decimalization)

Solution:

# Test across different market regimes
regimes = {
    'bull_market': ('2020-01-01', '2021-12-31'),
    'bear_market': ('2022-01-01', '2022-12-31'),
    'sideways': ('2023-01-01', '2023-12-31'),
}

for regime_name, (start, end) in regimes.items():
    results = engine.run(start_date=start, end_date=end)
    print(f"{regime_name}: Sharpe = {results['sharpe_ratio']:.2f}")


Validation Checklist

Use this checklist before deploying any strategy:

Data Validation

  • Data quality checked (no gaps, spikes, or errors)
  • Survivorship bias eliminated
  • Corporate actions properly adjusted
  • Timezone consistency verified
  • Point-in-time data confirmed

Bias Prevention

  • Look-ahead bias check passed
  • Event-driven architecture implemented
  • No future data leakage in indicators
  • BiasChecker module executed successfully
  • Manual code review for timing issues

Transaction Costs

  • Realistic commission rates applied
  • Slippage modeled appropriately
  • Bid-ask spread considered
  • Market impact estimated for large orders
  • Conservative cost assumptions used

Robustness Testing

  • Walk-forward analysis performed
  • Out-of-sample validation completed
  • Multiple time periods tested
  • Different market regimes analyzed
  • Sensitivity analysis conducted

Parameter Optimization

  • Parameter count justified (<7 recommended)
  • Optimization methodology documented
  • In-sample/out-of-sample split defined
  • Overfitting metrics checked
  • Parameter stability verified

Risk Management

  • Maximum drawdown acceptable
  • Position sizing rules defined
  • Risk per trade specified
  • Correlation with other strategies assessed
  • Tail risk evaluated

Statistical Validation

  • Sufficient trade count (>100 recommended)
  • Statistical significance verified
  • Sharpe ratio realistic (0.5-2.0 achievable)
  • Win rate not suspiciously high
  • Performance consistent across periods

Documentation

  • Strategy logic fully documented
  • Assumptions clearly stated
  • All parameters explained
  • Known limitations identified
  • Expected performance ranges specified

Real-World Examples

Example 1: Moving Average Crossover with Proper Validation

Bad Implementation:

# ❌ Common mistakes
def bad_backtest():
    # Using entire dataset for optimization
    best_params = optimize_params(data)

    # Testing on same data used for optimization
    results = backtest(data, best_params)

    # Not accounting for costs
    # No bias checking
    # No regime analysis
    return results

Good Implementation:

# ✅ Proper validation
from src.backtesting.engine import EnhancedBacktestEngine
from src.backtesting.optimization.walk_forward import WalkForwardAnalyzer
from datetime import datetime
from decimal import Decimal

def proper_backtest():
    # 1. Define parameters with economic rationale
    param_grid = {
        'fast_period': [10, 20, 30],  # Short-term trend
        'slow_period': [50, 100, 200],  # Long-term trend
    }

    # 2. Use walk-forward analysis
    analyzer = WalkForwardAnalyzer(
        optimizer=optimizer,
        in_sample_days=252,  # 1 year training
        out_sample_days=63,  # 3 months testing
        step_days=21,  # Roll monthly
        anchored=False  # Rolling window
    )

    # 3. Enable all safeguards
    engine = EnhancedBacktestEngine(
        symbol_list=['BTCUSDT'],
        strategy_class=SMACrossover,
        start_date=datetime(2020, 1, 1),
        end_date=datetime(2024, 1, 1),
        initial_capital=Decimal('100000'),
        commission_pct=Decimal('0.001'),
        slippage_pct=Decimal('0.0005'),
        enable_corporate_actions=True,
        enable_bias_checking=True,
        bias_check_strict_mode=True
    )

    # 4. Run walk-forward analysis
    wf_results = analyzer.run(
        start_date=datetime(2020, 1, 1),
        end_date=datetime(2024, 1, 1)
    )

    # 5. Validate results
    if wf_results.mean_degradation_pct > 30:
        print("WARNING: High performance degradation (>30%)")
        print("Strategy likely overfit")
        return None

    # 6. Test across regimes
    regimes = analyze_market_regimes(data)
    regime_results = {}
    for regime in regimes:
        regime_results[regime.name] = engine.run(
            start_date=regime.start,
            end_date=regime.end
        )

    return {
        'walk_forward': wf_results,
        'regime_analysis': regime_results,
        'final_validation': 'passed'
    }

Example 2: Mean Reversion Strategy with Volume Filters

from src.backtesting.strategies.base import Strategy
from src.backtesting.events import SignalEvent
import pandas as pd

class MeanReversionStrategy(Strategy):
    """
    Mean reversion with proper implementation:
    - Volume confirmation
    - Volatility filters
    - Risk management
    """

    def __init__(self, events, lookback=20, entry_std=2.0, exit_std=0.5):
        super().__init__(events)
        self.lookback = lookback
        self.entry_std = entry_std
        self.exit_std = exit_std
        self.price_history = {}
        self.volume_history = {}

    def calculate_signals(self, event):
        """Generate signals with proper validation"""
        symbol = event.symbol

        # Update history (only using past data!)
        if symbol not in self.price_history:
            self.price_history[symbol] = []
            self.volume_history[symbol] = []

        self.price_history[symbol].append(event.close)
        self.volume_history[symbol].append(event.volume)

        # Need enough data
        if len(self.price_history[symbol]) < self.lookback:
            return

        # Calculate indicators using ONLY historical data
        prices = pd.Series(self.price_history[symbol][-self.lookback:])
        volumes = pd.Series(self.volume_history[symbol][-self.lookback:])

        mean_price = prices.mean()
        std_price = prices.std()
        current_price = event.close

        # Z-score
        zscore = (current_price - mean_price) / std_price

        # Volume filter (confirm with above-average volume)
        avg_volume = volumes.mean()
        volume_confirmed = event.volume > avg_volume * 1.2

        # Entry signals
        if zscore < -self.entry_std and volume_confirmed:
            # Oversold - buy signal
            self.events.put(SignalEvent(
                symbol=symbol,
                datetime=event.datetime,
                signal_type='LONG',
                strength=min(abs(zscore) / 3.0, 1.0)
            ))

        elif zscore > self.entry_std and volume_confirmed:
            # Overbought - sell signal
            self.events.put(SignalEvent(
                symbol=symbol,
                datetime=event.datetime,
                signal_type='SHORT',
                strength=min(abs(zscore) / 3.0, 1.0)
            ))

        # Exit signals
        elif abs(zscore) < self.exit_std:
            # Return to mean - exit
            self.events.put(SignalEvent(
                symbol=symbol,
                datetime=event.datetime,
                signal_type='EXIT',
                strength=1.0
            ))

Example 3: Handling Corporate Actions

from src.backtesting.corporate_actions import CorporateActionsEngine, CorporateAction, CorporateActionType
from datetime import datetime

# Load corporate actions
ca_engine = CorporateActionsEngine()

# Example: Tesla 3-for-1 split on Aug 25, 2022
tesla_split = CorporateAction(
    symbol='TSLA',
    action_type=CorporateActionType.STOCK_SPLIT,
    ex_date=datetime(2022, 8, 25),
    ratio=3.0,
    description='3-for-1 stock split'
)
ca_engine.add_action(tesla_split)

# Backtest with corporate actions
engine = EnhancedBacktestEngine(
    symbol_list=['TSLA'],
    strategy_class=MyStrategy,
    enable_corporate_actions=True,
    corporate_actions_csv='data/corporate_actions.csv'
)

results = engine.run()

Avoiding Overfitting

Understanding Overfitting

Overfitting occurs when a strategy learns the noise in historical data rather than genuine patterns. It manifests as excellent backtest performance but poor live trading results.

Key Indicators of Overfitting

  1. Too Many Parameters: >7 parameters is a red flag
  2. Perfect or Near-Perfect Results: Sharpe > 3.0, Win rate > 70%
  3. High In-Sample/Out-of-Sample Gap: >30% degradation
  4. Complex Logic: Many special cases and exceptions
  5. Narrow Applicability: Only works for specific stocks/periods

Prevention Strategies

1. Limit Parameter Count

Rule of Thumb: Number of parameters ≤ log₁₀(number of trades)

# Good: 3 parameters, simple logic
class SimpleStrategy:
    def __init__(self, fast_ma=10, slow_ma=50, volume_threshold=1.5):
        self.fast_ma = fast_ma
        self.slow_ma = slow_ma
        self.volume_threshold = volume_threshold

# Bad: 12 parameters, likely overfit
class ComplexStrategy:
    def __init__(self, ma1=10, ma2=20, ma3=50, ma4=100, ma5=200,
                 rsi_period=14, rsi_upper=70, rsi_lower=30,
                 volume_ma=20, volume_std=2, atr_period=14, atr_multiplier=2):
        # Too many parameters!
        pass

2. Use Information Criteria

from scipy import stats
import numpy as np

def calculate_aic(returns, num_parameters, num_trades):
    """
    Calculate Akaike Information Criterion
    Lower is better
    """
    log_likelihood = np.sum(stats.norm.logpdf(returns))
    aic = 2 * num_parameters - 2 * log_likelihood
    return aic

def calculate_bic(returns, num_parameters, num_trades):
    """
    Calculate Bayesian Information Criterion
    Lower is better, penalizes parameters more than AIC
    """
    log_likelihood = np.sum(stats.norm.logpdf(returns))
    bic = num_parameters * np.log(num_trades) - 2 * log_likelihood
    return bic

# Compare strategies
strategy_a_bic = calculate_bic(returns_a, 3, 500)
strategy_b_bic = calculate_bic(returns_b, 10, 500)

if strategy_b_bic > strategy_a_bic:
    print("Strategy A is better (simpler and similar performance)")

3. Cross-Validation

from sklearn.model_selection import TimeSeriesSplit

def time_series_cross_validation(data, n_splits=5):
    """
    Perform time series cross-validation
    """
    tscv = TimeSeriesSplit(n_splits=n_splits)
    results = []

    for fold, (train_idx, test_idx) in enumerate(tscv.split(data)):
        train_data = data.iloc[train_idx]
        test_data = data.iloc[test_idx]

        # Optimize on train
        best_params = optimize(train_data)

        # Test on test
        test_result = backtest(test_data, best_params)
        results.append({
            'fold': fold,
            'train_sharpe': optimize_result.sharpe,
            'test_sharpe': test_result.sharpe,
            'degradation': optimize_result.sharpe - test_result.sharpe
        })

    return pd.DataFrame(results)

4. Regularization

def penalized_sharpe(returns, num_parameters, penalty_factor=0.1):
    """
    Calculate Sharpe ratio with complexity penalty
    """
    sharpe = returns.mean() / returns.std() * np.sqrt(252)
    penalty = penalty_factor * num_parameters
    return sharpe - penalty

# Prefer simpler strategies
strategy_a_score = penalized_sharpe(returns_a, num_params=3)
strategy_b_score = penalized_sharpe(returns_b, num_params=10)

5. Economic Rationale

Every parameter should have a clear economic justification:

# ✅ Good: Economic rationale
class TrendFollowing:
    def __init__(self, 
                 short_ma=50,    # 2-3 month trend
                 long_ma=200,    # 1-year trend
                 volume_confirm=1.5):  # Above-average activity
        """
        Trend following based on:
        - 50-day MA captures intermediate trend (academic support)
        - 200-day MA is widely watched institutional level
        - Volume confirms genuine momentum vs noise
        """
        pass

# ❌ Bad: Arbitrary parameters
class MysteryStrategy:
    def __init__(self, 
                 threshold_1=73.24,  # Why 73.24?
                 threshold_2=0.0043,  # Suspiciously precise
                 lag_periods=17):     # Why 17 days?
        # No economic rationale!
        pass

Walk-Forward Analysis Guidelines

Walk-forward analysis (WFA) is the gold standard for strategy validation. It simulates realistic strategy development and deployment.

Methodology

  1. Divide data into sequential windows
  2. Optimize parameters on in-sample (IS) data
  3. Test optimized parameters on out-of-sample (OOS) data
  4. Roll forward and repeat
  5. Analyze IS vs OOS performance degradation

Implementation

from src.backtesting.optimization.walk_forward import WalkForwardAnalyzer, WalkForwardWindow
from src.backtesting.optimization.grid_search import GridSearchOptimizer
from datetime import datetime

# Define parameter search space
param_grid = {
    'fast_period': range(5, 30, 5),
    'slow_period': range(30, 200, 20),
}

# Create optimizer
optimizer = GridSearchOptimizer(
    strategy_class=SMACrossover,
    param_grid=param_grid,
    optimization_metric='sharpe_ratio'
)

# Configure walk-forward analysis
wf_analyzer = WalkForwardAnalyzer(
    optimizer=optimizer,
    in_sample_days=252,      # 1 year for optimization
    out_sample_days=63,      # 3 months for validation
    step_days=21,            # Roll every month
    anchored=False,          # Rolling window (not expanding)
    min_trades=30            # Require minimum trades
)

# Run analysis
results = wf_analyzer.run(
    symbol_list=['BTCUSDT'],
    start_date=datetime(2020, 1, 1),
    end_date=datetime(2024, 1, 1),
    initial_capital=100000
)

# Evaluate results
print(f"Mean IS Sharpe: {results.mean_in_sample:.2f}")
print(f"Mean OOS Sharpe: {results.mean_out_sample:.2f}")
print(f"Performance Degradation: {results.mean_degradation_pct:.1f}%")

# Check for overfitting
if results.mean_degradation_pct > 30:
    print("⚠️  HIGH DEGRADATION - Likely Overfit")
elif results.mean_degradation_pct > 20:
    print("⚠️  MODERATE DEGRADATION - Use Caution")
else:
    print("✓ LOW DEGRADATION - Strategy Appears Robust")

# Visualize
wf_analyzer.plot_results(results, save_path='walk_forward_results.png')

Interpretation Guidelines

Performance Degradation: - < 20%: Excellent - Strategy is robust - 20-30%: Acceptable - Some overfitting but manageable - 30-50%: Concerning - Significant overfitting - > 50%: Severe - Strategy unlikely to work live

Consistency: - Check OOS performance across windows - Look for consistent parameter sets - Verify no extreme outlier windows

Minimum Requirements: - At least 5 walk-forward windows - Minimum 30 trades per OOS window - Positive OOS performance in >60% of windows

Advanced: Anchored vs. Rolling

# Anchored Walk-Forward (expanding window)
# Use when: Building long-term models, more data improves results
wf_anchored = WalkForwardAnalyzer(
    anchored=True,  # Window grows over time
    in_sample_days=252,
    out_sample_days=63
)

# Rolling Walk-Forward (sliding window)
# Use when: Market conditions change, recent data more relevant
wf_rolling = WalkForwardAnalyzer(
    anchored=False,  # Window slides forward
    in_sample_days=252,
    out_sample_days=63
)

Corporate Actions Handling

Corporate actions (splits, dividends, etc.) can significantly impact backtesting accuracy if not handled properly.

Types of Corporate Actions

  1. Stock Splits: N-for-1 split (e.g., 2-for-1, 3-for-1)
  2. Reverse Splits: 1-for-N split (e.g., 1-for-10)
  3. Cash Dividends: Cash payment to shareholders
  4. Stock Dividends: Additional shares distributed
  5. Rights Issues: Right to buy additional shares
  6. Bonus Issues: Free additional shares

Adjustment Methodology

from src.backtesting.corporate_actions import CorporateActionsEngine, CorporateAction, CorporateActionType
from datetime import datetime
import pandas as pd

# Initialize engine
ca_engine = CorporateActionsEngine()

# Example 1: Stock Split
split = CorporateAction(
    symbol='AAPL',
    action_type=CorporateActionType.STOCK_SPLIT,
    ex_date=datetime(2020, 8, 31),
    ratio=4.0,  # 4-for-1 split
    description='Apple 4-for-1 stock split'
)
ca_engine.add_action(split)

# Example 2: Dividend
dividend = CorporateAction(
    symbol='AAPL',
    action_type=CorporateActionType.CASH_DIVIDEND,
    ex_date=datetime(2024, 2, 9),
    amount=0.24,  # $0.24 per share
    description='Quarterly dividend'
)
ca_engine.add_action(dividend)

# Load price data
price_data = load_price_data('AAPL')

# Apply adjustments
adjusted_data = ca_engine.adjust_prices(
    symbol='AAPL',
    data=price_data,
    adjust_price=True,
    adjust_volume=True
)

Best Practices

1. Always Use Adjusted Prices

# ❌ Wrong: Using unadjusted prices
engine = BacktestEngine(
    enable_corporate_actions=False  # Don't do this!
)

# ✅ Correct: Using adjusted prices
engine = EnhancedBacktestEngine(
    enable_corporate_actions=True,
    corporate_actions_csv='data/corporate_actions.csv'
)

2. Verify Adjustment Factors

def verify_adjustments(symbol, ex_date):
    """Verify corporate action adjustments are correct"""
    pre_split = get_price(symbol, ex_date - timedelta(days=1))
    post_split = get_price(symbol, ex_date)

    # For 2-for-1 split, expect ~50% price
    ratio = post_split / pre_split
    print(f"Price ratio: {ratio:.2f}")

    # Verify adjusted historical prices
    historical = get_historical_adjusted(symbol)
    # All pre-split prices should be divided by split ratio

3. Handle Dividends Carefully

For total return calculations:

def calculate_total_return(prices, dividends):
    """
    Calculate total return including dividends
    """
    # Price return
    price_return = (prices[-1] - prices[0]) / prices[0]

    # Dividend return (reinvested)
    dividend_return = 0
    for div_date, div_amount in dividends.items():
        price_at_div = prices[div_date]
        shares_bought = div_amount / price_at_div
        dividend_return += shares_bought * prices[-1]

    total_return = price_return + dividend_return
    return total_return

4. Cross-Reference Data Sources

import yfinance as yf

# Verify adjustments match market data
ticker = yf.Ticker('AAPL')

# Check splits
splits = ticker.splits
print("Recorded splits:")
print(splits)

# Check dividends
dividends = ticker.dividends
print("\nRecorded dividends:")
print(dividends)

Common Errors

Error 1: Mixing Adjusted and Unadjusted Data

# ❌ Wrong
adjusted_prices = load_adjusted_prices()
unadjusted_volume = load_unadjusted_volume()  # Inconsistent!

# ✅ Correct
adjusted_prices = load_adjusted_prices()
adjusted_volume = load_adjusted_volume()  # Both adjusted

Error 2: Not Adjusting Technical Indicators

# ❌ Wrong - indicators on unadjusted prices
sma = calculate_sma(unadjusted_prices)

# ✅ Correct - indicators on adjusted prices
sma = calculate_sma(adjusted_prices)


Bias Prevention Strategies

Types of Biases

  1. Look-Ahead Bias: Using future information
  2. Survivorship Bias: Only testing survivors
  3. Selection Bias: Cherry-picking favorable periods
  4. Data-Mining Bias: Testing too many strategies
  5. Optimization Bias: Over-optimizing parameters

Detection and Prevention

1. Automated Bias Detection

from src.backtesting.bias_checker import BiasChecker

# Configure bias checker
checker = BiasChecker(
    strict_mode=True,
    tolerance=0.01
)

# Run comprehensive checks
bias_results = checker.run_full_check(
    strategy=strategy,
    data_handler=data_handler,
    portfolio=portfolio
)

# Review results
if bias_results.has_errors():
    print("CRITICAL: Bias detected!")
    for issue in bias_results.get_issues_by_severity('ERROR'):
        print(f"- {issue.description}")
    exit(1)

if bias_results.has_warnings():
    print("WARNING: Potential bias issues")
    for issue in bias_results.get_issues_by_severity('WARNING'):
        print(f"- {issue.description}")

2. Point-in-Time Data

class PointInTimeDataHandler:
    """Ensure no look-ahead bias in data access"""

    def __init__(self, data):
        self.data = data
        self.current_idx = 0

    def get_latest_bars(self, symbol, n=1):
        """
        Get latest N bars - ONLY historical data

        Returns:
            DataFrame with bars up to current_idx (not beyond!)
        """
        if self.current_idx < n:
            return None

        # Critical: Only return data UP TO current point
        return self.data.iloc[self.current_idx - n:self.current_idx]

    def update_bars(self):
        """Move forward one bar"""
        self.current_idx += 1

    def get_current_bar(self, symbol):
        """Get current bar (point-in-time)"""
        return self.data.iloc[self.current_idx]

3. Timestamp Validation

def validate_timestamps(strategy_signals, market_data):
    """
    Ensure signals only use data available at signal time
    """
    for signal in strategy_signals:
        signal_time = signal.timestamp

        # Check: Are all input data timestamps before signal?
        for data_point in signal.input_data:
            if data_point.timestamp >= signal_time:
                raise ValueError(
                    f"Look-ahead bias: Signal at {signal_time} "
                    f"uses data from {data_point.timestamp}"
                )

4. Randomization Tests

import numpy as np

def run_randomization_test(strategy, n_trials=1000):
    """
    Test if strategy performance is due to skill or luck
    """
    actual_sharpe = strategy.backtest().sharpe_ratio
    random_sharpes = []

    for _ in range(n_trials):
        # Randomize entry/exit signals
        random_strategy = randomize_signals(strategy)
        random_sharpe = random_strategy.backtest().sharpe_ratio
        random_sharpes.append(random_sharpe)

    # Calculate p-value
    p_value = np.mean(np.array(random_sharpes) >= actual_sharpe)

    if p_value < 0.05:
        print(f"✓ Strategy is statistically significant (p={p_value:.4f})")
    else:
        print(f"✗ Strategy not significant (p={p_value:.4f})")

    return p_value

Academic References

Foundational Papers

  1. Bailey, D. H., & López de Prado, M. (2014)
  2. "The Deflated Sharpe Ratio: Correcting for Selection Bias, Backtest Overfitting, and Non-Normality"
  3. Journal of Portfolio Management, 40(5), 94-107
  4. Key contribution: Statistical methods to detect overfitting

  5. Harvey, C. R., Liu, Y., & Zhu, H. (2016)

  6. "... and the Cross-Section of Expected Returns"
  7. Review of Financial Studies, 29(1), 5-68
  8. Key contribution: Multiple testing in strategy evaluation

  9. Bailey, D. H., Borwein, J., López de Prado, M., & Zhu, Q. J. (2014)

  10. "Pseudo-Mathematics and Financial Charlatanism: The Effects of Backtest Overfitting on Out-of-Sample Performance"
  11. Notices of the AMS, 61(5), 458-471
  12. Key contribution: Probability of backtest overfitting (PBO)

  13. Pardo, R. (2008)

  14. "The Evaluation and Optimization of Trading Strategies" (2nd ed.)
  15. Wiley Trading
  16. Key contribution: Walk-forward analysis methodology

Testing and Validation

  1. White, H. (2000)
  2. "A Reality Check for Data Snooping"
  3. Econometrica, 68(5), 1097-1126
  4. Key contribution: Bootstrap methods for multiple testing

  5. Hansen, P. R. (2005)

  6. "A Test for Superior Predictive Ability"
  7. Journal of Business & Economic Statistics, 23(4), 365-380
  8. Key contribution: SPA test for strategy comparison

  9. Romano, J. P., & Wolf, M. (2005)

  10. "Stepwise Multiple Testing as Formalized Data Snooping"
  11. Econometrica, 73(4), 1237-1282
  12. Key contribution: Controlling false discoveries

Transaction Costs and Market Microstructure

  1. Keim, D. B., & Madhavan, A. (1997)
  2. "Transactions Costs and Investment Style: An Inter-Exchange Analysis of Institutional Equity Trades"
  3. Journal of Financial Economics, 46(3), 265-292
  4. Key contribution: Realistic transaction cost modeling

  5. Hasbrouck, J. (2009)

  6. "Trading Costs and Returns for U.S. Equities: Estimating Effective Costs from Daily Data"
  7. Journal of Finance, 64(3), 1445-1477
  8. Key contribution: Estimating implicit costs

Risk Management

  1. Cornish, E. A., & Fisher, R. A. (1938)

    • "Moments and Cumulants in the Specification of Distributions"
    • Revue de l'Institut International de Statistique, 5(4), 307-320
    • Key contribution: Higher moments in return distributions
  2. Sortino, F. A., & Van der Meer, R. (1991)

    • "Downside Risk"
    • Journal of Portfolio Management, 17(4), 27-31
    • Key contribution: Sortino ratio and downside risk

Best Practices Books

  1. Aronson, D. (2006)

    • "Evidence-Based Technical Analysis"
    • Wiley
    • Comprehensive guide to rigorous backtesting
  2. López de Prado, M. (2018)

    • "Advances in Financial Machine Learning"
    • Wiley
    • Modern techniques for strategy development and validation
  3. Chan, E. P. (2013)

    • "Algorithmic Trading: Winning Strategies and Their Rationale"
    • Wiley
    • Practical guide to strategy development

Online Resources

  1. SSRN Financial Markets

    • https://www.ssrn.com/
    • Working papers on quantitative finance
  2. Journal of Portfolio Management

    • https://jpm.pm-research.com/
    • Practitioner-focused research
  3. QuantStart Blog

    • https://www.quantstart.com/
    • Tutorials on backtesting best practices

Summary

Successful backtesting requires:

  1. Clean, Point-in-Time Data: No survivorship bias, proper corporate actions
  2. Realistic Costs: Conservative estimates of commissions, slippage, spread
  3. Bias Prevention: Automated checks, manual review, timestamp validation
  4. Robust Validation: Walk-forward analysis, multiple regimes, cross-validation
  5. Statistical Rigor: Multiple testing corrections, sufficient sample size
  6. Economic Rationale: Every parameter and rule justified
  7. Documentation: Full transparency about methodology and assumptions

Remember: The goal is not to create a perfect backtest, but to create an honest assessment of expected performance.


Last updated: 2024-01-27 For questions or contributions: see CONTRIBUTING.md