Backtesting Guide¶

This guide explains how to run backtests, interpret results, and optimize trading strategies using the DSTA backtesting engine.

Overview¶

The DSTA backtesting engine is an event-driven system that simulates realistic trading conditions on historical data. It helps you evaluate strategy performance before risking real capital.

Why Event-Driven?¶

✅ No Lookahead Bias: Can't accidentally use future information
✅ Realistic: Matches how real trading works
✅ Testable: Each component can be tested independently
✅ Production-Ready: Same code works for live trading

Quick Start¶

Running Your First Backtest¶

from backtesting.backtest import Backtest
from backtesting.strategies.sma_crossover import SMACrossoverStrategy
from datetime import datetime

# Configure backtest
backtest = Backtest(
    symbol_list=['BTCUSDT'],
    initial_capital=100000,
    start_date=datetime(2023, 1, 1),
    end_date=datetime(2023, 12, 31),
    data_handler='DatabaseDataHandler',
    execution_handler='SimulatedExecutionHandler',
    strategy=SMACrossoverStrategy,
    strategy_params={
        'fast_period': 50,
        'slow_period': 200
    }
)

# Run backtest
results = backtest.run()

# Display results
print(results)

Understanding the Configuration¶

Required Parameters:

symbol_list: List of trading pairs to trade (e.g., ['BTCUSDT', 'ETHUSDT'])
initial_capital: Starting capital in dollars (e.g., 100000)
start_date: Backtest start date (e.g., datetime(2023, 1, 1))
end_date: Backtest end date (e.g., datetime(2023, 12, 31))
strategy: Strategy class to test

Optional Parameters:

data_handler: Data source (default: 'DatabaseDataHandler')
execution_handler: Order execution model (default: 'SimulatedExecutionHandler')
strategy_params: Dictionary of strategy parameters
commission: Commission per trade as percentage (default: 0.001 = 0.1%)
slippage: Slippage per trade as percentage (default: 0.0005 = 0.05%)

Interpreting Results¶

Performance Metrics¶

The backtest returns a comprehensive set of metrics:

{
    # Returns
    'total_return': 15.43,              # Total return (%)
    'annualized_return': 15.89,         # Annualized return (%)
    'benchmark_return': 8.50,            # Buy-and-hold return (%)

    # Risk-Adjusted Metrics
    'sharpe_ratio': 1.85,               # Sharpe ratio (higher is better)
    'sortino_ratio': 2.31,              # Sortino ratio (downside deviation)
    'calmar_ratio': 1.23,               # Return / max drawdown

    # Risk Metrics
    'max_drawdown': -12.54,             # Maximum drawdown (%)
    'max_drawdown_duration': 45,        # Days to recover from max drawdown
    'volatility': 18.23,                # Annualized volatility (%)
    'downside_deviation': 12.45,        # Downside volatility (%)

    # Trade Statistics
    'total_trades': 47,                 # Number of round-trip trades
    'winning_trades': 28,               # Number of winning trades
    'losing_trades': 19,                # Number of losing trades
    'win_rate': 59.57,                  # Winning trades / total trades (%)
    'avg_win': 842.50,                  # Average winning trade ($)
    'avg_loss': -456.30,                # Average losing trade ($)
    'largest_win': 3250.00,             # Largest winning trade ($)
    'largest_loss': -1850.00,           # Largest losing trade ($)
    'avg_trade_duration': 5.3,          # Average days in trade

    # Risk-Reward
    'profit_factor': 1.84,              # Gross profit / gross loss
    'expectancy': 385.21,               # Expected value per trade ($)
    'risk_reward_ratio': 1.85,          # Avg win / avg loss

    # Execution Quality
    'avg_slippage': 0.042,              # Average slippage (%)
    'total_commission': 1250.50,        # Total commission paid ($)

    # Equity Curve
    'equity_curve': [...],              # List of equity values over time
    'drawdown_curve': [...],            # Drawdown values over time

    # Trade Log
    'trades': [...]                     # Detailed trade records
}

Key Metrics Explained¶

Total Return¶

Definition: Percentage gain/loss from start to end of backtest.

Formula: (Final Equity - Initial Capital) / Initial Capital * 100

Interpretation: - Positive = Profitable strategy - Compare to benchmark (buy-and-hold) to assess if strategy adds value - Consider in context of risk (volatility, drawdown)

Example:

Initial Capital: $100,000
Final Equity: $115,430
Total Return: 15.43%

Sharpe Ratio¶

Definition: Risk-adjusted return metric.

Formula: (Return - Risk-Free Rate) / Volatility

Interpretation: - < 1.0: Poor risk-adjusted performance - 1.0 - 2.0: Good performance - > 2.0: Excellent performance - > 3.0: Exceptional (verify for errors!)

Considerations: - Assumes returns are normally distributed (often not true) - Penalizes both upside and downside volatility - Use Sortino ratio for asymmetric strategies

Example:

Annual Return: 15.89%
Risk-Free Rate: 0%
Volatility: 18.23%
Sharpe Ratio: 15.89 / 18.23 = 0.87 (below 1.0, needs improvement)

Sortino Ratio¶

Definition: Like Sharpe but only penalizes downside volatility.

Formula: (Return - Risk-Free Rate) / Downside Deviation

Interpretation: - Better measure for strategies with asymmetric returns - Higher values indicate better downside risk management - Compare to Sharpe: if Sortino >> Sharpe, strategy limits losses well

Example:

Annual Return: 15.89%
Downside Deviation: 12.45%
Sortino Ratio: 15.89 / 12.45 = 1.28 (better than Sharpe of 0.87)

Maximum Drawdown¶

Definition: Largest peak-to-trough decline in equity.

Interpretation: - Most realistic measure of downside risk - Represents worst-case loss an investor would have experienced - < 10%: Low risk - 10-20%: Moderate risk - 20-30%: High risk - > 30%: Very high risk (may be unacceptable for many investors)

Example:

Peak Equity: $120,000
Trough Equity: $104,952
Max Drawdown: -12.54%
Recovery Time: 45 days

Win Rate¶

Definition: Percentage of profitable trades.

Formula: Winning Trades / Total Trades * 100

Interpretation: - Not the most important metric! - Can be profitable with low win rate if avg_win >> avg_loss - Can be unprofitable with high win rate if avg_win << avg_loss - Most successful strategies: 40-60% win rate

Example:

Total Trades: 47
Winning Trades: 28
Win Rate: 59.57%

Profit Factor¶

Definition: Ratio of gross profit to gross loss.

Formula: Sum(Winning Trades) / Abs(Sum(Losing Trades))

Interpretation: - < 1.0: Losing strategy (gross losses exceed gross profits) - 1.0 - 1.5: Marginally profitable - 1.5 - 2.0: Good profitability - > 2.0: Excellent (verify for overfitting!)

Example:

Gross Profit: $23,590
Gross Loss: $-12,830
Profit Factor: 23,590 / 12,830 = 1.84 (good)

Expectancy¶

Definition: Expected profit per trade.

Formula: (Win Rate * Avg Win) - (Loss Rate * Abs(Avg Loss))

Interpretation: - Positive = profitable strategy on average - Higher is better - Multiply by expected trades per year for annual expectation

Example:

Win Rate: 59.57%
Avg Win: $842.50
Loss Rate: 40.43%
Avg Loss: $456.30
Expectancy: (0.5957 * 842.50) - (0.4043 * 456.30) = $385.21

Common Pitfalls and How to Avoid Them¶

1. Lookahead Bias¶

What: Using information not available at the time of trading.

Examples:

# ❌ Bad - Looks at future data
def calculate_signals(self, event):
    all_data = self.bars.get_all_bars(symbol)
    tomorrow_high = all_data[-1]['high']  # This is tomorrow!

# ✅ Good - Only past data
def calculate_signals(self, event):
    historical = self.bars.get_latest_bars(symbol, N=20)
    current = self.bars.get_latest_bar(symbol)

Prevention: - Use event-driven architecture (DSTA does this) - Only access data via get_latest_bar() or get_latest_bars() - Never use future-peeking functions like shift(-1) in pandas

2. Survivorship Bias¶

What: Only testing on assets that still exist today.

Impact: Overestimates performance by excluding failed assets.

Example: - Testing crypto strategies only on current top 100 coins - Ignores coins that went to zero or got delisted

Prevention: - Include delisted/defunct assets in backtest universe - Use point-in-time data (what was top 100 then, not now) - Test on multiple market conditions (bull, bear, sideways)

3. Overfitting¶

What: Optimizing strategy parameters to perfectly fit historical data.

Signs: - Sharpe ratio > 3.0 - Win rate > 70% - Works great in-sample, fails out-of-sample - Many complex conditions/parameters

Example:

# ❌ Overfitted - Too many parameters
if (rsi > 31.47 and rsi < 31.53 and
    sma_10 > sma_11 * 1.0023 and
    volume > prev_volume * 1.347 and
    hour_of_day in [10, 14, 15]):
    buy()

Prevention: - Use walk-forward optimization (see Optimization section) - Test on out-of-sample data - Keep strategies simple - Limit number of parameters (< 5 recommended) - Use cross-validation

4. Data Mining Bias¶

What: Testing many strategies/parameters, only reporting winners.

Impact: The "winning" strategy likely got lucky.

Example: - Test 100 different strategies - 5 appear profitable by random chance - Report only those 5

Prevention: - Document all tests performed - Use Bonferroni correction for multiple tests - Require minimum number of trades (> 30 recommended) - Test on different time periods

5. Ignoring Transaction Costs¶

What: Not accounting for commissions, slippage, spread.

Impact: Drastically overstates profitability, especially for high-frequency strategies.

Example:

# ❌ Bad - No costs
backtest = Backtest(
    ...,
    commission=0.0,
    slippage=0.0
)

# ✅ Good - Realistic costs
backtest = Backtest(
    ...,
    commission=0.001,  # 0.1% per trade
    slippage=0.0005    # 0.05% slippage
)

Realistic Values: - Commission: 0.1% - 0.2% per trade (maker/taker fees) - Slippage: 0.05% - 0.2% (depends on liquidity, trade size) - Spread: Use actual bid-ask spread from order book data

6. Unrealistic Position Sizing¶

What: Position sizes too large for available liquidity.

Example: - Backtesting $1M positions in low-liquidity altcoins - Wouldn't be able to fill those orders in reality

Prevention: - Consider market depth and daily volume - Limit position size to % of daily volume (e.g., < 1%) - Include slippage that increases with position size

Optimization Techniques¶

1. Grid Search Optimization¶

When to Use: Systematic exploration of parameter space.

How it Works: Test all combinations of parameters.

Example:

from backtesting.optimization import GridSearchOptimizer

# Define parameter grid
param_grid = {
    'fast_period': [10, 20, 30, 50],
    'slow_period': [50, 100, 150, 200],
    'rsi_oversold': [20, 25, 30, 35]
}

# Run optimization
optimizer = GridSearchOptimizer(
    strategy=SMACrossoverStrategy,
    param_grid=param_grid,
    symbol_list=['BTCUSDT'],
    initial_capital=100000,
    start_date=datetime(2023, 1, 1),
    end_date=datetime(2023, 12, 31),
    metric='sharpe_ratio'  # Optimize for Sharpe ratio
)

results = optimizer.optimize()

# Best parameters
print(f"Best params: {results['best_params']}")
print(f"Best Sharpe: {results['best_score']:.2f}")

# All results
for result in results['all_results']:
    print(f"Params: {result['params']}, Sharpe: {result['sharpe_ratio']:.2f}")

Pros: - Systematic and complete - Easy to implement - No randomness

Cons: - Slow for many parameters (combinatorial explosion) - Can overfit if not validated properly

2. Walk-Forward Optimization¶

When to Use: To avoid overfitting and validate strategy robustness.

How it Works: 1. Split data into in-sample and out-of-sample periods 2. Optimize on in-sample 3. Test on out-of-sample 4. Roll forward and repeat

Example:

from backtesting.optimization import WalkForwardOptimizer

optimizer = WalkForwardOptimizer(
    strategy=SMACrossoverStrategy,
    param_grid=param_grid,
    symbol_list=['BTCUSDT'],
    initial_capital=100000,
    start_date=datetime(2023, 1, 1),
    end_date=datetime(2023, 12, 31),
    in_sample_period=180,   # 6 months optimization
    out_sample_period=60,   # 2 months testing
    step_size=30            # Roll forward by 1 month
)

results = optimizer.optimize()

# Results by period
for period_result in results['periods']:
    print(f"Period: {period_result['out_sample_start']} to {period_result['out_sample_end']}")
    print(f"  In-sample Sharpe: {period_result['in_sample_sharpe']:.2f}")
    print(f"  Out-sample Sharpe: {period_result['out_sample_sharpe']:.2f}")
    print(f"  Best params: {period_result['best_params']}")

# Aggregate statistics
print(f"\nAverage out-sample Sharpe: {results['avg_out_sample_sharpe']:.2f}")

Advantages: - Realistic assessment of future performance - Detects overfitting - Shows parameter stability over time

Interpretation: - Good: Out-sample Sharpe ≥ 80% of in-sample Sharpe - Warning: Out-sample Sharpe < 50% of in-sample (likely overfit) - Excellent: Parameters stable across periods

3. Genetic Algorithm Optimization¶

When to Use: Large parameter space, many parameters.

How it Works: Evolutionary approach - best performers "breed" to create new parameter combinations.

Example:

from backtesting.optimization import GeneticOptimizer

optimizer = GeneticOptimizer(
    strategy=SMACrossoverStrategy,
    param_ranges={
        'fast_period': (10, 100),      # Min, max
        'slow_period': (50, 300),
        'rsi_oversold': (20, 40)
    },
    symbol_list=['BTCUSDT'],
    initial_capital=100000,
    start_date=datetime(2023, 1, 1),
    end_date=datetime(2023, 12, 31),
    population_size=50,     # Number of combinations per generation
    generations=20,         # Number of iterations
    mutation_rate=0.1,      # Probability of random change
    metric='sharpe_ratio'
)

results = optimizer.optimize()

# Best individual
print(f"Best params: {results['best_params']}")
print(f"Best Sharpe: {results['best_score']:.2f}")

# Evolution history
import matplotlib.pyplot as plt
plt.plot(results['generation_best_scores'])
plt.xlabel('Generation')
plt.ylabel('Best Sharpe Ratio')
plt.title('Optimization Progress')
plt.show()

Pros: - Handles large parameter spaces - Can find global optimums - Faster than grid search for many parameters

Cons: - Non-deterministic (different results each run) - Requires tuning (population size, mutation rate) - Can still overfit

4. Monte Carlo Simulation¶

When to Use: Assess strategy robustness and expected range of outcomes.

How it Works: Randomly shuffle trade order or returns to generate distribution of possible outcomes.

Example:

from backtesting.monte_carlo import MonteCarloSimulator

# Run backtest first
backtest_results = backtest.run()

# Monte Carlo simulation
mc = MonteCarloSimulator(
    trades=backtest_results['trades'],
    initial_capital=100000,
    num_simulations=1000
)

mc_results = mc.run()

# Statistics
print(f"Expected Return: {mc_results['mean_return']:.2f}%")
print(f"Std Dev Return: {mc_results['std_return']:.2f}%")
print(f"5th Percentile Return: {mc_results['percentile_5']:.2f}%")
print(f"95th Percentile Return: {mc_results['percentile_95']:.2f}%")
print(f"Probability of Loss: {mc_results['prob_loss']:.2f}%")

# Plot distribution
import matplotlib.pyplot as plt
plt.hist(mc_results['all_returns'], bins=50)
plt.axvline(mc_results['mean_return'], color='r', label='Mean')
plt.axvline(mc_results['percentile_5'], color='orange', label='5th %ile')
plt.axvline(mc_results['percentile_95'], color='orange', label='95th %ile')
plt.xlabel('Total Return (%)')
plt.ylabel('Frequency')
plt.legend()
plt.show()

Interpretation: - Wide distribution = high uncertainty - Negative 5^th percentile = risk of significant loss - Compare to backtest: if backtest >> mean, may have gotten lucky

Visualization¶

Equity Curve¶

import matplotlib.pyplot as plt
import pandas as pd

# Get equity curve from backtest
equity_curve = results['equity_curve']

# Convert to pandas for easy plotting
df = pd.DataFrame(equity_curve)
df['timestamp'] = pd.to_datetime(df['timestamp'])
df.set_index('timestamp', inplace=True)

# Plot
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(12, 8), sharex=True)

# Equity curve
ax1.plot(df.index, df['equity'], label='Strategy', linewidth=2)
ax1.axhline(y=100000, color='gray', linestyle='--', label='Initial Capital')
ax1.set_ylabel('Equity ($)')
ax1.set_title('Equity Curve')
ax1.legend()
ax1.grid(True, alpha=0.3)

# Drawdown
drawdowns = (df['equity'] / df['equity'].cummax() - 1) * 100
ax2.fill_between(df.index, drawdowns, 0, color='red', alpha=0.3)
ax2.set_ylabel('Drawdown (%)')
ax2.set_xlabel('Date')
ax2.set_title('Drawdown')
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

Monthly Returns Heatmap¶

import seaborn as sns

# Calculate monthly returns
df['returns'] = df['equity'].pct_change()
monthly_returns = df['returns'].resample('M').apply(lambda x: (1 + x).prod() - 1) * 100

# Pivot for heatmap
monthly_pivot = monthly_returns.to_frame()
monthly_pivot['year'] = monthly_pivot.index.year
monthly_pivot['month'] = monthly_pivot.index.month
heatmap_data = monthly_pivot.pivot(index='year', columns='month', values='returns')

# Plot
plt.figure(figsize=(12, 6))
sns.heatmap(
    heatmap_data,
    annot=True,
    fmt='.1f',
    cmap='RdYlGn',
    center=0,
    cbar_kws={'label': 'Return (%)'}
)
plt.title('Monthly Returns Heatmap')
plt.xlabel('Month')
plt.ylabel('Year')
plt.show()

Trade Analysis¶

# Get trade details
trades = pd.DataFrame(results['trades'])

# Plot trade distribution
fig, axes = plt.subplots(2, 2, figsize=(12, 10))

# Trade P&L distribution
axes[0, 0].hist(trades['pnl'], bins=30, edgecolor='black')
axes[0, 0].axvline(x=0, color='red', linestyle='--')
axes[0, 0].set_xlabel('P&L ($)')
axes[0, 0].set_ylabel('Frequency')
axes[0, 0].set_title('Trade P&L Distribution')

# Cumulative P&L
axes[0, 1].plot(trades['exit_date'], trades['pnl'].cumsum())
axes[0, 1].set_xlabel('Date')
axes[0, 1].set_ylabel('Cumulative P&L ($)')
axes[0, 1].set_title('Cumulative P&L Over Time')
axes[0, 1].grid(True, alpha=0.3)

# Trade duration distribution
trade_durations = (trades['exit_date'] - trades['entry_date']).dt.total_seconds() / 3600
axes[1, 0].hist(trade_durations, bins=30, edgecolor='black')
axes[1, 0].set_xlabel('Duration (hours)')
axes[1, 0].set_ylabel('Frequency')
axes[1, 0].set_title('Trade Duration Distribution')

# Win/Loss by symbol
win_loss = trades.groupby(['symbol', 'pnl' > 0]).size().unstack(fill_value=0)
win_loss.plot(kind='bar', ax=axes[1, 1], color=['red', 'green'])
axes[1, 1].set_xlabel('Symbol')
axes[1, 1].set_ylabel('Number of Trades')
axes[1, 1].set_title('Wins vs Losses by Symbol')
axes[1, 1].legend(['Loss', 'Win'])

plt.tight_layout()
plt.show()

Best Practices¶

1. Always Use Out-of-Sample Testing¶

# ❌ Bad - Test on same data used for development
backtest_all = Backtest(
    start_date=datetime(2023, 1, 1),
    end_date=datetime(2023, 12, 31),
    ...
)

# ✅ Good - Reserve out-of-sample period
# Develop on 2023
backtest_insample = Backtest(
    start_date=datetime(2023, 1, 1),
    end_date=datetime(2023, 12, 31),
    ...
)

# Test on 2024 (haven't seen this data)
backtest_outsample = Backtest(
    start_date=datetime(2024, 1, 1),
    end_date=datetime(2024, 6, 30),
    ...
)

2. Test on Multiple Time Periods¶

# Test on different market conditions
periods = [
    ('Bull Market', datetime(2023, 1, 1), datetime(2023, 6, 30)),
    ('Bear Market', datetime(2023, 7, 1), datetime(2023, 12, 31)),
    ('Sideways', datetime(2024, 1, 1), datetime(2024, 3, 31))
]

for name, start, end in periods:
    backtest = Backtest(start_date=start, end_date=end, ...)
    results = backtest.run()
    print(f"{name}: Return={results['total_return']:.2f}%, Sharpe={results['sharpe_ratio']:.2f}")

3. Require Minimum Sample Size¶

results = backtest.run()

# Check minimum number of trades
if results['total_trades'] < 30:
    print("WARNING: Insufficient trades for statistical significance")
    print(f"Only {results['total_trades']} trades. Need at least 30.")

4. Compare to Benchmark¶

# Strategy results
strategy_return = results['total_return']

# Benchmark: Buy and hold
benchmark_return = results['benchmark_return']

# Compare
excess_return = strategy_return - benchmark_return
print(f"Strategy Return: {strategy_return:.2f}%")
print(f"Benchmark Return: {benchmark_return:.2f}%")
print(f"Excess Return: {excess_return:.2f}%")

if excess_return < 0:
    print("WARNING: Strategy underperforms buy-and-hold!")

5. Document Everything¶

Keep a research journal:

# Strategy Research Log

## 2024-01-15: SMA Crossover Initial Test
- Strategy: SMA Crossover (50/200)
- Period: 2023-01-01 to 2023-12-31
- Capital: $100,000
- Results:
  - Total Return: 15.43%
  - Sharpe Ratio: 1.85
  - Max Drawdown: -12.54%
  - Total Trades: 47
- Observations: Works well in trending markets, struggles in sideways
- Next Steps: Test on 2024 data, optimize parameters

## 2024-01-16: Parameter Optimization
- Tested fast_period: [20, 30, 50, 100]
- Tested slow_period: [100, 150, 200, 300]
- Best params: fast=30, slow=150
- In-sample Sharpe: 2.15
- Out-sample Sharpe: 1.82 (good - no overfitting)
- Proceeding with these parameters

Common Questions¶

Q: What's a good Sharpe ratio?
A: > 1.0 is acceptable, > 2.0 is good, > 3.0 is excellent (but verify for overfitting).

Q: How many trades do I need for statistical significance?
A: Minimum 30, preferably 50+. More is better.

Q: Should I optimize for total return or Sharpe ratio?
A: Sharpe ratio (risk-adjusted). High returns with huge drawdowns are not sustainable.

Q: What if my strategy works in backtest but fails in live trading?
A: Common causes: lookahead bias, overfitting, transaction costs underestimated, different market regime.

Q: How do I know if I'm overfitting?
A: Use walk-forward optimization. If out-sample performance << in-sample, you're overfitting.

Resources¶

Strategy Development: See docs/STRATEGY_DEVELOPMENT.md
Risk Management: See docs/RISK_MANAGEMENT.md
Architecture: See docs/BACKTESTING_ARCHITECTURE.md
Performance Metrics: See src/backtesting/performance.py
Example Notebooks: See notebooks/backtesting_examples.ipynb

Support¶

For questions about backtesting:

Check this documentation
Review example backtests in notebooks/
Examine test files in tests/backtesting/
Open an issue with the backtesting label