Skip to content

Backtesting Guide

This guide explains how to run backtests, interpret results, and optimize trading strategies using the DSTA backtesting engine.

Overview

The DSTA backtesting engine is an event-driven system that simulates realistic trading conditions on historical data. It helps you evaluate strategy performance before risking real capital.

Why Event-Driven?

  • No Lookahead Bias: Can't accidentally use future information
  • Realistic: Matches how real trading works
  • Testable: Each component can be tested independently
  • Production-Ready: Same code works for live trading

Quick Start

Running Your First Backtest

from backtesting.backtest import Backtest
from backtesting.strategies.sma_crossover import SMACrossoverStrategy
from datetime import datetime

# Configure backtest
backtest = Backtest(
    symbol_list=['BTCUSDT'],
    initial_capital=100000,
    start_date=datetime(2023, 1, 1),
    end_date=datetime(2023, 12, 31),
    data_handler='DatabaseDataHandler',
    execution_handler='SimulatedExecutionHandler',
    strategy=SMACrossoverStrategy,
    strategy_params={
        'fast_period': 50,
        'slow_period': 200
    }
)

# Run backtest
results = backtest.run()

# Display results
print(results)

Understanding the Configuration

Required Parameters:

  • symbol_list: List of trading pairs to trade (e.g., ['BTCUSDT', 'ETHUSDT'])
  • initial_capital: Starting capital in dollars (e.g., 100000)
  • start_date: Backtest start date (e.g., datetime(2023, 1, 1))
  • end_date: Backtest end date (e.g., datetime(2023, 12, 31))
  • strategy: Strategy class to test

Optional Parameters:

  • data_handler: Data source (default: 'DatabaseDataHandler')
  • execution_handler: Order execution model (default: 'SimulatedExecutionHandler')
  • strategy_params: Dictionary of strategy parameters
  • commission: Commission per trade as percentage (default: 0.001 = 0.1%)
  • slippage: Slippage per trade as percentage (default: 0.0005 = 0.05%)

Interpreting Results

Performance Metrics

The backtest returns a comprehensive set of metrics:

{
    # Returns
    'total_return': 15.43,              # Total return (%)
    'annualized_return': 15.89,         # Annualized return (%)
    'benchmark_return': 8.50,            # Buy-and-hold return (%)

    # Risk-Adjusted Metrics
    'sharpe_ratio': 1.85,               # Sharpe ratio (higher is better)
    'sortino_ratio': 2.31,              # Sortino ratio (downside deviation)
    'calmar_ratio': 1.23,               # Return / max drawdown

    # Risk Metrics
    'max_drawdown': -12.54,             # Maximum drawdown (%)
    'max_drawdown_duration': 45,        # Days to recover from max drawdown
    'volatility': 18.23,                # Annualized volatility (%)
    'downside_deviation': 12.45,        # Downside volatility (%)

    # Trade Statistics
    'total_trades': 47,                 # Number of round-trip trades
    'winning_trades': 28,               # Number of winning trades
    'losing_trades': 19,                # Number of losing trades
    'win_rate': 59.57,                  # Winning trades / total trades (%)
    'avg_win': 842.50,                  # Average winning trade ($)
    'avg_loss': -456.30,                # Average losing trade ($)
    'largest_win': 3250.00,             # Largest winning trade ($)
    'largest_loss': -1850.00,           # Largest losing trade ($)
    'avg_trade_duration': 5.3,          # Average days in trade

    # Risk-Reward
    'profit_factor': 1.84,              # Gross profit / gross loss
    'expectancy': 385.21,               # Expected value per trade ($)
    'risk_reward_ratio': 1.85,          # Avg win / avg loss

    # Execution Quality
    'avg_slippage': 0.042,              # Average slippage (%)
    'total_commission': 1250.50,        # Total commission paid ($)

    # Equity Curve
    'equity_curve': [...],              # List of equity values over time
    'drawdown_curve': [...],            # Drawdown values over time

    # Trade Log
    'trades': [...]                     # Detailed trade records
}

Key Metrics Explained

Total Return

Definition: Percentage gain/loss from start to end of backtest.

Formula: (Final Equity - Initial Capital) / Initial Capital * 100

Interpretation: - Positive = Profitable strategy - Compare to benchmark (buy-and-hold) to assess if strategy adds value - Consider in context of risk (volatility, drawdown)

Example:

Initial Capital: $100,000
Final Equity: $115,430
Total Return: 15.43%

Sharpe Ratio

Definition: Risk-adjusted return metric.

Formula: (Return - Risk-Free Rate) / Volatility

Interpretation: - < 1.0: Poor risk-adjusted performance - 1.0 - 2.0: Good performance - > 2.0: Excellent performance - > 3.0: Exceptional (verify for errors!)

Considerations: - Assumes returns are normally distributed (often not true) - Penalizes both upside and downside volatility - Use Sortino ratio for asymmetric strategies

Example:

Annual Return: 15.89%
Risk-Free Rate: 0%
Volatility: 18.23%
Sharpe Ratio: 15.89 / 18.23 = 0.87 (below 1.0, needs improvement)

Sortino Ratio

Definition: Like Sharpe but only penalizes downside volatility.

Formula: (Return - Risk-Free Rate) / Downside Deviation

Interpretation: - Better measure for strategies with asymmetric returns - Higher values indicate better downside risk management - Compare to Sharpe: if Sortino >> Sharpe, strategy limits losses well

Example:

Annual Return: 15.89%
Downside Deviation: 12.45%
Sortino Ratio: 15.89 / 12.45 = 1.28 (better than Sharpe of 0.87)

Maximum Drawdown

Definition: Largest peak-to-trough decline in equity.

Interpretation: - Most realistic measure of downside risk - Represents worst-case loss an investor would have experienced - < 10%: Low risk - 10-20%: Moderate risk - 20-30%: High risk - > 30%: Very high risk (may be unacceptable for many investors)

Example:

Peak Equity: $120,000
Trough Equity: $104,952
Max Drawdown: -12.54%
Recovery Time: 45 days

Win Rate

Definition: Percentage of profitable trades.

Formula: Winning Trades / Total Trades * 100

Interpretation: - Not the most important metric! - Can be profitable with low win rate if avg_win >> avg_loss - Can be unprofitable with high win rate if avg_win << avg_loss - Most successful strategies: 40-60% win rate

Example:

Total Trades: 47
Winning Trades: 28
Win Rate: 59.57%

Profit Factor

Definition: Ratio of gross profit to gross loss.

Formula: Sum(Winning Trades) / Abs(Sum(Losing Trades))

Interpretation: - < 1.0: Losing strategy (gross losses exceed gross profits) - 1.0 - 1.5: Marginally profitable - 1.5 - 2.0: Good profitability - > 2.0: Excellent (verify for overfitting!)

Example:

Gross Profit: $23,590
Gross Loss: $-12,830
Profit Factor: 23,590 / 12,830 = 1.84 (good)

Expectancy

Definition: Expected profit per trade.

Formula: (Win Rate * Avg Win) - (Loss Rate * Abs(Avg Loss))

Interpretation: - Positive = profitable strategy on average - Higher is better - Multiply by expected trades per year for annual expectation

Example:

Win Rate: 59.57%
Avg Win: $842.50
Loss Rate: 40.43%
Avg Loss: $456.30
Expectancy: (0.5957 * 842.50) - (0.4043 * 456.30) = $385.21

Common Pitfalls and How to Avoid Them

1. Lookahead Bias

What: Using information not available at the time of trading.

Examples:

# ❌ Bad - Looks at future data
def calculate_signals(self, event):
    all_data = self.bars.get_all_bars(symbol)
    tomorrow_high = all_data[-1]['high']  # This is tomorrow!

# ✅ Good - Only past data
def calculate_signals(self, event):
    historical = self.bars.get_latest_bars(symbol, N=20)
    current = self.bars.get_latest_bar(symbol)

Prevention: - Use event-driven architecture (DSTA does this) - Only access data via get_latest_bar() or get_latest_bars() - Never use future-peeking functions like shift(-1) in pandas

2. Survivorship Bias

What: Only testing on assets that still exist today.

Impact: Overestimates performance by excluding failed assets.

Example: - Testing crypto strategies only on current top 100 coins - Ignores coins that went to zero or got delisted

Prevention: - Include delisted/defunct assets in backtest universe - Use point-in-time data (what was top 100 then, not now) - Test on multiple market conditions (bull, bear, sideways)

3. Overfitting

What: Optimizing strategy parameters to perfectly fit historical data.

Signs: - Sharpe ratio > 3.0 - Win rate > 70% - Works great in-sample, fails out-of-sample - Many complex conditions/parameters

Example:

# ❌ Overfitted - Too many parameters
if (rsi > 31.47 and rsi < 31.53 and
    sma_10 > sma_11 * 1.0023 and
    volume > prev_volume * 1.347 and
    hour_of_day in [10, 14, 15]):
    buy()

Prevention: - Use walk-forward optimization (see Optimization section) - Test on out-of-sample data - Keep strategies simple - Limit number of parameters (< 5 recommended) - Use cross-validation

4. Data Mining Bias

What: Testing many strategies/parameters, only reporting winners.

Impact: The "winning" strategy likely got lucky.

Example: - Test 100 different strategies - 5 appear profitable by random chance - Report only those 5

Prevention: - Document all tests performed - Use Bonferroni correction for multiple tests - Require minimum number of trades (> 30 recommended) - Test on different time periods

5. Ignoring Transaction Costs

What: Not accounting for commissions, slippage, spread.

Impact: Drastically overstates profitability, especially for high-frequency strategies.

Example:

# ❌ Bad - No costs
backtest = Backtest(
    ...,
    commission=0.0,
    slippage=0.0
)

# ✅ Good - Realistic costs
backtest = Backtest(
    ...,
    commission=0.001,  # 0.1% per trade
    slippage=0.0005    # 0.05% slippage
)

Realistic Values: - Commission: 0.1% - 0.2% per trade (maker/taker fees) - Slippage: 0.05% - 0.2% (depends on liquidity, trade size) - Spread: Use actual bid-ask spread from order book data

6. Unrealistic Position Sizing

What: Position sizes too large for available liquidity.

Example: - Backtesting $1M positions in low-liquidity altcoins - Wouldn't be able to fill those orders in reality

Prevention: - Consider market depth and daily volume - Limit position size to % of daily volume (e.g., < 1%) - Include slippage that increases with position size

Optimization Techniques

1. Grid Search Optimization

When to Use: Systematic exploration of parameter space.

How it Works: Test all combinations of parameters.

Example:

from backtesting.optimization import GridSearchOptimizer

# Define parameter grid
param_grid = {
    'fast_period': [10, 20, 30, 50],
    'slow_period': [50, 100, 150, 200],
    'rsi_oversold': [20, 25, 30, 35]
}

# Run optimization
optimizer = GridSearchOptimizer(
    strategy=SMACrossoverStrategy,
    param_grid=param_grid,
    symbol_list=['BTCUSDT'],
    initial_capital=100000,
    start_date=datetime(2023, 1, 1),
    end_date=datetime(2023, 12, 31),
    metric='sharpe_ratio'  # Optimize for Sharpe ratio
)

results = optimizer.optimize()

# Best parameters
print(f"Best params: {results['best_params']}")
print(f"Best Sharpe: {results['best_score']:.2f}")

# All results
for result in results['all_results']:
    print(f"Params: {result['params']}, Sharpe: {result['sharpe_ratio']:.2f}")

Pros: - Systematic and complete - Easy to implement - No randomness

Cons: - Slow for many parameters (combinatorial explosion) - Can overfit if not validated properly

2. Walk-Forward Optimization

When to Use: To avoid overfitting and validate strategy robustness.

How it Works: 1. Split data into in-sample and out-of-sample periods 2. Optimize on in-sample 3. Test on out-of-sample 4. Roll forward and repeat

Example:

from backtesting.optimization import WalkForwardOptimizer

optimizer = WalkForwardOptimizer(
    strategy=SMACrossoverStrategy,
    param_grid=param_grid,
    symbol_list=['BTCUSDT'],
    initial_capital=100000,
    start_date=datetime(2023, 1, 1),
    end_date=datetime(2023, 12, 31),
    in_sample_period=180,   # 6 months optimization
    out_sample_period=60,   # 2 months testing
    step_size=30            # Roll forward by 1 month
)

results = optimizer.optimize()

# Results by period
for period_result in results['periods']:
    print(f"Period: {period_result['out_sample_start']} to {period_result['out_sample_end']}")
    print(f"  In-sample Sharpe: {period_result['in_sample_sharpe']:.2f}")
    print(f"  Out-sample Sharpe: {period_result['out_sample_sharpe']:.2f}")
    print(f"  Best params: {period_result['best_params']}")

# Aggregate statistics
print(f"\nAverage out-sample Sharpe: {results['avg_out_sample_sharpe']:.2f}")

Advantages: - Realistic assessment of future performance - Detects overfitting - Shows parameter stability over time

Interpretation: - Good: Out-sample Sharpe ≥ 80% of in-sample Sharpe - Warning: Out-sample Sharpe < 50% of in-sample (likely overfit) - Excellent: Parameters stable across periods

3. Genetic Algorithm Optimization

When to Use: Large parameter space, many parameters.

How it Works: Evolutionary approach - best performers "breed" to create new parameter combinations.

Example:

from backtesting.optimization import GeneticOptimizer

optimizer = GeneticOptimizer(
    strategy=SMACrossoverStrategy,
    param_ranges={
        'fast_period': (10, 100),      # Min, max
        'slow_period': (50, 300),
        'rsi_oversold': (20, 40)
    },
    symbol_list=['BTCUSDT'],
    initial_capital=100000,
    start_date=datetime(2023, 1, 1),
    end_date=datetime(2023, 12, 31),
    population_size=50,     # Number of combinations per generation
    generations=20,         # Number of iterations
    mutation_rate=0.1,      # Probability of random change
    metric='sharpe_ratio'
)

results = optimizer.optimize()

# Best individual
print(f"Best params: {results['best_params']}")
print(f"Best Sharpe: {results['best_score']:.2f}")

# Evolution history
import matplotlib.pyplot as plt
plt.plot(results['generation_best_scores'])
plt.xlabel('Generation')
plt.ylabel('Best Sharpe Ratio')
plt.title('Optimization Progress')
plt.show()

Pros: - Handles large parameter spaces - Can find global optimums - Faster than grid search for many parameters

Cons: - Non-deterministic (different results each run) - Requires tuning (population size, mutation rate) - Can still overfit

4. Monte Carlo Simulation

When to Use: Assess strategy robustness and expected range of outcomes.

How it Works: Randomly shuffle trade order or returns to generate distribution of possible outcomes.

Example:

from backtesting.monte_carlo import MonteCarloSimulator

# Run backtest first
backtest_results = backtest.run()

# Monte Carlo simulation
mc = MonteCarloSimulator(
    trades=backtest_results['trades'],
    initial_capital=100000,
    num_simulations=1000
)

mc_results = mc.run()

# Statistics
print(f"Expected Return: {mc_results['mean_return']:.2f}%")
print(f"Std Dev Return: {mc_results['std_return']:.2f}%")
print(f"5th Percentile Return: {mc_results['percentile_5']:.2f}%")
print(f"95th Percentile Return: {mc_results['percentile_95']:.2f}%")
print(f"Probability of Loss: {mc_results['prob_loss']:.2f}%")

# Plot distribution
import matplotlib.pyplot as plt
plt.hist(mc_results['all_returns'], bins=50)
plt.axvline(mc_results['mean_return'], color='r', label='Mean')
plt.axvline(mc_results['percentile_5'], color='orange', label='5th %ile')
plt.axvline(mc_results['percentile_95'], color='orange', label='95th %ile')
plt.xlabel('Total Return (%)')
plt.ylabel('Frequency')
plt.legend()
plt.show()

Interpretation: - Wide distribution = high uncertainty - Negative 5th percentile = risk of significant loss - Compare to backtest: if backtest >> mean, may have gotten lucky

Visualization

Equity Curve

import matplotlib.pyplot as plt
import pandas as pd

# Get equity curve from backtest
equity_curve = results['equity_curve']

# Convert to pandas for easy plotting
df = pd.DataFrame(equity_curve)
df['timestamp'] = pd.to_datetime(df['timestamp'])
df.set_index('timestamp', inplace=True)

# Plot
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(12, 8), sharex=True)

# Equity curve
ax1.plot(df.index, df['equity'], label='Strategy', linewidth=2)
ax1.axhline(y=100000, color='gray', linestyle='--', label='Initial Capital')
ax1.set_ylabel('Equity ($)')
ax1.set_title('Equity Curve')
ax1.legend()
ax1.grid(True, alpha=0.3)

# Drawdown
drawdowns = (df['equity'] / df['equity'].cummax() - 1) * 100
ax2.fill_between(df.index, drawdowns, 0, color='red', alpha=0.3)
ax2.set_ylabel('Drawdown (%)')
ax2.set_xlabel('Date')
ax2.set_title('Drawdown')
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

Monthly Returns Heatmap

import seaborn as sns

# Calculate monthly returns
df['returns'] = df['equity'].pct_change()
monthly_returns = df['returns'].resample('M').apply(lambda x: (1 + x).prod() - 1) * 100

# Pivot for heatmap
monthly_pivot = monthly_returns.to_frame()
monthly_pivot['year'] = monthly_pivot.index.year
monthly_pivot['month'] = monthly_pivot.index.month
heatmap_data = monthly_pivot.pivot(index='year', columns='month', values='returns')

# Plot
plt.figure(figsize=(12, 6))
sns.heatmap(
    heatmap_data,
    annot=True,
    fmt='.1f',
    cmap='RdYlGn',
    center=0,
    cbar_kws={'label': 'Return (%)'}
)
plt.title('Monthly Returns Heatmap')
plt.xlabel('Month')
plt.ylabel('Year')
plt.show()

Trade Analysis

# Get trade details
trades = pd.DataFrame(results['trades'])

# Plot trade distribution
fig, axes = plt.subplots(2, 2, figsize=(12, 10))

# Trade P&L distribution
axes[0, 0].hist(trades['pnl'], bins=30, edgecolor='black')
axes[0, 0].axvline(x=0, color='red', linestyle='--')
axes[0, 0].set_xlabel('P&L ($)')
axes[0, 0].set_ylabel('Frequency')
axes[0, 0].set_title('Trade P&L Distribution')

# Cumulative P&L
axes[0, 1].plot(trades['exit_date'], trades['pnl'].cumsum())
axes[0, 1].set_xlabel('Date')
axes[0, 1].set_ylabel('Cumulative P&L ($)')
axes[0, 1].set_title('Cumulative P&L Over Time')
axes[0, 1].grid(True, alpha=0.3)

# Trade duration distribution
trade_durations = (trades['exit_date'] - trades['entry_date']).dt.total_seconds() / 3600
axes[1, 0].hist(trade_durations, bins=30, edgecolor='black')
axes[1, 0].set_xlabel('Duration (hours)')
axes[1, 0].set_ylabel('Frequency')
axes[1, 0].set_title('Trade Duration Distribution')

# Win/Loss by symbol
win_loss = trades.groupby(['symbol', 'pnl' > 0]).size().unstack(fill_value=0)
win_loss.plot(kind='bar', ax=axes[1, 1], color=['red', 'green'])
axes[1, 1].set_xlabel('Symbol')
axes[1, 1].set_ylabel('Number of Trades')
axes[1, 1].set_title('Wins vs Losses by Symbol')
axes[1, 1].legend(['Loss', 'Win'])

plt.tight_layout()
plt.show()

Best Practices

1. Always Use Out-of-Sample Testing

# ❌ Bad - Test on same data used for development
backtest_all = Backtest(
    start_date=datetime(2023, 1, 1),
    end_date=datetime(2023, 12, 31),
    ...
)

# ✅ Good - Reserve out-of-sample period
# Develop on 2023
backtest_insample = Backtest(
    start_date=datetime(2023, 1, 1),
    end_date=datetime(2023, 12, 31),
    ...
)

# Test on 2024 (haven't seen this data)
backtest_outsample = Backtest(
    start_date=datetime(2024, 1, 1),
    end_date=datetime(2024, 6, 30),
    ...
)

2. Test on Multiple Time Periods

# Test on different market conditions
periods = [
    ('Bull Market', datetime(2023, 1, 1), datetime(2023, 6, 30)),
    ('Bear Market', datetime(2023, 7, 1), datetime(2023, 12, 31)),
    ('Sideways', datetime(2024, 1, 1), datetime(2024, 3, 31))
]

for name, start, end in periods:
    backtest = Backtest(start_date=start, end_date=end, ...)
    results = backtest.run()
    print(f"{name}: Return={results['total_return']:.2f}%, Sharpe={results['sharpe_ratio']:.2f}")

3. Require Minimum Sample Size

results = backtest.run()

# Check minimum number of trades
if results['total_trades'] < 30:
    print("WARNING: Insufficient trades for statistical significance")
    print(f"Only {results['total_trades']} trades. Need at least 30.")

4. Compare to Benchmark

# Strategy results
strategy_return = results['total_return']

# Benchmark: Buy and hold
benchmark_return = results['benchmark_return']

# Compare
excess_return = strategy_return - benchmark_return
print(f"Strategy Return: {strategy_return:.2f}%")
print(f"Benchmark Return: {benchmark_return:.2f}%")
print(f"Excess Return: {excess_return:.2f}%")

if excess_return < 0:
    print("WARNING: Strategy underperforms buy-and-hold!")

5. Document Everything

Keep a research journal:

# Strategy Research Log

## 2024-01-15: SMA Crossover Initial Test
- Strategy: SMA Crossover (50/200)
- Period: 2023-01-01 to 2023-12-31
- Capital: $100,000
- Results:
  - Total Return: 15.43%
  - Sharpe Ratio: 1.85
  - Max Drawdown: -12.54%
  - Total Trades: 47
- Observations: Works well in trending markets, struggles in sideways
- Next Steps: Test on 2024 data, optimize parameters

## 2024-01-16: Parameter Optimization
- Tested fast_period: [20, 30, 50, 100]
- Tested slow_period: [100, 150, 200, 300]
- Best params: fast=30, slow=150
- In-sample Sharpe: 2.15
- Out-sample Sharpe: 1.82 (good - no overfitting)
- Proceeding with these parameters

Common Questions

Q: What's a good Sharpe ratio?
A: > 1.0 is acceptable, > 2.0 is good, > 3.0 is excellent (but verify for overfitting).

Q: How many trades do I need for statistical significance?
A: Minimum 30, preferably 50+. More is better.

Q: Should I optimize for total return or Sharpe ratio?
A: Sharpe ratio (risk-adjusted). High returns with huge drawdowns are not sustainable.

Q: What if my strategy works in backtest but fails in live trading?
A: Common causes: lookahead bias, overfitting, transaction costs underestimated, different market regime.

Q: How do I know if I'm overfitting?
A: Use walk-forward optimization. If out-sample performance << in-sample, you're overfitting.

Resources

  • Strategy Development: See docs/STRATEGY_DEVELOPMENT.md
  • Risk Management: See docs/RISK_MANAGEMENT.md
  • Architecture: See docs/BACKTESTING_ARCHITECTURE.md
  • Performance Metrics: See src/backtesting/performance.py
  • Example Notebooks: See notebooks/backtesting_examples.ipynb

Support

For questions about backtesting:

  1. Check this documentation
  2. Review example backtests in notebooks/
  3. Examine test files in tests/backtesting/
  4. Open an issue with the backtesting label