Backtesting Guide¶
This guide explains how to run backtests, interpret results, and optimize trading strategies using the DSTA backtesting engine.
Overview¶
The DSTA backtesting engine is an event-driven system that simulates realistic trading conditions on historical data. It helps you evaluate strategy performance before risking real capital.
Why Event-Driven?¶
- ✅ No Lookahead Bias: Can't accidentally use future information
- ✅ Realistic: Matches how real trading works
- ✅ Testable: Each component can be tested independently
- ✅ Production-Ready: Same code works for live trading
Quick Start¶
Running Your First Backtest¶
from backtesting.backtest import Backtest
from backtesting.strategies.sma_crossover import SMACrossoverStrategy
from datetime import datetime
# Configure backtest
backtest = Backtest(
symbol_list=['BTCUSDT'],
initial_capital=100000,
start_date=datetime(2023, 1, 1),
end_date=datetime(2023, 12, 31),
data_handler='DatabaseDataHandler',
execution_handler='SimulatedExecutionHandler',
strategy=SMACrossoverStrategy,
strategy_params={
'fast_period': 50,
'slow_period': 200
}
)
# Run backtest
results = backtest.run()
# Display results
print(results)
Understanding the Configuration¶
Required Parameters:
symbol_list: List of trading pairs to trade (e.g.,['BTCUSDT', 'ETHUSDT'])initial_capital: Starting capital in dollars (e.g.,100000)start_date: Backtest start date (e.g.,datetime(2023, 1, 1))end_date: Backtest end date (e.g.,datetime(2023, 12, 31))strategy: Strategy class to test
Optional Parameters:
data_handler: Data source (default:'DatabaseDataHandler')execution_handler: Order execution model (default:'SimulatedExecutionHandler')strategy_params: Dictionary of strategy parameterscommission: Commission per trade as percentage (default:0.001= 0.1%)slippage: Slippage per trade as percentage (default:0.0005= 0.05%)
Interpreting Results¶
Performance Metrics¶
The backtest returns a comprehensive set of metrics:
{
# Returns
'total_return': 15.43, # Total return (%)
'annualized_return': 15.89, # Annualized return (%)
'benchmark_return': 8.50, # Buy-and-hold return (%)
# Risk-Adjusted Metrics
'sharpe_ratio': 1.85, # Sharpe ratio (higher is better)
'sortino_ratio': 2.31, # Sortino ratio (downside deviation)
'calmar_ratio': 1.23, # Return / max drawdown
# Risk Metrics
'max_drawdown': -12.54, # Maximum drawdown (%)
'max_drawdown_duration': 45, # Days to recover from max drawdown
'volatility': 18.23, # Annualized volatility (%)
'downside_deviation': 12.45, # Downside volatility (%)
# Trade Statistics
'total_trades': 47, # Number of round-trip trades
'winning_trades': 28, # Number of winning trades
'losing_trades': 19, # Number of losing trades
'win_rate': 59.57, # Winning trades / total trades (%)
'avg_win': 842.50, # Average winning trade ($)
'avg_loss': -456.30, # Average losing trade ($)
'largest_win': 3250.00, # Largest winning trade ($)
'largest_loss': -1850.00, # Largest losing trade ($)
'avg_trade_duration': 5.3, # Average days in trade
# Risk-Reward
'profit_factor': 1.84, # Gross profit / gross loss
'expectancy': 385.21, # Expected value per trade ($)
'risk_reward_ratio': 1.85, # Avg win / avg loss
# Execution Quality
'avg_slippage': 0.042, # Average slippage (%)
'total_commission': 1250.50, # Total commission paid ($)
# Equity Curve
'equity_curve': [...], # List of equity values over time
'drawdown_curve': [...], # Drawdown values over time
# Trade Log
'trades': [...] # Detailed trade records
}
Key Metrics Explained¶
Total Return¶
Definition: Percentage gain/loss from start to end of backtest.
Formula: (Final Equity - Initial Capital) / Initial Capital * 100
Interpretation: - Positive = Profitable strategy - Compare to benchmark (buy-and-hold) to assess if strategy adds value - Consider in context of risk (volatility, drawdown)
Example:
Sharpe Ratio¶
Definition: Risk-adjusted return metric.
Formula: (Return - Risk-Free Rate) / Volatility
Interpretation: - < 1.0: Poor risk-adjusted performance - 1.0 - 2.0: Good performance - > 2.0: Excellent performance - > 3.0: Exceptional (verify for errors!)
Considerations: - Assumes returns are normally distributed (often not true) - Penalizes both upside and downside volatility - Use Sortino ratio for asymmetric strategies
Example:
Annual Return: 15.89%
Risk-Free Rate: 0%
Volatility: 18.23%
Sharpe Ratio: 15.89 / 18.23 = 0.87 (below 1.0, needs improvement)
Sortino Ratio¶
Definition: Like Sharpe but only penalizes downside volatility.
Formula: (Return - Risk-Free Rate) / Downside Deviation
Interpretation: - Better measure for strategies with asymmetric returns - Higher values indicate better downside risk management - Compare to Sharpe: if Sortino >> Sharpe, strategy limits losses well
Example:
Annual Return: 15.89%
Downside Deviation: 12.45%
Sortino Ratio: 15.89 / 12.45 = 1.28 (better than Sharpe of 0.87)
Maximum Drawdown¶
Definition: Largest peak-to-trough decline in equity.
Interpretation: - Most realistic measure of downside risk - Represents worst-case loss an investor would have experienced - < 10%: Low risk - 10-20%: Moderate risk - 20-30%: High risk - > 30%: Very high risk (may be unacceptable for many investors)
Example:
Win Rate¶
Definition: Percentage of profitable trades.
Formula: Winning Trades / Total Trades * 100
Interpretation: - Not the most important metric! - Can be profitable with low win rate if avg_win >> avg_loss - Can be unprofitable with high win rate if avg_win << avg_loss - Most successful strategies: 40-60% win rate
Example:
Profit Factor¶
Definition: Ratio of gross profit to gross loss.
Formula: Sum(Winning Trades) / Abs(Sum(Losing Trades))
Interpretation: - < 1.0: Losing strategy (gross losses exceed gross profits) - 1.0 - 1.5: Marginally profitable - 1.5 - 2.0: Good profitability - > 2.0: Excellent (verify for overfitting!)
Example:
Expectancy¶
Definition: Expected profit per trade.
Formula: (Win Rate * Avg Win) - (Loss Rate * Abs(Avg Loss))
Interpretation: - Positive = profitable strategy on average - Higher is better - Multiply by expected trades per year for annual expectation
Example:
Win Rate: 59.57%
Avg Win: $842.50
Loss Rate: 40.43%
Avg Loss: $456.30
Expectancy: (0.5957 * 842.50) - (0.4043 * 456.30) = $385.21
Common Pitfalls and How to Avoid Them¶
1. Lookahead Bias¶
What: Using information not available at the time of trading.
Examples:
# ❌ Bad - Looks at future data
def calculate_signals(self, event):
all_data = self.bars.get_all_bars(symbol)
tomorrow_high = all_data[-1]['high'] # This is tomorrow!
# ✅ Good - Only past data
def calculate_signals(self, event):
historical = self.bars.get_latest_bars(symbol, N=20)
current = self.bars.get_latest_bar(symbol)
Prevention: - Use event-driven architecture (DSTA does this) - Only access data via get_latest_bar() or get_latest_bars() - Never use future-peeking functions like shift(-1) in pandas
2. Survivorship Bias¶
What: Only testing on assets that still exist today.
Impact: Overestimates performance by excluding failed assets.
Example: - Testing crypto strategies only on current top 100 coins - Ignores coins that went to zero or got delisted
Prevention: - Include delisted/defunct assets in backtest universe - Use point-in-time data (what was top 100 then, not now) - Test on multiple market conditions (bull, bear, sideways)
3. Overfitting¶
What: Optimizing strategy parameters to perfectly fit historical data.
Signs: - Sharpe ratio > 3.0 - Win rate > 70% - Works great in-sample, fails out-of-sample - Many complex conditions/parameters
Example:
# ❌ Overfitted - Too many parameters
if (rsi > 31.47 and rsi < 31.53 and
sma_10 > sma_11 * 1.0023 and
volume > prev_volume * 1.347 and
hour_of_day in [10, 14, 15]):
buy()
Prevention: - Use walk-forward optimization (see Optimization section) - Test on out-of-sample data - Keep strategies simple - Limit number of parameters (< 5 recommended) - Use cross-validation
4. Data Mining Bias¶
What: Testing many strategies/parameters, only reporting winners.
Impact: The "winning" strategy likely got lucky.
Example: - Test 100 different strategies - 5 appear profitable by random chance - Report only those 5
Prevention: - Document all tests performed - Use Bonferroni correction for multiple tests - Require minimum number of trades (> 30 recommended) - Test on different time periods
5. Ignoring Transaction Costs¶
What: Not accounting for commissions, slippage, spread.
Impact: Drastically overstates profitability, especially for high-frequency strategies.
Example:
# ❌ Bad - No costs
backtest = Backtest(
...,
commission=0.0,
slippage=0.0
)
# ✅ Good - Realistic costs
backtest = Backtest(
...,
commission=0.001, # 0.1% per trade
slippage=0.0005 # 0.05% slippage
)
Realistic Values: - Commission: 0.1% - 0.2% per trade (maker/taker fees) - Slippage: 0.05% - 0.2% (depends on liquidity, trade size) - Spread: Use actual bid-ask spread from order book data
6. Unrealistic Position Sizing¶
What: Position sizes too large for available liquidity.
Example: - Backtesting $1M positions in low-liquidity altcoins - Wouldn't be able to fill those orders in reality
Prevention: - Consider market depth and daily volume - Limit position size to % of daily volume (e.g., < 1%) - Include slippage that increases with position size
Optimization Techniques¶
1. Grid Search Optimization¶
When to Use: Systematic exploration of parameter space.
How it Works: Test all combinations of parameters.
Example:
from backtesting.optimization import GridSearchOptimizer
# Define parameter grid
param_grid = {
'fast_period': [10, 20, 30, 50],
'slow_period': [50, 100, 150, 200],
'rsi_oversold': [20, 25, 30, 35]
}
# Run optimization
optimizer = GridSearchOptimizer(
strategy=SMACrossoverStrategy,
param_grid=param_grid,
symbol_list=['BTCUSDT'],
initial_capital=100000,
start_date=datetime(2023, 1, 1),
end_date=datetime(2023, 12, 31),
metric='sharpe_ratio' # Optimize for Sharpe ratio
)
results = optimizer.optimize()
# Best parameters
print(f"Best params: {results['best_params']}")
print(f"Best Sharpe: {results['best_score']:.2f}")
# All results
for result in results['all_results']:
print(f"Params: {result['params']}, Sharpe: {result['sharpe_ratio']:.2f}")
Pros: - Systematic and complete - Easy to implement - No randomness
Cons: - Slow for many parameters (combinatorial explosion) - Can overfit if not validated properly
2. Walk-Forward Optimization¶
When to Use: To avoid overfitting and validate strategy robustness.
How it Works: 1. Split data into in-sample and out-of-sample periods 2. Optimize on in-sample 3. Test on out-of-sample 4. Roll forward and repeat
Example:
from backtesting.optimization import WalkForwardOptimizer
optimizer = WalkForwardOptimizer(
strategy=SMACrossoverStrategy,
param_grid=param_grid,
symbol_list=['BTCUSDT'],
initial_capital=100000,
start_date=datetime(2023, 1, 1),
end_date=datetime(2023, 12, 31),
in_sample_period=180, # 6 months optimization
out_sample_period=60, # 2 months testing
step_size=30 # Roll forward by 1 month
)
results = optimizer.optimize()
# Results by period
for period_result in results['periods']:
print(f"Period: {period_result['out_sample_start']} to {period_result['out_sample_end']}")
print(f" In-sample Sharpe: {period_result['in_sample_sharpe']:.2f}")
print(f" Out-sample Sharpe: {period_result['out_sample_sharpe']:.2f}")
print(f" Best params: {period_result['best_params']}")
# Aggregate statistics
print(f"\nAverage out-sample Sharpe: {results['avg_out_sample_sharpe']:.2f}")
Advantages: - Realistic assessment of future performance - Detects overfitting - Shows parameter stability over time
Interpretation: - Good: Out-sample Sharpe ≥ 80% of in-sample Sharpe - Warning: Out-sample Sharpe < 50% of in-sample (likely overfit) - Excellent: Parameters stable across periods
3. Genetic Algorithm Optimization¶
When to Use: Large parameter space, many parameters.
How it Works: Evolutionary approach - best performers "breed" to create new parameter combinations.
Example:
from backtesting.optimization import GeneticOptimizer
optimizer = GeneticOptimizer(
strategy=SMACrossoverStrategy,
param_ranges={
'fast_period': (10, 100), # Min, max
'slow_period': (50, 300),
'rsi_oversold': (20, 40)
},
symbol_list=['BTCUSDT'],
initial_capital=100000,
start_date=datetime(2023, 1, 1),
end_date=datetime(2023, 12, 31),
population_size=50, # Number of combinations per generation
generations=20, # Number of iterations
mutation_rate=0.1, # Probability of random change
metric='sharpe_ratio'
)
results = optimizer.optimize()
# Best individual
print(f"Best params: {results['best_params']}")
print(f"Best Sharpe: {results['best_score']:.2f}")
# Evolution history
import matplotlib.pyplot as plt
plt.plot(results['generation_best_scores'])
plt.xlabel('Generation')
plt.ylabel('Best Sharpe Ratio')
plt.title('Optimization Progress')
plt.show()
Pros: - Handles large parameter spaces - Can find global optimums - Faster than grid search for many parameters
Cons: - Non-deterministic (different results each run) - Requires tuning (population size, mutation rate) - Can still overfit
4. Monte Carlo Simulation¶
When to Use: Assess strategy robustness and expected range of outcomes.
How it Works: Randomly shuffle trade order or returns to generate distribution of possible outcomes.
Example:
from backtesting.monte_carlo import MonteCarloSimulator
# Run backtest first
backtest_results = backtest.run()
# Monte Carlo simulation
mc = MonteCarloSimulator(
trades=backtest_results['trades'],
initial_capital=100000,
num_simulations=1000
)
mc_results = mc.run()
# Statistics
print(f"Expected Return: {mc_results['mean_return']:.2f}%")
print(f"Std Dev Return: {mc_results['std_return']:.2f}%")
print(f"5th Percentile Return: {mc_results['percentile_5']:.2f}%")
print(f"95th Percentile Return: {mc_results['percentile_95']:.2f}%")
print(f"Probability of Loss: {mc_results['prob_loss']:.2f}%")
# Plot distribution
import matplotlib.pyplot as plt
plt.hist(mc_results['all_returns'], bins=50)
plt.axvline(mc_results['mean_return'], color='r', label='Mean')
plt.axvline(mc_results['percentile_5'], color='orange', label='5th %ile')
plt.axvline(mc_results['percentile_95'], color='orange', label='95th %ile')
plt.xlabel('Total Return (%)')
plt.ylabel('Frequency')
plt.legend()
plt.show()
Interpretation: - Wide distribution = high uncertainty - Negative 5th percentile = risk of significant loss - Compare to backtest: if backtest >> mean, may have gotten lucky
Visualization¶
Equity Curve¶
import matplotlib.pyplot as plt
import pandas as pd
# Get equity curve from backtest
equity_curve = results['equity_curve']
# Convert to pandas for easy plotting
df = pd.DataFrame(equity_curve)
df['timestamp'] = pd.to_datetime(df['timestamp'])
df.set_index('timestamp', inplace=True)
# Plot
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(12, 8), sharex=True)
# Equity curve
ax1.plot(df.index, df['equity'], label='Strategy', linewidth=2)
ax1.axhline(y=100000, color='gray', linestyle='--', label='Initial Capital')
ax1.set_ylabel('Equity ($)')
ax1.set_title('Equity Curve')
ax1.legend()
ax1.grid(True, alpha=0.3)
# Drawdown
drawdowns = (df['equity'] / df['equity'].cummax() - 1) * 100
ax2.fill_between(df.index, drawdowns, 0, color='red', alpha=0.3)
ax2.set_ylabel('Drawdown (%)')
ax2.set_xlabel('Date')
ax2.set_title('Drawdown')
ax2.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
Monthly Returns Heatmap¶
import seaborn as sns
# Calculate monthly returns
df['returns'] = df['equity'].pct_change()
monthly_returns = df['returns'].resample('M').apply(lambda x: (1 + x).prod() - 1) * 100
# Pivot for heatmap
monthly_pivot = monthly_returns.to_frame()
monthly_pivot['year'] = monthly_pivot.index.year
monthly_pivot['month'] = monthly_pivot.index.month
heatmap_data = monthly_pivot.pivot(index='year', columns='month', values='returns')
# Plot
plt.figure(figsize=(12, 6))
sns.heatmap(
heatmap_data,
annot=True,
fmt='.1f',
cmap='RdYlGn',
center=0,
cbar_kws={'label': 'Return (%)'}
)
plt.title('Monthly Returns Heatmap')
plt.xlabel('Month')
plt.ylabel('Year')
plt.show()
Trade Analysis¶
# Get trade details
trades = pd.DataFrame(results['trades'])
# Plot trade distribution
fig, axes = plt.subplots(2, 2, figsize=(12, 10))
# Trade P&L distribution
axes[0, 0].hist(trades['pnl'], bins=30, edgecolor='black')
axes[0, 0].axvline(x=0, color='red', linestyle='--')
axes[0, 0].set_xlabel('P&L ($)')
axes[0, 0].set_ylabel('Frequency')
axes[0, 0].set_title('Trade P&L Distribution')
# Cumulative P&L
axes[0, 1].plot(trades['exit_date'], trades['pnl'].cumsum())
axes[0, 1].set_xlabel('Date')
axes[0, 1].set_ylabel('Cumulative P&L ($)')
axes[0, 1].set_title('Cumulative P&L Over Time')
axes[0, 1].grid(True, alpha=0.3)
# Trade duration distribution
trade_durations = (trades['exit_date'] - trades['entry_date']).dt.total_seconds() / 3600
axes[1, 0].hist(trade_durations, bins=30, edgecolor='black')
axes[1, 0].set_xlabel('Duration (hours)')
axes[1, 0].set_ylabel('Frequency')
axes[1, 0].set_title('Trade Duration Distribution')
# Win/Loss by symbol
win_loss = trades.groupby(['symbol', 'pnl' > 0]).size().unstack(fill_value=0)
win_loss.plot(kind='bar', ax=axes[1, 1], color=['red', 'green'])
axes[1, 1].set_xlabel('Symbol')
axes[1, 1].set_ylabel('Number of Trades')
axes[1, 1].set_title('Wins vs Losses by Symbol')
axes[1, 1].legend(['Loss', 'Win'])
plt.tight_layout()
plt.show()
Best Practices¶
1. Always Use Out-of-Sample Testing¶
# ❌ Bad - Test on same data used for development
backtest_all = Backtest(
start_date=datetime(2023, 1, 1),
end_date=datetime(2023, 12, 31),
...
)
# ✅ Good - Reserve out-of-sample period
# Develop on 2023
backtest_insample = Backtest(
start_date=datetime(2023, 1, 1),
end_date=datetime(2023, 12, 31),
...
)
# Test on 2024 (haven't seen this data)
backtest_outsample = Backtest(
start_date=datetime(2024, 1, 1),
end_date=datetime(2024, 6, 30),
...
)
2. Test on Multiple Time Periods¶
# Test on different market conditions
periods = [
('Bull Market', datetime(2023, 1, 1), datetime(2023, 6, 30)),
('Bear Market', datetime(2023, 7, 1), datetime(2023, 12, 31)),
('Sideways', datetime(2024, 1, 1), datetime(2024, 3, 31))
]
for name, start, end in periods:
backtest = Backtest(start_date=start, end_date=end, ...)
results = backtest.run()
print(f"{name}: Return={results['total_return']:.2f}%, Sharpe={results['sharpe_ratio']:.2f}")
3. Require Minimum Sample Size¶
results = backtest.run()
# Check minimum number of trades
if results['total_trades'] < 30:
print("WARNING: Insufficient trades for statistical significance")
print(f"Only {results['total_trades']} trades. Need at least 30.")
4. Compare to Benchmark¶
# Strategy results
strategy_return = results['total_return']
# Benchmark: Buy and hold
benchmark_return = results['benchmark_return']
# Compare
excess_return = strategy_return - benchmark_return
print(f"Strategy Return: {strategy_return:.2f}%")
print(f"Benchmark Return: {benchmark_return:.2f}%")
print(f"Excess Return: {excess_return:.2f}%")
if excess_return < 0:
print("WARNING: Strategy underperforms buy-and-hold!")
5. Document Everything¶
Keep a research journal:
# Strategy Research Log
## 2024-01-15: SMA Crossover Initial Test
- Strategy: SMA Crossover (50/200)
- Period: 2023-01-01 to 2023-12-31
- Capital: $100,000
- Results:
- Total Return: 15.43%
- Sharpe Ratio: 1.85
- Max Drawdown: -12.54%
- Total Trades: 47
- Observations: Works well in trending markets, struggles in sideways
- Next Steps: Test on 2024 data, optimize parameters
## 2024-01-16: Parameter Optimization
- Tested fast_period: [20, 30, 50, 100]
- Tested slow_period: [100, 150, 200, 300]
- Best params: fast=30, slow=150
- In-sample Sharpe: 2.15
- Out-sample Sharpe: 1.82 (good - no overfitting)
- Proceeding with these parameters
Common Questions¶
Q: What's a good Sharpe ratio?
A: > 1.0 is acceptable, > 2.0 is good, > 3.0 is excellent (but verify for overfitting).
Q: How many trades do I need for statistical significance?
A: Minimum 30, preferably 50+. More is better.
Q: Should I optimize for total return or Sharpe ratio?
A: Sharpe ratio (risk-adjusted). High returns with huge drawdowns are not sustainable.
Q: What if my strategy works in backtest but fails in live trading?
A: Common causes: lookahead bias, overfitting, transaction costs underestimated, different market regime.
Q: How do I know if I'm overfitting?
A: Use walk-forward optimization. If out-sample performance << in-sample, you're overfitting.
Resources¶
- Strategy Development: See
docs/STRATEGY_DEVELOPMENT.md - Risk Management: See
docs/RISK_MANAGEMENT.md - Architecture: See
docs/BACKTESTING_ARCHITECTURE.md - Performance Metrics: See
src/backtesting/performance.py - Example Notebooks: See
notebooks/backtesting_examples.ipynb
Support¶
For questions about backtesting:
- Check this documentation
- Review example backtests in
notebooks/ - Examine test files in
tests/backtesting/ - Open an issue with the
backtestinglabel