ML Features Documentation¶
Overview¶
The ML Features module provides comprehensive feature engineering capabilities for cryptocurrency trading. It includes 50+ technical indicators, price patterns, volume features, and time-based features.
Quick Start¶
from ml.features import FeatureEngineer
import pandas as pd
# Load your OHLCV data
df = pd.read_csv('ohlcv_data.csv')
# Initialize feature engineer
fe = FeatureEngineer(normalize=True, handle_missing='ffill')
# Generate features
features_df = fe.fit_transform(df)
print(f"Generated {len(fe.get_feature_names())} features")
Feature Categories¶
1. Technical Indicators (30+ features)¶
Moving Averages: - Simple Moving Average (SMA): 5, 10, 20, 50, 100, 200 periods - Exponential Moving Average (EMA): 5, 10, 20, 50, 100, 200 periods - Price to MA ratios - MA crossovers
Momentum Indicators: - RSI (Relative Strength Index): 7, 14, 21 periods - ROC (Rate of Change): 5, 10, 20 periods - Stochastic Oscillator (K and D) - Williams %R
Trend Indicators: - MACD (Moving Average Convergence Divergence) - ADX (Average Directional Index) - CCI (Commodity Channel Index)
Volatility Indicators: - Bollinger Bands (upper, middle, lower, width, position) - ATR (Average True Range): 14, 21 periods
2. Price Patterns (10+ features)¶
Candlestick Patterns: - Doji - Hammer/Hanging Man - Shooting Star - Bullish/Bearish Engulfing
Chart Patterns: - Higher highs/Lower lows - Consecutive up/down moves - Gap detection
3. Volume Features (8+ features)¶
- Volume moving averages
- Volume ratios
- OBV (On-Balance Volume)
- VWAP (Volume-Weighted Average Price)
- Volume momentum
- MFI (Money Flow Index)
4. Time Features (12+ features)¶
- Hour, day, month, quarter
- Cyclical encoding (sin/cos transformations)
- Weekend indicator
- Month start/end indicators
5. Order Book Features (Optional)¶
- Bid-ask spread
- Order book imbalance
- Book depth
- Weighted mid price
Usage Examples¶
Basic Feature Engineering¶
from ml.features import FeatureEngineer
# Create feature engineer
fe = FeatureEngineer(normalize=False)
# Generate features
features = fe.fit_transform(df)
# Get feature names
feature_names = fe.get_feature_names()
print(f"Features: {feature_names[:10]}")
Feature Selection¶
from ml.feature_selector import FeatureSelector
# Prepare data
X = features[fe.get_feature_names()]
y = features['close'].pct_change().shift(-1)
# Select top features by importance
selector = FeatureSelector(task='regression', n_features=30)
X_selected = selector.select_by_importance(X, y)
# View importance
importance_df = selector.get_feature_importance_df()
print(importance_df.head(10))
Remove Correlated Features¶
# Remove highly correlated features
selector = FeatureSelector()
X_uncorr = selector.remove_correlated_features(X, threshold=0.95)
Multi-Method Analysis¶
from ml.feature_selector import analyze_feature_importance
# Analyze using multiple methods
results = analyze_feature_importance(
X, y,
task='regression',
methods=['rf', 'corr', 'kbest']
)
# Compare methods
print("Random Forest top 5:")
print(results['random_forest'].head())
print("\nCorrelation top 5:")
print(results['correlation'].head())
Feature Engineering Options¶
Normalization¶
# Z-score normalization
fe = FeatureEngineer(normalize=True)
features = fe.fit_transform(df)
# Features are normalized with mean=0, std=1
Missing Value Handling¶
# Forward fill
fe = FeatureEngineer(handle_missing='ffill')
# Backward fill
fe = FeatureEngineer(handle_missing='bfill')
# Fill with zeros
fe = FeatureEngineer(handle_missing='zero')
# Drop rows with missing values
fe = FeatureEngineer(handle_missing='drop')
Order Book Features¶
from ml.features import OrderBookFeatures
obf = OrderBookFeatures()
# Calculate spread
spread = obf.calculate_spread(bid=50000, ask=50010)
# Calculate imbalance
imbalance = obf.calculate_imbalance(bid_volume=100, ask_volume=80)
# Calculate depth
depth = obf.calculate_depth(order_book, levels=10)
# Add to dataframe
df['spread'] = ...
df['imbalance'] = ...
features = fe.add_order_book_features(
df,
bid_ask_spread=df['spread'],
order_imbalance=df['imbalance']
)
Feature Selection Methods¶
1. Random Forest Importance¶
selector = FeatureSelector(task='regression', n_features=30)
X_selected = selector.select_by_importance(X, y, threshold=0.01)
2. Correlation-Based¶
selector = FeatureSelector(task='regression')
X_selected = selector.select_by_correlation(X, y, threshold=0.05)
3. Recursive Feature Elimination¶
selector = FeatureSelector(task='regression')
X_selected = selector.select_by_rfe(X, y, n_features=50)
4. Statistical Tests (K-Best)¶
Integration with ML Models¶
from ml.features import FeatureEngineer
from ml.models.price_predictor import LSTMPricePredictor
# Generate features
fe = FeatureEngineer(normalize=True)
features = fe.fit_transform(df)
# Select important features
selector = FeatureSelector(n_features=30)
X = selector.select_by_importance(
features[fe.get_feature_names()],
features['close'].pct_change().shift(-1)
)
# Use with LSTM model
model = LSTMPricePredictor(
input_features=len(selector.selected_features_),
hidden_size=128,
num_layers=2
)
Performance Considerations¶
- Feature Generation: Vectorized operations using pandas/numpy for speed
- Memory: Use
df.copy()to avoid modifying original data - Normalization: Store scaler parameters for consistent transform
- Missing Values: Handle early to avoid propagation
Best Practices¶
- Always split data before feature selection: Avoid look-ahead bias
- Use separate fit/transform: Fit on training, transform on test
- Remove correlated features: Reduce multicollinearity
- Select relevant features: More isn't always better
- Normalize for neural networks: Critical for LSTM/RL models
Examples¶
See notebooks/ml_feature_engineering.ipynb for comprehensive examples including: - Feature generation - Visualization - Feature selection - Integration with models
API Reference¶
FeatureEngineer¶
Methods: - fit_transform(df): Generate and fit features - transform(df): Transform using fitted parameters - get_feature_names(): Get list of feature names - add_order_book_features(df, ...): Add order book features
Parameters: - normalize (bool): Whether to normalize features - handle_missing (str): Method to handle missing values
FeatureSelector¶
Methods: - select_by_importance(X, y, threshold): Random Forest importance - select_by_correlation(X, y, threshold): Correlation with target - select_by_rfe(X, y, n_features): Recursive Feature Elimination - select_k_best(X, y, k): Statistical test selection - remove_correlated_features(X, threshold): Remove correlations - get_feature_importance_df(): Get importance DataFrame - get_selected_features(): Get selected feature names
Parameters: - task (str): 'regression' or 'classification' - n_features (int): Number of features to select
Testing¶
# Run tests
pytest tests/ml/test_features.py -v
# With coverage
pytest tests/ml/test_features.py --cov=src/ml/features --cov-report=html
See Also¶
- Price Prediction Model
- Regime Detection
- RL Trading
- Tutorial notebook:
notebooks/ml_feature_engineering.ipynb