ML Features Documentation¶

Overview¶

The ML Features module provides comprehensive feature engineering capabilities for cryptocurrency trading. It includes 50+ technical indicators, price patterns, volume features, and time-based features.

Quick Start¶

from ml.features import FeatureEngineer
import pandas as pd

# Load your OHLCV data
df = pd.read_csv('ohlcv_data.csv')

# Initialize feature engineer
fe = FeatureEngineer(normalize=True, handle_missing='ffill')

# Generate features
features_df = fe.fit_transform(df)

print(f"Generated {len(fe.get_feature_names())} features")

Feature Categories¶

1. Technical Indicators (30+ features)¶

Moving Averages: - Simple Moving Average (SMA): 5, 10, 20, 50, 100, 200 periods - Exponential Moving Average (EMA): 5, 10, 20, 50, 100, 200 periods - Price to MA ratios - MA crossovers

Momentum Indicators: - RSI (Relative Strength Index): 7, 14, 21 periods - ROC (Rate of Change): 5, 10, 20 periods - Stochastic Oscillator (K and D) - Williams %R

Trend Indicators: - MACD (Moving Average Convergence Divergence) - ADX (Average Directional Index) - CCI (Commodity Channel Index)

Volatility Indicators: - Bollinger Bands (upper, middle, lower, width, position) - ATR (Average True Range): 14, 21 periods

2. Price Patterns (10+ features)¶

Candlestick Patterns: - Doji - Hammer/Hanging Man - Shooting Star - Bullish/Bearish Engulfing

Chart Patterns: - Higher highs/Lower lows - Consecutive up/down moves - Gap detection

3. Volume Features (8+ features)¶

Volume moving averages
Volume ratios
OBV (On-Balance Volume)
VWAP (Volume-Weighted Average Price)
Volume momentum
MFI (Money Flow Index)

4. Time Features (12+ features)¶

Hour, day, month, quarter
Cyclical encoding (sin/cos transformations)
Weekend indicator
Month start/end indicators

5. Order Book Features (Optional)¶

Bid-ask spread
Order book imbalance
Book depth
Weighted mid price

Usage Examples¶

Basic Feature Engineering¶

from ml.features import FeatureEngineer

# Create feature engineer
fe = FeatureEngineer(normalize=False)

# Generate features
features = fe.fit_transform(df)

# Get feature names
feature_names = fe.get_feature_names()
print(f"Features: {feature_names[:10]}")

Feature Selection¶

from ml.feature_selector import FeatureSelector

# Prepare data
X = features[fe.get_feature_names()]
y = features['close'].pct_change().shift(-1)

# Select top features by importance
selector = FeatureSelector(task='regression', n_features=30)
X_selected = selector.select_by_importance(X, y)

# View importance
importance_df = selector.get_feature_importance_df()
print(importance_df.head(10))

Remove Correlated Features¶

# Remove highly correlated features
selector = FeatureSelector()
X_uncorr = selector.remove_correlated_features(X, threshold=0.95)

Multi-Method Analysis¶

from ml.feature_selector import analyze_feature_importance

# Analyze using multiple methods
results = analyze_feature_importance(
    X, y,
    task='regression',
    methods=['rf', 'corr', 'kbest']
)

# Compare methods
print("Random Forest top 5:")
print(results['random_forest'].head())

print("\nCorrelation top 5:")
print(results['correlation'].head())

Feature Engineering Options¶

Normalization¶

# Z-score normalization
fe = FeatureEngineer(normalize=True)
features = fe.fit_transform(df)

# Features are normalized with mean=0, std=1

Missing Value Handling¶

# Forward fill
fe = FeatureEngineer(handle_missing='ffill')

# Backward fill
fe = FeatureEngineer(handle_missing='bfill')

# Fill with zeros
fe = FeatureEngineer(handle_missing='zero')

# Drop rows with missing values
fe = FeatureEngineer(handle_missing='drop')

Order Book Features¶

from ml.features import OrderBookFeatures

obf = OrderBookFeatures()

# Calculate spread
spread = obf.calculate_spread(bid=50000, ask=50010)

# Calculate imbalance
imbalance = obf.calculate_imbalance(bid_volume=100, ask_volume=80)

# Calculate depth
depth = obf.calculate_depth(order_book, levels=10)

# Add to dataframe
df['spread'] = ...
df['imbalance'] = ...
features = fe.add_order_book_features(
    df, 
    bid_ask_spread=df['spread'],
    order_imbalance=df['imbalance']
)

Feature Selection Methods¶

1. Random Forest Importance¶

selector = FeatureSelector(task='regression', n_features=30)
X_selected = selector.select_by_importance(X, y, threshold=0.01)

2. Correlation-Based¶

selector = FeatureSelector(task='regression')
X_selected = selector.select_by_correlation(X, y, threshold=0.05)

3. Recursive Feature Elimination¶

selector = FeatureSelector(task='regression')
X_selected = selector.select_by_rfe(X, y, n_features=50)

4. Statistical Tests (K-Best)¶

selector = FeatureSelector(task='regression')
X_selected = selector.select_k_best(X, y, k=30)

Integration with ML Models¶

from ml.features import FeatureEngineer
from ml.models.price_predictor import LSTMPricePredictor

# Generate features
fe = FeatureEngineer(normalize=True)
features = fe.fit_transform(df)

# Select important features
selector = FeatureSelector(n_features=30)
X = selector.select_by_importance(
    features[fe.get_feature_names()],
    features['close'].pct_change().shift(-1)
)

# Use with LSTM model
model = LSTMPricePredictor(
    input_features=len(selector.selected_features_),
    hidden_size=128,
    num_layers=2
)

Performance Considerations¶

Feature Generation: Vectorized operations using pandas/numpy for speed
Memory: Use df.copy() to avoid modifying original data
Normalization: Store scaler parameters for consistent transform
Missing Values: Handle early to avoid propagation

Best Practices¶

Always split data before feature selection: Avoid look-ahead bias
Use separate fit/transform: Fit on training, transform on test
Remove correlated features: Reduce multicollinearity
Select relevant features: More isn't always better
Normalize for neural networks: Critical for LSTM/RL models

Examples¶

See notebooks/ml_feature_engineering.ipynb for comprehensive examples including: - Feature generation - Visualization - Feature selection - Integration with models

API Reference¶

FeatureEngineer¶

Methods: - fit_transform(df): Generate and fit features - transform(df): Transform using fitted parameters - get_feature_names(): Get list of feature names - add_order_book_features(df, ...): Add order book features

Parameters: - normalize (bool): Whether to normalize features - handle_missing (str): Method to handle missing values

FeatureSelector¶

Methods: - select_by_importance(X, y, threshold): Random Forest importance - select_by_correlation(X, y, threshold): Correlation with target - select_by_rfe(X, y, n_features): Recursive Feature Elimination - select_k_best(X, y, k): Statistical test selection - remove_correlated_features(X, threshold): Remove correlations - get_feature_importance_df(): Get importance DataFrame - get_selected_features(): Get selected feature names

Parameters: - task (str): 'regression' or 'classification' - n_features (int): Number of features to select

Testing¶

# Run tests
pytest tests/ml/test_features.py -v

# With coverage
pytest tests/ml/test_features.py --cov=src/ml/features --cov-report=html