Product Requirements Document (PRD)¶

DSTA — Dr. Strange Trading Analysis¶

Version: 2.0 Last Updated: 2026-05-20 Status: Active Development

1. Executive Summary¶

1.1 Vision¶

DSTA (Dr. Strange Trading Analysis) is a personal, end-to-end cryptocurrency trading platform built as a microservices monorepo. It covers the full lifecycle from raw market data ingestion through strategy backtesting, ML-driven signal generation, live order execution, and portfolio monitoring — all running on self-hosted infrastructure.

The platform is evolving across four sequential phases:

Phase	Theme	Timeline
1	Close the Loop — first live trade end-to-end	Weeks 1–4
2	Intelligence Upgrade — SOTA ML signals	Weeks 5–10
3	Alternative Data & Portfolio	Weeks 10–18
4	LLM Agent Layer	Month 5+

1.2 Objectives¶

Deliver a live trading pipeline where a strategy running in dsta-trading-svc places real orders on Binance/Huobi/Gate.io through dsta-exchange-svc.
Validate every strategy against walk-forward backtests before live deployment.
Replace heuristic-only signals with an ensemble that combines technical indicators, SOTA time-series ML models, and sentiment analysis.
Extend to alternative data sources (funding rates, on-chain metrics, options flow, order book depth) and portfolio-level optimization.
Build a MarketAnalystAgent using the Claude API that autonomously discovers, backtests, and proposes strategies — with human approval before execution.

1.3 Target Users¶

This is a personal research platform. The primary user is a quantitative developer who: - Understands Python, Docker, REST/WebSocket APIs, and basic quantitative finance. - Wants full transparency and control over every layer of the stack. - Is willing to iterate on strategy logic, ML models, and infra in parallel.

2. Current State of the Codebase (as of 2026-05-20)¶

Understanding what already exists is required before planning new work.

2.1 Services¶

Service	Port	Status	Notes
`dsta-core-svc`	8001	Partial	JWT auth works; Telegram notifications not wired
`dsta-trading-svc`	8002	Partial	Backtesting engine works; live execution is dry-run only; strategy registry is in-memory
`dsta-data-svc`	8003	Partial	Historical download works; WebSocket ingestion to TimescaleDB not implemented
`dsta-ml-svc`	8004	Partial	LSTM predictor and PPO RL agent exist; not wired to trading-svc
`dsta-exchange-svc`	8005	Partial	Binance + Huobi adapters work; Gate.io adapter incomplete; circuit breaker missing
`dsta-cli`	—	Skeleton	Commands exist but no end-to-end smoke test
`dsta-qa`	—	Partial	Quant analytics scripts exist; not wired to live data
`dsta-web`	—	Shell	React/Vite shell; no dashboard pages implemented

2.2 What Works¶

Backtesting engine (event-driven, supports slippage + fees)
Binance and Huobi exchange adapters (REST + WebSocket)
Feature engineering pipeline
LSTM price predictor (training + inference)
PPO reinforcement learning agent
CLI skeleton
JWT authentication in core-svc

2.3 What Does Not Work Yet¶

Live end-to-end trading pipeline (strategy → exchange order)
Gate.io adapter
ML models connected to trading signal flow
WebSocket OHLCV ingestion persisted to TimescaleDB
Web dashboard (UI is an empty shell)
Telegram notifications
Docker Compose full-stack deployment
All services deployed simultaneously with API Gateway

3. Market Context¶

3.1 Opportunity¶

Cryptocurrency markets operate 24/7 with high volatility and thin institutional participation in altcoin pairs, creating exploitable inefficiencies for systematic strategies. Pain points with existing solutions:

Commercial bots (3Commas, Cryptohopper): opaque signal logic, expensive subscriptions, no ML customization.
Open-source backtesting libraries (Backtrader, VectorBT): backtest only, no live execution path.
Full platforms (Freqtrade, Hummingbot): rigid strategy DSLs, limited ML integration surface.

3.2 DSTA Differentiators¶

Full vertical stack: data ingestion → feature engineering → ML training → live execution → portfolio monitoring.
Bring-your-own model: plug in any PyTorch model for signal generation.
Alternative data first-class: funding rates, on-chain, options flow, sentiment built into the feature store.
LLM-assisted strategy discovery: automated hypothesis generation and backtesting via Claude API tool-use.
Self-hosted: no SaaS fees, no data leaving the machine, reproducible experiments via MLflow.

4. User Requirements¶

4.1 Data & Research¶

As a developer, I want WebSocket OHLCV data ingested continuously into TimescaleDB so I can build features from recent candles without manual downloads.
As a developer, I want historical OHLCV data available for at least 2 years at 1m resolution for all tracked pairs, so backtests cover multiple market regimes.
As a developer, I want a feature store (Feast + Redis) that serves consistent features at both training time and inference time, so there is no train/serve skew.

4.2 Strategy & Backtesting¶

As a developer, I want to run walk-forward backtests on any registered strategy and get a structured report (Sharpe, Calmar, max drawdown, win rate, trade log) before deploying live.
As a developer, I want strategy parameters optimized with Optuna walk-forward search, not grid search, so parameter selection is not forward-looking.
As a developer, I want to compare multiple strategy results in a single CLI command.

4.3 Live Trading¶

As a developer, I want trading-svc to call exchange-svc's order API so that a BUY/SELL signal from a strategy results in a real order on the exchange.
As a developer, I want the strategy registry persisted in the database so running strategies survive a service restart.
As a developer, I want per-strategy risk controls (max position size, max drawdown circuit breaker) enforced before any order is submitted.

4.4 ML & Signals¶

As a developer, I want an ensemble signal aggregator that combines technical indicators, ML price predictions, and sentiment scores into a single confidence-weighted signal.
As a developer, I want live market regime detection (HMM/BOCPD) so strategies can adapt their parameters to trending vs. mean-reverting conditions.
As a developer, I want MLflow experiment tracking so I can compare model versions and reproduce any training run.

4.5 Alternative Data¶

As a developer, I want funding rate data from Binance and Gate.io ingested and exposed as a feature, so I can use leverage imbalance as a signal.
As a developer, I want on-chain metrics (SOPR, MVRV, whale flows) from Glassnode free tier / CryptoQuant ingested on a daily schedule.
As a developer, I want L2 order book imbalance computed from WebSocket depth streams and stored as a feature.

4.6 Portfolio Management¶

As a developer, I want simultaneous multi-asset positions managed by a portfolio layer that tracks correlation and enforces Kelly Criterion sizing.
As a developer, I want CVaR-constrained portfolio optimization (PyPortfolioOpt + CVXPY) to compute target weights.
As a developer, I want auto-rebalancing triggered when actual weights drift beyond a configurable threshold from target weights.

4.7 Notifications & Monitoring¶

As a developer, I want Telegram notifications on: trade fill, PnL alert (daily summary), max drawdown breach, and service health failure.
As a developer, I want a Prometheus/Grafana stack where each service exposes /metrics and dashboards are version-controlled.

4.8 LLM Agent¶

As a developer, I want a MarketAnalystAgent that can autonomously generate strategy hypotheses, run backtests, and present ranked results — but requires my explicit approval before placing any live order.
As a developer, I want the agent to query a RAG knowledge base built from my research notebooks and past backtest results.

5. Functional Requirements¶

5.1 Data Collection (dsta-data-svc)¶

5.1.1 Historical Data¶

REQ-DATA-001: Fetch OHLCV candlestick data from Binance, Huobi, and Gate.io REST APIs.
REQ-DATA-002: Support timeframes: 1m, 3m, 5m, 15m, 30m, 1h, 2h, 4h, 6h, 12h, 1d.
REQ-DATA-003: Implement rate-limit-aware download with exponential backoff and resume capability.
REQ-DATA-004: Store candlesticks in TimescaleDB hypertables partitioned by time and symbol.
REQ-DATA-005: Validate candles on ingest (gap detection, outlier rejection, OHLC consistency checks).

5.1.2 Real-Time Ingestion Pipeline¶

REQ-DATA-006: Maintain persistent WebSocket connections to Binance, Huobi, and Gate.io for kline/candlestick streams.
REQ-DATA-007: Write arriving candles to TimescaleDB within 500ms of receipt.
REQ-DATA-008: Reconnect automatically on WebSocket drop; log gap intervals.
REQ-DATA-009: Expose an HTTP SSE endpoint (/stream/ohlcv) so other services can subscribe to the live candle feed.

5.1.3 Order Book Data¶

REQ-DATA-010: Subscribe to L2 order book depth streams (top-20 levels) from WebSocket.
REQ-DATA-011: Compute and persist order book imbalance (bid volume / (bid + ask volume)) as a time-series feature at 1-second resolution.

5.1.4 Alternative Data¶

REQ-DATA-012: Ingest funding rates from Binance Futures and Gate.io Futures on a 1-hour schedule via REST.
REQ-DATA-013: Ingest Glassnode free-tier metrics (SOPR, MVRV, exchange flows) on a daily schedule via REST.
REQ-DATA-014: Ingest CryptoPanic RSS feed and score each article with FinBERT; store score + timestamp per token.
REQ-DATA-015: Fetch Google Trends weekly data for top-10 tracked tokens via pytrends.
REQ-DATA-016: Pull Deribit options data (put/call ratio, gamma exposure) for BTC and ETH on an hourly schedule.

5.1.5 Data Export¶

REQ-DATA-017: Provide a CLI command and REST endpoint to export any symbol/timeframe slice to CSV, JSON, or Parquet.

5.2 Exchange Adapters (dsta-exchange-svc)¶

REQ-EXCH-001: Implement a unified ExchangeAdapter interface with methods: get_ticker, get_orderbook, get_ohlcv, place_order, cancel_order, get_position, get_balance.
REQ-EXCH-002: Complete the Gate.io adapter implementing all ExchangeAdapter methods.
REQ-EXCH-003: Implement circuit breaker per adapter: after 5 consecutive failures within 60 seconds, open the circuit for 30 seconds before retrying.
REQ-EXCH-004: Expose a WebSocket proxy endpoint so other services can subscribe to real-time exchange feeds without holding their own exchange connections.
REQ-EXCH-005: All order placement calls must be idempotent (client-order-id based) to prevent double-submission on retry.
REQ-EXCH-006: Log every order request and response (masked API keys) to an audit table.

5.3 Backtesting Engine (dsta-trading-svc)¶

REQ-BT-001: Event-driven architecture: data events → strategy signal → order event → fill event → position update.
REQ-BT-002: Simulate realistic fills: market orders fill at next-candle open with configurable slippage model; limit orders fill when price crosses limit.
REQ-BT-003: Apply per-exchange fee schedules (maker/taker).
REQ-BT-004: Support long, short, and leveraged positions.
REQ-BT-005: Support portfolio-level backtesting across multiple symbols simultaneously.
REQ-BT-006: Walk-forward validation: split data into rolling in-sample/out-of-sample windows; report out-of-sample performance only.
REQ-BT-007: Output metrics: total return, annualized return, Sharpe ratio, Calmar ratio, Sortino ratio, max drawdown, win rate, profit factor, average trade duration, trade count.
REQ-BT-008: Generate equity curve (JSON + optional PNG) and per-trade log (CSV).

5.4 Strategy Framework (dsta-trading-svc)¶

REQ-STRAT-001: Provide a BaseStrategy abstract class with lifecycle hooks: on_candle, on_fill, on_stop.
REQ-STRAT-002: Bundle the following reference strategies: SMA crossover, RSI mean-reversion, Bollinger Band breakout.
REQ-STRAT-003: Persist strategy registry in PostgreSQL. Each row stores: strategy_id, name, class_path, parameters (JSONB), status (enabled/disabled/paper), exchange, symbol, created_at.
REQ-STRAT-004: Support hot-reload: mark a strategy as disabled in DB; the execution scheduler detects the change within one candle interval and stops the strategy without restart.
REQ-STRAT-005: Optuna walk-forward hyperparameter optimization: given a strategy class and parameter search space, run N walk-forward trials and return the best parameter set.

5.5 Live Execution (dsta-trading-svc)¶

REQ-EXEC-001: The execution engine subscribes to the live candle feed from data-svc and delivers candles to each enabled strategy.
REQ-EXEC-002: When a strategy emits a signal, the execution engine validates risk controls (position size limit, max drawdown circuit breaker, open order count) before calling exchange-svc.
REQ-EXEC-003: On order fill confirmation from exchange-svc, update the position tracker and emit a fill event to core-svc for notification.
REQ-EXEC-004: Maintain a paper trading mode where signals are executed against a virtual account at mid-price; no exchange API calls are made.
REQ-EXEC-005: Emergency stop: a CLI command or API call halts all strategies and cancels all open orders on all exchanges within 5 seconds.

5.6 Ensemble Signal Aggregator (dsta-trading-svc)¶

REQ-ENS-001: Aggregate signals from three sources per symbol: (a) technical indicator strategy output, (b) ml-svc price prediction confidence score, © sentiment pipeline score from data-svc.
REQ-ENS-002: Apply confidence weighting: each signal source has a configurable weight; weights sum to 1.0.
REQ-ENS-003: Emit a composite signal (BUY/SELL/HOLD) with a confidence value [0, 1] to the execution engine.
REQ-ENS-004: Log each component signal and the composite result per candle for post-hoc analysis.

5.7 ML Models (dsta-ml-svc)¶

5.7.1 Price Prediction¶

REQ-ML-001: Retain the existing LSTM model as a baseline; add PatchTST and iTransformer as alternative architectures selectable via config.
REQ-ML-002: Integrate Chronos (Amazon, 2024) as a zero-shot inference option — no fine-tuning required, inference only.
REQ-ML-003: Expose a REST endpoint POST /predict accepting {symbol, features: [...], horizon: int}, returning {point_estimate, confidence_interval, model_id}.
REQ-ML-004: Track all training runs in MLflow (self-hosted): log hyperparameters, validation metrics, and the serialized model artifact.

5.7.2 Market Regime Detection¶

REQ-ML-005: Implement a Hidden Markov Model (HMM) regime detector with states: trending-up, trending-down, mean-reverting, high-volatility.
REQ-ML-006: Expose GET /regime/{symbol} returning current regime and posterior probabilities.
REQ-ML-007: Store regime transitions in the database so trading-svc can filter strategy signals by regime.

5.7.3 Feature Store¶

REQ-ML-008: Deploy Feast with Redis online store and TimescaleDB offline store.
REQ-ML-009: Define feature views for: OHLCV-derived technical features, order book imbalance, funding rate, on-chain metrics, sentiment score.
REQ-ML-010: Training pipelines read features from the Feast offline store; inference pipelines read from the Redis online store.

5.7.4 Reinforcement Learning¶

REQ-ML-011: The existing PPO agent is retrained against the backtesting environment after each new month of data is available.
REQ-ML-012: The PPO agent's action (buy/sell/hold + position fraction) is exposed as a signal source via the same POST /predict interface.

5.8 Core Services (dsta-core-svc)¶

REQ-CORE-001: Issue and validate JWTs for all API requests; token rotation supported without service restart.
REQ-CORE-002: Telegram notification dispatcher: receive NotificationEvent messages from RabbitMQ/Kafka and send to configured chat ID.
REQ-CORE-003: Notification event types: TRADE_FILL, PNL_DAILY_SUMMARY, DRAWDOWN_ALERT, SERVICE_DOWN, STRATEGY_STOPPED.
REQ-CORE-004: Rate-limit outgoing Telegram messages to avoid hitting Bot API limits (30 messages/second per bot).

5.9 Portfolio Management (dsta-trading-svc)¶

REQ-PORT-001: Track simultaneous positions across multiple symbols; maintain a correlation matrix updated daily.
REQ-PORT-002: Implement Kelly Criterion position sizing: given win rate and payoff ratio estimated from recent trades, compute the fraction of capital to deploy.
REQ-PORT-003: CVaR-constrained optimization: use PyPortfolioOpt + CVXPY to compute target portfolio weights given expected returns and covariance.
REQ-PORT-004: Auto-rebalancing: when any asset weight deviates from target by more than a configurable threshold (default 5%), trigger a rebalance order.

5.10 LLM Agent Layer (dsta-ml-svc or standalone dsta-agent-svc)¶

REQ-AGENT-001: Implement MarketAnalystAgent using the Claude API (tool-use pattern). Tools: fetch_news, query_onchain_metrics, run_backtest, get_portfolio_state, recommend_trade.
REQ-AGENT-002: Human-in-the-loop gate: the agent's recommend_trade tool call produces a pending recommendation; the user must confirm via CLI or Telegram before execution.
REQ-AGENT-003: RAG knowledge base: index research notebooks, backtest result JSON files, and strategy documentation using LlamaIndex with pgvector. The agent can query this via a search_knowledge_base tool.
REQ-AGENT-004: Autonomous strategy discovery loop: agent generates a hypothesis, calls run_backtest, evaluates Sharpe/Calmar against a threshold, and either proposes deployment or discards.
REQ-AGENT-005: Use FinBERT for domain-specific financial NLP inside the sentiment pipeline; expose as a callable function to the agent.

5.11 Web Dashboard (dsta-web)¶

REQ-WEB-001: Dashboard page: real-time portfolio value, unrealized and realized PnL, positions table, open orders table.
REQ-WEB-002: Strategy management page: list strategies with status, parameters, and last backtest summary; enable/disable toggle.
REQ-WEB-003: Market data page: interactive candlestick chart (with overlaid technical indicators), order book heatmap, funding rate chart.
REQ-WEB-004: Backtesting page: trigger a backtest run, view progress, and download report.
REQ-WEB-005: Alerts page: list all notification events with acknowledge functionality.
REQ-WEB-006: Authenticate using JWT from core-svc; all API calls pass the token in the Authorization header.
REQ-WEB-007: Real-time price and position updates via WebSocket connection to exchange-svc proxy.

5.12 CLI (dsta-cli)¶

REQ-CLI-001: Commands: backtest run, strategy list, strategy enable/disable, order place, order list, position list, data download, data status, health check.
REQ-CLI-002: Each command targets a specific service endpoint; authentication via stored JWT (login command).
REQ-CLI-003: Output format: human-readable table by default; --json flag for machine-readable output.

6. Non-Functional Requirements¶

6.1 Performance¶

REQ-PERF-001: WebSocket candle ingestion to DB write: < 500ms end-to-end.
REQ-PERF-002: ML inference latency (/predict): < 200ms for LSTM/PatchTST; < 1s for Chronos.
REQ-PERF-003: Order placement to confirmation: < 2s on Binance; < 5s on Gate.io.
REQ-PERF-004: Backtesting 1 year of 1m data for a single symbol: < 60 seconds.
REQ-PERF-005: Ensemble signal computation per candle: < 50ms.

6.2 Reliability¶

REQ-REL-001: Exchange WebSocket connections recover automatically within 10 seconds of disconnect.
REQ-REL-002: No trade is placed without a corresponding audit log entry.
REQ-REL-003: Database migrations run automatically on service startup via Alembic/Django migrations.
REQ-REL-004: Each service exposes GET /health and GET /metrics (Prometheus format).

6.3 Security¶

REQ-SEC-001: Exchange API keys stored encrypted at rest (Fernet or similar); never logged.
REQ-SEC-002: All inter-service communication over HTTPS/WSS in production.
REQ-SEC-003: API Gateway enforces JWT validation before routing to downstream services.
REQ-SEC-004: Secrets managed via environment variables or a secrets manager; never committed to the repository.

6.4 Maintainability¶

REQ-MAINT-001: Each service has its own Makefile with standard targets: install, test, lint, run, docker-build.
REQ-MAINT-002: Unit test coverage > 80% for all new code in trading-svc, data-svc, and ml-svc.
REQ-MAINT-003: All services follow the same structured logging format (JSON, with service, level, timestamp, trace_id fields).
REQ-MAINT-004: OpenAPI specs generated from code and committed to docs/openapi/.

6.5 Deployment¶

REQ-DEPLOY-001: A single docker compose up in the deploy/ directory starts the full stack.
REQ-DEPLOY-002: Environment-specific configs managed via .env files; a .env.example is committed.
REQ-DEPLOY-003: TimescaleDB hypertable migrations run as a one-shot init container.

7. AI/ML Architecture¶

7.1 Signal Pipeline¶

[TimescaleDB OHLCV]  [Alt Data sources]  [Sentiment pipeline]
        |                    |                     |
        v                    v                     v
  [Feast offline store] ——> [Feature Engineering] ——> [Feast online store (Redis)]
                                                              |
                         ┌────────────────────────────────────┤
                         |              |                      |
                    [Technical      [ML Price           [Sentiment
                    indicators]     predictor]           score]
                         |              |                      |
                         └────────────────────────────────────┘
                                        |
                               [Ensemble aggregator]
                                        |
                               [Regime filter (HMM)]
                                        |
                               [Risk controls]
                                        |
                              [Order execution]

7.2 ML Model Stack¶

Model	Purpose	Architecture	Notes
LSTM	Price prediction (baseline)	2-layer LSTM	Existing; kept for comparison
PatchTST	Price prediction (SOTA)	Transformer on patches	Replace LSTM in Phase 2
iTransformer	Price prediction (SOTA)	Inverted transformer	Alternative to PatchTST
Chronos	Zero-shot forecasting	Foundation model	Amazon 2024; inference only
PPO RL agent	Position sizing signal	Proximal Policy Optimization	Existing; monthly retraining
HMM / BOCPD	Regime detection	Hidden Markov Model	Online inference; state stored in DB
FinBERT	Sentiment scoring	Domain-adapted BERT	Inference only; weights from HuggingFace

7.3 Feature Store Design (Feast + Redis)¶

Feature Views: - ohlcv_features: returns, log-returns, rolling volatility (5/20/60 periods), RSI, MACD, ATR, Bollinger width. - orderbook_features: bid/ask imbalance, spread, depth ratio. - funding_rate_features: current rate, 24h rolling mean, deviation from neutral. - onchain_features: SOPR 7d MA, MVRV Z-score, exchange net flows. - sentiment_features: FinBERT score 24h rolling mean, momentum (score delta 1h).

Entities: symbol (BTC/USDT, ETH/USDT, etc.)

Offline store: TimescaleDB (training data reads) Online store: Redis (sub-10ms feature retrieval at inference)

7.4 MLflow Tracking¶

Self-hosted MLflow server (mlflow-svc) with PostgreSQL backend store and S3-compatible (MinIO) artifact store.
Every training run logs: architecture name, hyperparameters, train/val loss curve, validation Sharpe, model artifact.
Registered model versions are tagged staging or production.
dsta-ml-svc loads the model version tagged production at startup.

7.5 LLM Agent Design (Phase 4)¶

User query / automated trigger
          |
          v
  MarketAnalystAgent (Claude API, tool-use)
          |
    ┌─────┴─────────────────────────────┐
    |     |          |         |        |
 fetch  query     run      get      recommend
 news  onchain  backtest  portfolio  trade
    |     |          |         |        |
    └──── RAG KB ────┤         └── Human approval gate
         (LlamaIndex           (Telegram confirm / CLI confirm)
          + pgvector)                   |
                                        v
                                  [Execution engine]

8. Technology Stack¶

8.1 Core Runtime¶

Component	Technology	Notes
Backend services	Python 3.12+, FastAPI (new) / Django 4.x (core-svc)	exchange-svc already migrated to FastAPI
OHLCV storage	TimescaleDB (PostgreSQL extension)	Hypertables for time-series
Relational storage	PostgreSQL 17	Strategies, orders, positions, configs
Cache / pub-sub	Redis 8	Online feature store + session cache
Message broker	RabbitMQ (Phase 1–2), Kafka (Phase 3+)	Kafka adds replay capability
Workflow orchestration	Prefect (Phase 3+)	Ingestion pipelines, retraining jobs
Feature store	Feast + Redis	Phase 2+
Experiment tracking	MLflow (self-hosted)	Phase 2+
Hyperparameter optimization	Optuna	Phase 2+
Containerization	Docker, Docker Compose	All services
API Gateway	Traefik or Kong	Phase 1

8.2 ML / Data Science¶

Component	Technology
Data manipulation	Pandas, NumPy, Polars (performance-critical paths)
ML framework	PyTorch 2.x
Time-series models	PatchTST, iTransformer, Chronos
RL	Stable-Baselines3 (PPO)
NLP	HuggingFace Transformers (FinBERT)
Portfolio optimization	PyPortfolioOpt + CVXPY
Regime detection	hmmlearn
RAG	LlamaIndex + pgvector
LLM agent	Anthropic Claude API (tool-use)

8.3 Exchange Connectivity¶

Exchange	REST	WebSocket	Status
Binance	python-binance	native	Working
Huobi	custom client	custom	Working
Gate.io	gate-api SDK	gate-api SDK	Incomplete — Phase 1
Deribit	deribit-api	—	Phase 3 (options data only)

8.4 Observability¶

Component	Technology
Metrics	Prometheus
Dashboards	Grafana
Logs	Loki + Promtail
Tracing	Jaeger (OpenTelemetry)

9. Development Roadmap¶

Phase 1 — Close the Loop (Weeks 1–4)¶

Goal: The first live trade is placed by a strategy and confirmed on the exchange.

WebSocket OHLCV ingestion pipeline writing to TimescaleDB (data-svc).
Gate.io adapter completing the exchange-svc adapter set.
trading-svc execution engine wired to exchange-svc live order API (no more dry-run).
Strategy registry migrated from in-memory to PostgreSQL.
Telegram notifications on TRADE_FILL, DRAWDOWN_ALERT, PNL_DAILY_SUMMARY (core-svc).
Full-stack Docker Compose deployment with API Gateway.
Walk-forward backtest validation of SMA crossover and RSI mean-reversion before going live.

Exit criteria: A paper-mode strategy running in trading-svc generates signals from live TimescaleDB data, calls exchange-svc, and core-svc sends a Telegram fill notification. All services start with docker compose up.

Phase 2 — Intelligence Upgrade (Weeks 5–10)¶

Goal: Replace LSTM-only signals with a SOTA ensemble; introduce feature store and MLflow.

PatchTST and iTransformer added to ml-svc as selectable architectures.
Chronos integrated as zero-shot inference option.
Live regime detection endpoint (HMM) in ml-svc.
Sentiment pipeline: CryptoPanic RSS → FinBERT → score stored in TimescaleDB.
Ensemble signal aggregator in trading-svc combining technical + ML + sentiment.
Optuna walk-forward hyperparameter optimization for strategy parameters.
Feast + Redis feature store with defined feature views for all data sources.
MLflow self-hosted for experiment tracking.

Exit criteria: The ensemble signal aggregator is live; at least one trading strategy uses a PatchTST or iTransformer signal; all training runs are logged in MLflow.

Phase 3 — Alternative Data & Portfolio (Weeks 10–18)¶

Goal: Richer signals and true portfolio-level management.

Funding rate ingestion (Binance + Gate.io) as a feature.
On-chain data ingestion (Glassnode / CryptoQuant): SOPR, MVRV, whale flows.
Deribit options flow ingestion: put/call ratio, gamma exposure (BTC + ETH).
L2 order book imbalance feature from WebSocket depth.
Multi-asset simultaneous positions with correlation tracking.
Kelly Criterion position sizing.
CVaR portfolio optimization (PyPortfolioOpt + CVXPY).
Auto-rebalancing on weight drift.
Kafka streaming to replace/augment RabbitMQ (enables ML training replay).
Prefect for scheduling ingestion, retraining, and reporting pipelines.

Exit criteria: The platform runs at least 3 simultaneous strategies across 5+ symbols; portfolio weights are rebalanced automatically; funding rate is an active feature in at least one live strategy.

Phase 4 — LLM Agent Layer (Month 5+)¶

Goal: Autonomous strategy discovery with human-in-the-loop execution.

MarketAnalystAgent using Claude API with 5 tools: fetch_news, query_onchain_metrics, run_backtest, get_portfolio_state, recommend_trade.
Human-in-the-loop approval gate (Telegram confirm or CLI confirm).
RAG knowledge base over research notebooks + backtest results (LlamaIndex + pgvector).
Autonomous strategy discovery: agent generates → backtests → ranks by Sharpe/Calmar → proposes deployment.
FinBERT / FinGPT integration for financial NLP tasks inside the agent tool chain.

Exit criteria: Agent can autonomously generate a strategy hypothesis, backtest it, and present a ranked recommendation requiring only human approval to deploy.

10. Data Models¶

10.1 Market Data (TimescaleDB hypertables)¶

Candlestick
  symbol       TEXT     NOT NULL
  exchange     TEXT     NOT NULL
  timeframe    TEXT     NOT NULL   -- '1m', '1h', etc.
  timestamp    TIMESTAMPTZ NOT NULL
  open         NUMERIC(24,8)
  high         NUMERIC(24,8)
  low          NUMERIC(24,8)
  close        NUMERIC(24,8)
  volume       NUMERIC(28,8)
  PRIMARY KEY (symbol, exchange, timeframe, timestamp)
  -- Hypertable partitioned on timestamp

OrderBookImbalance
  symbol       TEXT
  exchange     TEXT
  timestamp    TIMESTAMPTZ
  imbalance    FLOAT8         -- bid_vol / (bid_vol + ask_vol)
  spread_bps   FLOAT8
  PRIMARY KEY (symbol, exchange, timestamp)

FundingRate
  symbol       TEXT
  exchange     TEXT
  timestamp    TIMESTAMPTZ
  rate         NUMERIC(18,8)
  PRIMARY KEY (symbol, exchange, timestamp)

SentimentScore
  symbol       TEXT
  source       TEXT           -- 'cryptopanic', 'reddit'
  timestamp    TIMESTAMPTZ
  score        FLOAT4         -- FinBERT output [-1, 1]
  article_url  TEXT

10.2 Trading (PostgreSQL)¶

Strategy
  id           UUID   PRIMARY KEY
  name         TEXT   NOT NULL
  class_path   TEXT   NOT NULL
  parameters   JSONB
  status       TEXT   CHECK (status IN ('enabled','disabled','paper'))
  exchange     TEXT
  symbol       TEXT
  created_at   TIMESTAMPTZ
  updated_at   TIMESTAMPTZ

Order
  id                UUID   PRIMARY KEY
  exchange_order_id TEXT
  strategy_id       UUID   REFERENCES Strategy
  symbol            TEXT
  exchange          TEXT
  side              TEXT   -- 'buy' | 'sell'
  order_type        TEXT   -- 'market' | 'limit'
  quantity          NUMERIC(24,8)
  price             NUMERIC(24,8)
  status            TEXT   -- 'open' | 'filled' | 'canceled' | 'partial'
  created_at        TIMESTAMPTZ
  filled_at         TIMESTAMPTZ
  canceled_at       TIMESTAMPTZ
  client_order_id   TEXT   UNIQUE

Position
  id            UUID   PRIMARY KEY
  strategy_id   UUID   REFERENCES Strategy
  symbol        TEXT
  exchange      TEXT
  side          TEXT
  quantity      NUMERIC(24,8)
  entry_price   NUMERIC(24,8)
  current_price NUMERIC(24,8)
  unrealized_pnl NUMERIC(24,8)
  opened_at     TIMESTAMPTZ
  closed_at     TIMESTAMPTZ

TradeHistory
  id          UUID   PRIMARY KEY
  order_id    UUID   REFERENCES Order
  timestamp   TIMESTAMPTZ
  symbol      TEXT
  exchange    TEXT
  side        TEXT
  quantity    NUMERIC(24,8)
  price       NUMERIC(24,8)
  fee         NUMERIC(18,8)
  fee_asset   TEXT
  pnl         NUMERIC(24,8)
  strategy_id UUID

10.3 System / Config (PostgreSQL)¶

ExchangeAccount
  id          UUID   PRIMARY KEY
  exchange    TEXT
  api_key     TEXT   -- encrypted
  api_secret  TEXT   -- encrypted
  label       TEXT
  is_active   BOOLEAN
  created_at  TIMESTAMPTZ

NotificationEvent
  id          UUID   PRIMARY KEY
  event_type  TEXT   -- 'TRADE_FILL' | 'PNL_DAILY_SUMMARY' | ...
  payload     JSONB
  sent_at     TIMESTAMPTZ
  acknowledged BOOLEAN DEFAULT FALSE

11. Success Metrics¶

11.1 Development¶

Phase 1 exit: first real (or paper) order placed end-to-end by Week 4.
Phase 2 exit: ensemble signal live in production by Week 10.
Test coverage > 80% on all new code.
Zero broken builds on main branch.

11.2 Performance¶

WebSocket ingestion lag < 500ms (P99).
ML inference < 200ms (P95).
Backtest 1 year of 1m data: < 60 seconds.
Order-to-fill confirmation round-trip: < 2s (Binance).

11.3 Trading (Post-Phase 1 live)¶

Strategy Sharpe ratio > 1.5 (out-of-sample walk-forward).
Maximum drawdown < 20%.
Win rate > 52% on live trades.
Live vs. backtest Sharpe variance < 15%.

12. Risk Management¶

12.1 Technical Risks¶

Risk	Impact	Mitigation
Exchange API changes	High	Unified adapter interface; versioned client libs; smoke tests against testnet
Data gaps in TimescaleDB	High	Gap detection on ingest; backfill job; alert on missing candles
ML model degradation	High	MLflow model versioning; shadow mode comparison; automatic rollback
Service downtime during live trading	High	Health checks, auto-restart policies, emergency stop command
Train/serve feature skew	Medium	Feast enforces consistent feature definitions across offline/online
Double order submission on retry	High	Idempotent client-order-id on all order calls

12.2 Trading Risks¶

Risk	Impact	Mitigation
Strategy overfitting	High	Mandatory walk-forward validation before live deployment
Regime change making model stale	High	Regime detector gates strategy signals; monthly retraining
Flash crash	High	Circuit breaker in exchange-svc; position size limits; emergency stop
Runaway losses	Critical	Max drawdown circuit breaker per strategy; portfolio-level hard stop
Execution slippage vs. backtest	Medium	Conservative slippage model in backtest; post-live slippage report

12.3 Operational Risks¶

Risk	Impact	Mitigation
API key compromise	Critical	Encrypted at rest, never logged, rotation procedure documented
LLM agent runaway execution	Critical	Hard human-approval gate before any order; agent cannot call order API directly
Infrastructure cost blowup	Low	Self-hosted stack; resource limits in Docker Compose

13. Compliance & Legal¶

DSTA is a personal research platform; no trading advice is given to third parties.
Users trade at their own risk; no guarantee of profit or performance.
Users are responsible for their own tax reporting and local regulatory compliance.
API keys are the user's responsibility; the platform stores them encrypted but provides no custodial guarantees.
Open source under MIT License.

14. Appendices¶

14.1 Glossary¶

Term	Definition
OHLCV	Open, High, Low, Close, Volume — standard candlestick format
Sharpe Ratio	Annualized risk-adjusted return: (mean return − risk-free rate) / std dev of return
Calmar Ratio	Annualized return / max drawdown
Slippage	Difference between expected fill price and actual fill price
Walk-forward	Rolling in-sample/out-of-sample validation; avoids look-ahead bias
Feature skew	Discrepancy between features computed at training vs. inference time
HMM	Hidden Markov Model — used for regime detection
BOCPD	Bayesian Online Change Point Detection
CVaR	Conditional Value at Risk — tail-loss measure for portfolio optimization
Kelly Criterion	Formula for optimal bet size given edge and odds
PatchTST	2023 transformer architecture that patches time series before self-attention
iTransformer	2024 transformer that inverts attention to the feature (variate) dimension
Chronos	Amazon 2024 zero-shot time-series foundation model
Feast	Open-source feature store for ML
RAG	Retrieval-Augmented Generation — LLM answers grounded in retrieved documents

14.2 References¶

"Advances in Financial Machine Learning" — Marcos López de Prado
"Algorithmic Trading: Winning Strategies" — Ernest P. Chan
PatchTST: https://arxiv.org/abs/2211.14730
iTransformer: https://arxiv.org/abs/2310.06625
Chronos: https://arxiv.org/abs/2403.07815
Feast documentation: https://docs.feast.dev
Claude API tool-use: https://docs.anthropic.com/en/docs/tool-use

TASKS.md: Phased task breakdown (Phase 1–4)
docs/CHANGELOG.md: Service-level change history
deploy/docker-compose.yaml: Full-stack deployment spec
docs/openapi/: Per-service OpenAPI specs

Document Control

Version	Date	Author	Changes
1.0	2025-11-21	AI Assistant	Initial PRD creation
2.0	2026-05-20	minhdqdev	Full rewrite: 4-phase roadmap, ML architecture, alt data, agent layer