DSTA Microservices Architecture Plan¶
Version: 1.0
Date: 2026-01-28
Author: DSTA Engineering Team
Status: Planning & Design Phase
Table of Contents¶
- Executive Summary
- Current State Analysis
- Target Microservices Architecture
- Service Catalog
- Technology Stack
- Data Architecture
- Communication Patterns
- Migration Strategy
- Development Workflow
- Deployment & CI/CD
- Monitoring & Observability
- Timeline & Milestones
1. Executive Summary¶
1.1 Purpose¶
This document outlines the comprehensive plan to transform DSTA (Dr. Strange Trading Analysis) from a monolithic Django application into a modern microservices architecture. The decomposition will enable independent scaling, deployment, and development of trading system components while maintaining system reliability and performance.
1.2 Key Objectives¶
- Scalability: Enable independent scaling of data ingestion, trading execution, and analytics
- Reliability: Isolate failures and improve system resilience through service boundaries
- Development Velocity: Allow parallel development by multiple teams
- Technology Flexibility: Enable use of optimal tools per service (Django, FastAPI, etc.)
- Deployment Agility: Independent deployment with zero-downtime updates
1.3 Architectural Principles¶
- Domain-Driven Design: Services aligned with business domains
- API-First: Well-defined contracts using OpenAPI/AsyncAPI
- Event-Driven: Asynchronous communication via message broker
- Database Per Service: Each service owns its data
- Observability: Comprehensive logging, metrics, and tracing
- Security: Zero-trust architecture with mTLS and API authentication
2. Current State Analysis¶
2.1 Existing Architecture¶
┌─────────────────────────────────────────────┐
│ Django Monolith (src/) │
├─────────────────────────────────────────────┤
│ ├─ api/ (REST API + WebSockets) │
│ ├─ trading/ (Bot execution logic) │
│ ├─ backtesting/ (Backtesting engine) │
│ ├─ analytics/ (Reports & metrics) │
│ ├─ ml/ (ML models & training) │
│ ├─ data/ (Data management) │
│ ├─ huobi/ (Exchange integration) │
│ └─ dsta/ (Django core settings) │
├─────────────────────────────────────────────┤
│ External Dependencies: │
│ - PostgreSQL (primary database) │
│ - Redis (cache + Celery broker) │
│ - Celery (async tasks) │
└─────────────────────────────────────────────┘
2.2 Key Components (413 Python files)¶
API Layer (src/api/): - Django REST Framework endpoints - WebSocket connections for real-time data - Data validation and serialization - Exchange client wrappers (Binance, Gate.io, Huobi)
Trading Engine (src/trading/): - Order execution and position management - Risk management and circuit breakers - Multi-strategy coordination - Stop-loss and alert mechanisms
Backtesting (src/backtesting/): - Event-driven backtesting engine - Performance analytics and reporting - Monte Carlo simulation - Strategy optimization
Analytics (src/analytics/): - Performance metrics and reports - Correlation analysis - Transaction cost modeling - Visualization and charts
Machine Learning (src/ml/): - Feature engineering - Model training and deployment - Regime detection strategies - Reinforcement learning agents
Data Management (src/data/): - Market data ingestion - Data validation and quality checks - Gap detection and filling
2.3 Pain Points¶
- Scalability Limitations
- Cannot scale data ingestion independently from trading logic
- Single PostgreSQL database becomes bottleneck
-
Celery workers handle mixed workload types
-
Deployment Challenges
- Monolithic deployment requires full system restart
- High risk of system-wide failures
-
Long deployment windows
-
Development Friction
- 413 Python files in single codebase
- Tight coupling between modules
- Shared dependencies create conflicts
-
Difficult to test components in isolation
-
Operational Complexity
- Single failure point can cascade
- Difficult to identify performance bottlenecks
- Limited ability to optimize specific components
3. Target Microservices Architecture¶
3.1 High-Level Architecture¶
┌─────────────────────────────────────────────────────────────┐
│ API Gateway (Kong/Traefik) │
│ Authentication │ Rate Limiting │ Routing │ Load Balancing │
└─────────────────────────────────────────────────────────────┘
│
┌──────────────────────┼──────────────────────┐
│ │ │
▼ ▼ ▼
┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ dsta-core-svc │ │dsta-trading-svc │ │ dsta-data-svc │
├──────────────────┤ ├──────────────────┤ ├──────────────────┤
│ • Auth │ │ • Strategies │ │ • Market Data │
│ • Config │ │ • Execution │ │ • Analytics │
│ • Notifications │ │ • Positions │ │ • Backtesting │
│ • User Mgmt │ │ • Risk Mgmt │ │ • Reporting │
└──────────────────┘ └──────────────────┘ └──────────────────┘
│ │ │
└──────────────────────┼──────────────────────┘
│
▼
┌──────────────────────────────────────────────┐
│ Message Broker (RabbitMQ/Kafka) │
│ Events: Orders │ Fills │ Market Data │
└──────────────────────────────────────────────┘
│
┌──────────────────────┼──────────────────────┐
│ │ │
▼ ▼ ▼
┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ dsta-ml-svc │ │dsta-exchange-svc │ │ Storage Layer │
├──────────────────┤ ├──────────────────┤ ├──────────────────┤
│ • Feature Eng │ │ • Binance │ │ • PostgreSQL │
│ • Model Training │ │ • Gate.io │ │ • TimescaleDB │
│ • Predictions │ │ • Huobi │ │ • Redis │
│ • RL Agents │ │ • WebSocket Hub │ │ • S3/MinIO │
└──────────────────┘ └──────────────────┘ └──────────────────┘
3.2 Service Boundaries & Domains¶
The system is decomposed into 5 domain-based microservices:
- Core Service - User management, authentication, configuration, notifications
- Trading Service - Strategies, execution, position management, risk controls
- Data Service - Market data, analytics, backtesting
- ML Service - Machine learning models, predictions, feature engineering
- Exchange Service - Multi-exchange gateway, WebSocket connections
4. Service Catalog¶
4.1 Core Service (dsta-core-svc)¶
Responsibility: Platform foundation and cross-cutting concerns
Capabilities: - Authentication & Authorization - JWT token generation and validation - API key management - OAuth2 integration - Role-based access control (RBAC) - Session management
- Configuration Management
- Exchange credentials storage (encrypted)
- Strategy parameters versioning
- Feature flags management
- Environment-specific settings
-
Real-time configuration updates
-
Notification System
- Telegram bot integration
- Email notifications (SMTP)
- Webhook delivery
- Alert prioritization and deduplication
-
Notification templates
-
User Management
- User registration and profiles
- Preferences and settings
- Audit logging
Technology Stack: - Framework: Django 4.x + Django REST Framework - Database: PostgreSQL (users, config, notifications) - Cache: Redis (sessions, config cache, notification queue) - Encryption: Fernet/AWS KMS - Tools: Ruff, Pylance, uv
API Endpoints:
Auth:
POST /api/v1/auth/login
POST /api/v1/auth/refresh
POST /api/v1/auth/logout
GET /api/v1/auth/validate
POST /api/v1/auth/api-keys
Config:
GET /api/v1/config/{key}
PUT /api/v1/config/{key}
GET /api/v1/config/secrets/{key}
WS /api/v1/config/watch
Notifications:
POST /api/v1/notifications/send
GET /api/v1/notifications/history
POST /api/v1/notifications/templates
Users:
GET /api/v1/users/profile
PUT /api/v1/users/preferences
GET /api/v1/users/audit-log
Data Models: - User, Role, Permission, Session, APIKey - ConfigSchema, ConfigValue, SecretStore - Notification, Template, Recipient, DeliveryLog
Integration: - Publishes: Config updates, notification events - Provides: Authentication for all services
Scaling: Horizontal (stateless), read replicas for auth validation
Source Code (from monolith): - src/dsta/ (Django settings, middleware) - src/api/admin.py (user management) - src/dsta/config.py (configuration) - Telegram integration logic
4.2 Trading Service (dsta-trading-svc)¶
Responsibility: Core trading engine and execution
Capabilities: - Strategy Management - Strategy registration and versioning - Parameter optimization - Signal generation coordination - Multi-strategy orchestration - Performance tracking
- Order Execution
- Order creation and submission
- Order status tracking and updates
- Partial fill handling
- Smart order routing
-
Execution algorithms (TWAP, VWAP)
-
Position Management
- Real-time position calculation
- P&L tracking (realized/unrealized)
- Portfolio aggregation
- Position reconciliation
-
Margin calculation
-
Risk Management
- Pre-trade risk checks
- Position limit enforcement
- Drawdown monitoring
- Circuit breaker logic
- Stop-loss and alerts
- Exposure calculation
Technology Stack: - Framework: Django 4.x + Django REST Framework - Database: PostgreSQL (strategies, orders, positions, risk) - Cache: Redis (order status, positions, risk limits) - Libraries: NumPy, Pandas, TA-Lib - Tools: Ruff, Pylance, uv
API Endpoints:
Strategies:
POST /api/v1/strategies
GET /api/v1/strategies/{id}
PUT /api/v1/strategies/{id}/parameters
GET /api/v1/strategies/{id}/signals
GET /api/v1/strategies/{id}/performance
Orders:
POST /api/v1/orders
GET /api/v1/orders/{id}
PUT /api/v1/orders/{id}/cancel
GET /api/v1/orders/active
GET /api/v1/fills
Positions:
GET /api/v1/positions
GET /api/v1/positions/{symbol}
GET /api/v1/portfolio/pnl
POST /api/v1/positions/reconcile
Risk:
POST /api/v1/risk/check
GET /api/v1/risk/limits
PUT /api/v1/risk/limits
GET /api/v1/risk/violations
POST /api/v1/risk/circuit-breaker
Data Models: - Strategy, StrategyVersion, Signal, StrategyParameter - Order, Fill, ExecutionReport, OrderStatus - Position, Portfolio, PnLSnapshot, PositionHistory - RiskLimit, Exposure, Violation, CircuitBreaker
Integration: - Subscribes: Market data events (from Data Service) - Publishes: Order events, fill events, position updates, risk violations - Calls: Exchange Service (for order submission) - Calls: Core Service (for auth, config)
Scaling: - Vertical for low-latency execution path - Horizontal for strategy processing - Redis for real-time state
Source Code (from monolith): - src/trading/ (entire directory) - src/backtesting/strategy*.py (strategy framework) - Risk management logic
4.3 Data Service (dsta-data-svc)¶
Responsibility: Market data, analytics, and backtesting
Capabilities: - Market Data Management - Exchange data ingestion (REST + WebSocket) - OHLCV data storage and retrieval - Order book management - Data quality validation - Gap detection and filling
- Analytics & Reporting
- Performance metrics calculation
- Equity curve generation
- Trade analysis and attribution
- Correlation analysis
- Sharpe ratio, drawdown, etc.
-
Report generation and scheduling
-
Backtesting Engine
- Event-driven backtesting
- Monte Carlo simulation
- Parameter optimization
- Walk-forward analysis
- Bias detection and snooping prevention
Technology Stack: - Framework: Django 4.x + Django REST Framework - Database: TimescaleDB (time-series OHLCV data) - Database: PostgreSQL (backtest results, reports) - Cache: Redis (recent data, order books) - Storage: S3/MinIO (charts, PDFs, detailed reports) - Queue: Celery (async ingestion, report generation) - Libraries: Matplotlib, Pandas, NumPy, SciPy, TA-Lib - Tools: Ruff, Pylance, uv
API Endpoints:
Market Data:
GET /api/v1/market-data/ohlcv/{symbol}
GET /api/v1/market-data/orderbook/{symbol}
WS /api/v1/market-data/stream/{symbol}
POST /api/v1/market-data/sync
GET /api/v1/market-data/gaps
Analytics:
GET /api/v1/analytics/performance
GET /api/v1/analytics/equity-curve
GET /api/v1/analytics/reports/{id}
POST /api/v1/analytics/reports/generate
GET /api/v1/analytics/correlation
GET /api/v1/analytics/sharpe
Backtesting:
POST /api/v1/backtest/run
GET /api/v1/backtest/{id}/results
POST /api/v1/backtest/optimize
GET /api/v1/backtest/{id}/report
POST /api/v1/backtest/monte-carlo
Data Models: - Candlestick, OrderBook, Ticker, DataGap - PerformanceMetric, Report, EquityCurve, AnalysisJob - BacktestJob, BacktestResult, OptimizationRun, Trade
Integration: - Publishes: Market data events (to Trading Service, ML Service) - Subscribes: Fill events, position updates (for analytics) - Calls: Exchange Service (for data ingestion) - Calls: Core Service (for auth, config)
Scaling: - Horizontal ingestion workers - TimescaleDB compression for historical data - Parallel backtest execution - Celery workers for report generation
Source Code (from monolith): - src/api/data_*.py (data management) - src/analytics/ (entire directory) - src/backtesting/ (entire directory) - src/data/ (data utilities)
4.4 ML Service (dsta-ml-svc)¶
Responsibility: Machine learning and AI-driven predictions
Capabilities: - Feature Engineering - Technical indicator calculation - Feature extraction and selection - Feature storage and versioning
- Model Training
- Regime detection models
- Reinforcement learning agents
- Price prediction models
-
Model versioning and deployment
-
Prediction Serving
- Real-time predictions
- Batch predictions
- Model performance monitoring
-
A/B testing framework
-
ML Environments
- Training environments
- Simulation environments
- Model evaluation
Technology Stack: - Framework: Django 4.x + Django REST Framework - Database: PostgreSQL (model metadata, features) - Storage: S3/MinIO (trained models, datasets) - ML Libraries: scikit-learn, TensorFlow/PyTorch, Stable-Baselines3 - Tools: Ruff, Pylance, uv - Hardware: GPU support for training
API Endpoints:
Models:
POST /api/v1/ml/models
GET /api/v1/ml/models
GET /api/v1/ml/models/{id}
DELETE /api/v1/ml/models/{id}
Training:
POST /api/v1/ml/train
GET /api/v1/ml/training/{id}/status
GET /api/v1/ml/training/{id}/metrics
Predictions:
POST /api/v1/ml/predict
POST /api/v1/ml/predict/batch
GET /api/v1/ml/predictions/{id}
Features:
GET /api/v1/ml/features/{symbol}
POST /api/v1/ml/features/extract
GET /api/v1/ml/features/importance
Data Models: - MLModel, ModelVersion, ModelMetrics - Feature, FeatureSet, FeatureImportance - Prediction, TrainingJob, Environment
Integration: - Subscribes: Market data events (for feature extraction) - Publishes: Prediction signals (to Trading Service) - Reads: Historical data from Data Service - Calls: Core Service (for auth, config)
Scaling: - GPU workers for model training - Horizontal inference serving - Feature caching in Redis - Async training jobs via Celery
Source Code (from monolith): - src/ml/ (entire directory) - Feature engineering logic - RL environments
4.5 Exchange Service (dsta-exchange-svc)¶
Responsibility: Unified exchange integration layer
Capabilities: - Multi-Exchange Adapters - Binance integration - Gate.io integration - Huobi integration - Unified API interface
- Connection Management
- REST API client pooling
- WebSocket connection management
- Connection health monitoring
-
Automatic reconnection
-
Rate Limiting
- Per-exchange rate limiting
- Request throttling
-
Queue management
-
Resilience
- Circuit breaker pattern
- Retry logic with exponential backoff
- Failover handling
- Error normalization
Technology Stack: - Framework: Django 4.x + Django REST Framework - Cache: Redis (rate limits, connection state) - Libraries: httpx, aiohttp, websockets - Tools: Ruff, Pylance, uv
API Endpoints:
Orders:
POST /api/v1/exchange/{name}/order
GET /api/v1/exchange/{name}/order/{id}
POST /api/v1/exchange/{name}/cancel
POST /api/v1/exchange/{name}/batch-cancel
Account:
GET /api/v1/exchange/{name}/balance
GET /api/v1/exchange/{name}/account
Market Data:
GET /api/v1/exchange/{name}/ticker/{symbol}
GET /api/v1/exchange/{name}/orderbook/{symbol}
WS /api/v1/exchange/{name}/stream
Health:
GET /api/v1/exchange/{name}/status
GET /api/v1/exchange/health
Data Models: - ExchangeConfig, ExchangeCredentials - RateLimit, ConnectionStatus - APIRequest, APIResponse (logging)
Integration: - Publishes: Market data, order updates, exchange events - External: Binance, Gate.io, Huobi APIs - Calls: Core Service (for credentials, config)
Scaling: - Horizontal scaling with connection pooling - Redis for distributed rate limiting - WebSocket connection per instance
Source Code (from monolith): - src/api/exchanges/ (exchange clients) - src/huobi/ (entire Huobi integration) - src/api/orderbook.py - src/api/websockets/
5. Technology Stack¶
5.1 Core Technology Choices¶
Programming & Frameworks¶
- Language: Python 3.12+
- Web Framework: Django 4.x + Django REST Framework
- ASGI Server: Uvicorn (for WebSocket support)
- Task Queue: Celery 5.x
- Linting: Ruff (fast, comprehensive)
- Type Checking: Pylance (VSCode) + MyPy
- Package Manager: uv (ultra-fast, reliable)
Databases¶
- Relational: PostgreSQL 15+ (users, config, strategies)
- Time-Series: TimescaleDB (market data, metrics)
- Cache: Redis 7.x (sessions, queues, pub/sub)
- Object Storage: MinIO / S3 (reports, models, logs)
Message Broker¶
- Primary: RabbitMQ 3.x (reliable, Django-friendly)
- Alternative: Apache Kafka (high-throughput option)
API Gateway¶
- Option 1: Kong (preferred, rich plugin ecosystem)
- Option 2: Traefik (cloud-native, simple)
Observability¶
- Logging: Loki + Promtail
- Metrics: Prometheus + Grafana
- Tracing: Jaeger / Tempo
- APM: Django Debug Toolbar (dev), Sentry (prod)
CI/CD¶
- Platform: GitHub Actions
- Container Registry: GitHub Container Registry (GHCR)
- Deployment: Docker Compose (dev), Kubernetes (prod)
5.2 Development Tools¶
[tool.ruff]
line-length = 120
target-version = "py312"
select = ["E", "F", "I", "N", "W", "UP", "B", "C4", "DTZ", "T20", "RET", "SIM"]
[tool.ruff.isort]
known-first-party = ["api", "dsta", "auth", "config", "notification"]
known-third-party = ["django", "rest_framework", "celery"]
[tool.pylance]
typeCheckingMode = "basic"
reportMissingTypeStubs = false
[dependency-groups]
dev = ["ruff>=0.1.0", "mypy>=1.8", "pytest>=7.4", "pytest-django>=4.7"]
5.3 Service Template Structure¶
Each microservice follows this standardized structure:
service-name/
├── src/
│ ├── api/ # API endpoints
│ ├── models/ # Database models
│ ├── services/ # Business logic
│ ├── tasks/ # Celery tasks
│ ├── serializers/ # DRF serializers
│ └── config/ # Service configuration
├── tests/
│ ├── unit/
│ ├── integration/
│ └── e2e/
├── deploy/
│ ├── Dockerfile
│ ├── docker-compose.yml
│ └── k8s/
├── .github/
│ └── workflows/
│ ├── ci.yml
│ ├── deploy.yml
│ └── security.yml
├── pyproject.toml # uv + ruff + mypy config
├── requirements.txt # Generated by uv
└── README.md
6. Data Architecture¶
6.1 Database Per Service Pattern¶
Each service owns its database schema:
┌─────────────────┐ ┌──────────────────┐
│ auth-service │────▶│ PostgreSQL (DB1) │
└─────────────────┘ │ - users │
│ - roles │
│ - permissions │
└──────────────────┘
┌─────────────────┐ ┌──────────────────┐
│strategy-service │────▶│ PostgreSQL (DB2) │
└─────────────────┘ │ - strategies │
│ - signals │
│ - parameters │
└──────────────────┘
┌─────────────────┐ ┌──────────────────┐
│market-data-svc │────▶│ TimescaleDB │
└─────────────────┘ │ - candlesticks │
│ - order_books │
│ - tickers │
└──────────────────┘
6.2 Data Consistency Strategies¶
Strong Consistency: Within service boundaries (ACID transactions)
Eventual Consistency: Cross-service (event-driven updates)
Saga Pattern: Distributed transactions (e.g., order placement)
# Example: Order placement saga
1. Risk Service → Pre-trade check
2. Execution Service → Create order (pending)
3. Exchange Gateway → Submit to exchange
4. [Success] Execution Service → Update order (active)
5. [Success] Position Service → Update position
6. [Failure] Compensation: Cancel order, revert state
6.3 Data Migration Strategy¶
Phase 1: Logical separation (same DB, different schemas)
Phase 2: Physical separation (different DB instances) - Use Django's database routing - Foreign key relationships become API calls - Implement eventual consistency
Phase 3: Polyglot persistence (optimal DB per service) - TimescaleDB for time-series data - PostgreSQL for relational data - Redis for hot data
7. Communication Patterns¶
7.1 Synchronous Communication (REST)¶
Use Cases: - Request-response patterns - Data retrieval - Pre-trade validation
Implementation: - Django REST Framework - OpenAPI specification - Circuit breaker pattern (using tenacity or custom)
# Example: Risk check before order
response = requests.post(
"http://risk-service/api/risk/check",
json={"order": order_data},
timeout=2.0
)
if response.status_code == 200 and response.json()["approved"]:
# Proceed with order
7.2 Asynchronous Communication (Events)¶
Use Cases: - Market data distribution - Order fill notifications - Position updates
Implementation: - RabbitMQ with topic exchanges - Celery for task processing - Event schema versioning
# Example: Publish fill event
from kombu import Exchange, Queue
fill_exchange = Exchange('fills', type='topic')
fill_queue = Queue('position-service.fills', exchange=fill_exchange, routing_key='fill.#')
# Publisher (execution-service)
producer.publish(
{'order_id': '123', 'qty': 1.5, 'price': 50000},
exchange=fill_exchange,
routing_key='fill.BTCUSDT'
)
# Consumer (position-service)
@app.task
def handle_fill(message):
update_position(message)
7.3 WebSocket Communication¶
Use Cases: - Real-time market data streaming - Live trading updates - Dashboard updates
Implementation: - Django Channels (ASGI) - Redis as channel layer
# WebSocket consumer
class MarketDataConsumer(AsyncWebsocketConsumer):
async def connect(self):
await self.channel_layer.group_add("market_data", self.channel_name)
await self.accept()
async def market_update(self, event):
await self.send(text_data=json.dumps(event['data']))
7.4 API Contracts¶
All inter-service communication uses versioned contracts:
# openapi.yaml (excerpt)
/api/v1/orders:
post:
summary: Create order
requestBody:
content:
application/json:
schema:
$ref: '#/components/schemas/OrderRequest'
responses:
201:
description: Order created
content:
application/json:
schema:
$ref: '#/components/schemas/OrderResponse'
8. Migration Strategy¶
8.1 Migration Approach: Strangler Fig Pattern¶
Gradually replace monolith components with microservices while maintaining system operation.
Phase 1: Extract peripheral services
Phase 2: Extract data services
Phase 3: Extract core trading services
Phase 4: Decommission monolith
8.2 Step-by-Step Migration Plan¶
Phase 1: Infrastructure Setup (Weeks 1-2)¶
- Set up API Gateway (Kong/Traefik)
- Deploy RabbitMQ/Kafka message broker
- Set up TimescaleDB for time-series data
- Configure Redis cluster
- Set up monitoring stack (Prometheus, Grafana, Loki, Jaeger)
- Create service templates and CI/CD pipelines
- Establish branching strategy and deployment process
Why First? Foundation for all services
Phase 2: Extract Core Service (Weeks 3-5)¶
- Create
dsta-core-svcrepository and structure - Migrate authentication and user models from Django auth
- Migrate configuration management from
src/dsta/config.py - Migrate notification logic (Telegram, email)
- Implement JWT token generation and validation
- Set up API Gateway authentication
- Deploy to staging and validate
- Update monolith to use Core Service for auth
Source Files:
src/dsta/settings.py → dsta-core-svc/src/auth/
src/dsta/config.py → dsta-core-svc/src/config/
Telegram integration → dsta-core-svc/src/notifications/
Why Second? Provides authentication/config foundation for all other services
Validation: - ✓ Users can login via Core Service - ✓ All services can validate tokens - ✓ Configuration updates propagate in real-time - ✓ Notifications sent successfully
Phase 3: Extract Exchange Service (Weeks 6-8)¶
- Create
dsta-exchange-svcrepository - Consolidate exchange clients (Binance, Gate.io, Huobi)
- Implement unified exchange API interface
- Add connection pooling and rate limiting
- Implement circuit breaker for exchange failures
- Set up WebSocket connection management
- Deploy and validate
Source Files:
src/api/exchanges/ → dsta-exchange-svc/src/adapters/
src/huobi/ → dsta-exchange-svc/src/adapters/huobi/
src/api/websockets/ → dsta-exchange-svc/src/websockets/
src/api/orderbook.py → dsta-exchange-svc/src/orderbook/
Why Third? Isolates external dependencies, shared by Data and Trading services
Validation: - ✓ Can place/cancel orders on all exchanges - ✓ WebSocket streams operational - ✓ Rate limiting working correctly - ✓ Circuit breaker triggers on exchange failures
Phase 4: Extract Data Service (Weeks 9-12)¶
- Create
dsta-data-svcrepository - Migrate to TimescaleDB for OHLCV data
- Migrate market data ingestion logic
- Migrate analytics and reporting modules
- Migrate backtesting engine
- Set up S3/MinIO for report storage
- Implement market data event publishing
- Deploy and validate
Source Files:
src/api/data_*.py → dsta-data-svc/src/market_data/
src/api/models.py (Candlestick) → dsta-data-svc/src/models/
src/analytics/ → dsta-data-svc/src/analytics/
src/backtesting/ → dsta-data-svc/src/backtesting/
src/data/ → dsta-data-svc/src/data_management/
Why Fourth? High data volume benefits from isolation, needed by Trading and ML
Validation: - ✓ Historical data accessible via API - ✓ Real-time WebSocket streams working - ✓ Backtest execution successful - ✓ Reports generated and stored - ✓ Analytics metrics calculated correctly
Phase 5: Extract Trading Service (Weeks 13-17)¶
- Create
dsta-trading-svcrepository - Migrate strategy framework and registry
- Migrate order execution logic
- Migrate position management
- Migrate risk management (limits, circuit breakers)
- Implement event-driven architecture (subscribe to market data)
- Implement saga pattern for order placement
- Deploy and validate with dry-run mode
Source Files:
src/trading/ → dsta-trading-svc/src/
src/backtesting/strategy*.py → dsta-trading-svc/src/strategies/
Why Fifth? Core business logic, depends on Data and Exchange services
Validation: - ✓ Strategies generate signals correctly - ✓ Orders placed and tracked successfully - ✓ Positions calculated in real-time - ✓ Risk checks blocking invalid orders - ✓ Circuit breakers trigger correctly - ✓ Dry-run mode working
Phase 6: Extract ML Service (Weeks 18-21)¶
- Create
dsta-ml-svcrepository - Migrate feature engineering modules
- Migrate ML model training logic
- Set up model versioning and storage
- Implement prediction API
- Add GPU worker support
- Deploy and validate
Source Files:
src/ml/ → dsta-ml-svc/src/
Feature engineering → dsta-ml-svc/src/features/
Model trainers → dsta-ml-svc/src/trainers/
RL environments → dsta-ml-svc/src/environments/
Why Last? Specialized workload, less integrated, can run independently
Validation: - ✓ Features extracted correctly - ✓ Models training successfully - ✓ Predictions serving via API - ✓ GPU utilization working
Phase 7: Decommission Monolith (Week 22)¶
- Verify all functionality migrated
- Run parallel testing (monolith vs microservices)
- Switch 100% traffic to microservices
- Monitor for 1 week
- Archive monolith codebase
- Celebrate! 🎉
Final Checks: - ✓ All API endpoints responding - ✓ All background jobs running - ✓ Data consistency verified - ✓ Performance metrics acceptable - ✓ No errors in logs
8.3 Risk Mitigation¶
Dual-Write Strategy: Write to both monolith and microservice during migration Feature Flags: Toggle between old and new implementation Canary Deployment: Route 10% traffic to new service, monitor, then scale Automated Testing: Maintain test coverage >80% during migration Rollback Plan: Keep monolith running for quick rollback
9. Development Workflow¶
9.1 Local Development Setup¶
Each developer runs services locally using Docker Compose:
# docker-compose.dev.yml
version: '3.9'
services:
api-gateway:
image: kong:3.4
ports: ["8000:8000"]
auth-service:
build: ./services/auth-service
environment:
- DATABASE_URL=postgresql://postgres:postgres@postgres:5432/auth
- REDIS_URL=redis://redis:6379/0
market-data-service:
build: ./services/market-data-service
environment:
- DATABASE_URL=postgresql://postgres:postgres@timescaledb:5432/market_data
rabbitmq:
image: rabbitmq:3.12-management
ports: ["5672:5672", "15672:15672"]
postgres:
image: postgres:15
timescaledb:
image: timescale/timescaledb:latest-pg15
redis:
image: redis:7-alpine
Start development environment:
# Clone repositories
git clone https://github.com/org/auth-service.git
git clone https://github.com/org/market-data-service.git
# Start all services
docker-compose -f docker-compose.dev.yml up -d
# Run migrations
./scripts/migrate-all.sh
# Access services
curl http://localhost:8000/auth/health
curl http://localhost:8000/market-data/health
9.2 Service Development Workflow¶
# 1. Create feature branch
git checkout -b feature/add-websocket-endpoint
# 2. Install dependencies with uv
uv pip install -e ".[dev]"
# 3. Run linting and type checking
ruff check src/
mypy src/
# 4. Run tests
pytest tests/ -v --cov
# 5. Commit with conventional commits
git commit -m "feat: add WebSocket endpoint for real-time data"
# 6. Push and create PR
git push origin feature/add-websocket-endpoint
9.3 Code Review Standards¶
- All tests pass (unit, integration, E2E)
- Code coverage ≥ 80%
- Ruff linting passes (no warnings)
- MyPy type checking passes
- API contract documented (OpenAPI)
- README updated if needed
- Changelog updated
- Performance regression check
10. Deployment & CI/CD¶
10.1 GitHub Actions Workflows¶
Each service repository includes standardized workflows:
Workflow 1: CI Pipeline (.github/workflows/ci.yml)¶
name: CI
on:
pull_request:
branches: [main, develop]
push:
branches: [main, develop]
jobs:
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install uv
run: curl -LsSf https://astral.sh/uv/install.sh | sh
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.12'
- name: Install dependencies
run: uv pip install -e ".[dev]"
- name: Run Ruff
run: ruff check src/ tests/
- name: Run Mypy
run: mypy src/
test:
runs-on: ubuntu-latest
services:
postgres:
image: postgres:15
env:
POSTGRES_PASSWORD: postgres
options: >-
--health-cmd pg_isready
--health-interval 10s
--health-timeout 5s
--health-retries 5
redis:
image: redis:7-alpine
options: >-
--health-cmd "redis-cli ping"
--health-interval 10s
--health-timeout 5s
--health-retries 5
steps:
- uses: actions/checkout@v4
- name: Install uv
run: curl -LsSf https://astral.sh/uv/install.sh | sh
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.12'
- name: Install dependencies
run: uv pip install -e ".[dev]"
- name: Run migrations
env:
DATABASE_URL: postgresql://postgres:postgres@localhost:5432/test_db
run: python manage.py migrate
- name: Run tests
env:
DATABASE_URL: postgresql://postgres:postgres@localhost:5432/test_db
REDIS_URL: redis://localhost:6379/0
run: pytest tests/ -v --cov --cov-report=xml
- name: Upload coverage
uses: codecov/codecov-action@v3
with:
file: ./coverage.xml
security:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install uv
run: curl -LsSf https://astral.sh/uv/install.sh | sh
- name: Run Bandit
run: |
uv pip install bandit
bandit -r src/ -f json -o bandit-report.json
- name: Run Safety
run: |
uv pip install safety
safety check --json
Workflow 2: Deploy Pipeline (.github/workflows/deploy.yml)¶
name: Deploy
on:
push:
branches: [main]
workflow_dispatch:
inputs:
environment:
description: 'Environment to deploy'
required: true
type: choice
options: [staging, production]
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Log in to GitHub Container Registry
uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Build and push
uses: docker/build-push-action@v5
with:
context: .
push: true
tags: |
ghcr.io/${{ github.repository }}:${{ github.sha }}
ghcr.io/${{ github.repository }}:latest
cache-from: type=gha
cache-to: type=gha,mode=max
deploy-staging:
needs: build
if: github.ref == 'refs/heads/main'
runs-on: ubuntu-latest
environment: staging
steps:
- name: Deploy to staging
run: |
# Update docker-compose or K8s deployment
ssh deploy@staging-server "docker pull ghcr.io/${{ github.repository }}:${{ github.sha }}"
ssh deploy@staging-server "docker-compose up -d auth-service"
deploy-production:
needs: deploy-staging
if: github.event_name == 'workflow_dispatch' && github.event.inputs.environment == 'production'
runs-on: ubuntu-latest
environment: production
steps:
- name: Deploy to production
run: |
# Blue-green deployment
ssh deploy@prod-server "kubectl set image deployment/auth-service auth-service=ghcr.io/${{ github.repository }}:${{ github.sha }}"
10.2 Deployment Strategy¶
Staging: Auto-deploy on merge to main Production: Manual approval + blue-green deployment
main branch
│
├─▶ Build & Test
│
├─▶ Deploy to Staging
│
├─▶ Automated Tests (E2E, smoke)
│
├─▶ Manual Approval
│
└─▶ Deploy to Production (blue-green)
10.3 Service Dependencies¶
Deployment order based on dependencies:
1. Infrastructure (PostgreSQL, Redis, RabbitMQ, API Gateway)
2. Config Service (all services depend on it)
3. Auth Service (authentication for all)
4. Notification Service (independent)
5. Exchange Gateway (external dependencies)
6. Market Data Service (depends on Exchange Gateway)
7. Risk Service (independent business logic)
8. Execution Service (depends on Risk, Exchange Gateway)
9. Position Service (depends on Execution events)
10. Strategy Service (depends on Market Data)
11. Analytics Service (depends on Position, Execution)
12. Backtest Service (depends on Market Data, Strategy)
13. ML Service (depends on Market Data)
11. Monitoring & Observability¶
11.1 Observability Stack¶
┌────────────────────────────────────────┐
│ Grafana Dashboards │
│ (Unified visualization) │
└────────────────────────────────────────┘
│ │ │
▼ ▼ ▼
┌──────────────┐ ┌─────────────┐ ┌──────────────┐
│ Prometheus │ │ Loki │ │ Jaeger │
│ (Metrics) │ │ (Logs) │ │ (Traces) │
└──────────────┘ └─────────────┘ └──────────────┘
│ │ │
└──────────────┼──────────────┘
│
┌──────────────┴──────────────┐
│ │
Microservices API Gateway
(instrumented)
11.2 Metrics Collection¶
Each service exposes Prometheus metrics:
# src/monitoring.py
from prometheus_client import Counter, Histogram, Gauge
# Request metrics
http_requests_total = Counter(
'http_requests_total',
'Total HTTP requests',
['method', 'endpoint', 'status']
)
http_request_duration = Histogram(
'http_request_duration_seconds',
'HTTP request duration',
['method', 'endpoint']
)
# Business metrics
orders_created = Counter('orders_created_total', 'Orders created')
orders_filled = Counter('orders_filled_total', 'Orders filled')
positions_value = Gauge('positions_value_usd', 'Total position value USD')
Expose at /metrics endpoint:
# urls.py
from django.urls import path
from prometheus_client import make_asgi_app
urlpatterns = [
# ... other paths
path('metrics/', make_asgi_app()),
]
11.3 Logging Strategy¶
Structured logging with JSON format:
import structlog
logger = structlog.get_logger()
logger.info(
"order_created",
order_id="123",
symbol="BTCUSDT",
quantity=1.5,
price=50000,
user_id="user-456"
)
Log aggregation with Loki:
# promtail-config.yaml
clients:
- url: http://loki:3100/loki/api/v1/push
scrape_configs:
- job_name: services
static_configs:
- targets:
- localhost
labels:
job: dsta-services
__path__: /var/log/services/*.log
pipeline_stages:
- json:
expressions:
level: level
service: service
message: message
11.4 Distributed Tracing¶
OpenTelemetry integration:
# src/tracing.py
from opentelemetry import trace
from opentelemetry.exporter.jaeger.thrift import JaegerExporter
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
provider = TracerProvider()
jaeger_exporter = JaegerExporter(
agent_host_name="jaeger",
agent_port=6831,
)
provider.add_span_processor(BatchSpanProcessor(jaeger_exporter))
trace.set_tracer_provider(provider)
# Usage
tracer = trace.get_tracer(__name__)
with tracer.start_as_current_span("process_order"):
# Process order logic
with tracer.start_as_current_span("check_risk"):
# Risk check
pass
with tracer.start_as_current_span("submit_to_exchange"):
# Exchange submission
pass
11.5 Alerting Rules¶
Prometheus alerting rules:
# alerts.yml
groups:
- name: service_health
rules:
- alert: ServiceDown
expr: up == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Service {{ $labels.job }} is down"
- alert: HighErrorRate
expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.05
for: 5m
labels:
severity: warning
annotations:
summary: "High error rate on {{ $labels.service }}"
- alert: OrderExecutionSlow
expr: histogram_quantile(0.95, rate(order_execution_duration_seconds_bucket[5m])) > 2
for: 10m
labels:
severity: warning
annotations:
summary: "Order execution is slow (p95 > 2s)"
12. Timeline & Milestones¶
12.1 Project Timeline (22 Weeks)¶
Weeks 1-2: Infrastructure Setup
Weeks 3-5: Extract Core Service (Auth, Config, Notifications)
Weeks 6-8: Extract Exchange Service
Weeks 9-12: Extract Data Service (Market Data, Analytics, Backtesting)
Weeks 13-17: Extract Trading Service (Strategies, Execution, Risk, Positions)
Weeks 18-21: Extract ML Service
Week 22: Decommission Monolith & Go Live
12.2 Key Milestones¶
| Milestone | Week | Deliverables | Success Criteria |
|---|---|---|---|
| M1: Foundation | 2 | Infrastructure, CI/CD, Monitoring | All tools deployed, templates ready |
| M2: Core Service | 5 | Auth, Config, Notifications | Authentication working, config centralized |
| M3: Exchange Service | 8 | Multi-exchange gateway | All exchanges accessible, rate limiting working |
| M4: Data Service | 12 | Market data, Analytics, Backtesting | Data flowing, backtests running |
| M5: Trading Service | 17 | Strategies, Execution, Risk, Positions | Live trading operational (dry-run) |
| M6: ML Service | 21 | ML models, Predictions | Models deployed, predictions serving |
| M7: Go Live | 22 | Monolith decommissioned | 100% traffic on microservices |
12.3 Resource Requirements¶
Team Composition: - 2 Backend Engineers (Python/Django) - 1 DevOps Engineer (Docker, K8s, CI/CD) - 1 QA Engineer (Testing, automation) - 1 Tech Lead (Architecture, code review)
Part-time: - 1 Security Engineer (weeks 1-6, 28-30) - 1 Data Engineer (weeks 7-11, 27-29)
Appendix A: Service Communication Matrix¶
| Service | Calls (Sync) | Publishes (Async) | Subscribes (Async) |
|---|---|---|---|
| dsta-core-svc | - | Config updates, User events | - |
| dsta-trading-svc | Core (auth, config), Exchange (orders), Data (market data API) | Order events, Fill events, Position updates, Risk violations | Market data events |
| dsta-data-svc | Core (auth, config), Exchange (market data) | Market data events, Report completion | Fill events, Position updates |
| dsta-ml-svc | Core (auth, config), Data (historical data) | Prediction signals | Market data events |
| dsta-exchange-svc | Core (credentials, config) | Market data events, Order updates, Exchange status | - |
Appendix B: Database Schema Migration¶
Example: Candlestick Model Migration¶
Before (Monolith):
# src/api/models.py
class Candlestick(models.Model):
exchange = models.CharField(max_length=20)
symbol = models.CharField(max_length=20)
timestamp = models.DateTimeField()
# ... OHLCV fields
After (Market Data Service):
# services/market-data-service/src/models.py
class Candlestick(models.Model):
exchange = models.CharField(max_length=20)
symbol = models.CharField(max_length=20)
timestamp = models.DateTimeField()
# ... OHLCV fields
class Meta:
db_table = 'candlesticks' # Same table name
managed = True
Migration SQL:
-- Create new database
CREATE DATABASE market_data_service;
-- Copy data
INSERT INTO market_data_service.candlesticks
SELECT * FROM dsta.candlesticks;
-- Verify
SELECT COUNT(*) FROM market_data_service.candlesticks;
SELECT COUNT(*) FROM dsta.candlesticks;
-- Should match
-- After verification, drop from monolith
DROP TABLE dsta.candlesticks;
Appendix C: Example Service Implementation¶
Minimal Django Service Structure (dsta-core-svc)¶
dsta-core-svc/
├── src/
│ ├── core_service/
│ │ ├── __init__.py
│ │ ├── settings.py
│ │ ├── urls.py
│ │ ├── wsgi.py
│ │ └── asgi.py
│ ├── auth/
│ │ ├── __init__.py
│ │ ├── models.py
│ │ ├── views.py
│ │ ├── serializers.py
│ │ ├── services.py
│ │ └── urls.py
│ ├── config/
│ │ ├── __init__.py
│ │ ├── models.py
│ │ ├── views.py
│ │ ├── serializers.py
│ │ └── urls.py
│ ├── notifications/
│ │ ├── __init__.py
│ │ ├── models.py
│ │ ├── tasks.py
│ │ ├── telegram.py
│ │ └── urls.py
│ └── manage.py
├── tests/
│ ├── test_auth.py
│ ├── test_config.py
│ └── test_notifications.py
├── deploy/
│ ├── Dockerfile
│ ├── docker-compose.yml
│ └── k8s/
├── .github/
│ └── workflows/
│ ├── ci.yml
│ └── deploy.yml
├── pyproject.toml
├── requirements.txt
└── README.md
pyproject.toml:
[project]
name = "dsta-core-svc"
version = "1.0.0"
requires-python = ">=3.12"
dependencies = [
"django>=4.2",
"djangorestframework>=3.14",
"psycopg2-binary>=2.9",
"redis>=5.0",
"celery>=5.3",
"pyjwt>=2.8",
"cryptography>=41.0",
]
[tool.ruff]
line-length = 120
target-version = "py312"
select = ["E", "F", "I", "W", "UP", "B", "C4", "DTZ"]
[tool.ruff.isort]
known-first-party = ["core_service", "auth", "config", "notifications"]
known-third-party = ["django", "rest_framework", "celery"]
[dependency-groups]
dev = [
"ruff>=0.1",
"mypy>=1.8",
"pytest>=7.4",
"pytest-django>=4.7",
"pytest-cov>=4.1",
]
Dockerfile:
FROM python:3.12-slim
WORKDIR /app
# Install uv
RUN pip install uv
# Copy dependencies
COPY requirements.txt .
RUN uv pip install --system -r requirements.txt
# Copy application
COPY src/ ./src/
# Expose port
EXPOSE 8000
# Run migrations and start server
CMD ["sh", "-c", "python src/manage.py migrate && gunicorn -w 4 -b 0.0.0.0:8000 core_service.wsgi:application"]
docker-compose.yml (for local development):
version: '3.9'
services:
core-service:
build: .
ports:
- "8001:8000"
environment:
- DATABASE_URL=postgresql://postgres:postgres@postgres:5432/core_db
- REDIS_URL=redis://redis:6379/0
- SECRET_KEY=dev-secret-key-change-in-production
- DEBUG=True
depends_on:
- postgres
- redis
volumes:
- ./src:/app/src
postgres:
image: postgres:15-alpine
environment:
- POSTGRES_DB=core_db
- POSTGRES_USER=postgres
- POSTGRES_PASSWORD=postgres
volumes:
- postgres_data:/var/lib/postgresql/data
redis:
image: redis:7-alpine
ports:
- "6379:6379"
celery-worker:
build: .
command: celery -A core_service worker -l info
environment:
- DATABASE_URL=postgresql://postgres:postgres@postgres:5432/core_db
- REDIS_URL=redis://redis:6379/0
depends_on:
- postgres
- redis
volumes:
- ./src:/app/src
volumes:
postgres_data:
Appendix D: Testing Strategy¶
Testing Pyramid¶
/\
/ \ E2E Tests (10%)
/────\ - Full workflow tests
/ \ - Cross-service integration
/────────\ Integration Tests (30%)
/ \ - API contract tests
/────────────\ - Database integration
/ \ Unit Tests (60%)
/________________\- Business logic
- Utilities, helpers
Test Coverage Requirements¶
- Unit Tests: ≥80% coverage per service
- Integration Tests: All API endpoints covered
- E2E Tests: Critical user journeys (order placement, risk check)
Example Test¶
# tests/test_order_execution.py
import pytest
from django.test import TestCase
from trading.models import Order
from trading.services import OrderExecutionService
@pytest.mark.django_db
class TestOrderExecution(TestCase):
def test_create_order_success(self):
"""Test successful order creation within dsta-trading-svc"""
service = OrderExecutionService()
order = service.create_order(
symbol="BTCUSDT",
side="BUY",
quantity=1.0,
price=50000
)
assert order.status == "PENDING"
assert Order.objects.filter(id=order.id).exists()
def test_create_order_fails_risk_check(self):
"""Test order rejection by risk management"""
with pytest.raises(RiskViolationError):
service = OrderExecutionService()
service.create_order(
symbol="BTCUSDT",
side="BUY",
quantity=1000000, # Exceeds limit
price=50000
)
def test_position_update_on_fill(self):
"""Test position calculation after order fill"""
# Create and fill order
order = OrderExecutionService().create_order(
symbol="BTCUSDT", side="BUY", quantity=1.0, price=50000
)
order.status = "FILLED"
order.save()
# Verify position updated
position = Position.objects.get(symbol="BTCUSDT")
assert position.quantity == 1.0
assert position.avg_price == 50000
Conclusion¶
This microservices plan provides a practical roadmap for transforming DSTA from a monolithic Django application into 5 domain-based microservices. By using larger services that encapsulate complete domains, we achieve a balance between modularity and operational simplicity.
Key Benefits: - Simpler Operations: 5 services instead of 12+ means easier monitoring and deployment - Domain Cohesion: Each service contains related functionality (auth + config + notifications in Core) - Reduced Communication: Fewer inter-service calls compared to fine-grained microservices - Easier Development: Developers can work within a domain without crossing many service boundaries
Architecture Summary: 1. dsta-core-svc: Auth, Config, Notifications, User Management 2. dsta-trading-svc: Strategies, Execution, Positions, Risk Management 3. dsta-data-svc: Market Data, Analytics, Backtesting, Reporting 4. dsta-ml-svc: Feature Engineering, Model Training, Predictions 5. dsta-exchange-svc: Multi-Exchange Gateway, WebSocket Connections
Migration Timeline: 22 weeks (5.5 months)
Technology Stack: Django 4.x, Ruff, Pylance, uv, GitHub Actions, RabbitMQ, PostgreSQL, TimescaleDB, Redis
Next Steps: 1. Review and approve this plan 2. Set up infrastructure (Weeks 1-2) 3. Begin with Core Service extraction (Weeks 3-5) 4. Follow the phased migration approach 5. Monitor and iterate
Success Metrics: - Zero downtime during migration - 99.9% uptime post-migration - Deployment frequency: 2-3x per week per service - Lead time for changes: <2 days - Mean time to recovery: <30 minutes - Test coverage: ≥80% per service
Document Version: 1.0
Last Updated: 2026-01-28
Status: Ready for Review
Author: DSTA Engineering Team