Skip to content

DSTA Microservices Architecture Plan

Version: 1.0
Date: 2026-01-28
Author: DSTA Engineering Team
Status: Planning & Design Phase


Table of Contents

  1. Executive Summary
  2. Current State Analysis
  3. Target Microservices Architecture
  4. Service Catalog
  5. Technology Stack
  6. Data Architecture
  7. Communication Patterns
  8. Migration Strategy
  9. Development Workflow
  10. Deployment & CI/CD
  11. Monitoring & Observability
  12. Timeline & Milestones

1. Executive Summary

1.1 Purpose

This document outlines the comprehensive plan to transform DSTA (Dr. Strange Trading Analysis) from a monolithic Django application into a modern microservices architecture. The decomposition will enable independent scaling, deployment, and development of trading system components while maintaining system reliability and performance.

1.2 Key Objectives

  • Scalability: Enable independent scaling of data ingestion, trading execution, and analytics
  • Reliability: Isolate failures and improve system resilience through service boundaries
  • Development Velocity: Allow parallel development by multiple teams
  • Technology Flexibility: Enable use of optimal tools per service (Django, FastAPI, etc.)
  • Deployment Agility: Independent deployment with zero-downtime updates

1.3 Architectural Principles

  1. Domain-Driven Design: Services aligned with business domains
  2. API-First: Well-defined contracts using OpenAPI/AsyncAPI
  3. Event-Driven: Asynchronous communication via message broker
  4. Database Per Service: Each service owns its data
  5. Observability: Comprehensive logging, metrics, and tracing
  6. Security: Zero-trust architecture with mTLS and API authentication

2. Current State Analysis

2.1 Existing Architecture

┌─────────────────────────────────────────────┐
│          Django Monolith (src/)             │
├─────────────────────────────────────────────┤
│  ├─ api/          (REST API + WebSockets)   │
│  ├─ trading/      (Bot execution logic)     │
│  ├─ backtesting/  (Backtesting engine)      │
│  ├─ analytics/    (Reports & metrics)       │
│  ├─ ml/           (ML models & training)    │
│  ├─ data/         (Data management)         │
│  ├─ huobi/        (Exchange integration)    │
│  └─ dsta/         (Django core settings)    │
├─────────────────────────────────────────────┤
│  External Dependencies:                     │
│  - PostgreSQL (primary database)            │
│  - Redis (cache + Celery broker)            │
│  - Celery (async tasks)                     │
└─────────────────────────────────────────────┘

2.2 Key Components (413 Python files)

API Layer (src/api/): - Django REST Framework endpoints - WebSocket connections for real-time data - Data validation and serialization - Exchange client wrappers (Binance, Gate.io, Huobi)

Trading Engine (src/trading/): - Order execution and position management - Risk management and circuit breakers - Multi-strategy coordination - Stop-loss and alert mechanisms

Backtesting (src/backtesting/): - Event-driven backtesting engine - Performance analytics and reporting - Monte Carlo simulation - Strategy optimization

Analytics (src/analytics/): - Performance metrics and reports - Correlation analysis - Transaction cost modeling - Visualization and charts

Machine Learning (src/ml/): - Feature engineering - Model training and deployment - Regime detection strategies - Reinforcement learning agents

Data Management (src/data/): - Market data ingestion - Data validation and quality checks - Gap detection and filling

2.3 Pain Points

  1. Scalability Limitations
  2. Cannot scale data ingestion independently from trading logic
  3. Single PostgreSQL database becomes bottleneck
  4. Celery workers handle mixed workload types

  5. Deployment Challenges

  6. Monolithic deployment requires full system restart
  7. High risk of system-wide failures
  8. Long deployment windows

  9. Development Friction

  10. 413 Python files in single codebase
  11. Tight coupling between modules
  12. Shared dependencies create conflicts
  13. Difficult to test components in isolation

  14. Operational Complexity

  15. Single failure point can cascade
  16. Difficult to identify performance bottlenecks
  17. Limited ability to optimize specific components

3. Target Microservices Architecture

3.1 High-Level Architecture

┌─────────────────────────────────────────────────────────────┐
│                    API Gateway (Kong/Traefik)               │
│   Authentication │ Rate Limiting │ Routing │ Load Balancing │
└─────────────────────────────────────────────────────────────┘
        ┌──────────────────────┼──────────────────────┐
        │                      │                      │
        ▼                      ▼                      ▼
┌──────────────────┐  ┌──────────────────┐  ┌──────────────────┐
│ dsta-core-svc    │  │dsta-trading-svc  │  │ dsta-data-svc    │
├──────────────────┤  ├──────────────────┤  ├──────────────────┤
│ • Auth           │  │ • Strategies     │  │ • Market Data    │
│ • Config         │  │ • Execution      │  │ • Analytics      │
│ • Notifications  │  │ • Positions      │  │ • Backtesting    │
│ • User Mgmt      │  │ • Risk Mgmt      │  │ • Reporting      │
└──────────────────┘  └──────────────────┘  └──────────────────┘
        │                      │                      │
        └──────────────────────┼──────────────────────┘
        ┌──────────────────────────────────────────────┐
        │     Message Broker (RabbitMQ/Kafka)          │
        │   Events: Orders │ Fills │ Market Data       │
        └──────────────────────────────────────────────┘
        ┌──────────────────────┼──────────────────────┐
        │                      │                      │
        ▼                      ▼                      ▼
┌──────────────────┐  ┌──────────────────┐  ┌──────────────────┐
│ dsta-ml-svc      │  │dsta-exchange-svc │  │ Storage Layer    │
├──────────────────┤  ├──────────────────┤  ├──────────────────┤
│ • Feature Eng    │  │ • Binance        │  │ • PostgreSQL     │
│ • Model Training │  │ • Gate.io        │  │ • TimescaleDB    │
│ • Predictions    │  │ • Huobi          │  │ • Redis          │
│ • RL Agents      │  │ • WebSocket Hub  │  │ • S3/MinIO       │
└──────────────────┘  └──────────────────┘  └──────────────────┘

3.2 Service Boundaries & Domains

The system is decomposed into 5 domain-based microservices:

  1. Core Service - User management, authentication, configuration, notifications
  2. Trading Service - Strategies, execution, position management, risk controls
  3. Data Service - Market data, analytics, backtesting
  4. ML Service - Machine learning models, predictions, feature engineering
  5. Exchange Service - Multi-exchange gateway, WebSocket connections

4. Service Catalog

4.1 Core Service (dsta-core-svc)

Responsibility: Platform foundation and cross-cutting concerns

Capabilities: - Authentication & Authorization - JWT token generation and validation - API key management - OAuth2 integration - Role-based access control (RBAC) - Session management

  • Configuration Management
  • Exchange credentials storage (encrypted)
  • Strategy parameters versioning
  • Feature flags management
  • Environment-specific settings
  • Real-time configuration updates

  • Notification System

  • Telegram bot integration
  • Email notifications (SMTP)
  • Webhook delivery
  • Alert prioritization and deduplication
  • Notification templates

  • User Management

  • User registration and profiles
  • Preferences and settings
  • Audit logging

Technology Stack: - Framework: Django 4.x + Django REST Framework - Database: PostgreSQL (users, config, notifications) - Cache: Redis (sessions, config cache, notification queue) - Encryption: Fernet/AWS KMS - Tools: Ruff, Pylance, uv

API Endpoints:

Auth:
  POST   /api/v1/auth/login
  POST   /api/v1/auth/refresh
  POST   /api/v1/auth/logout
  GET    /api/v1/auth/validate
  POST   /api/v1/auth/api-keys

Config:
  GET    /api/v1/config/{key}
  PUT    /api/v1/config/{key}
  GET    /api/v1/config/secrets/{key}
  WS     /api/v1/config/watch

Notifications:
  POST   /api/v1/notifications/send
  GET    /api/v1/notifications/history
  POST   /api/v1/notifications/templates

Users:
  GET    /api/v1/users/profile
  PUT    /api/v1/users/preferences
  GET    /api/v1/users/audit-log

Data Models: - User, Role, Permission, Session, APIKey - ConfigSchema, ConfigValue, SecretStore - Notification, Template, Recipient, DeliveryLog

Integration: - Publishes: Config updates, notification events - Provides: Authentication for all services

Scaling: Horizontal (stateless), read replicas for auth validation

Source Code (from monolith): - src/dsta/ (Django settings, middleware) - src/api/admin.py (user management) - src/dsta/config.py (configuration) - Telegram integration logic


4.2 Trading Service (dsta-trading-svc)

Responsibility: Core trading engine and execution

Capabilities: - Strategy Management - Strategy registration and versioning - Parameter optimization - Signal generation coordination - Multi-strategy orchestration - Performance tracking

  • Order Execution
  • Order creation and submission
  • Order status tracking and updates
  • Partial fill handling
  • Smart order routing
  • Execution algorithms (TWAP, VWAP)

  • Position Management

  • Real-time position calculation
  • P&L tracking (realized/unrealized)
  • Portfolio aggregation
  • Position reconciliation
  • Margin calculation

  • Risk Management

  • Pre-trade risk checks
  • Position limit enforcement
  • Drawdown monitoring
  • Circuit breaker logic
  • Stop-loss and alerts
  • Exposure calculation

Technology Stack: - Framework: Django 4.x + Django REST Framework - Database: PostgreSQL (strategies, orders, positions, risk) - Cache: Redis (order status, positions, risk limits) - Libraries: NumPy, Pandas, TA-Lib - Tools: Ruff, Pylance, uv

API Endpoints:

Strategies:
  POST   /api/v1/strategies
  GET    /api/v1/strategies/{id}
  PUT    /api/v1/strategies/{id}/parameters
  GET    /api/v1/strategies/{id}/signals
  GET    /api/v1/strategies/{id}/performance

Orders:
  POST   /api/v1/orders
  GET    /api/v1/orders/{id}
  PUT    /api/v1/orders/{id}/cancel
  GET    /api/v1/orders/active
  GET    /api/v1/fills

Positions:
  GET    /api/v1/positions
  GET    /api/v1/positions/{symbol}
  GET    /api/v1/portfolio/pnl
  POST   /api/v1/positions/reconcile

Risk:
  POST   /api/v1/risk/check
  GET    /api/v1/risk/limits
  PUT    /api/v1/risk/limits
  GET    /api/v1/risk/violations
  POST   /api/v1/risk/circuit-breaker

Data Models: - Strategy, StrategyVersion, Signal, StrategyParameter - Order, Fill, ExecutionReport, OrderStatus - Position, Portfolio, PnLSnapshot, PositionHistory - RiskLimit, Exposure, Violation, CircuitBreaker

Integration: - Subscribes: Market data events (from Data Service) - Publishes: Order events, fill events, position updates, risk violations - Calls: Exchange Service (for order submission) - Calls: Core Service (for auth, config)

Scaling: - Vertical for low-latency execution path - Horizontal for strategy processing - Redis for real-time state

Source Code (from monolith): - src/trading/ (entire directory) - src/backtesting/strategy*.py (strategy framework) - Risk management logic


4.3 Data Service (dsta-data-svc)

Responsibility: Market data, analytics, and backtesting

Capabilities: - Market Data Management - Exchange data ingestion (REST + WebSocket) - OHLCV data storage and retrieval - Order book management - Data quality validation - Gap detection and filling

  • Analytics & Reporting
  • Performance metrics calculation
  • Equity curve generation
  • Trade analysis and attribution
  • Correlation analysis
  • Sharpe ratio, drawdown, etc.
  • Report generation and scheduling

  • Backtesting Engine

  • Event-driven backtesting
  • Monte Carlo simulation
  • Parameter optimization
  • Walk-forward analysis
  • Bias detection and snooping prevention

Technology Stack: - Framework: Django 4.x + Django REST Framework - Database: TimescaleDB (time-series OHLCV data) - Database: PostgreSQL (backtest results, reports) - Cache: Redis (recent data, order books) - Storage: S3/MinIO (charts, PDFs, detailed reports) - Queue: Celery (async ingestion, report generation) - Libraries: Matplotlib, Pandas, NumPy, SciPy, TA-Lib - Tools: Ruff, Pylance, uv

API Endpoints:

Market Data:
  GET    /api/v1/market-data/ohlcv/{symbol}
  GET    /api/v1/market-data/orderbook/{symbol}
  WS     /api/v1/market-data/stream/{symbol}
  POST   /api/v1/market-data/sync
  GET    /api/v1/market-data/gaps

Analytics:
  GET    /api/v1/analytics/performance
  GET    /api/v1/analytics/equity-curve
  GET    /api/v1/analytics/reports/{id}
  POST   /api/v1/analytics/reports/generate
  GET    /api/v1/analytics/correlation
  GET    /api/v1/analytics/sharpe

Backtesting:
  POST   /api/v1/backtest/run
  GET    /api/v1/backtest/{id}/results
  POST   /api/v1/backtest/optimize
  GET    /api/v1/backtest/{id}/report
  POST   /api/v1/backtest/monte-carlo

Data Models: - Candlestick, OrderBook, Ticker, DataGap - PerformanceMetric, Report, EquityCurve, AnalysisJob - BacktestJob, BacktestResult, OptimizationRun, Trade

Integration: - Publishes: Market data events (to Trading Service, ML Service) - Subscribes: Fill events, position updates (for analytics) - Calls: Exchange Service (for data ingestion) - Calls: Core Service (for auth, config)

Scaling: - Horizontal ingestion workers - TimescaleDB compression for historical data - Parallel backtest execution - Celery workers for report generation

Source Code (from monolith): - src/api/data_*.py (data management) - src/analytics/ (entire directory) - src/backtesting/ (entire directory) - src/data/ (data utilities)


4.4 ML Service (dsta-ml-svc)

Responsibility: Machine learning and AI-driven predictions

Capabilities: - Feature Engineering - Technical indicator calculation - Feature extraction and selection - Feature storage and versioning

  • Model Training
  • Regime detection models
  • Reinforcement learning agents
  • Price prediction models
  • Model versioning and deployment

  • Prediction Serving

  • Real-time predictions
  • Batch predictions
  • Model performance monitoring
  • A/B testing framework

  • ML Environments

  • Training environments
  • Simulation environments
  • Model evaluation

Technology Stack: - Framework: Django 4.x + Django REST Framework - Database: PostgreSQL (model metadata, features) - Storage: S3/MinIO (trained models, datasets) - ML Libraries: scikit-learn, TensorFlow/PyTorch, Stable-Baselines3 - Tools: Ruff, Pylance, uv - Hardware: GPU support for training

API Endpoints:

Models:
  POST   /api/v1/ml/models
  GET    /api/v1/ml/models
  GET    /api/v1/ml/models/{id}
  DELETE /api/v1/ml/models/{id}

Training:
  POST   /api/v1/ml/train
  GET    /api/v1/ml/training/{id}/status
  GET    /api/v1/ml/training/{id}/metrics

Predictions:
  POST   /api/v1/ml/predict
  POST   /api/v1/ml/predict/batch
  GET    /api/v1/ml/predictions/{id}

Features:
  GET    /api/v1/ml/features/{symbol}
  POST   /api/v1/ml/features/extract
  GET    /api/v1/ml/features/importance

Data Models: - MLModel, ModelVersion, ModelMetrics - Feature, FeatureSet, FeatureImportance - Prediction, TrainingJob, Environment

Integration: - Subscribes: Market data events (for feature extraction) - Publishes: Prediction signals (to Trading Service) - Reads: Historical data from Data Service - Calls: Core Service (for auth, config)

Scaling: - GPU workers for model training - Horizontal inference serving - Feature caching in Redis - Async training jobs via Celery

Source Code (from monolith): - src/ml/ (entire directory) - Feature engineering logic - RL environments


4.5 Exchange Service (dsta-exchange-svc)

Responsibility: Unified exchange integration layer

Capabilities: - Multi-Exchange Adapters - Binance integration - Gate.io integration - Huobi integration - Unified API interface

  • Connection Management
  • REST API client pooling
  • WebSocket connection management
  • Connection health monitoring
  • Automatic reconnection

  • Rate Limiting

  • Per-exchange rate limiting
  • Request throttling
  • Queue management

  • Resilience

  • Circuit breaker pattern
  • Retry logic with exponential backoff
  • Failover handling
  • Error normalization

Technology Stack: - Framework: Django 4.x + Django REST Framework - Cache: Redis (rate limits, connection state) - Libraries: httpx, aiohttp, websockets - Tools: Ruff, Pylance, uv

API Endpoints:

Orders:
  POST   /api/v1/exchange/{name}/order
  GET    /api/v1/exchange/{name}/order/{id}
  POST   /api/v1/exchange/{name}/cancel
  POST   /api/v1/exchange/{name}/batch-cancel

Account:
  GET    /api/v1/exchange/{name}/balance
  GET    /api/v1/exchange/{name}/account

Market Data:
  GET    /api/v1/exchange/{name}/ticker/{symbol}
  GET    /api/v1/exchange/{name}/orderbook/{symbol}
  WS     /api/v1/exchange/{name}/stream

Health:
  GET    /api/v1/exchange/{name}/status
  GET    /api/v1/exchange/health

Data Models: - ExchangeConfig, ExchangeCredentials - RateLimit, ConnectionStatus - APIRequest, APIResponse (logging)

Integration: - Publishes: Market data, order updates, exchange events - External: Binance, Gate.io, Huobi APIs - Calls: Core Service (for credentials, config)

Scaling: - Horizontal scaling with connection pooling - Redis for distributed rate limiting - WebSocket connection per instance

Source Code (from monolith): - src/api/exchanges/ (exchange clients) - src/huobi/ (entire Huobi integration) - src/api/orderbook.py - src/api/websockets/


5. Technology Stack

5.1 Core Technology Choices

Programming & Frameworks

  • Language: Python 3.12+
  • Web Framework: Django 4.x + Django REST Framework
  • ASGI Server: Uvicorn (for WebSocket support)
  • Task Queue: Celery 5.x
  • Linting: Ruff (fast, comprehensive)
  • Type Checking: Pylance (VSCode) + MyPy
  • Package Manager: uv (ultra-fast, reliable)

Databases

  • Relational: PostgreSQL 15+ (users, config, strategies)
  • Time-Series: TimescaleDB (market data, metrics)
  • Cache: Redis 7.x (sessions, queues, pub/sub)
  • Object Storage: MinIO / S3 (reports, models, logs)

Message Broker

  • Primary: RabbitMQ 3.x (reliable, Django-friendly)
  • Alternative: Apache Kafka (high-throughput option)

API Gateway

  • Option 1: Kong (preferred, rich plugin ecosystem)
  • Option 2: Traefik (cloud-native, simple)

Observability

  • Logging: Loki + Promtail
  • Metrics: Prometheus + Grafana
  • Tracing: Jaeger / Tempo
  • APM: Django Debug Toolbar (dev), Sentry (prod)

CI/CD

  • Platform: GitHub Actions
  • Container Registry: GitHub Container Registry (GHCR)
  • Deployment: Docker Compose (dev), Kubernetes (prod)

5.2 Development Tools

[tool.ruff]
line-length = 120
target-version = "py312"
select = ["E", "F", "I", "N", "W", "UP", "B", "C4", "DTZ", "T20", "RET", "SIM"]

[tool.ruff.isort]
known-first-party = ["api", "dsta", "auth", "config", "notification"]
known-third-party = ["django", "rest_framework", "celery"]

[tool.pylance]
typeCheckingMode = "basic"
reportMissingTypeStubs = false

[dependency-groups]
dev = ["ruff>=0.1.0", "mypy>=1.8", "pytest>=7.4", "pytest-django>=4.7"]

5.3 Service Template Structure

Each microservice follows this standardized structure:

service-name/
├── src/
│   ├── api/              # API endpoints
│   ├── models/           # Database models
│   ├── services/         # Business logic
│   ├── tasks/            # Celery tasks
│   ├── serializers/      # DRF serializers
│   └── config/           # Service configuration
├── tests/
│   ├── unit/
│   ├── integration/
│   └── e2e/
├── deploy/
│   ├── Dockerfile
│   ├── docker-compose.yml
│   └── k8s/
├── .github/
│   └── workflows/
│       ├── ci.yml
│       ├── deploy.yml
│       └── security.yml
├── pyproject.toml        # uv + ruff + mypy config
├── requirements.txt      # Generated by uv
└── README.md

6. Data Architecture

6.1 Database Per Service Pattern

Each service owns its database schema:

┌─────────────────┐     ┌──────────────────┐
│  auth-service   │────▶│ PostgreSQL (DB1) │
└─────────────────┘     │ - users          │
                        │ - roles          │
                        │ - permissions    │
                        └──────────────────┘

┌─────────────────┐     ┌──────────────────┐
│strategy-service │────▶│ PostgreSQL (DB2) │
└─────────────────┘     │ - strategies     │
                        │ - signals        │
                        │ - parameters     │
                        └──────────────────┘

┌─────────────────┐     ┌──────────────────┐
│market-data-svc  │────▶│ TimescaleDB      │
└─────────────────┘     │ - candlesticks   │
                        │ - order_books    │
                        │ - tickers        │
                        └──────────────────┘

6.2 Data Consistency Strategies

Strong Consistency: Within service boundaries (ACID transactions)

Eventual Consistency: Cross-service (event-driven updates)

Saga Pattern: Distributed transactions (e.g., order placement)

# Example: Order placement saga
1. Risk Service  Pre-trade check
2. Execution Service  Create order (pending)
3. Exchange Gateway  Submit to exchange
4. [Success] Execution Service  Update order (active)
5. [Success] Position Service  Update position
6. [Failure] Compensation: Cancel order, revert state

6.3 Data Migration Strategy

Phase 1: Logical separation (same DB, different schemas)

CREATE SCHEMA auth_service;
CREATE SCHEMA strategy_service;
CREATE SCHEMA market_data_service;

Phase 2: Physical separation (different DB instances) - Use Django's database routing - Foreign key relationships become API calls - Implement eventual consistency

Phase 3: Polyglot persistence (optimal DB per service) - TimescaleDB for time-series data - PostgreSQL for relational data - Redis for hot data


7. Communication Patterns

7.1 Synchronous Communication (REST)

Use Cases: - Request-response patterns - Data retrieval - Pre-trade validation

Implementation: - Django REST Framework - OpenAPI specification - Circuit breaker pattern (using tenacity or custom)

# Example: Risk check before order
response = requests.post(
    "http://risk-service/api/risk/check",
    json={"order": order_data},
    timeout=2.0
)
if response.status_code == 200 and response.json()["approved"]:
    # Proceed with order

7.2 Asynchronous Communication (Events)

Use Cases: - Market data distribution - Order fill notifications - Position updates

Implementation: - RabbitMQ with topic exchanges - Celery for task processing - Event schema versioning

# Example: Publish fill event
from kombu import Exchange, Queue

fill_exchange = Exchange('fills', type='topic')
fill_queue = Queue('position-service.fills', exchange=fill_exchange, routing_key='fill.#')

# Publisher (execution-service)
producer.publish(
    {'order_id': '123', 'qty': 1.5, 'price': 50000},
    exchange=fill_exchange,
    routing_key='fill.BTCUSDT'
)

# Consumer (position-service)
@app.task
def handle_fill(message):
    update_position(message)

7.3 WebSocket Communication

Use Cases: - Real-time market data streaming - Live trading updates - Dashboard updates

Implementation: - Django Channels (ASGI) - Redis as channel layer

# WebSocket consumer
class MarketDataConsumer(AsyncWebsocketConsumer):
    async def connect(self):
        await self.channel_layer.group_add("market_data", self.channel_name)
        await self.accept()

    async def market_update(self, event):
        await self.send(text_data=json.dumps(event['data']))

7.4 API Contracts

All inter-service communication uses versioned contracts:

# openapi.yaml (excerpt)
/api/v1/orders:
  post:
    summary: Create order
    requestBody:
      content:
        application/json:
          schema:
            $ref: '#/components/schemas/OrderRequest'
    responses:
      201:
        description: Order created
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/OrderResponse'

8. Migration Strategy

8.1 Migration Approach: Strangler Fig Pattern

Gradually replace monolith components with microservices while maintaining system operation.

Phase 1: Extract peripheral services
Phase 2: Extract data services
Phase 3: Extract core trading services
Phase 4: Decommission monolith

8.2 Step-by-Step Migration Plan

Phase 1: Infrastructure Setup (Weeks 1-2)

  • Set up API Gateway (Kong/Traefik)
  • Deploy RabbitMQ/Kafka message broker
  • Set up TimescaleDB for time-series data
  • Configure Redis cluster
  • Set up monitoring stack (Prometheus, Grafana, Loki, Jaeger)
  • Create service templates and CI/CD pipelines
  • Establish branching strategy and deployment process

Why First? Foundation for all services


Phase 2: Extract Core Service (Weeks 3-5)

  • Create dsta-core-svc repository and structure
  • Migrate authentication and user models from Django auth
  • Migrate configuration management from src/dsta/config.py
  • Migrate notification logic (Telegram, email)
  • Implement JWT token generation and validation
  • Set up API Gateway authentication
  • Deploy to staging and validate
  • Update monolith to use Core Service for auth

Source Files:

src/dsta/settings.py → dsta-core-svc/src/auth/
src/dsta/config.py → dsta-core-svc/src/config/
Telegram integration → dsta-core-svc/src/notifications/

Why Second? Provides authentication/config foundation for all other services

Validation: - ✓ Users can login via Core Service - ✓ All services can validate tokens - ✓ Configuration updates propagate in real-time - ✓ Notifications sent successfully


Phase 3: Extract Exchange Service (Weeks 6-8)

  • Create dsta-exchange-svc repository
  • Consolidate exchange clients (Binance, Gate.io, Huobi)
  • Implement unified exchange API interface
  • Add connection pooling and rate limiting
  • Implement circuit breaker for exchange failures
  • Set up WebSocket connection management
  • Deploy and validate

Source Files:

src/api/exchanges/ → dsta-exchange-svc/src/adapters/
src/huobi/ → dsta-exchange-svc/src/adapters/huobi/
src/api/websockets/ → dsta-exchange-svc/src/websockets/
src/api/orderbook.py → dsta-exchange-svc/src/orderbook/

Why Third? Isolates external dependencies, shared by Data and Trading services

Validation: - ✓ Can place/cancel orders on all exchanges - ✓ WebSocket streams operational - ✓ Rate limiting working correctly - ✓ Circuit breaker triggers on exchange failures


Phase 4: Extract Data Service (Weeks 9-12)

  • Create dsta-data-svc repository
  • Migrate to TimescaleDB for OHLCV data
  • Migrate market data ingestion logic
  • Migrate analytics and reporting modules
  • Migrate backtesting engine
  • Set up S3/MinIO for report storage
  • Implement market data event publishing
  • Deploy and validate

Source Files:

src/api/data_*.py → dsta-data-svc/src/market_data/
src/api/models.py (Candlestick) → dsta-data-svc/src/models/
src/analytics/ → dsta-data-svc/src/analytics/
src/backtesting/ → dsta-data-svc/src/backtesting/
src/data/ → dsta-data-svc/src/data_management/

Why Fourth? High data volume benefits from isolation, needed by Trading and ML

Validation: - ✓ Historical data accessible via API - ✓ Real-time WebSocket streams working - ✓ Backtest execution successful - ✓ Reports generated and stored - ✓ Analytics metrics calculated correctly


Phase 5: Extract Trading Service (Weeks 13-17)

  • Create dsta-trading-svc repository
  • Migrate strategy framework and registry
  • Migrate order execution logic
  • Migrate position management
  • Migrate risk management (limits, circuit breakers)
  • Implement event-driven architecture (subscribe to market data)
  • Implement saga pattern for order placement
  • Deploy and validate with dry-run mode

Source Files:

src/trading/ → dsta-trading-svc/src/
src/backtesting/strategy*.py → dsta-trading-svc/src/strategies/

Why Fifth? Core business logic, depends on Data and Exchange services

Validation: - ✓ Strategies generate signals correctly - ✓ Orders placed and tracked successfully - ✓ Positions calculated in real-time - ✓ Risk checks blocking invalid orders - ✓ Circuit breakers trigger correctly - ✓ Dry-run mode working


Phase 6: Extract ML Service (Weeks 18-21)

  • Create dsta-ml-svc repository
  • Migrate feature engineering modules
  • Migrate ML model training logic
  • Set up model versioning and storage
  • Implement prediction API
  • Add GPU worker support
  • Deploy and validate

Source Files:

src/ml/ → dsta-ml-svc/src/
Feature engineering → dsta-ml-svc/src/features/
Model trainers → dsta-ml-svc/src/trainers/
RL environments → dsta-ml-svc/src/environments/

Why Last? Specialized workload, less integrated, can run independently

Validation: - ✓ Features extracted correctly - ✓ Models training successfully - ✓ Predictions serving via API - ✓ GPU utilization working


Phase 7: Decommission Monolith (Week 22)

  • Verify all functionality migrated
  • Run parallel testing (monolith vs microservices)
  • Switch 100% traffic to microservices
  • Monitor for 1 week
  • Archive monolith codebase
  • Celebrate! 🎉

Final Checks: - ✓ All API endpoints responding - ✓ All background jobs running - ✓ Data consistency verified - ✓ Performance metrics acceptable - ✓ No errors in logs

8.3 Risk Mitigation

Dual-Write Strategy: Write to both monolith and microservice during migration Feature Flags: Toggle between old and new implementation Canary Deployment: Route 10% traffic to new service, monitor, then scale Automated Testing: Maintain test coverage >80% during migration Rollback Plan: Keep monolith running for quick rollback


9. Development Workflow

9.1 Local Development Setup

Each developer runs services locally using Docker Compose:

# docker-compose.dev.yml
version: '3.9'
services:
  api-gateway:
    image: kong:3.4
    ports: ["8000:8000"]

  auth-service:
    build: ./services/auth-service
    environment:
      - DATABASE_URL=postgresql://postgres:postgres@postgres:5432/auth
      - REDIS_URL=redis://redis:6379/0

  market-data-service:
    build: ./services/market-data-service
    environment:
      - DATABASE_URL=postgresql://postgres:postgres@timescaledb:5432/market_data

  rabbitmq:
    image: rabbitmq:3.12-management
    ports: ["5672:5672", "15672:15672"]

  postgres:
    image: postgres:15

  timescaledb:
    image: timescale/timescaledb:latest-pg15

  redis:
    image: redis:7-alpine

Start development environment:

# Clone repositories
git clone https://github.com/org/auth-service.git
git clone https://github.com/org/market-data-service.git

# Start all services
docker-compose -f docker-compose.dev.yml up -d

# Run migrations
./scripts/migrate-all.sh

# Access services
curl http://localhost:8000/auth/health
curl http://localhost:8000/market-data/health

9.2 Service Development Workflow

# 1. Create feature branch
git checkout -b feature/add-websocket-endpoint

# 2. Install dependencies with uv
uv pip install -e ".[dev]"

# 3. Run linting and type checking
ruff check src/
mypy src/

# 4. Run tests
pytest tests/ -v --cov

# 5. Commit with conventional commits
git commit -m "feat: add WebSocket endpoint for real-time data"

# 6. Push and create PR
git push origin feature/add-websocket-endpoint

9.3 Code Review Standards

  • All tests pass (unit, integration, E2E)
  • Code coverage ≥ 80%
  • Ruff linting passes (no warnings)
  • MyPy type checking passes
  • API contract documented (OpenAPI)
  • README updated if needed
  • Changelog updated
  • Performance regression check

10. Deployment & CI/CD

10.1 GitHub Actions Workflows

Each service repository includes standardized workflows:

Workflow 1: CI Pipeline (.github/workflows/ci.yml)

name: CI

on:
  pull_request:
    branches: [main, develop]
  push:
    branches: [main, develop]

jobs:
  lint:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Install uv
        run: curl -LsSf https://astral.sh/uv/install.sh | sh

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.12'

      - name: Install dependencies
        run: uv pip install -e ".[dev]"

      - name: Run Ruff
        run: ruff check src/ tests/

      - name: Run Mypy
        run: mypy src/

  test:
    runs-on: ubuntu-latest
    services:
      postgres:
        image: postgres:15
        env:
          POSTGRES_PASSWORD: postgres
        options: >-
          --health-cmd pg_isready
          --health-interval 10s
          --health-timeout 5s
          --health-retries 5

      redis:
        image: redis:7-alpine
        options: >-
          --health-cmd "redis-cli ping"
          --health-interval 10s
          --health-timeout 5s
          --health-retries 5

    steps:
      - uses: actions/checkout@v4

      - name: Install uv
        run: curl -LsSf https://astral.sh/uv/install.sh | sh

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.12'

      - name: Install dependencies
        run: uv pip install -e ".[dev]"

      - name: Run migrations
        env:
          DATABASE_URL: postgresql://postgres:postgres@localhost:5432/test_db
        run: python manage.py migrate

      - name: Run tests
        env:
          DATABASE_URL: postgresql://postgres:postgres@localhost:5432/test_db
          REDIS_URL: redis://localhost:6379/0
        run: pytest tests/ -v --cov --cov-report=xml

      - name: Upload coverage
        uses: codecov/codecov-action@v3
        with:
          file: ./coverage.xml

  security:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Install uv
        run: curl -LsSf https://astral.sh/uv/install.sh | sh

      - name: Run Bandit
        run: |
          uv pip install bandit
          bandit -r src/ -f json -o bandit-report.json

      - name: Run Safety
        run: |
          uv pip install safety
          safety check --json

Workflow 2: Deploy Pipeline (.github/workflows/deploy.yml)

name: Deploy

on:
  push:
    branches: [main]
  workflow_dispatch:
    inputs:
      environment:
        description: 'Environment to deploy'
        required: true
        type: choice
        options: [staging, production]

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3

      - name: Log in to GitHub Container Registry
        uses: docker/login-action@v3
        with:
          registry: ghcr.io
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}

      - name: Build and push
        uses: docker/build-push-action@v5
        with:
          context: .
          push: true
          tags: |
            ghcr.io/${{ github.repository }}:${{ github.sha }}
            ghcr.io/${{ github.repository }}:latest
          cache-from: type=gha
          cache-to: type=gha,mode=max

  deploy-staging:
    needs: build
    if: github.ref == 'refs/heads/main'
    runs-on: ubuntu-latest
    environment: staging
    steps:
      - name: Deploy to staging
        run: |
          # Update docker-compose or K8s deployment
          ssh deploy@staging-server "docker pull ghcr.io/${{ github.repository }}:${{ github.sha }}"
          ssh deploy@staging-server "docker-compose up -d auth-service"

  deploy-production:
    needs: deploy-staging
    if: github.event_name == 'workflow_dispatch' && github.event.inputs.environment == 'production'
    runs-on: ubuntu-latest
    environment: production
    steps:
      - name: Deploy to production
        run: |
          # Blue-green deployment
          ssh deploy@prod-server "kubectl set image deployment/auth-service auth-service=ghcr.io/${{ github.repository }}:${{ github.sha }}"

10.2 Deployment Strategy

Staging: Auto-deploy on merge to main Production: Manual approval + blue-green deployment

main branch
    ├─▶ Build & Test
    ├─▶ Deploy to Staging
    ├─▶ Automated Tests (E2E, smoke)
    ├─▶ Manual Approval
    └─▶ Deploy to Production (blue-green)

10.3 Service Dependencies

Deployment order based on dependencies:

1. Infrastructure (PostgreSQL, Redis, RabbitMQ, API Gateway)
2. Config Service (all services depend on it)
3. Auth Service (authentication for all)
4. Notification Service (independent)
5. Exchange Gateway (external dependencies)
6. Market Data Service (depends on Exchange Gateway)
7. Risk Service (independent business logic)
8. Execution Service (depends on Risk, Exchange Gateway)
9. Position Service (depends on Execution events)
10. Strategy Service (depends on Market Data)
11. Analytics Service (depends on Position, Execution)
12. Backtest Service (depends on Market Data, Strategy)
13. ML Service (depends on Market Data)

11. Monitoring & Observability

11.1 Observability Stack

┌────────────────────────────────────────┐
│          Grafana Dashboards            │
│  (Unified visualization)               │
└────────────────────────────────────────┘
         │              │              │
         ▼              ▼              ▼
┌──────────────┐ ┌─────────────┐ ┌──────────────┐
│  Prometheus  │ │    Loki     │ │    Jaeger    │
│  (Metrics)   │ │  (Logs)     │ │  (Traces)    │
└──────────────┘ └─────────────┘ └──────────────┘
         │              │              │
         └──────────────┼──────────────┘
         ┌──────────────┴──────────────┐
         │                             │
    Microservices                 API Gateway
    (instrumented)

11.2 Metrics Collection

Each service exposes Prometheus metrics:

# src/monitoring.py
from prometheus_client import Counter, Histogram, Gauge

# Request metrics
http_requests_total = Counter(
    'http_requests_total',
    'Total HTTP requests',
    ['method', 'endpoint', 'status']
)

http_request_duration = Histogram(
    'http_request_duration_seconds',
    'HTTP request duration',
    ['method', 'endpoint']
)

# Business metrics
orders_created = Counter('orders_created_total', 'Orders created')
orders_filled = Counter('orders_filled_total', 'Orders filled')
positions_value = Gauge('positions_value_usd', 'Total position value USD')

Expose at /metrics endpoint:

# urls.py
from django.urls import path
from prometheus_client import make_asgi_app

urlpatterns = [
    # ... other paths
    path('metrics/', make_asgi_app()),
]

11.3 Logging Strategy

Structured logging with JSON format:

import structlog

logger = structlog.get_logger()

logger.info(
    "order_created",
    order_id="123",
    symbol="BTCUSDT",
    quantity=1.5,
    price=50000,
    user_id="user-456"
)

Log aggregation with Loki:

# promtail-config.yaml
clients:
  - url: http://loki:3100/loki/api/v1/push

scrape_configs:
  - job_name: services
    static_configs:
      - targets:
          - localhost
        labels:
          job: dsta-services
          __path__: /var/log/services/*.log
    pipeline_stages:
      - json:
          expressions:
            level: level
            service: service
            message: message

11.4 Distributed Tracing

OpenTelemetry integration:

# src/tracing.py
from opentelemetry import trace
from opentelemetry.exporter.jaeger.thrift import JaegerExporter
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor

provider = TracerProvider()
jaeger_exporter = JaegerExporter(
    agent_host_name="jaeger",
    agent_port=6831,
)
provider.add_span_processor(BatchSpanProcessor(jaeger_exporter))
trace.set_tracer_provider(provider)

# Usage
tracer = trace.get_tracer(__name__)

with tracer.start_as_current_span("process_order"):
    # Process order logic
    with tracer.start_as_current_span("check_risk"):
        # Risk check
        pass
    with tracer.start_as_current_span("submit_to_exchange"):
        # Exchange submission
        pass

11.5 Alerting Rules

Prometheus alerting rules:

# alerts.yml
groups:
  - name: service_health
    rules:
      - alert: ServiceDown
        expr: up == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Service {{ $labels.job }} is down"

      - alert: HighErrorRate
        expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.05
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High error rate on {{ $labels.service }}"

      - alert: OrderExecutionSlow
        expr: histogram_quantile(0.95, rate(order_execution_duration_seconds_bucket[5m])) > 2
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "Order execution is slow (p95 > 2s)"

12. Timeline & Milestones

12.1 Project Timeline (22 Weeks)

Weeks 1-2:   Infrastructure Setup
Weeks 3-5:   Extract Core Service (Auth, Config, Notifications)
Weeks 6-8:   Extract Exchange Service
Weeks 9-12:  Extract Data Service (Market Data, Analytics, Backtesting)
Weeks 13-17: Extract Trading Service (Strategies, Execution, Risk, Positions)
Weeks 18-21: Extract ML Service
Week 22:     Decommission Monolith & Go Live

12.2 Key Milestones

Milestone Week Deliverables Success Criteria
M1: Foundation 2 Infrastructure, CI/CD, Monitoring All tools deployed, templates ready
M2: Core Service 5 Auth, Config, Notifications Authentication working, config centralized
M3: Exchange Service 8 Multi-exchange gateway All exchanges accessible, rate limiting working
M4: Data Service 12 Market data, Analytics, Backtesting Data flowing, backtests running
M5: Trading Service 17 Strategies, Execution, Risk, Positions Live trading operational (dry-run)
M6: ML Service 21 ML models, Predictions Models deployed, predictions serving
M7: Go Live 22 Monolith decommissioned 100% traffic on microservices

12.3 Resource Requirements

Team Composition: - 2 Backend Engineers (Python/Django) - 1 DevOps Engineer (Docker, K8s, CI/CD) - 1 QA Engineer (Testing, automation) - 1 Tech Lead (Architecture, code review)

Part-time: - 1 Security Engineer (weeks 1-6, 28-30) - 1 Data Engineer (weeks 7-11, 27-29)


Appendix A: Service Communication Matrix

Service Calls (Sync) Publishes (Async) Subscribes (Async)
dsta-core-svc - Config updates, User events -
dsta-trading-svc Core (auth, config), Exchange (orders), Data (market data API) Order events, Fill events, Position updates, Risk violations Market data events
dsta-data-svc Core (auth, config), Exchange (market data) Market data events, Report completion Fill events, Position updates
dsta-ml-svc Core (auth, config), Data (historical data) Prediction signals Market data events
dsta-exchange-svc Core (credentials, config) Market data events, Order updates, Exchange status -

Appendix B: Database Schema Migration

Example: Candlestick Model Migration

Before (Monolith):

# src/api/models.py
class Candlestick(models.Model):
    exchange = models.CharField(max_length=20)
    symbol = models.CharField(max_length=20)
    timestamp = models.DateTimeField()
    # ... OHLCV fields

After (Market Data Service):

# services/market-data-service/src/models.py
class Candlestick(models.Model):
    exchange = models.CharField(max_length=20)
    symbol = models.CharField(max_length=20)
    timestamp = models.DateTimeField()
    # ... OHLCV fields

    class Meta:
        db_table = 'candlesticks'  # Same table name
        managed = True

Migration SQL:

-- Create new database
CREATE DATABASE market_data_service;

-- Copy data
INSERT INTO market_data_service.candlesticks
SELECT * FROM dsta.candlesticks;

-- Verify
SELECT COUNT(*) FROM market_data_service.candlesticks;
SELECT COUNT(*) FROM dsta.candlesticks;
-- Should match

-- After verification, drop from monolith
DROP TABLE dsta.candlesticks;


Appendix C: Example Service Implementation

Minimal Django Service Structure (dsta-core-svc)

dsta-core-svc/
├── src/
│   ├── core_service/
│   │   ├── __init__.py
│   │   ├── settings.py
│   │   ├── urls.py
│   │   ├── wsgi.py
│   │   └── asgi.py
│   ├── auth/
│   │   ├── __init__.py
│   │   ├── models.py
│   │   ├── views.py
│   │   ├── serializers.py
│   │   ├── services.py
│   │   └── urls.py
│   ├── config/
│   │   ├── __init__.py
│   │   ├── models.py
│   │   ├── views.py
│   │   ├── serializers.py
│   │   └── urls.py
│   ├── notifications/
│   │   ├── __init__.py
│   │   ├── models.py
│   │   ├── tasks.py
│   │   ├── telegram.py
│   │   └── urls.py
│   └── manage.py
├── tests/
│   ├── test_auth.py
│   ├── test_config.py
│   └── test_notifications.py
├── deploy/
│   ├── Dockerfile
│   ├── docker-compose.yml
│   └── k8s/
├── .github/
│   └── workflows/
│       ├── ci.yml
│       └── deploy.yml
├── pyproject.toml
├── requirements.txt
└── README.md

pyproject.toml:

[project]
name = "dsta-core-svc"
version = "1.0.0"
requires-python = ">=3.12"
dependencies = [
    "django>=4.2",
    "djangorestframework>=3.14",
    "psycopg2-binary>=2.9",
    "redis>=5.0",
    "celery>=5.3",
    "pyjwt>=2.8",
    "cryptography>=41.0",
]

[tool.ruff]
line-length = 120
target-version = "py312"
select = ["E", "F", "I", "W", "UP", "B", "C4", "DTZ"]

[tool.ruff.isort]
known-first-party = ["core_service", "auth", "config", "notifications"]
known-third-party = ["django", "rest_framework", "celery"]

[dependency-groups]
dev = [
    "ruff>=0.1",
    "mypy>=1.8",
    "pytest>=7.4",
    "pytest-django>=4.7",
    "pytest-cov>=4.1",
]

Dockerfile:

FROM python:3.12-slim

WORKDIR /app

# Install uv
RUN pip install uv

# Copy dependencies
COPY requirements.txt .
RUN uv pip install --system -r requirements.txt

# Copy application
COPY src/ ./src/

# Expose port
EXPOSE 8000

# Run migrations and start server
CMD ["sh", "-c", "python src/manage.py migrate && gunicorn -w 4 -b 0.0.0.0:8000 core_service.wsgi:application"]

docker-compose.yml (for local development):

version: '3.9'

services:
  core-service:
    build: .
    ports:
      - "8001:8000"
    environment:
      - DATABASE_URL=postgresql://postgres:postgres@postgres:5432/core_db
      - REDIS_URL=redis://redis:6379/0
      - SECRET_KEY=dev-secret-key-change-in-production
      - DEBUG=True
    depends_on:
      - postgres
      - redis
    volumes:
      - ./src:/app/src

  postgres:
    image: postgres:15-alpine
    environment:
      - POSTGRES_DB=core_db
      - POSTGRES_USER=postgres
      - POSTGRES_PASSWORD=postgres
    volumes:
      - postgres_data:/var/lib/postgresql/data

  redis:
    image: redis:7-alpine
    ports:
      - "6379:6379"

  celery-worker:
    build: .
    command: celery -A core_service worker -l info
    environment:
      - DATABASE_URL=postgresql://postgres:postgres@postgres:5432/core_db
      - REDIS_URL=redis://redis:6379/0
    depends_on:
      - postgres
      - redis
    volumes:
      - ./src:/app/src

volumes:
  postgres_data:


Appendix D: Testing Strategy

Testing Pyramid

        /\
       /  \      E2E Tests (10%)
      /────\     - Full workflow tests
     /      \    - Cross-service integration
    /────────\   Integration Tests (30%)
   /          \  - API contract tests
  /────────────\ - Database integration
 /              \ Unit Tests (60%)
/________________\- Business logic
                  - Utilities, helpers

Test Coverage Requirements

  • Unit Tests: ≥80% coverage per service
  • Integration Tests: All API endpoints covered
  • E2E Tests: Critical user journeys (order placement, risk check)

Example Test

# tests/test_order_execution.py
import pytest
from django.test import TestCase
from trading.models import Order
from trading.services import OrderExecutionService

@pytest.mark.django_db
class TestOrderExecution(TestCase):
    def test_create_order_success(self):
        """Test successful order creation within dsta-trading-svc"""
        service = OrderExecutionService()
        order = service.create_order(
            symbol="BTCUSDT",
            side="BUY",
            quantity=1.0,
            price=50000
        )
        assert order.status == "PENDING"
        assert Order.objects.filter(id=order.id).exists()

    def test_create_order_fails_risk_check(self):
        """Test order rejection by risk management"""
        with pytest.raises(RiskViolationError):
            service = OrderExecutionService()
            service.create_order(
                symbol="BTCUSDT",
                side="BUY",
                quantity=1000000,  # Exceeds limit
                price=50000
            )

    def test_position_update_on_fill(self):
        """Test position calculation after order fill"""
        # Create and fill order
        order = OrderExecutionService().create_order(
            symbol="BTCUSDT", side="BUY", quantity=1.0, price=50000
        )
        order.status = "FILLED"
        order.save()

        # Verify position updated
        position = Position.objects.get(symbol="BTCUSDT")
        assert position.quantity == 1.0
        assert position.avg_price == 50000

Conclusion

This microservices plan provides a practical roadmap for transforming DSTA from a monolithic Django application into 5 domain-based microservices. By using larger services that encapsulate complete domains, we achieve a balance between modularity and operational simplicity.

Key Benefits: - Simpler Operations: 5 services instead of 12+ means easier monitoring and deployment - Domain Cohesion: Each service contains related functionality (auth + config + notifications in Core) - Reduced Communication: Fewer inter-service calls compared to fine-grained microservices - Easier Development: Developers can work within a domain without crossing many service boundaries

Architecture Summary: 1. dsta-core-svc: Auth, Config, Notifications, User Management 2. dsta-trading-svc: Strategies, Execution, Positions, Risk Management 3. dsta-data-svc: Market Data, Analytics, Backtesting, Reporting 4. dsta-ml-svc: Feature Engineering, Model Training, Predictions 5. dsta-exchange-svc: Multi-Exchange Gateway, WebSocket Connections

Migration Timeline: 22 weeks (5.5 months)

Technology Stack: Django 4.x, Ruff, Pylance, uv, GitHub Actions, RabbitMQ, PostgreSQL, TimescaleDB, Redis

Next Steps: 1. Review and approve this plan 2. Set up infrastructure (Weeks 1-2) 3. Begin with Core Service extraction (Weeks 3-5) 4. Follow the phased migration approach 5. Monitor and iterate

Success Metrics: - Zero downtime during migration - 99.9% uptime post-migration - Deployment frequency: 2-3x per week per service - Lead time for changes: <2 days - Mean time to recovery: <30 minutes - Test coverage: ≥80% per service


Document Version: 1.0
Last Updated: 2026-01-28
Status: Ready for Review
Author: DSTA Engineering Team