Skip to content

Docker Production Deployment Guide

This document describes the production Docker setup for the DSTA trading system.

Overview

The production deployment uses multi-stage Docker builds with security hardening and consists of:

  • API Server: Django/ASGI application server (2 replicas)
  • Celery Worker: Background task processing (3 replicas)
  • Celery Beat: Task scheduler (1 replica)
  • PostgreSQL: Primary database
  • Redis: Cache and message broker
  • Nginx: Reverse proxy and load balancer

Architecture

┌──────────┐
│  Client  │
└────┬─────┘
┌─────────────┐
│   Nginx     │ ◄── Reverse Proxy, SSL, Static Files
│  (Port 80)  │
└──────┬──────┘
┌───────────────────┐
│   API Server      │ ◄── Django/ASGI Application
│   (2 replicas)    │
└────┬──────────────┘
     ├──► PostgreSQL  ◄── Database
     └──► Redis  ◄──► Celery Worker (3 replicas)
                      Celery Beat (1 replica)

Production Images

Multi-Stage Builds

All production Dockerfiles use multi-stage builds:

  1. Builder Stage: Compiles dependencies (TA-Lib, Python packages)
  2. Runtime Stage: Minimal runtime image with only necessary components

Security Features

  • ✅ Non-root user (dsta:dsta UID/GID 1000)
  • ✅ Alpine/Slim base images for minimal attack surface
  • ✅ No development tools in runtime images
  • ✅ Read-only filesystem where possible
  • ✅ Health checks for all services
  • ✅ Resource limits (CPU/Memory)

Image Sizes (Approximate)

  • dsta-api:latest: ~300MB
  • dsta-worker:latest: ~250MB
  • dsta-scheduler:latest: ~250MB

Quick Start

Prerequisites

  • Docker 24.0+
  • Docker Compose 2.20+
  • 8GB RAM minimum
  • 50GB disk space

Initial Setup

  1. Copy environment template:

    cp .env_files/prod.env.template .env_files/prod.env
    

  2. Edit production environment:

    vim .env_files/prod.env
    

Critical values to change: - SECRET_KEY: Generate with python -c 'from django.core.management.utils import get_random_secret_key; print(get_random_secret_key())' - POSTGRES_PASSWORD: Strong password - ALLOWED_HOSTS: Your domain names - API keys and credentials

  1. Build production images:

    cd deploy
    docker-compose -f docker-compose.prod.yml build
    

  2. Start services:

    docker-compose -f docker-compose.prod.yml up -d
    

  3. Check service health:

    docker-compose -f docker-compose.prod.yml ps
    

  4. View logs:

    docker-compose -f docker-compose.prod.yml logs -f api-server
    

Service Configuration

API Server

  • Replicas: 2 (for high availability)
  • Port: 8000 (internal)
  • Workers: 4 Uvicorn workers
  • Health Check: GET /health/ every 30s
  • Resource Limits: 2 CPU, 2GB RAM

Celery Worker

  • Replicas: 3 (for parallel task processing)
  • Concurrency: 4 workers per container
  • Max Tasks per Child: 1000 (prevents memory leaks)
  • Resource Limits: 2 CPU, 2GB RAM

Celery Beat

  • Replicas: 1 (only one scheduler needed)
  • Scheduler: Django-celery-beat (database-backed)
  • Resource Limits: 0.5 CPU, 512MB RAM

PostgreSQL

  • Version: PostgreSQL 17 Alpine
  • Encoding: UTF-8
  • Persistence: Named volume postgres_data
  • Health Check: pg_isready every 10s
  • Resource Limits: 2 CPU, 2GB RAM

Redis

  • Version: Redis 8 Alpine
  • Persistence: AOF + RDB snapshots
  • Max Memory: 1GB (LRU eviction)
  • Save Policies:
  • After 900s if 1 key changed
  • After 300s if 10 keys changed
  • After 60s if 10000 keys changed
  • Resource Limits: 1 CPU, 1GB RAM

Nginx

  • Version: Nginx Alpine
  • Features:
  • HTTP/2 support
  • Gzip compression
  • Static file caching (30 days)
  • Rate limiting (100 req/s per IP)
  • WebSocket support
  • Security headers
  • Resource Limits: 0.5 CPU, 256MB RAM

Networking

All services communicate via bridge network dsta-network (172.20.0.0/16).

Service URLs (internal): - PostgreSQL: postgres:5432 - Redis: redis:6379 - API Server: api-server:8000

Volumes

Production uses named volumes for data persistence:

Volume Purpose Backup Priority
postgres_data Database Critical ⚠️
redis_data Cache persistence Medium
api_logs Application logs Low
worker_logs Worker logs Low
scheduler_logs Scheduler logs Low
static_files Static assets Low (regenerable)
media_files User uploads High
celerybeat_schedule Task schedule Medium

Backup Strategy

Database Backups

Automated daily backups:

# Backup script (add to cron)
docker exec dsta-postgres-prod pg_dump -U dsta dsta > backup-$(date +%Y%m%d).sql

Volume Backups

# Backup volumes
docker run --rm \
  -v postgres_data:/data \
  -v $(pwd):/backup \
  alpine tar czf /backup/postgres_data.tar.gz /data

Scaling

Horizontal Scaling

Scale specific services:

# Scale API servers
docker-compose -f docker-compose.prod.yml up -d --scale api-server=4

# Scale workers
docker-compose -f docker-compose.prod.yml up -d --scale celery-worker=5

Resource Limits

Adjust in docker-compose.prod.yml:

deploy:
  resources:
    limits:
      cpus: '4'
      memory: 4G

SSL/TLS Configuration

Using Let's Encrypt

  1. Install certbot:

    apt-get install certbot
    

  2. Generate certificates:

    certbot certonly --standalone -d your-domain.com
    

  3. Copy certificates:

    mkdir -p deploy/ssl
    cp /etc/letsencrypt/live/your-domain.com/fullchain.pem deploy/ssl/cert.pem
    cp /etc/letsencrypt/live/your-domain.com/privkey.pem deploy/ssl/key.pem
    

  4. Update nginx config: Uncomment HTTPS server block in deploy/nginx/conf.d/dsta.conf

  5. Restart nginx:

    docker-compose -f docker-compose.prod.yml restart nginx
    

Certificate Renewal

Add to crontab:

0 0 1 * * certbot renew --quiet && docker-compose -f /path/to/deploy/docker-compose.prod.yml restart nginx

Monitoring

Health Checks

# Check all services
curl http://localhost/health/

# Check specific service
docker-compose -f docker-compose.prod.yml ps

Logs

# All services
docker-compose -f docker-compose.prod.yml logs -f

# Specific service
docker-compose -f docker-compose.prod.yml logs -f api-server

# Last 100 lines
docker-compose -f docker-compose.prod.yml logs --tail=100 celery-worker

Resource Usage

# Container stats
docker stats

# Detailed stats
docker-compose -f docker-compose.prod.yml stats

Maintenance

Update Images

# Pull latest code
git pull origin main

# Rebuild images
docker-compose -f docker-compose.prod.yml build

# Rolling update (zero downtime)
docker-compose -f docker-compose.prod.yml up -d --no-deps --build api-server

Database Migrations

# Run migrations
docker-compose -f docker-compose.prod.yml exec api-server python manage.py migrate

# Check migration status
docker-compose -f docker-compose.prod.yml exec api-server python manage.py showmigrations

Clear Cache

# Clear Redis cache
docker-compose -f docker-compose.prod.yml exec redis redis-cli FLUSHDB

Troubleshooting

Container Won't Start

# Check logs
docker-compose -f docker-compose.prod.yml logs [service-name]

# Check service status
docker-compose -f docker-compose.prod.yml ps

# Restart service
docker-compose -f docker-compose.prod.yml restart [service-name]

Database Connection Issues

# Check PostgreSQL health
docker-compose -f docker-compose.prod.yml exec postgres pg_isready -U dsta

# Check connection from API
docker-compose -f docker-compose.prod.yml exec api-server python manage.py dbshell

High Memory Usage

# Check container stats
docker stats

# Adjust resource limits in docker-compose.prod.yml
# Restart affected service
docker-compose -f docker-compose.prod.yml restart [service-name]

Worker Tasks Not Processing

# Check worker logs
docker-compose -f docker-compose.prod.yml logs -f celery-worker

# Check Redis connection
docker-compose -f docker-compose.prod.yml exec redis redis-cli PING

# Restart workers
docker-compose -f docker-compose.prod.yml restart celery-worker celery-beat

Security Best Practices

  1. Environment Variables: Never commit .env.prod to git
  2. Secret Rotation: Rotate secrets regularly (90 days)
  3. Updates: Keep base images updated
  4. Network Isolation: Use Docker networks for service isolation
  5. Access Control: Restrict nginx /metrics endpoint
  6. Firewall: Only expose necessary ports (80, 443)
  7. Backups: Encrypted backups to off-site storage
  8. Monitoring: Set up alerts for security events

Performance Tuning

PostgreSQL

Edit docker-compose.prod.yml:

postgres:
  command: >
    postgres
    -c shared_buffers=256MB
    -c effective_cache_size=1GB
    -c max_connections=100
    -c work_mem=4MB

Redis

Adjust maxmemory policy:

redis:
  command: >
    redis-server
    --maxmemory 2gb
    --maxmemory-policy allkeys-lru

Nginx

Increase worker connections:

events {
    worker_connections 4096;
}

Disaster Recovery

Full System Recovery

  1. Restore database:

    docker-compose -f docker-compose.prod.yml exec -T postgres psql -U dsta dsta < backup.sql
    

  2. Restore volumes:

    docker run --rm -v postgres_data:/data -v $(pwd):/backup alpine tar xzf /backup/postgres_data.tar.gz -C /
    

  3. Restart services:

    docker-compose -f docker-compose.prod.yml restart
    

Cost Optimization

Resource Allocation

  • Start with minimal resources
  • Monitor usage with docker stats
  • Scale up based on actual needs
  • Use resource reservations for guaranteed performance

Image Optimization

  • Use .dockerignore to exclude unnecessary files
  • Multi-stage builds to reduce image size
  • Regular cleanup: docker system prune -a

CI/CD Integration

Production images are designed for CI/CD pipelines. See docs/DEPLOYMENT_AUTOMATION.md for integration with GitHub Actions, GitLab CI, or Jenkins.

Support

For issues or questions: - Check logs first - Review troubleshooting section - Open issue on GitHub - Contact DevOps team

Version History

  • 1.0.0 (2025-01-27): Initial production Docker setup