Docker Production Deployment Guide¶
This document describes the production Docker setup for the DSTA trading system.
Overview¶
The production deployment uses multi-stage Docker builds with security hardening and consists of:
- API Server: Django/ASGI application server (2 replicas)
- Celery Worker: Background task processing (3 replicas)
- Celery Beat: Task scheduler (1 replica)
- PostgreSQL: Primary database
- Redis: Cache and message broker
- Nginx: Reverse proxy and load balancer
Architecture¶
┌──────────┐
│ Client │
└────┬─────┘
│
▼
┌─────────────┐
│ Nginx │ ◄── Reverse Proxy, SSL, Static Files
│ (Port 80) │
└──────┬──────┘
│
▼
┌───────────────────┐
│ API Server │ ◄── Django/ASGI Application
│ (2 replicas) │
└────┬──────────────┘
│
├──► PostgreSQL ◄── Database
│
└──► Redis ◄──► Celery Worker (3 replicas)
Celery Beat (1 replica)
Production Images¶
Multi-Stage Builds¶
All production Dockerfiles use multi-stage builds:
- Builder Stage: Compiles dependencies (TA-Lib, Python packages)
- Runtime Stage: Minimal runtime image with only necessary components
Security Features¶
- ✅ Non-root user (
dsta:dstaUID/GID 1000) - ✅ Alpine/Slim base images for minimal attack surface
- ✅ No development tools in runtime images
- ✅ Read-only filesystem where possible
- ✅ Health checks for all services
- ✅ Resource limits (CPU/Memory)
Image Sizes (Approximate)¶
dsta-api:latest: ~300MBdsta-worker:latest: ~250MBdsta-scheduler:latest: ~250MB
Quick Start¶
Prerequisites¶
- Docker 24.0+
- Docker Compose 2.20+
- 8GB RAM minimum
- 50GB disk space
Initial Setup¶
-
Copy environment template:
-
Edit production environment:
Critical values to change: - SECRET_KEY: Generate with python -c 'from django.core.management.utils import get_random_secret_key; print(get_random_secret_key())' - POSTGRES_PASSWORD: Strong password - ALLOWED_HOSTS: Your domain names - API keys and credentials
-
Build production images:
-
Start services:
-
Check service health:
-
View logs:
Service Configuration¶
API Server¶
- Replicas: 2 (for high availability)
- Port: 8000 (internal)
- Workers: 4 Uvicorn workers
- Health Check:
GET /health/every 30s - Resource Limits: 2 CPU, 2GB RAM
Celery Worker¶
- Replicas: 3 (for parallel task processing)
- Concurrency: 4 workers per container
- Max Tasks per Child: 1000 (prevents memory leaks)
- Resource Limits: 2 CPU, 2GB RAM
Celery Beat¶
- Replicas: 1 (only one scheduler needed)
- Scheduler: Django-celery-beat (database-backed)
- Resource Limits: 0.5 CPU, 512MB RAM
PostgreSQL¶
- Version: PostgreSQL 17 Alpine
- Encoding: UTF-8
- Persistence: Named volume
postgres_data - Health Check:
pg_isreadyevery 10s - Resource Limits: 2 CPU, 2GB RAM
Redis¶
- Version: Redis 8 Alpine
- Persistence: AOF + RDB snapshots
- Max Memory: 1GB (LRU eviction)
- Save Policies:
- After 900s if 1 key changed
- After 300s if 10 keys changed
- After 60s if 10000 keys changed
- Resource Limits: 1 CPU, 1GB RAM
Nginx¶
- Version: Nginx Alpine
- Features:
- HTTP/2 support
- Gzip compression
- Static file caching (30 days)
- Rate limiting (100 req/s per IP)
- WebSocket support
- Security headers
- Resource Limits: 0.5 CPU, 256MB RAM
Networking¶
All services communicate via bridge network dsta-network (172.20.0.0/16).
Service URLs (internal): - PostgreSQL: postgres:5432 - Redis: redis:6379 - API Server: api-server:8000
Volumes¶
Production uses named volumes for data persistence:
| Volume | Purpose | Backup Priority |
|---|---|---|
postgres_data | Database | Critical ⚠️ |
redis_data | Cache persistence | Medium |
api_logs | Application logs | Low |
worker_logs | Worker logs | Low |
scheduler_logs | Scheduler logs | Low |
static_files | Static assets | Low (regenerable) |
media_files | User uploads | High |
celerybeat_schedule | Task schedule | Medium |
Backup Strategy¶
Database Backups¶
Automated daily backups:
# Backup script (add to cron)
docker exec dsta-postgres-prod pg_dump -U dsta dsta > backup-$(date +%Y%m%d).sql
Volume Backups¶
# Backup volumes
docker run --rm \
-v postgres_data:/data \
-v $(pwd):/backup \
alpine tar czf /backup/postgres_data.tar.gz /data
Scaling¶
Horizontal Scaling¶
Scale specific services:
# Scale API servers
docker-compose -f docker-compose.prod.yml up -d --scale api-server=4
# Scale workers
docker-compose -f docker-compose.prod.yml up -d --scale celery-worker=5
Resource Limits¶
Adjust in docker-compose.prod.yml:
SSL/TLS Configuration¶
Using Let's Encrypt¶
-
Install certbot:
-
Generate certificates:
-
Copy certificates:
-
Update nginx config: Uncomment HTTPS server block in
deploy/nginx/conf.d/dsta.conf -
Restart nginx:
Certificate Renewal¶
Add to crontab:
0 0 1 * * certbot renew --quiet && docker-compose -f /path/to/deploy/docker-compose.prod.yml restart nginx
Monitoring¶
Health Checks¶
# Check all services
curl http://localhost/health/
# Check specific service
docker-compose -f docker-compose.prod.yml ps
Logs¶
# All services
docker-compose -f docker-compose.prod.yml logs -f
# Specific service
docker-compose -f docker-compose.prod.yml logs -f api-server
# Last 100 lines
docker-compose -f docker-compose.prod.yml logs --tail=100 celery-worker
Resource Usage¶
Maintenance¶
Update Images¶
# Pull latest code
git pull origin main
# Rebuild images
docker-compose -f docker-compose.prod.yml build
# Rolling update (zero downtime)
docker-compose -f docker-compose.prod.yml up -d --no-deps --build api-server
Database Migrations¶
# Run migrations
docker-compose -f docker-compose.prod.yml exec api-server python manage.py migrate
# Check migration status
docker-compose -f docker-compose.prod.yml exec api-server python manage.py showmigrations
Clear Cache¶
Troubleshooting¶
Container Won't Start¶
# Check logs
docker-compose -f docker-compose.prod.yml logs [service-name]
# Check service status
docker-compose -f docker-compose.prod.yml ps
# Restart service
docker-compose -f docker-compose.prod.yml restart [service-name]
Database Connection Issues¶
# Check PostgreSQL health
docker-compose -f docker-compose.prod.yml exec postgres pg_isready -U dsta
# Check connection from API
docker-compose -f docker-compose.prod.yml exec api-server python manage.py dbshell
High Memory Usage¶
# Check container stats
docker stats
# Adjust resource limits in docker-compose.prod.yml
# Restart affected service
docker-compose -f docker-compose.prod.yml restart [service-name]
Worker Tasks Not Processing¶
# Check worker logs
docker-compose -f docker-compose.prod.yml logs -f celery-worker
# Check Redis connection
docker-compose -f docker-compose.prod.yml exec redis redis-cli PING
# Restart workers
docker-compose -f docker-compose.prod.yml restart celery-worker celery-beat
Security Best Practices¶
- Environment Variables: Never commit
.env.prodto git - Secret Rotation: Rotate secrets regularly (90 days)
- Updates: Keep base images updated
- Network Isolation: Use Docker networks for service isolation
- Access Control: Restrict nginx
/metricsendpoint - Firewall: Only expose necessary ports (80, 443)
- Backups: Encrypted backups to off-site storage
- Monitoring: Set up alerts for security events
Performance Tuning¶
PostgreSQL¶
Edit docker-compose.prod.yml:
postgres:
command: >
postgres
-c shared_buffers=256MB
-c effective_cache_size=1GB
-c max_connections=100
-c work_mem=4MB
Redis¶
Adjust maxmemory policy:
Nginx¶
Increase worker connections:
Disaster Recovery¶
Full System Recovery¶
-
Restore database:
-
Restore volumes:
-
Restart services:
Cost Optimization¶
Resource Allocation¶
- Start with minimal resources
- Monitor usage with
docker stats - Scale up based on actual needs
- Use resource reservations for guaranteed performance
Image Optimization¶
- Use
.dockerignoreto exclude unnecessary files - Multi-stage builds to reduce image size
- Regular cleanup:
docker system prune -a
CI/CD Integration¶
Production images are designed for CI/CD pipelines. See docs/DEPLOYMENT_AUTOMATION.md for integration with GitHub Actions, GitLab CI, or Jenkins.
Support¶
For issues or questions: - Check logs first - Review troubleshooting section - Open issue on GitHub - Contact DevOps team
Version History¶
- 1.0.0 (2025-01-27): Initial production Docker setup