Skip to content

Database Initialization Guide

Overview

This guide explains how to initialize and manage the DSTA database, including schema creation, migrations, and test data generation.

Quick Start

Automatic Initialization

The easiest way to initialize the database is using the provided script:

# Basic initialization
./scripts/init_db.sh

# With sample data
./scripts/init_db.sh --with-sample-data

# With tests
./scripts/init_db.sh --with-tests

# Custom database settings
./scripts/init_db.sh --db-host localhost --db-name dsta_dev --with-sample-data

Manual Initialization

If you prefer manual control:

cd src

# 1. Run migrations
python manage.py migrate

# 2. Create superuser
python manage.py createsuperuser

# 3. (Optional) Load sample data
python manage.py loaddata api/fixtures/sample_data.json

# 4. (Optional) Generate test data
python manage.py generate_test_data --days 30

Database Schema

Tables

The DSTA database consists of four main tables:

1. Candlesticks

Stores OHLCV (Open, High, Low, Close, Volume) candlestick data.

Fields: - exchange: Exchange name (binance, gateio, huobi) - symbol: Trading pair (e.g., BTCUSDT) - interval: Timeframe (1m, 5m, 15m, 30m, 1h, 4h, 1d, 7d) - timestamp: Candlestick opening time - open_price: Opening price - high_price: Highest price - low_price: Lowest price - close_price: Closing price - volume: Trading volume (base currency) - quote_volume: Trading volume (quote currency) - trades_count: Number of trades

Indexes: - Composite: (exchange, symbol, interval, timestamp) - (symbol, timestamp) - (exchange, timestamp)

Constraints: - Unique: (exchange, symbol, interval, timestamp)

2. Trades

Stores individual trade execution records.

Fields: - exchange: Exchange name - symbol: Trading pair - trade_id: Unique trade ID from exchange - timestamp: Trade execution time - price: Trade price - quantity: Trade quantity (base currency) - quote_quantity: Trade quantity (quote currency) - is_buyer_maker: Whether buyer was maker

Indexes: - (exchange, symbol, timestamp) - (symbol, timestamp)

Constraints: - Unique: (exchange, symbol, trade_id)

3. Order Books

Stores order book snapshot data.

Fields: - exchange: Exchange name - symbol: Trading pair - timestamp: Snapshot time - bids: JSON array of [price, quantity] bid levels - asks: JSON array of [price, quantity] ask levels - last_update_id: Exchange update ID

Indexes: - (exchange, symbol, timestamp) - (symbol, timestamp) - (timestamp)

4. Market Metrics

Stores market-wide indicators.

Fields: - metric_type: Type of metric (btc_dominance, total_market_cap, fear_greed_index, other) - timestamp: Metric timestamp - value: Metric value - metadata: Additional JSON metadata

Indexes: - (metric_type, timestamp)

Constraints: - Unique: (metric_type, timestamp)

Migrations

Creating Migrations

# After modifying models.py
python manage.py makemigrations

# With custom name
python manage.py makemigrations --name add_new_field

Applying Migrations

# Apply all pending migrations
python manage.py migrate

# Apply specific app migrations
python manage.py migrate api

# Show migration status
python manage.py showmigrations

# See SQL for a migration
python manage.py sqlmigrate api 0001

Rolling Back Migrations

# Rollback to specific migration
python manage.py migrate api 0001_initial

# Rollback all migrations for an app
python manage.py migrate api zero

Test Data Generation

Using Django Command

# Generate 30 days of data
python manage.py generate_test_data --days 30

# Multiple symbols
python manage.py generate_test_data --days 30 --symbols BTCUSDT ETHUSDT BNBUSDT

# Different exchange
python manage.py generate_test_data --exchange gateio --days 7

# Quiet mode
python manage.py generate_test_data --quiet

Using Python

from api.fixtures.test_data_generator import generate_test_data

# Generate test data
stats = generate_test_data(
    days=30,
    symbols=['BTCUSDT', 'ETHUSDT', 'BNBUSDT'],
    exchange='binance',
    verbose=True
)

print(f"Created {stats['candlesticks']} candlesticks")
print(f"Created {stats['trades']} trades")
print(f"Created {stats['market_metrics']} market metrics")

Generated Data Characteristics

  • Candlesticks: Random walk with realistic OHLC patterns
  • Trades: Random price movements with varying quantities
  • Market Metrics: Daily BTC dominance and Fear & Greed index

Sample Data

Loading Sample Data

# Load from fixture file
python manage.py loaddata api/fixtures/sample_data.json

# Export current data as fixture
python manage.py dumpdata api.Candlestick api.Trade --indent 2 > my_data.json

Sample Data Contents

The sample_data.json includes: - 2 candlesticks (BTCUSDT, ETHUSDT) - 2 market metrics (BTC dominance, Fear & Greed)

Database Management

Backup Database

# Using pg_dump
pg_dump -h localhost -U dsta -d dsta_dev -F c -f backup.dump

# Plain SQL format
pg_dump -h localhost -U dsta -d dsta_dev > backup.sql

Restore Database

# From custom format
pg_restore -h localhost -U dsta -d dsta_dev backup.dump

# From SQL file
psql -h localhost -U dsta -d dsta_dev < backup.sql

Reset Database

# Using Django
python manage.py flush

# Or drop and recreate
dropdb dsta_dev
createdb dsta_dev
python manage.py migrate

Verification

Check Schema

# List all tables
psql -h localhost -U dsta -d dsta_dev -c "\dt"

# Describe table
psql -h localhost -U dsta -d dsta_dev -c "\d candlesticks"

# Show indexes
psql -h localhost -U dsta -d dsta_dev -c "\di"

Check Data

# Count rows
psql -h localhost -U dsta -d dsta_dev -c "SELECT 
    (SELECT COUNT(*) FROM candlesticks) as candlesticks,
    (SELECT COUNT(*) FROM trades) as trades,
    (SELECT COUNT(*) FROM order_books) as order_books,
    (SELECT COUNT(*) FROM market_metrics) as market_metrics;"

# Sample data
psql -h localhost -U dsta -d dsta_dev -c "SELECT * FROM candlesticks LIMIT 5;"

Using Django Shell

python manage.py shell

>>> from api.models import Candlestick, Trade, MarketMetric
>>> 
>>> # Count records
>>> print(f"Candlesticks: {Candlestick.objects.count()}")
>>> print(f"Trades: {Trade.objects.count()}")
>>> print(f"Market Metrics: {MarketMetric.objects.count()}")
>>> 
>>> # Latest candlestick
>>> latest = Candlestick.objects.latest('timestamp')
>>> print(f"{latest.symbol} @ {latest.timestamp}: ${latest.close_price}")
>>> 
>>> # Query examples
>>> btc_candles = Candlestick.objects.filter(symbol='BTCUSDT', interval='1h')
>>> print(f"BTC hourly candles: {btc_candles.count()}")

Performance Optimization

Analyze Queries

-- Enable query timing
\timing on

-- Analyze query plan
EXPLAIN ANALYZE SELECT * FROM candlesticks 
WHERE exchange = 'binance' AND symbol = 'BTCUSDT' AND interval = '1h'
ORDER BY timestamp DESC LIMIT 100;

Add Indexes

# In models.py
class Meta:
    indexes = [
        models.Index(fields=['exchange', 'symbol', 'timestamp']),
    ]

# Create migration
python manage.py makemigrations
python manage.py migrate

Vacuum and Analyze

# Vacuum database
psql -h localhost -U dsta -d dsta_dev -c "VACUUM ANALYZE;"

# Specific table
psql -h localhost -U dsta -d dsta_dev -c "VACUUM ANALYZE candlesticks;"

Troubleshooting

Migration Conflicts

# Show migration status
python manage.py showmigrations

# Merge conflicting migrations
python manage.py makemigrations --merge

# Fake a migration (if manually applied)
python manage.py migrate --fake api 0001_initial

Connection Issues

# Test database connection
psql -h localhost -U dsta -d dsta_dev -c "SELECT 1;"

# Check if PostgreSQL is running
docker-compose ps postgres

# View logs
docker-compose logs postgres

Data Issues

# Check for duplicate entries
psql -h localhost -U dsta -d dsta_dev -c "
SELECT exchange, symbol, interval, timestamp, COUNT(*)
FROM candlesticks
GROUP BY exchange, symbol, interval, timestamp
HAVING COUNT(*) > 1;"

# Remove duplicates (backup first!)
python manage.py shell
>>> from api.models import Candlestick
>>> # Identify and remove duplicates...

Environment-Specific Settings

Development

DB_NAME=dsta_dev
DB_USER=dsta
DB_PASSWORD=dsta
DB_HOST=localhost
DB_PORT=5432

Testing

DB_NAME=dsta_test
DB_USER=dsta_test
DB_PASSWORD=test
DB_HOST=localhost
DB_PORT=5432

Production

DB_NAME=dsta_prod
DB_USER=dsta_prod
DB_PASSWORD=<secure_password>
DB_HOST=<db_server>
DB_PORT=5432

Best Practices

  1. Always backup before migrations in production
  2. Test migrations on a copy of production data
  3. Use transactions for bulk operations
  4. Monitor query performance regularly
  5. Archive old data to keep tables manageable
  6. Use database constraints to ensure data integrity
  7. Index frequently queried fields
  8. Document schema changes in migrations

Additional Resources