# Deployment Guide ## Deployment Options POTE can be deployed in several ways depending on your needs: 1. **Local Development** (SQLite) - What you have now ✅ 2. **Single Server** (PostgreSQL + cron jobs) 3. **Docker** (Containerized, easy to move) 4. **Cloud** (AWS/GCP/Azure with managed DB) --- ## Option 1: Local Development (Current Setup) ✅ **You're already running this!** ```bash # Setup (done) make install source venv/bin/activate make migrate # Ingest data python scripts/ingest_from_fixtures.py # Offline python scripts/fetch_congressional_trades.py --days 30 # With internet # Query python >>> from pote.db import SessionLocal >>> from pote.db.models import Official >>> with SessionLocal() as session: ... officials = session.query(Official).all() ... print(f"Total officials: {len(officials)}") ``` **Pros**: Simple, fast, no costs **Cons**: Local only, SQLite limitations for heavy queries --- ## Option 2: Single Server with PostgreSQL ### Setup PostgreSQL ```bash # Install PostgreSQL (Ubuntu/Debian) sudo apt update sudo apt install postgresql postgresql-contrib # Create database sudo -u postgres psql postgres=# CREATE DATABASE pote; postgres=# CREATE USER poteuser WITH PASSWORD 'your_secure_password'; postgres=# GRANT ALL PRIVILEGES ON DATABASE pote TO poteuser; postgres=# \q ``` ### Update Configuration ```bash # Edit .env DATABASE_URL=postgresql://poteuser:your_secure_password@localhost:5432/pote # Run migrations source venv/bin/activate make migrate ``` ### Schedule Regular Ingestion ```bash # Add to crontab: crontab -e # Fetch trades daily at 6 AM 0 6 * * * cd /path/to/pote && /path/to/pote/venv/bin/python scripts/fetch_congressional_trades.py --days 7 >> /var/log/pote/trades.log 2>&1 # Enrich securities weekly on Sunday at 3 AM 0 3 * * 0 cd /path/to/pote && /path/to/pote/venv/bin/python scripts/enrich_securities.py >> /var/log/pote/enrich.log 2>&1 # Fetch prices for all tickers daily at 7 AM 0 7 * * * cd /path/to/pote && /path/to/pote/venv/bin/python scripts/update_all_prices.py >> /var/log/pote/prices.log 2>&1 ``` **Pros**: Production-ready, full SQL features, scheduled jobs **Cons**: Requires server management, PostgreSQL setup --- ## Option 3: Docker Deployment ### Create Dockerfile ```dockerfile # Dockerfile FROM python:3.11-slim WORKDIR /app # Install system dependencies RUN apt-get update && apt-get install -y \ gcc \ postgresql-client \ && rm -rf /var/lib/apt/lists/* # Copy project files COPY pyproject.toml . COPY src/ src/ COPY alembic/ alembic/ COPY alembic.ini . COPY scripts/ scripts/ # Install Python dependencies RUN pip install --no-cache-dir -e . # Run migrations on startup CMD ["sh", "-c", "alembic upgrade head && python scripts/fetch_congressional_trades.py --days 30"] ``` ### Docker Compose Setup ```yaml # docker-compose.yml version: '3.8' services: db: image: postgres:15 environment: POSTGRES_DB: pote POSTGRES_USER: poteuser POSTGRES_PASSWORD: ${POSTGRES_PASSWORD} volumes: - postgres_data:/var/lib/postgresql/data ports: - "5432:5432" pote: build: . environment: DATABASE_URL: postgresql://poteuser:${POSTGRES_PASSWORD}@db:5432/pote QUIVERQUANT_API_KEY: ${QUIVERQUANT_API_KEY} FMP_API_KEY: ${FMP_API_KEY} depends_on: - db volumes: - ./logs:/app/logs # Optional: FastAPI backend (Phase 3) api: build: . command: uvicorn pote.api.main:app --host 0.0.0.0 --port 8000 environment: DATABASE_URL: postgresql://poteuser:${POSTGRES_PASSWORD}@db:5432/pote depends_on: - db ports: - "8000:8000" volumes: postgres_data: ``` ### Deploy with Docker ```bash # Create .env file cat > .env << EOF POSTGRES_PASSWORD=your_secure_password DATABASE_URL=postgresql://poteuser:your_secure_password@db:5432/pote QUIVERQUANT_API_KEY= FMP_API_KEY= EOF # Build and run docker-compose up -d # Run migrations docker-compose exec pote alembic upgrade head # Ingest data docker-compose exec pote python scripts/fetch_congressional_trades.py --days 30 # View logs docker-compose logs -f pote ``` **Pros**: Portable, isolated, easy to deploy anywhere **Cons**: Requires Docker knowledge, slightly more complex --- ## Option 4: Cloud Deployment (AWS Example) ### AWS Architecture ``` ┌─────────────────┐ │ EC2 Instance │ │ - Python app │ │ - Cron jobs │ └────────┬────────┘ │ ▼ ┌─────────────────┐ │ RDS (Postgres)│ │ - Managed DB │ └─────────────────┘ ``` ### Setup Steps 1. **Create RDS PostgreSQL Instance** - Go to AWS RDS Console - Create PostgreSQL 15 database - Note endpoint: `pote-db.xxxxx.us-east-1.rds.amazonaws.com` - Security group: Allow port 5432 from EC2 2. **Launch EC2 Instance** ```bash # SSH into EC2 ssh -i your-key.pem ubuntu@your-ec2-ip # Install dependencies sudo apt update sudo apt install python3.11 python3-pip git # Clone repo git clone cd pote # Setup python3 -m venv venv source venv/bin/activate pip install -e . # Configure cat > .env << EOF DATABASE_URL=postgresql://poteuser:password@pote-db.xxxxx.us-east-1.rds.amazonaws.com:5432/pote EOF # Run migrations alembic upgrade head # Setup cron jobs crontab -e # (Add the cron jobs from Option 2) ``` 3. **Optional: Use AWS Lambda for scheduled jobs** - Package app as Lambda function - Use EventBridge to trigger daily - Cheaper for infrequent jobs **Pros**: Scalable, managed database, reliable **Cons**: Costs money (~$20-50/mo for small RDS + EC2) --- ## Option 5: Fly.io / Railway / Render (Easiest Cloud) ### Fly.io Example ```bash # Install flyctl curl -L https://fly.io/install.sh | sh # Login flyctl auth login # Create fly.toml cat > fly.toml << EOF app = "pote-research" [build] builder = "paketobuildpacks/builder:base" [env] PORT = "8080" [[services]] internal_port = 8080 protocol = "tcp" [[services.ports]] port = 80 [postgres] app = "pote-db" EOF # Create Postgres flyctl postgres create --name pote-db # Deploy flyctl deploy # Set secrets flyctl secrets set DATABASE_URL="postgres://..." ``` **Pros**: Simple, cheap ($5-10/mo), automated deployments **Cons**: Limited control, may need to adapt code --- ## Production Checklist Before deploying to production: ### Security - [ ] Change all default passwords - [ ] Use environment variables for secrets (never commit `.env`) - [ ] Enable SSL for database connections - [ ] Set up firewall rules (only allow necessary ports) - [ ] Use HTTPS if exposing API/dashboard ### Reliability - [ ] Set up database backups (daily) - [ ] Configure logging (centralized if possible) - [ ] Monitor disk space (especially for SQLite) - [ ] Set up error alerts (email/Slack on failures) - [ ] Test recovery from backup ### Performance - [ ] Index frequently queried columns (already done in models) - [ ] Use connection pooling for PostgreSQL - [ ] Cache frequently accessed data - [ ] Limit API rate if exposing publicly ### Compliance - [ ] Review data retention policy - [ ] Add disclaimers to any UI ("research only, not advice") - [ ] Document data sources and update frequency - [ ] Keep audit logs of data ingestion --- ## Monitoring & Logs ### Basic Logging Setup ```python # Add to scripts/fetch_congressional_trades.py import logging from logging.handlers import RotatingFileHandler # Create logs directory os.makedirs("logs", exist_ok=True) # Configure logging handler = RotatingFileHandler( "logs/ingestion.log", maxBytes=10_000_000, # 10 MB backupCount=5 ) handler.setFormatter(logging.Formatter( '%(asctime)s [%(levelname)s] %(name)s: %(message)s' )) logger = logging.getLogger() logger.addHandler(handler) ``` ### Health Check Endpoint (Optional) ```python # Add to pote/api/main.py (when building API) from fastapi import FastAPI app = FastAPI() @app.get("/health") def health_check(): from pote.db import SessionLocal from sqlalchemy import text try: with SessionLocal() as session: session.execute(text("SELECT 1")) return {"status": "ok", "database": "connected"} except Exception as e: return {"status": "error", "message": str(e)} ``` --- ## Cost Estimates (Monthly) | Option | Cost | Notes | |--------|------|-------| | **Local Dev** | $0 | SQLite, your machine | | **VPS (DigitalOcean, Linode)** | $5-12 | Small droplet + managed Postgres | | **AWS (small)** | $20-50 | t3.micro EC2 + db.t3.micro RDS | | **Fly.io / Railway** | $5-15 | Hobby tier, managed | | **Docker on VPS** | $10-20 | One droplet, Docker Compose | **Free tier options**: - Railway: Free tier available (limited hours) - Fly.io: Free tier available (limited resources) - Oracle Cloud: Always-free tier (ARM instances) --- ## Next Steps After Deployment 1. **Verify ingestion**: Check logs after first cron run 2. **Test queries**: Ensure data is accessible 3. **Monitor growth**: Database size, query performance 4. **Plan backups**: Set up automated DB dumps 5. **Document access**: How to query, who has access For Phase 2 (Analytics), you'll add: - Scheduled jobs for computing returns - Clustering jobs (weekly/monthly) - Optional dashboard deployment --- ## Quick Deploy (Railway Example) Railway is probably the easiest for personal projects: ```bash # Install Railway CLI npm install -g @railway/cli # Login railway login # Initialize railway init # Add PostgreSQL railway add --database postgres # Deploy railway up # Add environment variables via dashboard # DATABASE_URL is auto-configured ``` **Cost**: ~$5/mo, scales automatically --- See `docs/05_dev_setup.md` for local development details.