POTE/docs/07_deployment.md

# Deployment Guide

## Deployment Options

POTE can be deployed in several ways depending on your needs:

1. **Local Development** (SQLite) - What you have now ✅
2. **Single Server** (PostgreSQL + cron jobs)
3. **Docker** (Containerized, easy to move)
4. **Cloud** (AWS/GCP/Azure with managed DB)

---

## Option 1: Local Development (Current Setup) ✅

**You're already running this!**

```bash
# Setup (done)
make install
source venv/bin/activate
make migrate

# Ingest data
python scripts/ingest_from_fixtures.py  # Offline
python scripts/fetch_congressional_trades.py --days 30  # With internet

# Query
python
>>> from pote.db import SessionLocal
>>> from pote.db.models import Official
>>> with SessionLocal() as session:
...     officials = session.query(Official).all()
...     print(f"Total officials: {len(officials)}")
```

**Pros**: Simple, fast, no costs
**Cons**: Local only, SQLite limitations for heavy queries

---

## Option 2: Single Server with PostgreSQL

### Setup PostgreSQL

```bash
# Install PostgreSQL (Ubuntu/Debian)
sudo apt update
sudo apt install postgresql postgresql-contrib

# Create database
sudo -u postgres psql
postgres=# CREATE DATABASE pote;
postgres=# CREATE USER poteuser WITH PASSWORD 'your_secure_password';
postgres=# GRANT ALL PRIVILEGES ON DATABASE pote TO poteuser;
postgres=# \q
```

### Update Configuration

```bash
# Edit .env
DATABASE_URL=postgresql://poteuser:your_secure_password@localhost:5432/pote

# Run migrations
source venv/bin/activate
make migrate
```

### Schedule Regular Ingestion

```bash
# Add to crontab: crontab -e

# Fetch trades daily at 6 AM
0 6 * * * cd /path/to/pote && /path/to/pote/venv/bin/python scripts/fetch_congressional_trades.py --days 7 >> /var/log/pote/trades.log 2>&1

# Enrich securities weekly on Sunday at 3 AM
0 3 * * 0 cd /path/to/pote && /path/to/pote/venv/bin/python scripts/enrich_securities.py >> /var/log/pote/enrich.log 2>&1

# Fetch prices for all tickers daily at 7 AM
0 7 * * * cd /path/to/pote && /path/to/pote/venv/bin/python scripts/update_all_prices.py >> /var/log/pote/prices.log 2>&1
```

**Pros**: Production-ready, full SQL features, scheduled jobs
**Cons**: Requires server management, PostgreSQL setup

---

## Option 3: Docker Deployment

### Create Dockerfile

```dockerfile
# Dockerfile
FROM python:3.11-slim

WORKDIR /app

# Install system dependencies
RUN apt-get update && apt-get install -y \
    gcc \
    postgresql-client \
    && rm -rf /var/lib/apt/lists/*

# Copy project files
COPY pyproject.toml .
COPY src/ src/
COPY alembic/ alembic/
COPY alembic.ini .
COPY scripts/ scripts/

# Install Python dependencies
RUN pip install --no-cache-dir -e .

# Run migrations on startup
CMD ["sh", "-c", "alembic upgrade head && python scripts/fetch_congressional_trades.py --days 30"]
```

### Docker Compose Setup

```yaml
# docker-compose.yml
version: '3.8'

services:
  db:
    image: postgres:15
    environment:
      POSTGRES_DB: pote
      POSTGRES_USER: poteuser
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
    volumes:
      - postgres_data:/var/lib/postgresql/data
    ports:
      - "5432:5432"

  pote:
    build: .
    environment:
      DATABASE_URL: postgresql://poteuser:${POSTGRES_PASSWORD}@db:5432/pote
      QUIVERQUANT_API_KEY: ${QUIVERQUANT_API_KEY}
      FMP_API_KEY: ${FMP_API_KEY}
    depends_on:
      - db
    volumes:
      - ./logs:/app/logs

  # Optional: FastAPI backend (Phase 3)
  api:
    build: .
    command: uvicorn pote.api.main:app --host 0.0.0.0 --port 8000
    environment:
      DATABASE_URL: postgresql://poteuser:${POSTGRES_PASSWORD}@db:5432/pote
    depends_on:
      - db
    ports:
      - "8000:8000"

volumes:
  postgres_data:
```

### Deploy with Docker

```bash
# Create .env file
cat > .env << EOF
POSTGRES_PASSWORD=your_secure_password
DATABASE_URL=postgresql://poteuser:your_secure_password@db:5432/pote
QUIVERQUANT_API_KEY=
FMP_API_KEY=
EOF

# Build and run
docker-compose up -d

# Run migrations
docker-compose exec pote alembic upgrade head

# Ingest data
docker-compose exec pote python scripts/fetch_congressional_trades.py --days 30

# View logs
docker-compose logs -f pote
```

**Pros**: Portable, isolated, easy to deploy anywhere
**Cons**: Requires Docker knowledge, slightly more complex

---

## Option 4: Cloud Deployment (AWS Example)

### AWS Architecture

```
┌─────────────────┐
│   EC2 Instance  │
│   - Python app  │
│   - Cron jobs   │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│   RDS (Postgres)│
│   - Managed DB  │
└─────────────────┘
```

### Setup Steps

1. **Create RDS PostgreSQL Instance**
   - Go to AWS RDS Console
   - Create PostgreSQL 15 database
   - Note endpoint: `pote-db.xxxxx.us-east-1.rds.amazonaws.com`
   - Security group: Allow port 5432 from EC2

2. **Launch EC2 Instance**
   ```bash
   # SSH into EC2
   ssh -i your-key.pem ubuntu@your-ec2-ip

   # Install dependencies
   sudo apt update
   sudo apt install python3.11 python3-pip git

   # Clone repo
   git clone <your-repo>
   cd pote

   # Setup
   python3 -m venv venv
   source venv/bin/activate
   pip install -e .

   # Configure
   cat > .env << EOF
   DATABASE_URL=postgresql://poteuser:password@pote-db.xxxxx.us-east-1.rds.amazonaws.com:5432/pote
   EOF

   # Run migrations
   alembic upgrade head

   # Setup cron jobs
   crontab -e
   # (Add the cron jobs from Option 2)
   ```

3. **Optional: Use AWS Lambda for scheduled jobs**
   - Package app as Lambda function
   - Use EventBridge to trigger daily
   - Cheaper for infrequent jobs

**Pros**: Scalable, managed database, reliable
**Cons**: Costs money (~$20-50/mo for small RDS + EC2)

---

## Option 5: Fly.io / Railway / Render (Easiest Cloud)

### Fly.io Example

```bash
# Install flyctl
curl -L https://fly.io/install.sh | sh

# Login
flyctl auth login

# Create fly.toml
cat > fly.toml << EOF
app = "pote-research"

[build]
  builder = "paketobuildpacks/builder:base"

[env]
  PORT = "8080"

[[services]]
  internal_port = 8080
  protocol = "tcp"

  [[services.ports]]
    port = 80

[postgres]
  app = "pote-db"
EOF

# Create Postgres
flyctl postgres create --name pote-db

# Deploy
flyctl deploy

# Set secrets
flyctl secrets set DATABASE_URL="postgres://..."
```

**Pros**: Simple, cheap ($5-10/mo), automated deployments
**Cons**: Limited control, may need to adapt code

---

## Production Checklist

Before deploying to production:

### Security
- [ ] Change all default passwords
- [ ] Use environment variables for secrets (never commit `.env`)
- [ ] Enable SSL for database connections
- [ ] Set up firewall rules (only allow necessary ports)
- [ ] Use HTTPS if exposing API/dashboard

### Reliability
- [ ] Set up database backups (daily)
- [ ] Configure logging (centralized if possible)
- [ ] Monitor disk space (especially for SQLite)
- [ ] Set up error alerts (email/Slack on failures)
- [ ] Test recovery from backup

### Performance
- [ ] Index frequently queried columns (already done in models)
- [ ] Use connection pooling for PostgreSQL
- [ ] Cache frequently accessed data
- [ ] Limit API rate if exposing publicly

### Compliance
- [ ] Review data retention policy
- [ ] Add disclaimers to any UI ("research only, not advice")
- [ ] Document data sources and update frequency
- [ ] Keep audit logs of data ingestion

---

## Monitoring & Logs

### Basic Logging Setup

```python
# Add to scripts/fetch_congressional_trades.py
import logging
from logging.handlers import RotatingFileHandler

# Create logs directory
os.makedirs("logs", exist_ok=True)

# Configure logging
handler = RotatingFileHandler(
    "logs/ingestion.log",
    maxBytes=10_000_000,  # 10 MB
    backupCount=5
)
handler.setFormatter(logging.Formatter(
    '%(asctime)s [%(levelname)s] %(name)s: %(message)s'
))
logger = logging.getLogger()
logger.addHandler(handler)
```

### Health Check Endpoint (Optional)

```python
# Add to pote/api/main.py (when building API)
from fastapi import FastAPI

app = FastAPI()

@app.get("/health")
def health_check():
    from pote.db import SessionLocal
    from sqlalchemy import text

    try:
        with SessionLocal() as session:
            session.execute(text("SELECT 1"))
        return {"status": "ok", "database": "connected"}
    except Exception as e:
        return {"status": "error", "message": str(e)}
```

---

## Cost Estimates (Monthly)

| Option | Cost | Notes |
|--------|------|-------|
| **Local Dev** | $0 | SQLite, your machine |
| **VPS (DigitalOcean, Linode)** | $5-12 | Small droplet + managed Postgres |
| **AWS (small)** | $20-50 | t3.micro EC2 + db.t3.micro RDS |
| **Fly.io / Railway** | $5-15 | Hobby tier, managed |
| **Docker on VPS** | $10-20 | One droplet, Docker Compose |

**Free tier options**:
- Railway: Free tier available (limited hours)
- Fly.io: Free tier available (limited resources)
- Oracle Cloud: Always-free tier (ARM instances)

---

## Next Steps After Deployment

1. **Verify ingestion**: Check logs after first cron run
2. **Test queries**: Ensure data is accessible
3. **Monitor growth**: Database size, query performance
4. **Plan backups**: Set up automated DB dumps
5. **Document access**: How to query, who has access

For Phase 2 (Analytics), you'll add:
- Scheduled jobs for computing returns
- Clustering jobs (weekly/monthly)
- Optional dashboard deployment

---

## Quick Deploy (Railway Example)

Railway is probably the easiest for personal projects:

```bash
# Install Railway CLI
npm install -g @railway/cli

# Login
railway login

# Initialize
railway init

# Add PostgreSQL
railway add --database postgres

# Deploy
railway up

# Add environment variables via dashboard
# DATABASE_URL is auto-configured
```

**Cost**: ~$5/mo, scales automatically

---

See `docs/05_dev_setup.md` for local development details.