POTE/docs/07_deployment.md
ilia 204cd0e75b Initial commit: POTE Phase 1 complete
- PR1: Project scaffold, DB models, price loader
- PR2: Congressional trade ingestion (House Stock Watcher)
- PR3: Security enrichment + deployment infrastructure
- 37 passing tests, 87%+ coverage
- Docker + Proxmox deployment ready
- Complete documentation
- Works 100% offline with fixtures
2025-12-14 20:45:34 -05:00

449 lines
9.8 KiB
Markdown

# Deployment Guide
## Deployment Options
POTE can be deployed in several ways depending on your needs:
1. **Local Development** (SQLite) - What you have now ✅
2. **Single Server** (PostgreSQL + cron jobs)
3. **Docker** (Containerized, easy to move)
4. **Cloud** (AWS/GCP/Azure with managed DB)
---
## Option 1: Local Development (Current Setup) ✅
**You're already running this!**
```bash
# Setup (done)
make install
source venv/bin/activate
make migrate
# Ingest data
python scripts/ingest_from_fixtures.py # Offline
python scripts/fetch_congressional_trades.py --days 30 # With internet
# Query
python
>>> from pote.db import SessionLocal
>>> from pote.db.models import Official
>>> with SessionLocal() as session:
... officials = session.query(Official).all()
... print(f"Total officials: {len(officials)}")
```
**Pros**: Simple, fast, no costs
**Cons**: Local only, SQLite limitations for heavy queries
---
## Option 2: Single Server with PostgreSQL
### Setup PostgreSQL
```bash
# Install PostgreSQL (Ubuntu/Debian)
sudo apt update
sudo apt install postgresql postgresql-contrib
# Create database
sudo -u postgres psql
postgres=# CREATE DATABASE pote;
postgres=# CREATE USER poteuser WITH PASSWORD 'your_secure_password';
postgres=# GRANT ALL PRIVILEGES ON DATABASE pote TO poteuser;
postgres=# \q
```
### Update Configuration
```bash
# Edit .env
DATABASE_URL=postgresql://poteuser:your_secure_password@localhost:5432/pote
# Run migrations
source venv/bin/activate
make migrate
```
### Schedule Regular Ingestion
```bash
# Add to crontab: crontab -e
# Fetch trades daily at 6 AM
0 6 * * * cd /path/to/pote && /path/to/pote/venv/bin/python scripts/fetch_congressional_trades.py --days 7 >> /var/log/pote/trades.log 2>&1
# Enrich securities weekly on Sunday at 3 AM
0 3 * * 0 cd /path/to/pote && /path/to/pote/venv/bin/python scripts/enrich_securities.py >> /var/log/pote/enrich.log 2>&1
# Fetch prices for all tickers daily at 7 AM
0 7 * * * cd /path/to/pote && /path/to/pote/venv/bin/python scripts/update_all_prices.py >> /var/log/pote/prices.log 2>&1
```
**Pros**: Production-ready, full SQL features, scheduled jobs
**Cons**: Requires server management, PostgreSQL setup
---
## Option 3: Docker Deployment
### Create Dockerfile
```dockerfile
# Dockerfile
FROM python:3.11-slim
WORKDIR /app
# Install system dependencies
RUN apt-get update && apt-get install -y \
gcc \
postgresql-client \
&& rm -rf /var/lib/apt/lists/*
# Copy project files
COPY pyproject.toml .
COPY src/ src/
COPY alembic/ alembic/
COPY alembic.ini .
COPY scripts/ scripts/
# Install Python dependencies
RUN pip install --no-cache-dir -e .
# Run migrations on startup
CMD ["sh", "-c", "alembic upgrade head && python scripts/fetch_congressional_trades.py --days 30"]
```
### Docker Compose Setup
```yaml
# docker-compose.yml
version: '3.8'
services:
db:
image: postgres:15
environment:
POSTGRES_DB: pote
POSTGRES_USER: poteuser
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
volumes:
- postgres_data:/var/lib/postgresql/data
ports:
- "5432:5432"
pote:
build: .
environment:
DATABASE_URL: postgresql://poteuser:${POSTGRES_PASSWORD}@db:5432/pote
QUIVERQUANT_API_KEY: ${QUIVERQUANT_API_KEY}
FMP_API_KEY: ${FMP_API_KEY}
depends_on:
- db
volumes:
- ./logs:/app/logs
# Optional: FastAPI backend (Phase 3)
api:
build: .
command: uvicorn pote.api.main:app --host 0.0.0.0 --port 8000
environment:
DATABASE_URL: postgresql://poteuser:${POSTGRES_PASSWORD}@db:5432/pote
depends_on:
- db
ports:
- "8000:8000"
volumes:
postgres_data:
```
### Deploy with Docker
```bash
# Create .env file
cat > .env << EOF
POSTGRES_PASSWORD=your_secure_password
DATABASE_URL=postgresql://poteuser:your_secure_password@db:5432/pote
QUIVERQUANT_API_KEY=
FMP_API_KEY=
EOF
# Build and run
docker-compose up -d
# Run migrations
docker-compose exec pote alembic upgrade head
# Ingest data
docker-compose exec pote python scripts/fetch_congressional_trades.py --days 30
# View logs
docker-compose logs -f pote
```
**Pros**: Portable, isolated, easy to deploy anywhere
**Cons**: Requires Docker knowledge, slightly more complex
---
## Option 4: Cloud Deployment (AWS Example)
### AWS Architecture
```
┌─────────────────┐
│ EC2 Instance │
│ - Python app │
│ - Cron jobs │
└────────┬────────┘
┌─────────────────┐
│ RDS (Postgres)│
│ - Managed DB │
└─────────────────┘
```
### Setup Steps
1. **Create RDS PostgreSQL Instance**
- Go to AWS RDS Console
- Create PostgreSQL 15 database
- Note endpoint: `pote-db.xxxxx.us-east-1.rds.amazonaws.com`
- Security group: Allow port 5432 from EC2
2. **Launch EC2 Instance**
```bash
# SSH into EC2
ssh -i your-key.pem ubuntu@your-ec2-ip
# Install dependencies
sudo apt update
sudo apt install python3.11 python3-pip git
# Clone repo
git clone <your-repo>
cd pote
# Setup
python3 -m venv venv
source venv/bin/activate
pip install -e .
# Configure
cat > .env << EOF
DATABASE_URL=postgresql://poteuser:password@pote-db.xxxxx.us-east-1.rds.amazonaws.com:5432/pote
EOF
# Run migrations
alembic upgrade head
# Setup cron jobs
crontab -e
# (Add the cron jobs from Option 2)
```
3. **Optional: Use AWS Lambda for scheduled jobs**
- Package app as Lambda function
- Use EventBridge to trigger daily
- Cheaper for infrequent jobs
**Pros**: Scalable, managed database, reliable
**Cons**: Costs money (~$20-50/mo for small RDS + EC2)
---
## Option 5: Fly.io / Railway / Render (Easiest Cloud)
### Fly.io Example
```bash
# Install flyctl
curl -L https://fly.io/install.sh | sh
# Login
flyctl auth login
# Create fly.toml
cat > fly.toml << EOF
app = "pote-research"
[build]
builder = "paketobuildpacks/builder:base"
[env]
PORT = "8080"
[[services]]
internal_port = 8080
protocol = "tcp"
[[services.ports]]
port = 80
[postgres]
app = "pote-db"
EOF
# Create Postgres
flyctl postgres create --name pote-db
# Deploy
flyctl deploy
# Set secrets
flyctl secrets set DATABASE_URL="postgres://..."
```
**Pros**: Simple, cheap ($5-10/mo), automated deployments
**Cons**: Limited control, may need to adapt code
---
## Production Checklist
Before deploying to production:
### Security
- [ ] Change all default passwords
- [ ] Use environment variables for secrets (never commit `.env`)
- [ ] Enable SSL for database connections
- [ ] Set up firewall rules (only allow necessary ports)
- [ ] Use HTTPS if exposing API/dashboard
### Reliability
- [ ] Set up database backups (daily)
- [ ] Configure logging (centralized if possible)
- [ ] Monitor disk space (especially for SQLite)
- [ ] Set up error alerts (email/Slack on failures)
- [ ] Test recovery from backup
### Performance
- [ ] Index frequently queried columns (already done in models)
- [ ] Use connection pooling for PostgreSQL
- [ ] Cache frequently accessed data
- [ ] Limit API rate if exposing publicly
### Compliance
- [ ] Review data retention policy
- [ ] Add disclaimers to any UI ("research only, not advice")
- [ ] Document data sources and update frequency
- [ ] Keep audit logs of data ingestion
---
## Monitoring & Logs
### Basic Logging Setup
```python
# Add to scripts/fetch_congressional_trades.py
import logging
from logging.handlers import RotatingFileHandler
# Create logs directory
os.makedirs("logs", exist_ok=True)
# Configure logging
handler = RotatingFileHandler(
"logs/ingestion.log",
maxBytes=10_000_000, # 10 MB
backupCount=5
)
handler.setFormatter(logging.Formatter(
'%(asctime)s [%(levelname)s] %(name)s: %(message)s'
))
logger = logging.getLogger()
logger.addHandler(handler)
```
### Health Check Endpoint (Optional)
```python
# Add to pote/api/main.py (when building API)
from fastapi import FastAPI
app = FastAPI()
@app.get("/health")
def health_check():
from pote.db import SessionLocal
from sqlalchemy import text
try:
with SessionLocal() as session:
session.execute(text("SELECT 1"))
return {"status": "ok", "database": "connected"}
except Exception as e:
return {"status": "error", "message": str(e)}
```
---
## Cost Estimates (Monthly)
| Option | Cost | Notes |
|--------|------|-------|
| **Local Dev** | $0 | SQLite, your machine |
| **VPS (DigitalOcean, Linode)** | $5-12 | Small droplet + managed Postgres |
| **AWS (small)** | $20-50 | t3.micro EC2 + db.t3.micro RDS |
| **Fly.io / Railway** | $5-15 | Hobby tier, managed |
| **Docker on VPS** | $10-20 | One droplet, Docker Compose |
**Free tier options**:
- Railway: Free tier available (limited hours)
- Fly.io: Free tier available (limited resources)
- Oracle Cloud: Always-free tier (ARM instances)
---
## Next Steps After Deployment
1. **Verify ingestion**: Check logs after first cron run
2. **Test queries**: Ensure data is accessible
3. **Monitor growth**: Database size, query performance
4. **Plan backups**: Set up automated DB dumps
5. **Document access**: How to query, who has access
For Phase 2 (Analytics), you'll add:
- Scheduled jobs for computing returns
- Clustering jobs (weekly/monthly)
- Optional dashboard deployment
---
## Quick Deploy (Railway Example)
Railway is probably the easiest for personal projects:
```bash
# Install Railway CLI
npm install -g @railway/cli
# Login
railway login
# Initialize
railway init
# Add PostgreSQL
railway add --database postgres
# Deploy
railway up
# Add environment variables via dashboard
# DATABASE_URL is auto-configured
```
**Cost**: ~$5/mo, scales automatically
---
See `docs/05_dev_setup.md` for local development details.