POTE/docs/10_automation.md
ilia 0d8d85adc1 Add complete automation, reporting, and CI/CD system
Features Added:
==============

📧 EMAIL REPORTING SYSTEM:
- EmailReporter: Send reports via SMTP (Gmail, SendGrid, custom)
- ReportGenerator: Generate daily/weekly summaries with HTML/text formatting
- Configurable via .env (SMTP_HOST, SMTP_PORT, etc.)
- Scripts: send_daily_report.py, send_weekly_report.py

🤖 AUTOMATED RUNS:
- automated_daily_run.sh: Full daily ETL pipeline + reporting
- automated_weekly_run.sh: Weekly pattern analysis + reports
- setup_cron.sh: Interactive cron job setup (5-minute setup)
- Logs saved to ~/logs/ with automatic cleanup

🔍 HEALTH CHECKS:
- health_check.py: System health monitoring
- Checks: DB connection, data freshness, counts, recent alerts
- JSON output for programmatic use
- Exit codes for monitoring integration

🚀 CI/CD PIPELINE:
- .github/workflows/ci.yml: Full CI/CD pipeline
- GitHub Actions / Gitea Actions compatible
- Jobs: lint & test, security scan, dependency scan, Docker build
- PostgreSQL service for integration tests
- 93 tests passing in CI

📚 COMPREHENSIVE DOCUMENTATION:
- AUTOMATION_QUICKSTART.md: 5-minute email setup guide
- docs/12_automation_and_reporting.md: Full automation guide
- Updated README.md with automation links
- Deployment → Production workflow guide

🛠️ IMPROVEMENTS:
- All shell scripts made executable
- Environment variable examples in .env.example
- Report logs saved with timestamps
- 30-day log retention with auto-cleanup
- Health checks can be scheduled via cron

WHAT THIS ENABLES:
==================
After deployment, users can:
1. Set up automated daily/weekly email reports (5 min)
2. Receive HTML+text emails with:
   - New trades, market alerts, suspicious timing
   - Weekly patterns, rankings, repeat offenders
3. Monitor system health automatically
4. Run full CI/CD pipeline on every commit
5. Deploy with confidence (tests + security scans)

USAGE:
======
# One-time setup (on deployed server)
./scripts/setup_cron.sh

# Or manually send reports
python scripts/send_daily_report.py --to user@example.com
python scripts/send_weekly_report.py --to user@example.com

# Check system health
python scripts/health_check.py

See AUTOMATION_QUICKSTART.md for full instructions.

93 tests passing | Full CI/CD | Email reports ready
2025-12-15 15:34:31 -05:00

11 KiB

POTE Automation Guide

Automated Data Collection & Updates


Understanding Disclosure Timing

Reality Check: No Real-Time Data Exists

Federal Law (STOCK Act):

  • 📅 Congress members have 30-45 days to disclose trades
  • 📅 Disclosures are filed as Periodic Transaction Reports (PTRs)
  • 📅 Public databases update after filing (usually next day)
  • 📅 No real-time feed exists by design

Example Timeline:

Jan 15, 2024  →  Senator buys NVDA
Feb 15, 2024  →  Disclosure filed (30 days later)
Feb 16, 2024  →  Appears on House Stock Watcher
Feb 17, 2024  →  Your system fetches it

Best Practice: Daily Updates

Since trades appear in batches (not continuously), running once per day is optimal:

Daily (7 AM) - Catches overnight filings
After market close - Prices are final
Low server load - Off-peak hours
Hourly - Wasteful, no new data
Real-time - Impossible, not how disclosures work


🤖 Automated Setup Options

Setup on Proxmox Container

# SSH to your container
ssh poteapp@10.0.10.95

# Edit crontab
crontab -e

# Add this line (runs daily at 7 AM):
0 7 * * * /home/poteapp/pote/scripts/daily_fetch.sh

# Or run twice daily (7 AM and 7 PM):
0 7,19 * * * /home/poteapp/pote/scripts/daily_fetch.sh

# Save and exit

What it does:

  • Fetches new congressional trades (last 7 days)
  • Enriches any new securities (name, sector, industry)
  • Updates price data for all securities
  • Logs everything to logs/daily_fetch_YYYYMMDD.log

Check logs:

tail -f ~/pote/logs/daily_fetch_$(date +%Y%m%d).log

Option 2: Systemd Timer (More Advanced)

For better logging and service management:

Create Service File

sudo nano /etc/systemd/system/pote-fetch.service
[Unit]
Description=POTE Daily Data Fetch
After=network.target postgresql.service

[Service]
Type=oneshot
User=poteapp
WorkingDirectory=/home/poteapp/pote
ExecStart=/home/poteapp/pote/scripts/daily_fetch.sh
StandardOutput=journal
StandardError=journal

[Install]
WantedBy=multi-user.target

Create Timer File

sudo nano /etc/systemd/system/pote-fetch.timer
[Unit]
Description=POTE Daily Data Fetch Timer
Requires=pote-fetch.service

[Timer]
OnCalendar=daily
OnCalendar=07:00
Persistent=true

[Install]
WantedBy=timers.target

Enable and Start

sudo systemctl daemon-reload
sudo systemctl enable pote-fetch.timer
sudo systemctl start pote-fetch.timer

# Check status
sudo systemctl status pote-fetch.timer
sudo systemctl list-timers

# View logs
sudo journalctl -u pote-fetch.service -f

Option 3: Manual Script (For Testing)

Run manually whenever you want:

cd /home/user/Documents/code/pote
./scripts/daily_fetch.sh

Or from anywhere:

/home/user/Documents/code/pote/scripts/daily_fetch.sh

📊 What Gets Updated?

1. Congressional Trades

Script: fetch_congressional_trades.py
Frequency: Daily
Fetches: Last 7 days (catches late filings)
API: House Stock Watcher (when available)

Alternative sources:

  • Manual CSV import
  • QuiverQuant API (paid)
  • Capitol Trades (paid)

2. Security Enrichment

Script: enrich_securities.py
Frequency: Daily (only updates new tickers)
Fetches: Company name, sector, industry
API: yfinance (free)

3. Price Data

Script: fetch_sample_prices.py
Frequency: Daily
Fetches: Historical prices for all securities
API: yfinance (free)
Smart: Only fetches missing date ranges (efficient)

4. Analytics (Optional)

Script: calculate_all_returns.py
Frequency: Daily (or on-demand)
Calculates: Returns, alpha, performance metrics


⚙️ Customizing the Schedule

Different Frequencies

# Every 6 hours
0 */6 * * * /home/poteapp/pote/scripts/daily_fetch.sh

# Twice daily (morning and evening)
0 7,19 * * * /home/poteapp/pote/scripts/daily_fetch.sh

# Weekdays only (business days)
0 7 * * 1-5 /home/poteapp/pote/scripts/daily_fetch.sh

# Once per week (Sunday at midnight)
0 0 * * 0 /home/poteapp/pote/scripts/daily_fetch.sh

Best Practice Recommendations

For Active Research:

  • Daily at 7 AM (catches overnight filings)
  • Weekdays only (Congress rarely files on weekends)

For Casual Tracking:

  • Weekly (Sunday night)
  • Bi-weekly (1st and 15th)

For Development:

  • Manual runs (on-demand testing)

📧 Email Notifications (Optional)

Setup Email Alerts

Add to your cron job:

# Install mail utility
sudo apt install mailutils

# Add to crontab with email
MAILTO=your-email@example.com
0 7 * * * /home/poteapp/pote/scripts/daily_fetch.sh

Custom Email Script

Create scripts/email_summary.py:

#!/usr/bin/env python
"""Email daily summary of new trades."""

import smtplib
from email.mime.text import MIMEText
from email.mime.multipart import MIMEMultipart
from datetime import date, timedelta
from sqlalchemy import text
from pote.db import engine

def get_new_trades(days=1):
    """Get trades from last N days."""
    since = date.today() - timedelta(days=days)
    
    with engine.connect() as conn:
        result = conn.execute(text("""
            SELECT o.name, s.ticker, t.side, t.transaction_date, t.value_min, t.value_max
            FROM trades t
            JOIN officials o ON t.official_id = o.id
            JOIN securities s ON t.security_id = s.id
            WHERE t.created_at >= :since
            ORDER BY t.transaction_date DESC
        """), {"since": since})
        
        return result.fetchall()

def send_email(to_email, trades):
    """Send email summary."""
    if not trades:
        print("No new trades to report")
        return
    
    # Compose email
    subject = f"POTE: {len(trades)} New Congressional Trades"
    
    body = f"<h2>New Trades ({len(trades)})</h2>\n<table>"
    body += "<tr><th>Official</th><th>Ticker</th><th>Side</th><th>Date</th><th>Value</th></tr>"
    
    for trade in trades:
        name, ticker, side, date, vmin, vmax = trade
        value = f"${vmin:,.0f}-${vmax:,.0f}" if vmax else f"${vmin:,.0f}+"
        body += f"<tr><td>{name}</td><td>{ticker}</td><td>{side}</td><td>{date}</td><td>{value}</td></tr>"
    
    body += "</table>"
    
    # Send email (configure SMTP settings)
    msg = MIMEMultipart()
    msg['From'] = "pote@yourserver.com"
    msg['To'] = to_email
    msg['Subject'] = subject
    msg.attach(MIMEText(body, 'html'))
    
    # Configure your SMTP server
    # server = smtplib.SMTP('smtp.gmail.com', 587)
    # server.starttls()
    # server.login("your-email@gmail.com", "your-password")
    # server.send_message(msg)
    # server.quit()
    
    print(f"Would send email to {to_email}")

if __name__ == "__main__":
    trades = get_new_trades(days=1)
    send_email("your-email@example.com", trades)

Then add to daily_fetch.sh:

# At the end of daily_fetch.sh
python scripts/email_summary.py

🔍 Monitoring & Logging

Check Cron Job Status

# View cron jobs
crontab -l

# Check if cron is running
sudo systemctl status cron

# View cron logs
grep CRON /var/log/syslog | tail -20

Check POTE Logs

# Today's log
tail -f ~/pote/logs/daily_fetch_$(date +%Y%m%d).log

# All logs
ls -lh ~/pote/logs/

# Last 100 lines of latest log
tail -100 ~/pote/logs/daily_fetch_*.log | tail -100

Log Rotation (Keep Disk Space Clean)

Add to /etc/logrotate.d/pote:

/home/poteapp/pote/logs/*.log {
    daily
    rotate 30
    compress
    delaycompress
    missingok
    notifempty
}

🚨 Handling Failures

What If House Stock Watcher Is Down?

The script is designed to continue even if one step fails:

# Script continues and logs warnings
⚠️  WARNING: Failed to fetch congressional trades
   This is likely because House Stock Watcher API is down
   Continuing with other steps...

Fallback options:

  1. Manual import: Use CSV import when API is down
  2. Alternative APIs: QuiverQuant, Capitol Trades
  3. Check logs: Review what failed and why

Automatic Retry Logic

Edit scripts/fetch_congressional_trades.py to add retries:

import time
from requests.exceptions import RequestException

MAX_RETRIES = 3
RETRY_DELAY = 300  # 5 minutes

for attempt in range(MAX_RETRIES):
    try:
        trades = client.fetch_recent_transactions(days=7)
        break
    except RequestException as e:
        if attempt < MAX_RETRIES - 1:
            logger.warning(f"Attempt {attempt+1} failed, retrying in {RETRY_DELAY}s...")
            time.sleep(RETRY_DELAY)
        else:
            logger.error("All retry attempts failed")
            raise

📈 Performance Optimization

Batch Processing

For large datasets, fetch in batches:

# Fetch trades in smaller date ranges
python scripts/fetch_congressional_trades.py --start-date 2024-01-01 --end-date 2024-01-31
python scripts/fetch_congressional_trades.py --start-date 2024-02-01 --end-date 2024-02-29

Parallel Processing

Use GNU Parallel for faster price fetching:

# Install parallel
sudo apt install parallel

# Fetch prices in parallel (4 at a time)
python -c "from pote.db import get_session; from pote.db.models import Security; 
session = next(get_session()); 
tickers = [s.ticker for s in session.query(Security).all()]; 
print('\n'.join(tickers))" | \
parallel -j 4 python scripts/fetch_prices_single.py {}

Database Indexing

Ensure indexes are created (already in migrations):

CREATE INDEX IF NOT EXISTS ix_trades_transaction_date ON trades(transaction_date);
CREATE INDEX IF NOT EXISTS ix_prices_date ON prices(date);
CREATE INDEX IF NOT EXISTS ix_prices_security_id ON prices(security_id);

For Proxmox Production:

# 1. Setup daily cron job
crontab -e
# Add: 0 7 * * * /home/poteapp/pote/scripts/daily_fetch.sh

# 2. Enable log rotation
sudo nano /etc/logrotate.d/pote
# Add log rotation config

# 3. Setup monitoring (optional)
python scripts/email_summary.py

# 4. Test manually first
./scripts/daily_fetch.sh

For Local Development:

# Run manually when needed
./scripts/daily_fetch.sh

# Or setup quick alias
echo "alias pote-update='~/Documents/code/pote/scripts/daily_fetch.sh'" >> ~/.bashrc
source ~/.bashrc

# Then just run:
pote-update

📝 Summary

Key Points:

  1. No real-time data exists - Congressional trades have 30-45 day lag by law
  2. Daily updates are optimal - Running hourly is wasteful
  3. Automated via cron - Set it and forget it
  4. Handles failures gracefully - Continues even if one API is down
  5. Logs everything - Easy to monitor and debug

Quick Setup:

# On Proxmox
crontab -e
# Add: 0 7 * * * /home/poteapp/pote/scripts/daily_fetch.sh

# Test it
./scripts/daily_fetch.sh

# Check logs
tail -f logs/daily_fetch_*.log

Data Freshness Expectations:

  • Best case: Trades from yesterday (if official filed overnight)
  • Typical: Trades from 30-45 days ago
  • Worst case: Official filed late or hasn't filed yet

This is normal and expected - you're working with disclosure data, not market data.