POTE/docs/09_data_updates.md
ilia 0d8d85adc1 Add complete automation, reporting, and CI/CD system
Features Added:
==============

📧 EMAIL REPORTING SYSTEM:
- EmailReporter: Send reports via SMTP (Gmail, SendGrid, custom)
- ReportGenerator: Generate daily/weekly summaries with HTML/text formatting
- Configurable via .env (SMTP_HOST, SMTP_PORT, etc.)
- Scripts: send_daily_report.py, send_weekly_report.py

🤖 AUTOMATED RUNS:
- automated_daily_run.sh: Full daily ETL pipeline + reporting
- automated_weekly_run.sh: Weekly pattern analysis + reports
- setup_cron.sh: Interactive cron job setup (5-minute setup)
- Logs saved to ~/logs/ with automatic cleanup

🔍 HEALTH CHECKS:
- health_check.py: System health monitoring
- Checks: DB connection, data freshness, counts, recent alerts
- JSON output for programmatic use
- Exit codes for monitoring integration

🚀 CI/CD PIPELINE:
- .github/workflows/ci.yml: Full CI/CD pipeline
- GitHub Actions / Gitea Actions compatible
- Jobs: lint & test, security scan, dependency scan, Docker build
- PostgreSQL service for integration tests
- 93 tests passing in CI

📚 COMPREHENSIVE DOCUMENTATION:
- AUTOMATION_QUICKSTART.md: 5-minute email setup guide
- docs/12_automation_and_reporting.md: Full automation guide
- Updated README.md with automation links
- Deployment → Production workflow guide

🛠️ IMPROVEMENTS:
- All shell scripts made executable
- Environment variable examples in .env.example
- Report logs saved with timestamps
- 30-day log retention with auto-cleanup
- Health checks can be scheduled via cron

WHAT THIS ENABLES:
==================
After deployment, users can:
1. Set up automated daily/weekly email reports (5 min)
2. Receive HTML+text emails with:
   - New trades, market alerts, suspicious timing
   - Weekly patterns, rankings, repeat offenders
3. Monitor system health automatically
4. Run full CI/CD pipeline on every commit
5. Deploy with confidence (tests + security scans)

USAGE:
======
# One-time setup (on deployed server)
./scripts/setup_cron.sh

# Or manually send reports
python scripts/send_daily_report.py --to user@example.com
python scripts/send_weekly_report.py --to user@example.com

# Check system health
python scripts/health_check.py

See AUTOMATION_QUICKSTART.md for full instructions.

93 tests passing | Full CI/CD | Email reports ready
2025-12-15 15:34:31 -05:00

5.0 KiB

Data Updates & Maintenance

Adding More Representatives

Method 1: Manual Entry (Python Script)

# Edit the script to add your representatives
nano scripts/add_custom_trades.py

# Run it
python scripts/add_custom_trades.py

Example:

add_trade(
    session,
    official_name="Your Representative",
    party="Democrat",  # or "Republican", "Independent"
    chamber="House",   # or "Senate"
    state="CA",
    ticker="NVDA",
    company_name="NVIDIA Corporation",
    side="buy",  # or "sell"
    value_min=15001,
    value_max=50000,
    transaction_date="2024-12-01",
    disclosure_date="2024-12-15",
)

Method 2: CSV Import

# Create a template
python scripts/scrape_alternative_sources.py template

# Edit trades_template.csv with your data
nano trades_template.csv

# Import it
python scripts/scrape_alternative_sources.py import trades_template.csv

CSV format:

name,party,chamber,state,district,ticker,side,value_min,value_max,transaction_date,disclosure_date
Bernie Sanders,Independent,Senate,VT,,COIN,sell,15001,50000,2024-12-01,2024-12-15

Method 3: Automatic Updates (When API is available)

# Fetch latest trades
python scripts/fetch_congressional_trades.py --days 30

Setting Up Automatic Updates

# Make script executable
chmod +x ~/pote/scripts/daily_update.sh

# Add to cron (runs daily at 6 AM)
crontab -e

# Add this line:
0 6 * * * /home/poteapp/pote/scripts/daily_update.sh

# Or for testing (runs every hour):
0 * * * * /home/poteapp/pote/scripts/daily_update.sh

View logs:

ls -lh ~/logs/daily_update_*.log
tail -f ~/logs/daily_update_$(date +%Y%m%d).log

Option B: Systemd Timer

Create /etc/systemd/system/pote-update.service:

[Unit]
Description=POTE Daily Data Update
After=network.target postgresql.service

[Service]
Type=oneshot
User=poteapp
WorkingDirectory=/home/poteapp/pote
ExecStart=/home/poteapp/pote/scripts/daily_update.sh
StandardOutput=append:/home/poteapp/logs/pote-update.log
StandardError=append:/home/poteapp/logs/pote-update.log

Create /etc/systemd/system/pote-update.timer:

[Unit]
Description=Run POTE update daily
Requires=pote-update.service

[Timer]
OnCalendar=daily
OnCalendar=06:00
Persistent=true

[Install]
WantedBy=timers.target

Enable it:

sudo systemctl enable --now pote-update.timer
sudo systemctl status pote-update.timer

Manual Update Workflow

# 1. Fetch new trades (when API works)
python scripts/fetch_congressional_trades.py

# 2. Enrich new securities
python scripts/enrich_securities.py

# 3. Update prices
python scripts/fetch_sample_prices.py

# 4. Check status
~/status.sh

Data Sources

Currently Working:

  • yfinance (prices, company info)
  • Manual entry
  • CSV import
  • Fixture files (testing)

Currently Down:

  • House Stock Watcher API (domain issues)

Future Options:

  • QuiverQuant (requires $30/month subscription)
  • Senate Stock Watcher (check if available)
  • Capitol Trades (web scraping)
  • Financial Modeling Prep (requires API key)

Monitoring Updates

Check Recent Activity

from sqlalchemy import text
from pote.db import engine
from datetime import datetime, timedelta

with engine.connect() as conn:
    # Trades added in last 7 days
    week_ago = (datetime.now() - timedelta(days=7)).strftime('%Y-%m-%d')
    result = conn.execute(text(f"""
        SELECT o.name, s.ticker, t.side, t.transaction_date
        FROM trades t
        JOIN officials o ON t.official_id = o.id
        JOIN securities s ON t.security_id = s.id
        WHERE t.created_at >= '{week_ago}'
        ORDER BY t.created_at DESC
    """))
    
    print("Recent trades:")
    for row in result:
        print(f"  {row.name} {row.side} {row.ticker} on {row.transaction_date}")

Database Growth

# Track database size over time
psql -h localhost -U poteuser -d pote -c "
SELECT 
    pg_size_pretty(pg_database_size('pote')) as db_size,
    (SELECT COUNT(*) FROM officials) as officials,
    (SELECT COUNT(*) FROM trades) as trades,
    (SELECT COUNT(*) FROM prices) as prices;
"

Backup Before Updates

# Backup before major updates
pg_dump -h localhost -U poteuser pote > ~/backups/pote_$(date +%Y%m%d_%H%M%S).sql

Troubleshooting

API Not Working

  • Use manual entry or CSV import
  • Check if alternative sources are available
  • Wait for House Stock Watcher to come back online

Duplicate Trades

The system automatically deduplicates by:

  • source + external_id (for API data)
  • Official + Security + Transaction Date (for manual data)

Missing Company Info

# Re-enrich all securities
python scripts/enrich_securities.py --force

Price Data Gaps

# Fetch specific date range
python << 'EOF'
from pote.ingestion.prices import PriceLoader
from pote.db import get_session

loader = PriceLoader(next(get_session()))
loader.fetch_and_store_prices("NVDA", "2024-01-01", "2024-12-31")
EOF