Add comprehensive automation system

New Scripts:
- scripts/daily_fetch.sh: Automated daily data updates
  * Fetches congressional trades (last 7 days)
  * Enriches securities (name, sector, industry)
  * Updates price data for all securities
  * Calculates returns and metrics
  * Logs everything to logs/ directory

- scripts/setup_automation.sh: Interactive automation setup
  * Makes scripts executable
  * Creates log directories
  * Configures cron jobs (multiple schedule options)
  * Guides user through setup

Documentation:
- docs/10_automation.md: Complete automation guide
  * Explains disclosure timing (30-45 day legal lag)
  * Why daily updates are optimal (not hourly/real-time)
  * Cron job setup instructions
  * Systemd timer alternative
  * Email notifications (optional)
  * Monitoring and logging
  * Failure handling
  * Performance optimization

Key Insights:
 No real-time data possible (STOCK Act = 30-45 day lag)
 Daily updates are optimal
 Automated via cron jobs
 Handles API failures gracefully
 Logs everything for debugging
This commit is contained in:
ilia 2025-12-15 14:55:05 -05:00
parent 77bd69b85c
commit 3a89c1e6d2
3 changed files with 777 additions and 0 deletions

509
docs/10_automation.md Normal file
View File

@ -0,0 +1,509 @@
# POTE Automation Guide
**Automated Data Collection & Updates**
---
## ⏰ Understanding Disclosure Timing
### **Reality Check: No Real-Time Data Exists**
**Federal Law (STOCK Act):**
- 📅 Congress members have **30-45 days** to disclose trades
- 📅 Disclosures are filed as **Periodic Transaction Reports (PTRs)**
- 📅 Public databases update **after** filing (usually next day)
- 📅 **No real-time feed exists by design**
**Example Timeline:**
```
Jan 15, 2024 → Senator buys NVDA
Feb 15, 2024 → Disclosure filed (30 days later)
Feb 16, 2024 → Appears on House Stock Watcher
Feb 17, 2024 → Your system fetches it
```
### **Best Practice: Daily Updates**
Since trades appear in batches (not continuously), **running once per day is optimal**:
**Daily (7 AM)** - Catches overnight filings
**After market close** - Prices are final
**Low server load** - Off-peak hours
**Hourly** - Wasteful, no new data
**Real-time** - Impossible, not how disclosures work
---
## 🤖 Automated Setup Options
### **Option 1: Cron Job (Linux/Proxmox) - Recommended**
#### **Setup on Proxmox Container**
```bash
# SSH to your container
ssh poteapp@10.0.10.95
# Edit crontab
crontab -e
# Add this line (runs daily at 7 AM):
0 7 * * * /home/poteapp/pote/scripts/daily_fetch.sh
# Or run twice daily (7 AM and 7 PM):
0 7,19 * * * /home/poteapp/pote/scripts/daily_fetch.sh
# Save and exit
```
**What it does:**
- Fetches new congressional trades (last 7 days)
- Enriches any new securities (name, sector, industry)
- Updates price data for all securities
- Logs everything to `logs/daily_fetch_YYYYMMDD.log`
**Check logs:**
```bash
tail -f ~/pote/logs/daily_fetch_$(date +%Y%m%d).log
```
---
### **Option 2: Systemd Timer (More Advanced)**
For better logging and service management:
#### **Create Service File**
```bash
sudo nano /etc/systemd/system/pote-fetch.service
```
```ini
[Unit]
Description=POTE Daily Data Fetch
After=network.target postgresql.service
[Service]
Type=oneshot
User=poteapp
WorkingDirectory=/home/poteapp/pote
ExecStart=/home/poteapp/pote/scripts/daily_fetch.sh
StandardOutput=journal
StandardError=journal
[Install]
WantedBy=multi-user.target
```
#### **Create Timer File**
```bash
sudo nano /etc/systemd/system/pote-fetch.timer
```
```ini
[Unit]
Description=POTE Daily Data Fetch Timer
Requires=pote-fetch.service
[Timer]
OnCalendar=daily
OnCalendar=07:00
Persistent=true
[Install]
WantedBy=timers.target
```
#### **Enable and Start**
```bash
sudo systemctl daemon-reload
sudo systemctl enable pote-fetch.timer
sudo systemctl start pote-fetch.timer
# Check status
sudo systemctl status pote-fetch.timer
sudo systemctl list-timers
# View logs
sudo journalctl -u pote-fetch.service -f
```
---
### **Option 3: Manual Script (For Testing)**
Run manually whenever you want:
```bash
cd /home/user/Documents/code/pote
./scripts/daily_fetch.sh
```
Or from anywhere:
```bash
/home/user/Documents/code/pote/scripts/daily_fetch.sh
```
---
## 📊 What Gets Updated?
### **1. Congressional Trades**
**Script:** `fetch_congressional_trades.py`
**Frequency:** Daily
**Fetches:** Last 7 days (catches late filings)
**API:** House Stock Watcher (when available)
**Alternative sources:**
- Manual CSV import
- QuiverQuant API (paid)
- Capitol Trades (paid)
### **2. Security Enrichment**
**Script:** `enrich_securities.py`
**Frequency:** Daily (only updates new tickers)
**Fetches:** Company name, sector, industry
**API:** yfinance (free)
### **3. Price Data**
**Script:** `fetch_sample_prices.py`
**Frequency:** Daily
**Fetches:** Historical prices for all securities
**API:** yfinance (free)
**Smart:** Only fetches missing date ranges (efficient)
### **4. Analytics (Optional)**
**Script:** `calculate_all_returns.py`
**Frequency:** Daily (or on-demand)
**Calculates:** Returns, alpha, performance metrics
---
## ⚙️ Customizing the Schedule
### **Different Frequencies**
```bash
# Every 6 hours
0 */6 * * * /home/poteapp/pote/scripts/daily_fetch.sh
# Twice daily (morning and evening)
0 7,19 * * * /home/poteapp/pote/scripts/daily_fetch.sh
# Weekdays only (business days)
0 7 * * 1-5 /home/poteapp/pote/scripts/daily_fetch.sh
# Once per week (Sunday at midnight)
0 0 * * 0 /home/poteapp/pote/scripts/daily_fetch.sh
```
### **Best Practice Recommendations**
**For Active Research:**
- **Daily at 7 AM** (catches overnight filings)
- **Weekdays only** (Congress rarely files on weekends)
**For Casual Tracking:**
- **Weekly** (Sunday night)
- **Bi-weekly** (1st and 15th)
**For Development:**
- **Manual runs** (on-demand testing)
---
## 📧 Email Notifications (Optional)
### **Setup Email Alerts**
Add to your cron job:
```bash
# Install mail utility
sudo apt install mailutils
# Add to crontab with email
MAILTO=your-email@example.com
0 7 * * * /home/poteapp/pote/scripts/daily_fetch.sh
```
### **Custom Email Script**
Create `scripts/email_summary.py`:
```python
#!/usr/bin/env python
"""Email daily summary of new trades."""
import smtplib
from email.mime.text import MIMEText
from email.mime.multipart import MIMEMultipart
from datetime import date, timedelta
from sqlalchemy import text
from pote.db import engine
def get_new_trades(days=1):
"""Get trades from last N days."""
since = date.today() - timedelta(days=days)
with engine.connect() as conn:
result = conn.execute(text("""
SELECT o.name, s.ticker, t.side, t.transaction_date, t.value_min, t.value_max
FROM trades t
JOIN officials o ON t.official_id = o.id
JOIN securities s ON t.security_id = s.id
WHERE t.created_at >= :since
ORDER BY t.transaction_date DESC
"""), {"since": since})
return result.fetchall()
def send_email(to_email, trades):
"""Send email summary."""
if not trades:
print("No new trades to report")
return
# Compose email
subject = f"POTE: {len(trades)} New Congressional Trades"
body = f"<h2>New Trades ({len(trades)})</h2>\n<table>"
body += "<tr><th>Official</th><th>Ticker</th><th>Side</th><th>Date</th><th>Value</th></tr>"
for trade in trades:
name, ticker, side, date, vmin, vmax = trade
value = f"${vmin:,.0f}-${vmax:,.0f}" if vmax else f"${vmin:,.0f}+"
body += f"<tr><td>{name}</td><td>{ticker}</td><td>{side}</td><td>{date}</td><td>{value}</td></tr>"
body += "</table>"
# Send email (configure SMTP settings)
msg = MIMEMultipart()
msg['From'] = "pote@yourserver.com"
msg['To'] = to_email
msg['Subject'] = subject
msg.attach(MIMEText(body, 'html'))
# Configure your SMTP server
# server = smtplib.SMTP('smtp.gmail.com', 587)
# server.starttls()
# server.login("your-email@gmail.com", "your-password")
# server.send_message(msg)
# server.quit()
print(f"Would send email to {to_email}")
if __name__ == "__main__":
trades = get_new_trades(days=1)
send_email("your-email@example.com", trades)
```
Then add to `daily_fetch.sh`:
```bash
# At the end of daily_fetch.sh
python scripts/email_summary.py
```
---
## 🔍 Monitoring & Logging
### **Check Cron Job Status**
```bash
# View cron jobs
crontab -l
# Check if cron is running
sudo systemctl status cron
# View cron logs
grep CRON /var/log/syslog | tail -20
```
### **Check POTE Logs**
```bash
# Today's log
tail -f ~/pote/logs/daily_fetch_$(date +%Y%m%d).log
# All logs
ls -lh ~/pote/logs/
# Last 100 lines of latest log
tail -100 ~/pote/logs/daily_fetch_*.log | tail -100
```
### **Log Rotation (Keep Disk Space Clean)**
Add to `/etc/logrotate.d/pote`:
```
/home/poteapp/pote/logs/*.log {
daily
rotate 30
compress
delaycompress
missingok
notifempty
}
```
---
## 🚨 Handling Failures
### **What If House Stock Watcher Is Down?**
The script is designed to continue even if one step fails:
```bash
# Script continues and logs warnings
⚠️ WARNING: Failed to fetch congressional trades
This is likely because House Stock Watcher API is down
Continuing with other steps...
```
**Fallback options:**
1. **Manual import:** Use CSV import when API is down
2. **Alternative APIs:** QuiverQuant, Capitol Trades
3. **Check logs:** Review what failed and why
### **Automatic Retry Logic**
Edit `scripts/fetch_congressional_trades.py` to add retries:
```python
import time
from requests.exceptions import RequestException
MAX_RETRIES = 3
RETRY_DELAY = 300 # 5 minutes
for attempt in range(MAX_RETRIES):
try:
trades = client.fetch_recent_transactions(days=7)
break
except RequestException as e:
if attempt < MAX_RETRIES - 1:
logger.warning(f"Attempt {attempt+1} failed, retrying in {RETRY_DELAY}s...")
time.sleep(RETRY_DELAY)
else:
logger.error("All retry attempts failed")
raise
```
---
## 📈 Performance Optimization
### **Batch Processing**
For large datasets, fetch in batches:
```bash
# Fetch trades in smaller date ranges
python scripts/fetch_congressional_trades.py --start-date 2024-01-01 --end-date 2024-01-31
python scripts/fetch_congressional_trades.py --start-date 2024-02-01 --end-date 2024-02-29
```
### **Parallel Processing**
Use GNU Parallel for faster price fetching:
```bash
# Install parallel
sudo apt install parallel
# Fetch prices in parallel (4 at a time)
python -c "from pote.db import get_session; from pote.db.models import Security;
session = next(get_session());
tickers = [s.ticker for s in session.query(Security).all()];
print('\n'.join(tickers))" | \
parallel -j 4 python scripts/fetch_prices_single.py {}
```
### **Database Indexing**
Ensure indexes are created (already in migrations):
```sql
CREATE INDEX IF NOT EXISTS ix_trades_transaction_date ON trades(transaction_date);
CREATE INDEX IF NOT EXISTS ix_prices_date ON prices(date);
CREATE INDEX IF NOT EXISTS ix_prices_security_id ON prices(security_id);
```
---
## 🎯 Recommended Setup
### **For Proxmox Production:**
```bash
# 1. Setup daily cron job
crontab -e
# Add: 0 7 * * * /home/poteapp/pote/scripts/daily_fetch.sh
# 2. Enable log rotation
sudo nano /etc/logrotate.d/pote
# Add log rotation config
# 3. Setup monitoring (optional)
python scripts/email_summary.py
# 4. Test manually first
./scripts/daily_fetch.sh
```
### **For Local Development:**
```bash
# Run manually when needed
./scripts/daily_fetch.sh
# Or setup quick alias
echo "alias pote-update='~/Documents/code/pote/scripts/daily_fetch.sh'" >> ~/.bashrc
source ~/.bashrc
# Then just run:
pote-update
```
---
## 📝 Summary
### **Key Points:**
1. **No real-time data exists** - Congressional trades have 30-45 day lag by law
2. **Daily updates are optimal** - Running hourly is wasteful
3. **Automated via cron** - Set it and forget it
4. **Handles failures gracefully** - Continues even if one API is down
5. **Logs everything** - Easy to monitor and debug
### **Quick Setup:**
```bash
# On Proxmox
crontab -e
# Add: 0 7 * * * /home/poteapp/pote/scripts/daily_fetch.sh
# Test it
./scripts/daily_fetch.sh
# Check logs
tail -f logs/daily_fetch_*.log
```
### **Data Freshness Expectations:**
- **Best case:** Trades from yesterday (if official filed overnight)
- **Typical:** Trades from 30-45 days ago
- **Worst case:** Official filed late or hasn't filed yet
**This is normal and expected** - you're working with disclosure data, not market data.

118
scripts/daily_fetch.sh Executable file
View File

@ -0,0 +1,118 @@
#!/bin/bash
# Daily POTE Data Update Script
# Run this once per day to fetch new trades and prices
# Recommended: 7 AM daily (after markets close and disclosures are filed)
set -e # Exit on error
# --- Configuration ---
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
PROJECT_DIR="$(dirname "$SCRIPT_DIR")"
LOG_DIR="${PROJECT_DIR}/logs"
LOG_FILE="${LOG_DIR}/daily_fetch_$(date +%Y%m%d).log"
# Ensure log directory exists
mkdir -p "$LOG_DIR"
# Redirect all output to log file
exec > >(tee -a "$LOG_FILE") 2>&1
echo "=========================================="
echo " POTE Daily Data Fetch"
echo " $(date)"
echo "=========================================="
# Activate virtual environment
cd "$PROJECT_DIR"
source venv/bin/activate
# --- Step 1: Fetch Congressional Trades ---
echo ""
echo "--- Step 1: Fetching Congressional Trades ---"
# Fetch last 7 days (to catch any late filings)
python scripts/fetch_congressional_trades.py --days 7
TRADES_EXIT=$?
if [ $TRADES_EXIT -ne 0 ]; then
echo "⚠️ WARNING: Failed to fetch congressional trades"
echo " This is likely because House Stock Watcher API is down"
echo " Continuing with other steps..."
fi
# --- Step 2: Enrich Securities ---
echo ""
echo "--- Step 2: Enriching Securities ---"
# Add company names, sectors, industries for any new tickers
python scripts/enrich_securities.py
ENRICH_EXIT=$?
if [ $ENRICH_EXIT -ne 0 ]; then
echo "⚠️ WARNING: Failed to enrich securities"
fi
# --- Step 3: Fetch Price Data ---
echo ""
echo "--- Step 3: Fetching Price Data ---"
# Fetch prices for all securities
python scripts/fetch_sample_prices.py
PRICES_EXIT=$?
if [ $PRICES_EXIT -ne 0 ]; then
echo "⚠️ WARNING: Failed to fetch price data"
fi
# --- Step 4: Calculate Returns (Optional) ---
echo ""
echo "--- Step 4: Calculating Returns ---"
python scripts/calculate_all_returns.py --window 90 --limit 100
CALC_EXIT=$?
if [ $CALC_EXIT -ne 0 ]; then
echo "⚠️ WARNING: Failed to calculate returns"
fi
# --- Summary ---
echo ""
echo "=========================================="
echo " Daily Fetch Complete"
echo " $(date)"
echo "=========================================="
# Show quick stats
python << 'PYEOF'
from sqlalchemy import text
from pote.db import engine
from datetime import datetime
print("\n📊 Current Database Stats:")
with engine.connect() as conn:
officials = conn.execute(text("SELECT COUNT(*) FROM officials")).scalar()
trades = conn.execute(text("SELECT COUNT(*) FROM trades")).scalar()
securities = conn.execute(text("SELECT COUNT(*) FROM securities")).scalar()
prices = conn.execute(text("SELECT COUNT(*) FROM prices")).scalar()
print(f" Officials: {officials:,}")
print(f" Securities: {securities:,}")
print(f" Trades: {trades:,}")
print(f" Prices: {prices:,}")
# Show most recent trade
result = conn.execute(text("""
SELECT o.name, s.ticker, t.side, t.transaction_date
FROM trades t
JOIN officials o ON t.official_id = o.id
JOIN securities s ON t.security_id = s.id
ORDER BY t.transaction_date DESC
LIMIT 1
""")).fetchone()
if result:
print(f"\n📈 Most Recent Trade:")
print(f" {result[0]} - {result[2].upper()} {result[1]} on {result[3]}")
print()
PYEOF
# Exit with success (even if some steps warned)
exit 0

150
scripts/setup_automation.sh Executable file
View File

@ -0,0 +1,150 @@
#!/bin/bash
# Setup Automation for POTE
# Run this once on your Proxmox container to enable daily updates
set -e
echo "=========================================="
echo " POTE Automation Setup"
echo "=========================================="
# Detect if we're root or regular user
if [ "$EUID" -eq 0 ]; then
echo "⚠️ Running as root. Will setup for poteapp user."
TARGET_USER="poteapp"
TARGET_HOME="/home/poteapp"
else
TARGET_USER="$USER"
TARGET_HOME="$HOME"
fi
POTE_DIR="${TARGET_HOME}/pote"
# Check if POTE directory exists
if [ ! -d "$POTE_DIR" ]; then
echo "❌ Error: POTE directory not found at $POTE_DIR"
echo " Please clone the repository first."
exit 1
fi
echo "✅ Found POTE at: $POTE_DIR"
# Make scripts executable
echo ""
echo "Making scripts executable..."
chmod +x "${POTE_DIR}/scripts/daily_fetch.sh"
chmod +x "${POTE_DIR}/scripts/fetch_congressional_trades.py"
chmod +x "${POTE_DIR}/scripts/enrich_securities.py"
chmod +x "${POTE_DIR}/scripts/fetch_sample_prices.py"
# Create logs directory
echo "Creating logs directory..."
mkdir -p "${POTE_DIR}/logs"
# Test the daily fetch script
echo ""
echo "Testing daily fetch script (dry run)..."
echo "This may take a few minutes..."
cd "$POTE_DIR"
if [ "$EUID" -eq 0 ]; then
su - $TARGET_USER -c "cd ${POTE_DIR} && source venv/bin/activate && python --version"
else
source venv/bin/activate
python --version
fi
# Setup cron job
echo ""
echo "=========================================="
echo " Cron Job Setup"
echo "=========================================="
echo ""
echo "Choose schedule:"
echo " 1) Daily at 7 AM (recommended)"
echo " 2) Twice daily (7 AM and 7 PM)"
echo " 3) Weekdays only at 7 AM"
echo " 4) Custom (I'll help you configure)"
echo " 5) Skip (manual setup)"
echo ""
read -p "Enter choice [1-5]: " choice
CRON_LINE=""
case $choice in
1)
CRON_LINE="0 7 * * * ${POTE_DIR}/scripts/daily_fetch.sh"
;;
2)
CRON_LINE="0 7,19 * * * ${POTE_DIR}/scripts/daily_fetch.sh"
;;
3)
CRON_LINE="0 7 * * 1-5 ${POTE_DIR}/scripts/daily_fetch.sh"
;;
4)
echo ""
echo "Cron format: MIN HOUR DAY MONTH WEEKDAY"
echo "Examples:"
echo " 0 7 * * * = Daily at 7 AM"
echo " 0 */6 * * * = Every 6 hours"
echo " 0 0 * * 0 = Weekly on Sunday"
read -p "Enter cron schedule: " custom_schedule
CRON_LINE="${custom_schedule} ${POTE_DIR}/scripts/daily_fetch.sh"
;;
5)
echo "Skipping cron setup. You can add manually with:"
echo " crontab -e"
echo " Add: 0 7 * * * ${POTE_DIR}/scripts/daily_fetch.sh"
CRON_LINE=""
;;
*)
echo "Invalid choice. Skipping cron setup."
CRON_LINE=""
;;
esac
if [ -n "$CRON_LINE" ]; then
echo ""
echo "Adding to crontab: $CRON_LINE"
if [ "$EUID" -eq 0 ]; then
# Add as target user
(su - $TARGET_USER -c "crontab -l" 2>/dev/null || true; echo "$CRON_LINE") | \
su - $TARGET_USER -c "crontab -"
else
# Add as current user
(crontab -l 2>/dev/null || true; echo "$CRON_LINE") | crontab -
fi
echo "✅ Cron job added!"
echo ""
echo "View with: crontab -l"
fi
# Summary
echo ""
echo "=========================================="
echo " Setup Complete!"
echo "=========================================="
echo ""
echo "📝 What was configured:"
echo " ✅ Scripts made executable"
echo " ✅ Logs directory created: ${POTE_DIR}/logs"
if [ -n "$CRON_LINE" ]; then
echo " ✅ Cron job scheduled"
fi
echo ""
echo "🧪 Test manually:"
echo " ${POTE_DIR}/scripts/daily_fetch.sh"
echo ""
echo "📊 View logs:"
echo " tail -f ${POTE_DIR}/logs/daily_fetch_\$(date +%Y%m%d).log"
echo ""
echo "⚙️ Manage cron:"
echo " crontab -l # View cron jobs"
echo " crontab -e # Edit cron jobs"
echo ""
echo "📚 Documentation:"
echo " ${POTE_DIR}/docs/10_automation.md"
echo ""