POTE/docs/09_data_updates.md
ilia 02c10c85d6 Add data update tools and Phase 2 plan
- scripts/add_custom_trades.py: Manual trade entry
- scripts/scrape_alternative_sources.py: CSV import
- scripts/daily_update.sh: Automated daily updates
- docs/09_data_updates.md: Complete update guide
- docs/PR4_PLAN.md: Phase 2 analytics plan

Enables users to add representatives and set up auto-updates
2025-12-15 10:39:18 -05:00

230 lines
5.0 KiB
Markdown

# Data Updates & Maintenance
## Adding More Representatives
### Method 1: Manual Entry (Python Script)
```bash
# Edit the script to add your representatives
nano scripts/add_custom_trades.py
# Run it
python scripts/add_custom_trades.py
```
Example:
```python
add_trade(
session,
official_name="Your Representative",
party="Democrat", # or "Republican", "Independent"
chamber="House", # or "Senate"
state="CA",
ticker="NVDA",
company_name="NVIDIA Corporation",
side="buy", # or "sell"
value_min=15001,
value_max=50000,
transaction_date="2024-12-01",
disclosure_date="2024-12-15",
)
```
### Method 2: CSV Import
```bash
# Create a template
python scripts/scrape_alternative_sources.py template
# Edit trades_template.csv with your data
nano trades_template.csv
# Import it
python scripts/scrape_alternative_sources.py import trades_template.csv
```
CSV format:
```csv
name,party,chamber,state,district,ticker,side,value_min,value_max,transaction_date,disclosure_date
Bernie Sanders,Independent,Senate,VT,,COIN,sell,15001,50000,2024-12-01,2024-12-15
```
### Method 3: Automatic Updates (When API is available)
```bash
# Fetch latest trades
python scripts/fetch_congressional_trades.py --days 30
```
## Setting Up Automatic Updates
### Option A: Cron Job (Recommended)
```bash
# Make script executable
chmod +x ~/pote/scripts/daily_update.sh
# Add to cron (runs daily at 6 AM)
crontab -e
# Add this line:
0 6 * * * /home/poteapp/pote/scripts/daily_update.sh
# Or for testing (runs every hour):
0 * * * * /home/poteapp/pote/scripts/daily_update.sh
```
View logs:
```bash
ls -lh ~/logs/daily_update_*.log
tail -f ~/logs/daily_update_$(date +%Y%m%d).log
```
### Option B: Systemd Timer
Create `/etc/systemd/system/pote-update.service`:
```ini
[Unit]
Description=POTE Daily Data Update
After=network.target postgresql.service
[Service]
Type=oneshot
User=poteapp
WorkingDirectory=/home/poteapp/pote
ExecStart=/home/poteapp/pote/scripts/daily_update.sh
StandardOutput=append:/home/poteapp/logs/pote-update.log
StandardError=append:/home/poteapp/logs/pote-update.log
```
Create `/etc/systemd/system/pote-update.timer`:
```ini
[Unit]
Description=Run POTE update daily
Requires=pote-update.service
[Timer]
OnCalendar=daily
OnCalendar=06:00
Persistent=true
[Install]
WantedBy=timers.target
```
Enable it:
```bash
sudo systemctl enable --now pote-update.timer
sudo systemctl status pote-update.timer
```
## Manual Update Workflow
```bash
# 1. Fetch new trades (when API works)
python scripts/fetch_congressional_trades.py
# 2. Enrich new securities
python scripts/enrich_securities.py
# 3. Update prices
python scripts/fetch_sample_prices.py
# 4. Check status
~/status.sh
```
## Data Sources
### Currently Working:
- ✅ yfinance (prices, company info)
- ✅ Manual entry
- ✅ CSV import
- ✅ Fixture files (testing)
### Currently Down:
- ❌ House Stock Watcher API (domain issues)
### Future Options:
- QuiverQuant (requires $30/month subscription)
- Senate Stock Watcher (check if available)
- Capitol Trades (web scraping)
- Financial Modeling Prep (requires API key)
## Monitoring Updates
### Check Recent Activity
```python
from sqlalchemy import text
from pote.db import engine
from datetime import datetime, timedelta
with engine.connect() as conn:
# Trades added in last 7 days
week_ago = (datetime.now() - timedelta(days=7)).strftime('%Y-%m-%d')
result = conn.execute(text(f"""
SELECT o.name, s.ticker, t.side, t.transaction_date
FROM trades t
JOIN officials o ON t.official_id = o.id
JOIN securities s ON t.security_id = s.id
WHERE t.created_at >= '{week_ago}'
ORDER BY t.created_at DESC
"""))
print("Recent trades:")
for row in result:
print(f" {row.name} {row.side} {row.ticker} on {row.transaction_date}")
```
### Database Growth
```bash
# Track database size over time
psql -h localhost -U poteuser -d pote -c "
SELECT
pg_size_pretty(pg_database_size('pote')) as db_size,
(SELECT COUNT(*) FROM officials) as officials,
(SELECT COUNT(*) FROM trades) as trades,
(SELECT COUNT(*) FROM prices) as prices;
"
```
## Backup Before Updates
```bash
# Backup before major updates
pg_dump -h localhost -U poteuser pote > ~/backups/pote_$(date +%Y%m%d_%H%M%S).sql
```
## Troubleshooting
### API Not Working
- Use manual entry or CSV import
- Check if alternative sources are available
- Wait for House Stock Watcher to come back online
### Duplicate Trades
The system automatically deduplicates by:
- `source` + `external_id` (for API data)
- Official + Security + Transaction Date (for manual data)
### Missing Company Info
```bash
# Re-enrich all securities
python scripts/enrich_securities.py --force
```
### Price Data Gaps
```bash
# Fetch specific date range
python << 'EOF'
from pote.ingestion.prices import PriceLoader
from pote.db import get_session
loader = PriceLoader(next(get_session()))
loader.fetch_and_store_prices("NVDA", "2024-01-01", "2024-12-31")
EOF
```