POTE/scripts/scrape_alternative_sources.py
ilia 0d8d85adc1 Add complete automation, reporting, and CI/CD system
Features Added:
==============

📧 EMAIL REPORTING SYSTEM:
- EmailReporter: Send reports via SMTP (Gmail, SendGrid, custom)
- ReportGenerator: Generate daily/weekly summaries with HTML/text formatting
- Configurable via .env (SMTP_HOST, SMTP_PORT, etc.)
- Scripts: send_daily_report.py, send_weekly_report.py

🤖 AUTOMATED RUNS:
- automated_daily_run.sh: Full daily ETL pipeline + reporting
- automated_weekly_run.sh: Weekly pattern analysis + reports
- setup_cron.sh: Interactive cron job setup (5-minute setup)
- Logs saved to ~/logs/ with automatic cleanup

🔍 HEALTH CHECKS:
- health_check.py: System health monitoring
- Checks: DB connection, data freshness, counts, recent alerts
- JSON output for programmatic use
- Exit codes for monitoring integration

🚀 CI/CD PIPELINE:
- .github/workflows/ci.yml: Full CI/CD pipeline
- GitHub Actions / Gitea Actions compatible
- Jobs: lint & test, security scan, dependency scan, Docker build
- PostgreSQL service for integration tests
- 93 tests passing in CI

📚 COMPREHENSIVE DOCUMENTATION:
- AUTOMATION_QUICKSTART.md: 5-minute email setup guide
- docs/12_automation_and_reporting.md: Full automation guide
- Updated README.md with automation links
- Deployment → Production workflow guide

🛠️ IMPROVEMENTS:
- All shell scripts made executable
- Environment variable examples in .env.example
- Report logs saved with timestamps
- 30-day log retention with auto-cleanup
- Health checks can be scheduled via cron

WHAT THIS ENABLES:
==================
After deployment, users can:
1. Set up automated daily/weekly email reports (5 min)
2. Receive HTML+text emails with:
   - New trades, market alerts, suspicious timing
   - Weekly patterns, rankings, repeat offenders
3. Monitor system health automatically
4. Run full CI/CD pipeline on every commit
5. Deploy with confidence (tests + security scans)

USAGE:
======
# One-time setup (on deployed server)
./scripts/setup_cron.sh

# Or manually send reports
python scripts/send_daily_report.py --to user@example.com
python scripts/send_weekly_report.py --to user@example.com

# Check system health
python scripts/health_check.py

See AUTOMATION_QUICKSTART.md for full instructions.

93 tests passing | Full CI/CD | Email reports ready
2025-12-15 15:34:31 -05:00

134 lines
4.0 KiB
Python
Executable File

#!/usr/bin/env python3
"""
Scrape congressional trades from alternative sources.
Options:
1. Senate Stock Watcher (if available)
2. QuiverQuant (requires API key)
3. Capitol Trades (web scraping - be careful)
4. Manual CSV import
"""
import csv
import logging
from datetime import datetime
from pathlib import Path
from pote.db import get_session
from pote.ingestion.trade_loader import TradeLoader
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
def import_from_csv(csv_path: str):
"""
Import trades from CSV file.
CSV format:
name,party,chamber,state,ticker,side,value_min,value_max,transaction_date,disclosure_date
"""
logger.info(f"Reading trades from {csv_path}")
with open(csv_path, 'r') as f:
reader = csv.DictReader(f)
transactions = []
for row in reader:
# Convert CSV row to transaction format
txn = {
"representative": row["name"],
"party": row["party"],
"house": row["chamber"], # "House" or "Senate"
"state": row.get("state", ""),
"district": row.get("district", ""),
"ticker": row["ticker"],
"transaction": row["side"].capitalize(), # "Purchase" or "Sale"
"amount": f"${row['value_min']} - ${row['value_max']}",
"transaction_date": row["transaction_date"],
"disclosure_date": row.get("disclosure_date", row["transaction_date"]),
}
transactions.append(txn)
logger.info(f"Loaded {len(transactions)} transactions from CSV")
# Ingest into database
with next(get_session()) as session:
loader = TradeLoader(session)
stats = loader.ingest_transactions(transactions, source="csv_import")
logger.info(f"✅ Ingested: {stats['officials_created']} officials, "
f"{stats['securities_created']} securities, "
f"{stats['trades_ingested']} trades")
def create_sample_csv(output_path: str = "trades_template.csv"):
"""Create a template CSV file for manual entry."""
template_data = [
{
"name": "Bernie Sanders",
"party": "Independent",
"chamber": "Senate",
"state": "VT",
"district": "",
"ticker": "COIN",
"side": "sell",
"value_min": "15001",
"value_max": "50000",
"transaction_date": "2024-12-01",
"disclosure_date": "2024-12-15",
},
{
"name": "Alexandria Ocasio-Cortez",
"party": "Democrat",
"chamber": "House",
"state": "NY",
"district": "NY-14",
"ticker": "PLTR",
"side": "buy",
"value_min": "1001",
"value_max": "15000",
"transaction_date": "2024-11-15",
"disclosure_date": "2024-12-01",
},
]
with open(output_path, 'w', newline='') as f:
writer = csv.DictWriter(f, fieldnames=template_data[0].keys())
writer.writeheader()
writer.writerows(template_data)
logger.info(f"✅ Created template CSV: {output_path}")
logger.info("Edit this file and run: python scripts/scrape_alternative_sources.py import <file>")
def main():
"""Main entry point."""
import sys
if len(sys.argv) < 2:
print("Usage:")
print(" python scripts/scrape_alternative_sources.py template # Create CSV template")
print(" python scripts/scrape_alternative_sources.py import <csv_file> # Import from CSV")
sys.exit(1)
command = sys.argv[1]
if command == "template":
create_sample_csv()
elif command == "import":
if len(sys.argv) < 3:
print("Error: Please specify CSV file to import")
sys.exit(1)
import_from_csv(sys.argv[2])
else:
print(f"Unknown command: {command}")
sys.exit(1)
if __name__ == "__main__":
main()