# Architecture (target shape for Phase 1) This is an intentionally simple architecture optimized for **clarity, idempotency, and testability**. ## High-level flow 1. **Ingest disclosures** (public source API) → normalize → upsert to DB (`officials`, `securities`, `trades`) 2. **Load market data** (daily prices) → upsert to DB (`prices`) 3. **Compute metrics** (returns, benchmarks, aggregates) → write to DB (`metrics_trade`, `metrics_official`) 4. **Query/report** via CLI (later: read-only API/dashboard) ## Proposed module layout (to be created) ``` src/pote/ __init__.py config.py # settings loader (.env), constants db/ __init__.py session.py # engine + sessionmaker models.py # SQLAlchemy ORM models migrations/ # Alembic (added once models stabilize) clients/ __init__.py quiver.py # QuiverQuant client (optional) fmp.py # Financial Modeling Prep client (optional) market_data.py # yfinance wrapper / other provider interface etl/ __init__.py congress_trades.py # disclosure ingestion + upsert prices.py # price ingestion + upsert + caching analytics/ __init__.py returns.py # return & abnormal return calculations signals.py # rule-based “flags” (transparent, caveated) aggregations.py # per-official summaries cli/ __init__.py main.py # entrypoint for research queries tests/ ... ``` ## Design constraints (non-negotiable) - **Public data only**: every record must store `source` and enough IDs to trace back. - **No advice**: outputs and docs must avoid prescriptive language and include disclaimers. - **Idempotency**: ETL and metrics jobs must be safe to rerun. - **Separation of concerns**: - clients fetch raw data - etl normalizes + writes - analytics reads normalized data and writes derived tables ## Operational conventions - Logging: structured-ish logs with counts (fetched/inserted/updated/skipped). - Rate limits: conservative defaults; provide `--sleep`/`--max-requests` config as needed. - Config: one settings object with env var support; `.env.example` committed, `.env` ignored.