POTE/docs/01_architecture.md
ilia 204cd0e75b Initial commit: POTE Phase 1 complete
- PR1: Project scaffold, DB models, price loader
- PR2: Congressional trade ingestion (House Stock Watcher)
- PR3: Security enrichment + deployment infrastructure
- 37 passing tests, 87%+ coverage
- Docker + Proxmox deployment ready
- Complete documentation
- Works 100% offline with fixtures
2025-12-14 20:45:34 -05:00

2.2 KiB

Architecture (target shape for Phase 1)

This is an intentionally simple architecture optimized for clarity, idempotency, and testability.

High-level flow

  1. Ingest disclosures (public source API) → normalize → upsert to DB (officials, securities, trades)
  2. Load market data (daily prices) → upsert to DB (prices)
  3. Compute metrics (returns, benchmarks, aggregates) → write to DB (metrics_trade, metrics_official)
  4. Query/report via CLI (later: read-only API/dashboard)

Proposed module layout (to be created)

src/pote/
  __init__.py
  config.py               # settings loader (.env), constants
  db/
    __init__.py
    session.py            # engine + sessionmaker
    models.py             # SQLAlchemy ORM models
    migrations/           # Alembic (added once models stabilize)
  clients/
    __init__.py
    quiver.py             # QuiverQuant client (optional)
    fmp.py                # Financial Modeling Prep client (optional)
    market_data.py        # yfinance wrapper / other provider interface
  etl/
    __init__.py
    congress_trades.py    # disclosure ingestion + upsert
    prices.py             # price ingestion + upsert + caching
  analytics/
    __init__.py
    returns.py            # return & abnormal return calculations
    signals.py            # rule-based “flags” (transparent, caveated)
    aggregations.py       # per-official summaries
  cli/
    __init__.py
    main.py               # entrypoint for research queries
tests/
  ...

Design constraints (non-negotiable)

  • Public data only: every record must store source and enough IDs to trace back.
  • No advice: outputs and docs must avoid prescriptive language and include disclaimers.
  • Idempotency: ETL and metrics jobs must be safe to rerun.
  • Separation of concerns:
    • clients fetch raw data
    • etl normalizes + writes
    • analytics reads normalized data and writes derived tables

Operational conventions

  • Logging: structured-ish logs with counts (fetched/inserted/updated/skipped).
  • Rate limits: conservative defaults; provide --sleep/--max-requests config as needed.
  • Config: one settings object with env var support; .env.example committed, .env ignored.