POTE/docs/01_architecture.md

# Architecture (target shape for Phase 1)

This is an intentionally simple architecture optimized for **clarity, idempotency, and testability**.

## High-level flow
1. **Ingest disclosures** (public source API) → normalize → upsert to DB (`officials`, `securities`, `trades`)
2. **Load market data** (daily prices) → upsert to DB (`prices`)
3. **Compute metrics** (returns, benchmarks, aggregates) → write to DB (`metrics_trade`, `metrics_official`)
4. **Query/report** via CLI (later: read-only API/dashboard)

## Proposed module layout (to be created)

```
src/pote/
  __init__.py
  config.py               # settings loader (.env), constants
  db/
    __init__.py
    session.py            # engine + sessionmaker
    models.py             # SQLAlchemy ORM models
    migrations/           # Alembic (added once models stabilize)
  clients/
    __init__.py
    quiver.py             # QuiverQuant client (optional)
    fmp.py                # Financial Modeling Prep client (optional)
    market_data.py        # yfinance wrapper / other provider interface
  etl/
    __init__.py
    congress_trades.py    # disclosure ingestion + upsert
    prices.py             # price ingestion + upsert + caching
  analytics/
    __init__.py
    returns.py            # return & abnormal return calculations
    signals.py            # rule-based “flags” (transparent, caveated)
    aggregations.py       # per-official summaries
  cli/
    __init__.py
    main.py               # entrypoint for research queries
tests/
  ...
```

## Design constraints (non-negotiable)
- **Public data only**: every record must store `source` and enough IDs to trace back.
- **No advice**: outputs and docs must avoid prescriptive language and include disclaimers.
- **Idempotency**: ETL and metrics jobs must be safe to rerun.
- **Separation of concerns**:
  - clients fetch raw data
  - etl normalizes + writes
  - analytics reads normalized data and writes derived tables

## Operational conventions
- Logging: structured-ish logs with counts (fetched/inserted/updated/skipped).
- Rate limits: conservative defaults; provide `--sleep`/`--max-requests` config as needed.
- Config: one settings object with env var support; `.env.example` committed, `.env` ignored.