POTE/docs/01_architecture.md
ilia 204cd0e75b Initial commit: POTE Phase 1 complete
- PR1: Project scaffold, DB models, price loader
- PR2: Congressional trade ingestion (House Stock Watcher)
- PR3: Security enrichment + deployment infrastructure
- 37 passing tests, 87%+ coverage
- Docker + Proxmox deployment ready
- Complete documentation
- Works 100% offline with fixtures
2025-12-14 20:45:34 -05:00

58 lines
2.2 KiB
Markdown

# Architecture (target shape for Phase 1)
This is an intentionally simple architecture optimized for **clarity, idempotency, and testability**.
## High-level flow
1. **Ingest disclosures** (public source API) → normalize → upsert to DB (`officials`, `securities`, `trades`)
2. **Load market data** (daily prices) → upsert to DB (`prices`)
3. **Compute metrics** (returns, benchmarks, aggregates) → write to DB (`metrics_trade`, `metrics_official`)
4. **Query/report** via CLI (later: read-only API/dashboard)
## Proposed module layout (to be created)
```
src/pote/
__init__.py
config.py # settings loader (.env), constants
db/
__init__.py
session.py # engine + sessionmaker
models.py # SQLAlchemy ORM models
migrations/ # Alembic (added once models stabilize)
clients/
__init__.py
quiver.py # QuiverQuant client (optional)
fmp.py # Financial Modeling Prep client (optional)
market_data.py # yfinance wrapper / other provider interface
etl/
__init__.py
congress_trades.py # disclosure ingestion + upsert
prices.py # price ingestion + upsert + caching
analytics/
__init__.py
returns.py # return & abnormal return calculations
signals.py # rule-based “flags” (transparent, caveated)
aggregations.py # per-official summaries
cli/
__init__.py
main.py # entrypoint for research queries
tests/
...
```
## Design constraints (non-negotiable)
- **Public data only**: every record must store `source` and enough IDs to trace back.
- **No advice**: outputs and docs must avoid prescriptive language and include disclaimers.
- **Idempotency**: ETL and metrics jobs must be safe to rerun.
- **Separation of concerns**:
- clients fetch raw data
- etl normalizes + writes
- analytics reads normalized data and writes derived tables
## Operational conventions
- Logging: structured-ish logs with counts (fetched/inserted/updated/skipped).
- Rate limits: conservative defaults; provide `--sleep`/`--max-requests` config as needed.
- Config: one settings object with env var support; `.env.example` committed, `.env` ignored.