# MVP (Phase 1) — US Congress prototype This document defines a **minimal viable research system** for ingesting U.S. Congress trade disclosures, storing them in a relational DB, joining to daily price data, and computing a small set of descriptive metrics. ## Non-goals (explicit) - No trading execution, brokerage integration, alerts for “buy/sell”, or portfolio automation. - No claims of insider information. - No promises of alpha; all outputs are descriptive analytics with caveats. ## MVP definition (what “done” means) The MVP is “done” when a researcher can: - Ingest recent U.S. Congress trade disclosures from at least **one** public source (e.g., QuiverQuant or FMP) into a DB. - Ingest daily prices for traded tickers (e.g., yfinance) into the DB. - Run a query/report that shows, for an official and date range: - trades (buy/sell, transaction + filing dates, amount/value range when available) - post-trade returns over fixed windows (e.g., 1M/3M/6M) and a simple benchmark (e.g., SPY) to produce **abnormal return** - Compute and store a small set of **risk/ethics flags** (rule-based, transparent, caveated). ## PR-sized rollout plan (sequence) ### PR 1 — Project scaffold + tooling (small, boring, reliable) - Create `src/` + `tests/` layout - Add `pyproject.toml` with formatting/lint/test tooling - Add `.env.example` + settings loader - Add `README` update: how to run tests, configure DB ### PR 2 — Database + schema (SQLAlchemy + Alembic) - SQLAlchemy models for: - `officials` - `securities` - `trades` - `prices` - `metrics_trade` (derived metrics per trade) - `metrics_official` (aggregates) - Alembic migration + SQLite dev default - Tests: model constraints + simple insert/query smoke tests ### PR 3 — API client: Congress trade disclosures (one source) - Implement a small client module (requests/httpx) - Add retry/backoff + basic rate limiting - Normalize raw payloads → internal dataclasses/pydantic models - Tests: unit tests with mocked HTTP responses ### PR 4 — ETL: upsert officials/securities/trades - Idempotent ETL job: - fetch recent disclosures - normalize - upsert into DB - Logging of counts (new/updated/skipped) - Tests: idempotency and upsert behavior with SQLite ### PR 5 — Price loader (daily bars) - Given tickers + date range: fetch prices (e.g., yfinance) and upsert - Basic caching: - don’t refetch days already present unless forced - fetch missing ranges only - Tests: caching behavior (mock provider) ### PR 6 — Metrics + first “research signals” (non-advice) - Compute per-trade: - forward returns (1M/3M/6M) - benchmark returns (SPY) and abnormal returns - Store to `metrics_trade` - Aggregate to `metrics_official` - Add **transparent flags** (examples): - `watch_large_trade`: above configurable value range threshold - `watch_fast_filing_gap`: long or suspicious filing gaps (descriptive) - `watch_sensitive_sector`: sector in a configurable list (research-only heuristic) - Tests: deterministic calculations on synthetic price series ### PR 7 — CLI / query helpers (research workflow) - CLI commands: - “show trades for official” - “top officials by average abnormal return (with sample size)” - “sector interest trend” - All outputs include: **“research only, not investment advice”** ## Key MVP decisions (defaults) - **DB**: SQLite by default for dev; Postgres supported via env. - **Time**: store all dates in ISO format; use timezone-aware datetimes where needed. - **Idempotency**: every ingestion and metric step can be re-run safely. - **Reproducibility**: record data source and raw identifiers for traceability.