- PR1: Project scaffold, DB models, price loader - PR2: Congressional trade ingestion (House Stock Watcher) - PR3: Security enrichment + deployment infrastructure - 37 passing tests, 87%+ coverage - Docker + Proxmox deployment ready - Complete documentation - Works 100% offline with fixtures
85 lines
3.6 KiB
Markdown
85 lines
3.6 KiB
Markdown
# MVP (Phase 1) — US Congress prototype
|
||
|
||
This document defines a **minimal viable research system** for ingesting U.S. Congress trade disclosures, storing them in a relational DB, joining to daily price data, and computing a small set of descriptive metrics.
|
||
|
||
## Non-goals (explicit)
|
||
- No trading execution, brokerage integration, alerts for “buy/sell”, or portfolio automation.
|
||
- No claims of insider information.
|
||
- No promises of alpha; all outputs are descriptive analytics with caveats.
|
||
|
||
## MVP definition (what “done” means)
|
||
The MVP is “done” when a researcher can:
|
||
- Ingest recent U.S. Congress trade disclosures from at least **one** public source (e.g., QuiverQuant or FMP) into a DB.
|
||
- Ingest daily prices for traded tickers (e.g., yfinance) into the DB.
|
||
- Run a query/report that shows, for an official and date range:
|
||
- trades (buy/sell, transaction + filing dates, amount/value range when available)
|
||
- post-trade returns over fixed windows (e.g., 1M/3M/6M) and a simple benchmark (e.g., SPY) to produce **abnormal return**
|
||
- Compute and store a small set of **risk/ethics flags** (rule-based, transparent, caveated).
|
||
|
||
## PR-sized rollout plan (sequence)
|
||
|
||
### PR 1 — Project scaffold + tooling (small, boring, reliable)
|
||
- Create `src/` + `tests/` layout
|
||
- Add `pyproject.toml` with formatting/lint/test tooling
|
||
- Add `.env.example` + settings loader
|
||
- Add `README` update: how to run tests, configure DB
|
||
|
||
### PR 2 — Database + schema (SQLAlchemy + Alembic)
|
||
- SQLAlchemy models for:
|
||
- `officials`
|
||
- `securities`
|
||
- `trades`
|
||
- `prices`
|
||
- `metrics_trade` (derived metrics per trade)
|
||
- `metrics_official` (aggregates)
|
||
- Alembic migration + SQLite dev default
|
||
- Tests: model constraints + simple insert/query smoke tests
|
||
|
||
### PR 3 — API client: Congress trade disclosures (one source)
|
||
- Implement a small client module (requests/httpx)
|
||
- Add retry/backoff + basic rate limiting
|
||
- Normalize raw payloads → internal dataclasses/pydantic models
|
||
- Tests: unit tests with mocked HTTP responses
|
||
|
||
### PR 4 — ETL: upsert officials/securities/trades
|
||
- Idempotent ETL job:
|
||
- fetch recent disclosures
|
||
- normalize
|
||
- upsert into DB
|
||
- Logging of counts (new/updated/skipped)
|
||
- Tests: idempotency and upsert behavior with SQLite
|
||
|
||
### PR 5 — Price loader (daily bars)
|
||
- Given tickers + date range: fetch prices (e.g., yfinance) and upsert
|
||
- Basic caching:
|
||
- don’t refetch days already present unless forced
|
||
- fetch missing ranges only
|
||
- Tests: caching behavior (mock provider)
|
||
|
||
### PR 6 — Metrics + first “research signals” (non-advice)
|
||
- Compute per-trade:
|
||
- forward returns (1M/3M/6M)
|
||
- benchmark returns (SPY) and abnormal returns
|
||
- Store to `metrics_trade`
|
||
- Aggregate to `metrics_official`
|
||
- Add **transparent flags** (examples):
|
||
- `watch_large_trade`: above configurable value range threshold
|
||
- `watch_fast_filing_gap`: long or suspicious filing gaps (descriptive)
|
||
- `watch_sensitive_sector`: sector in a configurable list (research-only heuristic)
|
||
- Tests: deterministic calculations on synthetic price series
|
||
|
||
### PR 7 — CLI / query helpers (research workflow)
|
||
- CLI commands:
|
||
- “show trades for official”
|
||
- “top officials by average abnormal return (with sample size)”
|
||
- “sector interest trend”
|
||
- All outputs include: **“research only, not investment advice”**
|
||
|
||
## Key MVP decisions (defaults)
|
||
- **DB**: SQLite by default for dev; Postgres supported via env.
|
||
- **Time**: store all dates in ISO format; use timezone-aware datetimes where needed.
|
||
- **Idempotency**: every ingestion and metric step can be re-run safely.
|
||
- **Reproducibility**: record data source and raw identifiers for traceability.
|
||
|
||
|