POTE/docs/00_mvp.md
ilia 204cd0e75b Initial commit: POTE Phase 1 complete
- PR1: Project scaffold, DB models, price loader
- PR2: Congressional trade ingestion (House Stock Watcher)
- PR3: Security enrichment + deployment infrastructure
- 37 passing tests, 87%+ coverage
- Docker + Proxmox deployment ready
- Complete documentation
- Works 100% offline with fixtures
2025-12-14 20:45:34 -05:00

85 lines
3.6 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# MVP (Phase 1) — US Congress prototype
This document defines a **minimal viable research system** for ingesting U.S. Congress trade disclosures, storing them in a relational DB, joining to daily price data, and computing a small set of descriptive metrics.
## Non-goals (explicit)
- No trading execution, brokerage integration, alerts for “buy/sell”, or portfolio automation.
- No claims of insider information.
- No promises of alpha; all outputs are descriptive analytics with caveats.
## MVP definition (what “done” means)
The MVP is “done” when a researcher can:
- Ingest recent U.S. Congress trade disclosures from at least **one** public source (e.g., QuiverQuant or FMP) into a DB.
- Ingest daily prices for traded tickers (e.g., yfinance) into the DB.
- Run a query/report that shows, for an official and date range:
- trades (buy/sell, transaction + filing dates, amount/value range when available)
- post-trade returns over fixed windows (e.g., 1M/3M/6M) and a simple benchmark (e.g., SPY) to produce **abnormal return**
- Compute and store a small set of **risk/ethics flags** (rule-based, transparent, caveated).
## PR-sized rollout plan (sequence)
### PR 1 — Project scaffold + tooling (small, boring, reliable)
- Create `src/` + `tests/` layout
- Add `pyproject.toml` with formatting/lint/test tooling
- Add `.env.example` + settings loader
- Add `README` update: how to run tests, configure DB
### PR 2 — Database + schema (SQLAlchemy + Alembic)
- SQLAlchemy models for:
- `officials`
- `securities`
- `trades`
- `prices`
- `metrics_trade` (derived metrics per trade)
- `metrics_official` (aggregates)
- Alembic migration + SQLite dev default
- Tests: model constraints + simple insert/query smoke tests
### PR 3 — API client: Congress trade disclosures (one source)
- Implement a small client module (requests/httpx)
- Add retry/backoff + basic rate limiting
- Normalize raw payloads → internal dataclasses/pydantic models
- Tests: unit tests with mocked HTTP responses
### PR 4 — ETL: upsert officials/securities/trades
- Idempotent ETL job:
- fetch recent disclosures
- normalize
- upsert into DB
- Logging of counts (new/updated/skipped)
- Tests: idempotency and upsert behavior with SQLite
### PR 5 — Price loader (daily bars)
- Given tickers + date range: fetch prices (e.g., yfinance) and upsert
- Basic caching:
- dont refetch days already present unless forced
- fetch missing ranges only
- Tests: caching behavior (mock provider)
### PR 6 — Metrics + first “research signals” (non-advice)
- Compute per-trade:
- forward returns (1M/3M/6M)
- benchmark returns (SPY) and abnormal returns
- Store to `metrics_trade`
- Aggregate to `metrics_official`
- Add **transparent flags** (examples):
- `watch_large_trade`: above configurable value range threshold
- `watch_fast_filing_gap`: long or suspicious filing gaps (descriptive)
- `watch_sensitive_sector`: sector in a configurable list (research-only heuristic)
- Tests: deterministic calculations on synthetic price series
### PR 7 — CLI / query helpers (research workflow)
- CLI commands:
- “show trades for official”
- “top officials by average abnormal return (with sample size)”
- “sector interest trend”
- All outputs include: **“research only, not investment advice”**
## Key MVP decisions (defaults)
- **DB**: SQLite by default for dev; Postgres supported via env.
- **Time**: store all dates in ISO format; use timezone-aware datetimes where needed.
- **Idempotency**: every ingestion and metric step can be re-run safely.
- **Reproducibility**: record data source and raw identifiers for traceability.