POTE/docs/00_mvp.md
ilia 204cd0e75b Initial commit: POTE Phase 1 complete
- PR1: Project scaffold, DB models, price loader
- PR2: Congressional trade ingestion (House Stock Watcher)
- PR3: Security enrichment + deployment infrastructure
- 37 passing tests, 87%+ coverage
- Docker + Proxmox deployment ready
- Complete documentation
- Works 100% offline with fixtures
2025-12-14 20:45:34 -05:00

3.6 KiB
Raw Blame History

MVP (Phase 1) — US Congress prototype

This document defines a minimal viable research system for ingesting U.S. Congress trade disclosures, storing them in a relational DB, joining to daily price data, and computing a small set of descriptive metrics.

Non-goals (explicit)

  • No trading execution, brokerage integration, alerts for “buy/sell”, or portfolio automation.
  • No claims of insider information.
  • No promises of alpha; all outputs are descriptive analytics with caveats.

MVP definition (what “done” means)

The MVP is “done” when a researcher can:

  • Ingest recent U.S. Congress trade disclosures from at least one public source (e.g., QuiverQuant or FMP) into a DB.
  • Ingest daily prices for traded tickers (e.g., yfinance) into the DB.
  • Run a query/report that shows, for an official and date range:
    • trades (buy/sell, transaction + filing dates, amount/value range when available)
    • post-trade returns over fixed windows (e.g., 1M/3M/6M) and a simple benchmark (e.g., SPY) to produce abnormal return
  • Compute and store a small set of risk/ethics flags (rule-based, transparent, caveated).

PR-sized rollout plan (sequence)

PR 1 — Project scaffold + tooling (small, boring, reliable)

  • Create src/ + tests/ layout
  • Add pyproject.toml with formatting/lint/test tooling
  • Add .env.example + settings loader
  • Add README update: how to run tests, configure DB

PR 2 — Database + schema (SQLAlchemy + Alembic)

  • SQLAlchemy models for:
    • officials
    • securities
    • trades
    • prices
    • metrics_trade (derived metrics per trade)
    • metrics_official (aggregates)
  • Alembic migration + SQLite dev default
  • Tests: model constraints + simple insert/query smoke tests

PR 3 — API client: Congress trade disclosures (one source)

  • Implement a small client module (requests/httpx)
  • Add retry/backoff + basic rate limiting
  • Normalize raw payloads → internal dataclasses/pydantic models
  • Tests: unit tests with mocked HTTP responses

PR 4 — ETL: upsert officials/securities/trades

  • Idempotent ETL job:
    • fetch recent disclosures
    • normalize
    • upsert into DB
  • Logging of counts (new/updated/skipped)
  • Tests: idempotency and upsert behavior with SQLite

PR 5 — Price loader (daily bars)

  • Given tickers + date range: fetch prices (e.g., yfinance) and upsert
  • Basic caching:
    • dont refetch days already present unless forced
    • fetch missing ranges only
  • Tests: caching behavior (mock provider)

PR 6 — Metrics + first “research signals” (non-advice)

  • Compute per-trade:
    • forward returns (1M/3M/6M)
    • benchmark returns (SPY) and abnormal returns
  • Store to metrics_trade
  • Aggregate to metrics_official
  • Add transparent flags (examples):
    • watch_large_trade: above configurable value range threshold
    • watch_fast_filing_gap: long or suspicious filing gaps (descriptive)
    • watch_sensitive_sector: sector in a configurable list (research-only heuristic)
  • Tests: deterministic calculations on synthetic price series

PR 7 — CLI / query helpers (research workflow)

  • CLI commands:
    • “show trades for official”
    • “top officials by average abnormal return (with sample size)”
    • “sector interest trend”
  • All outputs include: “research only, not investment advice”

Key MVP decisions (defaults)

  • DB: SQLite by default for dev; Postgres supported via env.
  • Time: store all dates in ISO format; use timezone-aware datetimes where needed.
  • Idempotency: every ingestion and metric step can be re-run safely.
  • Reproducibility: record data source and raw identifiers for traceability.