POTE/docs/02_data_model.md
ilia 204cd0e75b Initial commit: POTE Phase 1 complete
- PR1: Project scaffold, DB models, price loader
- PR2: Congressional trade ingestion (House Stock Watcher)
- PR3: Security enrichment + deployment infrastructure
- 37 passing tests, 87%+ coverage
- Docker + Proxmox deployment ready
- Complete documentation
- Works 100% offline with fixtures
2025-12-14 20:45:34 -05:00

3.3 KiB

Data model (normalized schema sketch)

This is the Phase 1 target schema. Exact fields may vary slightly by available source data; the goal is to keep raw ingestion traceable and analytics reproducible.

Core tables

officials

Represents an individual official (starting with U.S. Congress).

Suggested fields:

  • id (PK)
  • name (string)
  • chamber (enum-like string: House/Senate/Unknown)
  • party (string, nullable)
  • state (string, nullable)
  • identifiers (JSON) — e.g., bioguide ID, source-specific IDs
  • created_at, updated_at

securities

Represents a traded instrument.

Suggested fields:

  • id (PK)
  • ticker (string, indexed, nullable) — some disclosures may be missing ticker
  • name (string, nullable)
  • exchange (string, nullable)
  • sector (string, nullable)
  • identifiers (JSON) — ISIN, CUSIP, etc (when available)
  • created_at, updated_at

trades

One disclosed transaction record.

Suggested fields:

  • id (PK)
  • official_id (FK → officials.id)
  • security_id (FK → securities.id)
  • source (string) — e.g., quiver, fmp, house_disclosure
  • source_trade_id (string, nullable) — unique if provided
  • transaction_date (date, nullable if unknown)
  • filing_date (date, nullable)
  • side (enum-like string: BUY/SELL/EXCHANGE/UNKNOWN)
  • value_range_low (numeric, nullable)
  • value_range_high (numeric, nullable)
  • amount (numeric, nullable) — shares/contracts if available
  • currency (string, default USD)
  • quality_flags (JSON) — parse warnings, missing fields, etc
  • raw (JSON) — optional: raw payload snapshot for traceability
  • created_at, updated_at

Uniqueness strategy (typical):

  • unique constraint on (source, source_trade_id) when source_trade_id exists
  • otherwise a best-effort dedupe key (official, security, transaction_date, side, value_range_high, filing_date)

prices

Daily OHLCV for a ticker.

Suggested fields:

  • id (PK) or composite key
  • ticker (string, indexed)
  • date (date, indexed)
  • open, high, low, close (numeric)
  • adj_close (numeric, nullable)
  • volume (bigint, nullable)
  • source (string) — e.g., yfinance
  • created_at, updated_at

Unique constraint:

  • (ticker, date, source)

Derived tables

metrics_trade

Per-trade derived analytics (computed after prices are loaded).

Suggested fields:

  • id (PK)
  • trade_id (FK → trades.id, unique)
  • forward returns: ret_1m, ret_3m, ret_6m
  • benchmark returns: bm_ret_1m, bm_ret_3m, bm_ret_6m
  • abnormal returns: abret_1m, abret_3m, abret_6m
  • calc_version (string) — allows recomputation while tracking methodology
  • created_at, updated_at

metrics_official

Aggregate metrics per official.

Suggested fields:

  • id (PK)
  • official_id (FK → officials.id, unique)
  • n_trades, n_buys, n_sells
  • average/median abnormal returns for buys (by window) + sample sizes
  • cluster_label (nullable)
  • flags (JSON) — descriptive risk/ethics flags + supporting metrics
  • calc_version
  • created_at, updated_at

Notes on time and lags

  • Disclosures often have a filing delay; keep both transaction_date and filing_date.
  • When doing “event windows”, prefer windows relative to transaction_date, but also compute/record disclosure lag as a descriptive attribute.