- PR1: Project scaffold, DB models, price loader - PR2: Congressional trade ingestion (House Stock Watcher) - PR3: Security enrichment + deployment infrastructure - 37 passing tests, 87%+ coverage - Docker + Proxmox deployment ready - Complete documentation - Works 100% offline with fixtures
3.3 KiB
3.3 KiB
Data model (normalized schema sketch)
This is the Phase 1 target schema. Exact fields may vary slightly by available source data; the goal is to keep raw ingestion traceable and analytics reproducible.
Core tables
officials
Represents an individual official (starting with U.S. Congress).
Suggested fields:
id(PK)name(string)chamber(enum-like string: House/Senate/Unknown)party(string, nullable)state(string, nullable)identifiers(JSON) — e.g., bioguide ID, source-specific IDscreated_at,updated_at
securities
Represents a traded instrument.
Suggested fields:
id(PK)ticker(string, indexed, nullable) — some disclosures may be missing tickername(string, nullable)exchange(string, nullable)sector(string, nullable)identifiers(JSON) — ISIN, CUSIP, etc (when available)created_at,updated_at
trades
One disclosed transaction record.
Suggested fields:
id(PK)official_id(FK →officials.id)security_id(FK →securities.id)source(string) — e.g.,quiver,fmp,house_disclosuresource_trade_id(string, nullable) — unique if providedtransaction_date(date, nullable if unknown)filing_date(date, nullable)side(enum-like string: BUY/SELL/EXCHANGE/UNKNOWN)value_range_low(numeric, nullable)value_range_high(numeric, nullable)amount(numeric, nullable) — shares/contracts if availablecurrency(string, default USD)quality_flags(JSON) — parse warnings, missing fields, etcraw(JSON) — optional: raw payload snapshot for traceabilitycreated_at,updated_at
Uniqueness strategy (typical):
- unique constraint on (
source,source_trade_id) whensource_trade_idexists - otherwise a best-effort dedupe key (official, security, transaction_date, side, value_range_high, filing_date)
prices
Daily OHLCV for a ticker.
Suggested fields:
id(PK) or composite keyticker(string, indexed)date(date, indexed)open,high,low,close(numeric)adj_close(numeric, nullable)volume(bigint, nullable)source(string) — e.g.,yfinancecreated_at,updated_at
Unique constraint:
- (
ticker,date,source)
Derived tables
metrics_trade
Per-trade derived analytics (computed after prices are loaded).
Suggested fields:
id(PK)trade_id(FK →trades.id, unique)- forward returns:
ret_1m,ret_3m,ret_6m - benchmark returns:
bm_ret_1m,bm_ret_3m,bm_ret_6m - abnormal returns:
abret_1m,abret_3m,abret_6m calc_version(string) — allows recomputation while tracking methodologycreated_at,updated_at
metrics_official
Aggregate metrics per official.
Suggested fields:
id(PK)official_id(FK →officials.id, unique)n_trades,n_buys,n_sells- average/median abnormal returns for buys (by window) + sample sizes
cluster_label(nullable)flags(JSON) — descriptive risk/ethics flags + supporting metricscalc_versioncreated_at,updated_at
Notes on time and lags
- Disclosures often have a filing delay; keep both
transaction_dateandfiling_date. - When doing “event windows”, prefer windows relative to
transaction_date, but also compute/record disclosure lag as a descriptive attribute.