- PR1: Project scaffold, DB models, price loader - PR2: Congressional trade ingestion (House Stock Watcher) - PR3: Security enrichment + deployment infrastructure - 37 passing tests, 87%+ coverage - Docker + Proxmox deployment ready - Complete documentation - Works 100% offline with fixtures
103 lines
3.3 KiB
Markdown
103 lines
3.3 KiB
Markdown
# Data model (normalized schema sketch)
|
|
|
|
This is the Phase 1 target schema. Exact fields may vary slightly by available source data; the goal is to keep raw ingestion **traceable** and analytics **reproducible**.
|
|
|
|
## Core tables
|
|
|
|
### `officials`
|
|
Represents an individual official (starting with U.S. Congress).
|
|
|
|
Suggested fields:
|
|
- `id` (PK)
|
|
- `name` (string)
|
|
- `chamber` (enum-like string: House/Senate/Unknown)
|
|
- `party` (string, nullable)
|
|
- `state` (string, nullable)
|
|
- `identifiers` (JSON) — e.g., bioguide ID, source-specific IDs
|
|
- `created_at`, `updated_at`
|
|
|
|
### `securities`
|
|
Represents a traded instrument.
|
|
|
|
Suggested fields:
|
|
- `id` (PK)
|
|
- `ticker` (string, indexed, nullable) — some disclosures may be missing ticker
|
|
- `name` (string, nullable)
|
|
- `exchange` (string, nullable)
|
|
- `sector` (string, nullable)
|
|
- `identifiers` (JSON) — ISIN, CUSIP, etc (when available)
|
|
- `created_at`, `updated_at`
|
|
|
|
### `trades`
|
|
One disclosed transaction record.
|
|
|
|
Suggested fields:
|
|
- `id` (PK)
|
|
- `official_id` (FK → `officials.id`)
|
|
- `security_id` (FK → `securities.id`)
|
|
- `source` (string) — e.g., `quiver`, `fmp`, `house_disclosure`
|
|
- `source_trade_id` (string, nullable) — unique if provided
|
|
- `transaction_date` (date, nullable if unknown)
|
|
- `filing_date` (date, nullable)
|
|
- `side` (enum-like string: BUY/SELL/EXCHANGE/UNKNOWN)
|
|
- `value_range_low` (numeric, nullable)
|
|
- `value_range_high` (numeric, nullable)
|
|
- `amount` (numeric, nullable) — shares/contracts if available
|
|
- `currency` (string, default USD)
|
|
- `quality_flags` (JSON) — parse warnings, missing fields, etc
|
|
- `raw` (JSON) — optional: raw payload snapshot for traceability
|
|
- `created_at`, `updated_at`
|
|
|
|
Uniqueness strategy (typical):
|
|
- unique constraint on (`source`, `source_trade_id`) when `source_trade_id` exists
|
|
- otherwise a best-effort dedupe key (official, security, transaction_date, side, value_range_high, filing_date)
|
|
|
|
### `prices`
|
|
Daily OHLCV for a ticker.
|
|
|
|
Suggested fields:
|
|
- `id` (PK) or composite key
|
|
- `ticker` (string, indexed)
|
|
- `date` (date, indexed)
|
|
- `open`, `high`, `low`, `close` (numeric)
|
|
- `adj_close` (numeric, nullable)
|
|
- `volume` (bigint, nullable)
|
|
- `source` (string) — e.g., `yfinance`
|
|
- `created_at`, `updated_at`
|
|
|
|
Unique constraint:
|
|
- (`ticker`, `date`, `source`)
|
|
|
|
## Derived tables
|
|
|
|
### `metrics_trade`
|
|
Per-trade derived analytics (computed after prices are loaded).
|
|
|
|
Suggested fields:
|
|
- `id` (PK)
|
|
- `trade_id` (FK → `trades.id`, unique)
|
|
- forward returns: `ret_1m`, `ret_3m`, `ret_6m`
|
|
- benchmark returns: `bm_ret_1m`, `bm_ret_3m`, `bm_ret_6m`
|
|
- abnormal returns: `abret_1m`, `abret_3m`, `abret_6m`
|
|
- `calc_version` (string) — allows recomputation while tracking methodology
|
|
- `created_at`, `updated_at`
|
|
|
|
### `metrics_official`
|
|
Aggregate metrics per official.
|
|
|
|
Suggested fields:
|
|
- `id` (PK)
|
|
- `official_id` (FK → `officials.id`, unique)
|
|
- `n_trades`, `n_buys`, `n_sells`
|
|
- average/median abnormal returns for buys (by window) + sample sizes
|
|
- `cluster_label` (nullable)
|
|
- `flags` (JSON) — descriptive risk/ethics flags + supporting metrics
|
|
- `calc_version`
|
|
- `created_at`, `updated_at`
|
|
|
|
## Notes on time and lags
|
|
- Disclosures often have a filing delay; keep **both** `transaction_date` and `filing_date`.
|
|
- When doing “event windows”, prefer windows relative to `transaction_date`, but also compute/record **disclosure lag** as a descriptive attribute.
|
|
|
|
|