- PR1: Project scaffold, DB models, price loader - PR2: Congressional trade ingestion (House Stock Watcher) - PR3: Security enrichment + deployment infrastructure - 37 passing tests, 87%+ coverage - Docker + Proxmox deployment ready - Complete documentation - Works 100% offline with fixtures
54 lines
2.4 KiB
Markdown
54 lines
2.4 KiB
Markdown
# Data sources (public) + limitations
|
|
|
|
POTE only uses **lawfully available public data**. This project is for **private research** and produces **descriptive analytics** (not investment advice).
|
|
|
|
## Candidate sources (Phase 1)
|
|
|
|
### U.S. Congress trading disclosures
|
|
- **QuiverQuant (API)**: provides congressional trading data (availability depends on plan/keys).
|
|
- **Financial Modeling Prep (FMP)**: provides endpoints related to congressional trading and other market metadata (availability depends on plan/keys).
|
|
- **Official disclosure sources** (future): House/Senate disclosure filings where accessible and lawful to process.
|
|
|
|
POTE will treat source data as “best effort” and store:
|
|
- `source` (where it came from)
|
|
- `source_trade_id` (if provided)
|
|
- `raw` payload snapshot (optional, for traceability)
|
|
- `quality_flags` describing parse/coverage issues
|
|
|
|
### Daily price data
|
|
- **yfinance** (Yahoo finance wrapper) for daily OHLCV (research use; subject to availability and terms).
|
|
- Alternative provider adapters can be added later (e.g., Stooq, AlphaVantage, Polygon, etc. as configured by the user).
|
|
|
|
## Known limitations / pitfalls
|
|
|
|
### Disclosure quality and ambiguity
|
|
- **Tickers may be missing or wrong**; some disclosures list company names only or broad funds.
|
|
- Transactions may be **value ranges** rather than exact amounts.
|
|
- Some entries may reflect **family accounts** or managed accounts depending on disclosure details.
|
|
- Duplicate records can occur across sources; deduplication is probabilistic when no unique ID exists.
|
|
|
|
### Timing and “lag”
|
|
- Trades are often disclosed **after** the transaction date. Any analysis must account for:
|
|
- transaction date
|
|
- filing date
|
|
- **disclosure lag** (filing - transaction)
|
|
|
|
### Survivorship / coverage
|
|
- Some data providers may have incomplete histories or change coverage over time.
|
|
- Price history may be missing for delisted tickers or corporate actions.
|
|
|
|
### Interpretation risks
|
|
- Correlation is not causation; return outcomes do not imply intent or information access.
|
|
- High abnormal returns can occur by chance; small samples are especially noisy.
|
|
|
|
## Source governance in this repo
|
|
- No scraping that violates terms or access controls.
|
|
- No bypassing paywalls, authentication, or restrictions.
|
|
- When adding a new source, document:
|
|
- endpoint/coverage
|
|
- required API keys / limits
|
|
- normalization mapping to the internal schema
|
|
- known quirks
|
|
|
|
|