POTE/docs/03_data_sources.md
ilia 204cd0e75b Initial commit: POTE Phase 1 complete
- PR1: Project scaffold, DB models, price loader
- PR2: Congressional trade ingestion (House Stock Watcher)
- PR3: Security enrichment + deployment infrastructure
- 37 passing tests, 87%+ coverage
- Docker + Proxmox deployment ready
- Complete documentation
- Works 100% offline with fixtures
2025-12-14 20:45:34 -05:00

2.4 KiB

Data sources (public) + limitations

POTE only uses lawfully available public data. This project is for private research and produces descriptive analytics (not investment advice).

Candidate sources (Phase 1)

U.S. Congress trading disclosures

  • QuiverQuant (API): provides congressional trading data (availability depends on plan/keys).
  • Financial Modeling Prep (FMP): provides endpoints related to congressional trading and other market metadata (availability depends on plan/keys).
  • Official disclosure sources (future): House/Senate disclosure filings where accessible and lawful to process.

POTE will treat source data as “best effort” and store:

  • source (where it came from)
  • source_trade_id (if provided)
  • raw payload snapshot (optional, for traceability)
  • quality_flags describing parse/coverage issues

Daily price data

  • yfinance (Yahoo finance wrapper) for daily OHLCV (research use; subject to availability and terms).
  • Alternative provider adapters can be added later (e.g., Stooq, AlphaVantage, Polygon, etc. as configured by the user).

Known limitations / pitfalls

Disclosure quality and ambiguity

  • Tickers may be missing or wrong; some disclosures list company names only or broad funds.
  • Transactions may be value ranges rather than exact amounts.
  • Some entries may reflect family accounts or managed accounts depending on disclosure details.
  • Duplicate records can occur across sources; deduplication is probabilistic when no unique ID exists.

Timing and “lag”

  • Trades are often disclosed after the transaction date. Any analysis must account for:
    • transaction date
    • filing date
    • disclosure lag (filing - transaction)

Survivorship / coverage

  • Some data providers may have incomplete histories or change coverage over time.
  • Price history may be missing for delisted tickers or corporate actions.

Interpretation risks

  • Correlation is not causation; return outcomes do not imply intent or information access.
  • High abnormal returns can occur by chance; small samples are especially noisy.

Source governance in this repo

  • No scraping that violates terms or access controls.
  • No bypassing paywalls, authentication, or restrictions.
  • When adding a new source, document:
    • endpoint/coverage
    • required API keys / limits
    • normalization mapping to the internal schema
    • known quirks