# Data sources (public) + limitations POTE only uses **lawfully available public data**. This project is for **private research** and produces **descriptive analytics** (not investment advice). ## Candidate sources (Phase 1) ### U.S. Congress trading disclosures - **QuiverQuant (API)**: provides congressional trading data (availability depends on plan/keys). - **Financial Modeling Prep (FMP)**: provides endpoints related to congressional trading and other market metadata (availability depends on plan/keys). - **Official disclosure sources** (future): House/Senate disclosure filings where accessible and lawful to process. POTE will treat source data as “best effort” and store: - `source` (where it came from) - `source_trade_id` (if provided) - `raw` payload snapshot (optional, for traceability) - `quality_flags` describing parse/coverage issues ### Daily price data - **yfinance** (Yahoo finance wrapper) for daily OHLCV (research use; subject to availability and terms). - Alternative provider adapters can be added later (e.g., Stooq, AlphaVantage, Polygon, etc. as configured by the user). ## Known limitations / pitfalls ### Disclosure quality and ambiguity - **Tickers may be missing or wrong**; some disclosures list company names only or broad funds. - Transactions may be **value ranges** rather than exact amounts. - Some entries may reflect **family accounts** or managed accounts depending on disclosure details. - Duplicate records can occur across sources; deduplication is probabilistic when no unique ID exists. ### Timing and “lag” - Trades are often disclosed **after** the transaction date. Any analysis must account for: - transaction date - filing date - **disclosure lag** (filing - transaction) ### Survivorship / coverage - Some data providers may have incomplete histories or change coverage over time. - Price history may be missing for delisted tickers or corporate actions. ### Interpretation risks - Correlation is not causation; return outcomes do not imply intent or information access. - High abnormal returns can occur by chance; small samples are especially noisy. ## Source governance in this repo - No scraping that violates terms or access controls. - No bypassing paywalls, authentication, or restrictions. - When adding a new source, document: - endpoint/coverage - required API keys / limits - normalization mapping to the internal schema - known quirks