# Data sources (public) + limitations

POTE only uses **lawfully available public data**. This project is for **private research** and produces **descriptive analytics** (not investment advice).

## Candidate sources (Phase 1)

### U.S. Congress trading disclosures
- **QuiverQuant (API)**: provides congressional trading data (availability depends on plan/keys).
- **Financial Modeling Prep (FMP)**: provides endpoints related to congressional trading and other market metadata (availability depends on plan/keys).
- **Official disclosure sources** (future): House/Senate disclosure filings where accessible and lawful to process.

POTE will treat source data as “best effort” and store:
- `source` (where it came from)
- `source_trade_id` (if provided)
- `raw` payload snapshot (optional, for traceability)
- `quality_flags` describing parse/coverage issues

### Daily price data
- **yfinance** (Yahoo finance wrapper) for daily OHLCV (research use; subject to availability and terms).
- Alternative provider adapters can be added later (e.g., Stooq, AlphaVantage, Polygon, etc. as configured by the user).

## Known limitations / pitfalls

### Disclosure quality and ambiguity
- **Tickers may be missing or wrong**; some disclosures list company names only or broad funds.
- Transactions may be **value ranges** rather than exact amounts.
- Some entries may reflect **family accounts** or managed accounts depending on disclosure details.
- Duplicate records can occur across sources; deduplication is probabilistic when no unique ID exists.

### Timing and “lag”
- Trades are often disclosed **after** the transaction date. Any analysis must account for:
  - transaction date
  - filing date
  - **disclosure lag** (filing - transaction)

### Survivorship / coverage
- Some data providers may have incomplete histories or change coverage over time.
- Price history may be missing for delisted tickers or corporate actions.

### Interpretation risks
- Correlation is not causation; return outcomes do not imply intent or information access.
- High abnormal returns can occur by chance; small samples are especially noisy.

## Source governance in this repo
- No scraping that violates terms or access controls.
- No bypassing paywalls, authentication, or restrictions.
- When adding a new source, document:
  - endpoint/coverage
  - required API keys / limits
  - normalization mapping to the internal schema
  - known quirks