ilia/POTE

ilia 204cd0e75b Initial commit: POTE Phase 1 complete

- PR1: Project scaffold, DB models, price loader
- PR2: Congressional trade ingestion (House Stock Watcher)
- PR3: Security enrichment + deployment infrastructure
- 37 passing tests, 87%+ coverage
- Docker + Proxmox deployment ready
- Complete documentation
- Works 100% offline with fixtures

2025-12-14 20:45:34 -05:00

7.1 KiB

Raw Permalink Blame History

Free Testing: Data Sources & Sample Data Strategies

Your Question: "How can we test for free?"

Great question! Here are multiple strategies for testing the full pipeline without paid API keys:

Strategy 1: Mock/Fixture Data (Current Approach ✅)

What we already have:

tests/conftest.py creates in-memory SQLite DB with sample officials, securities, trades
Unit tests use mocked yfinance responses (see test_price_loader.py)
Cost: $0
Coverage: Models, DB logic, ETL transforms, analytics calculations

Pros: Fast, deterministic, no network, tests edge cases
Cons: Doesn't validate real API behavior or data quality

Strategy 2: Free Public Congressional Trade Data

Option A: House Stock Watcher (Community Project)

URL: https://housestockwatcher.com/
Format: Web scraping (no official API, but RSS feed available)
Data: Real-time congressional trades (House & Senate)
License: Public domain (scraped from official disclosures)
Cost: $0
How to use:
1. Scrape the RSS feed or JSON data from their GitHub repo
2. Parse into our trades schema
3. Use as integration test fixture

Example:

# They have a JSON API endpoint (unofficial but free)
import httpx
resp = httpx.get("https://housestockwatcher.com/api/all_transactions")
trades = resp.json()

Option B: Senate Stock Watcher API

URL: https://senatestockwatcher.com/
Similar to House Stock Watcher, community-maintained
Free JSON endpoints

Option C: Official Senate eFD (Electronic Financial Disclosures)

URL: https://efdsearch.senate.gov/search/
Format: Web forms (no API, requires scraping)
Cost: $0, but requires building a scraper
Data: Official Senate disclosures (PTRs)

Option D: Quiver Quantitative Free Tier

URL: https://www.quiverquant.com/
Free tier: 500 API calls/month (limited but usable for testing)
Signup: Email + API key (free)
Data: Congress, Senate, House trades + insider trades
Docs: https://api.quiverquant.com/docs

Integration test example:

# Set QUIVERQUANT_API_KEY in .env for integration tests
@pytest.mark.integration
@pytest.mark.skipif(not os.getenv("QUIVERQUANT_API_KEY"), reason="No API key")
def test_quiver_live_fetch():
    client = QuiverClient(api_key=os.getenv("QUIVERQUANT_API_KEY"))
    trades = client.fetch_recent_trades(limit=10)
    assert len(trades) > 0

Strategy 3: Use Sample/Historical Datasets

Option A: Pre-downloaded CSV Snapshots

Manually download 1-2 weeks of data from House/Senate Stock Watcher
Store in tests/fixtures/sample_trades.csv
Load in integration tests

Example:

import pandas as pd
from pathlib import Path

def test_etl_with_real_data():
    csv_path = Path(__file__).parent / "fixtures" / "sample_trades.csv"
    df = pd.read_csv(csv_path)
    # Run ETL pipeline
    loader = TradeLoader(session)
    loader.ingest_trades(df)
    # Assert trades were stored correctly

Option B: Kaggle Datasets

Search for "congressional stock trades" on Kaggle
Example: https://www.kaggle.com/datasets (check for recent uploads)
Download CSV, store in tests/fixtures/

Strategy 4: Hybrid Testing (Recommended 🌟)

Combine all strategies:

Unit tests (fast, always run):
- Use mocked data for models, ETL, analytics
- pytest tests/ (current setup)

Integration tests (optional, gated by env var):

@pytest.mark.integration
@pytest.mark.skipif(not os.getenv("ENABLE_LIVE_TESTS"), reason="Skipping live tests")
def test_live_quiver_api():
    # Hits real Quiver API (free tier)
    pass

Fixture-based tests (real data shape, no network):
- Store 100 real trades in tests/fixtures/sample_trades.json
- Test ETL, analytics, edge cases
Manual smoke tests (dev only):
- python scripts/fetch_sample_prices.py (uses yfinance, free)
- python scripts/ingest_house_watcher.py (once we build it)

Recommended Next Steps

For PR2 (Congress Trade Ingestion):

Build a House Stock Watcher scraper (free, no API key needed)
- Module: src/pote/ingestion/house_watcher.py
- Scrape their RSS or JSON endpoint
- Parse into Trade model
- Store 100 sample trades in tests/fixtures/

Add integration test marker:

# pyproject.toml
[tool.pytest.ini_options]
markers = [
    "integration: marks tests as integration tests (require DB/network)",
    "slow: marks tests as slow",
    "live: requires external API/network (use --live flag)",
]

Make PR2 testable without paid APIs:

# Unit tests (always pass, use mocks)
pytest tests/ -m "not integration"

# Integration tests (optional, use fixtures or free APIs)
pytest tests/ -m integration

# Live tests (only if you have API keys)
QUIVERQUANT_API_KEY=xxx pytest tests/ -m live

Cost Comparison

Source	Free Tier	Paid Tier	Best For
yfinance	Unlimited	N/A	Prices (already working ✅)
House Stock Watcher	Unlimited scraping	N/A	Free trades (best option)
Quiver Free	500 calls/mo	$30/mo (5k calls)	Testing, not production
FMP Free	250 calls/day	$15/mo	Alternative for trades
Mock data	∞	N/A	Unit tests

Bottom Line

You can build and test the entire system for $0 by:

Using House/Senate Stock Watcher for real trade data (free, unlimited)
Using yfinance for prices (already working)
Storing fixture snapshots for regression tests
Optionally using Quiver free tier (500 calls/mo) for validation

No paid API required until you want:

Production-grade rate limits
Historical data beyond 1-2 years
Official support/SLAs

Example: Building a Free Trade Scraper (PR2)

# src/pote/ingestion/house_watcher.py
import httpx
from datetime import date

class HouseWatcherClient:
    """Free congressional trade scraper."""
    
    BASE_URL = "https://housestockwatcher.com"
    
    def fetch_recent_trades(self, days: int = 7) -> list[dict]:
        """Scrape recent trades (free, no API key)."""
        resp = httpx.get(f"{self.BASE_URL}/api/all_transactions")
        resp.raise_for_status()
        
        trades = resp.json()
        # Filter to last N days, normalize to our schema
        return [self._normalize(t) for t in trades[:100]]
    
    def _normalize(self, raw: dict) -> dict:
        """Convert HouseWatcher format to our Trade schema."""
        return {
            "official_name": raw["representative"],
            "ticker": raw["ticker"],
            "transaction_date": raw["transaction_date"],
            "filing_date": raw["disclosure_date"],
            "side": "buy" if "Purchase" in raw["type"] else "sell",
            "value_min": raw.get("amount_min"),
            "value_max": raw.get("amount_max"),
            "source": "house_watcher",
        }

Let me know if you want me to implement this scraper now for PR2! 🚀

7.1 KiB Raw Permalink Blame History