- PR1: Project scaffold, DB models, price loader - PR2: Congressional trade ingestion (House Stock Watcher) - PR3: Security enrichment + deployment infrastructure - 37 passing tests, 87%+ coverage - Docker + Proxmox deployment ready - Complete documentation - Works 100% offline with fixtures
7.1 KiB
7.1 KiB
Free Testing: Data Sources & Sample Data Strategies
Your Question: "How can we test for free?"
Great question! Here are multiple strategies for testing the full pipeline without paid API keys:
Strategy 1: Mock/Fixture Data (Current Approach ✅)
What we already have:
tests/conftest.pycreates in-memory SQLite DB with sample officials, securities, trades- Unit tests use mocked
yfinanceresponses (seetest_price_loader.py) - Cost: $0
- Coverage: Models, DB logic, ETL transforms, analytics calculations
Pros: Fast, deterministic, no network, tests edge cases
Cons: Doesn't validate real API behavior or data quality
Strategy 2: Free Public Congressional Trade Data
Option A: House Stock Watcher (Community Project)
- URL: https://housestockwatcher.com/
- Format: Web scraping (no official API, but RSS feed available)
- Data: Real-time congressional trades (House & Senate)
- License: Public domain (scraped from official disclosures)
- Cost: $0
- How to use:
- Scrape the RSS feed or JSON data from their GitHub repo
- Parse into our
tradesschema - Use as integration test fixture
Example:
# They have a JSON API endpoint (unofficial but free)
import httpx
resp = httpx.get("https://housestockwatcher.com/api/all_transactions")
trades = resp.json()
Option B: Senate Stock Watcher API
- URL: https://senatestockwatcher.com/
- Similar to House Stock Watcher, community-maintained
- Free JSON endpoints
Option C: Official Senate eFD (Electronic Financial Disclosures)
- URL: https://efdsearch.senate.gov/search/
- Format: Web forms (no API, requires scraping)
- Cost: $0, but requires building a scraper
- Data: Official Senate disclosures (PTRs)
Option D: Quiver Quantitative Free Tier
- URL: https://www.quiverquant.com/
- Free tier: 500 API calls/month (limited but usable for testing)
- Signup: Email + API key (free)
- Data: Congress, Senate, House trades + insider trades
- Docs: https://api.quiverquant.com/docs
Integration test example:
# Set QUIVERQUANT_API_KEY in .env for integration tests
@pytest.mark.integration
@pytest.mark.skipif(not os.getenv("QUIVERQUANT_API_KEY"), reason="No API key")
def test_quiver_live_fetch():
client = QuiverClient(api_key=os.getenv("QUIVERQUANT_API_KEY"))
trades = client.fetch_recent_trades(limit=10)
assert len(trades) > 0
Strategy 3: Use Sample/Historical Datasets
Option A: Pre-downloaded CSV Snapshots
- Manually download 1-2 weeks of data from House/Senate Stock Watcher
- Store in
tests/fixtures/sample_trades.csv - Load in integration tests
Example:
import pandas as pd
from pathlib import Path
def test_etl_with_real_data():
csv_path = Path(__file__).parent / "fixtures" / "sample_trades.csv"
df = pd.read_csv(csv_path)
# Run ETL pipeline
loader = TradeLoader(session)
loader.ingest_trades(df)
# Assert trades were stored correctly
Option B: Kaggle Datasets
- Search for "congressional stock trades" on Kaggle
- Example: https://www.kaggle.com/datasets (check for recent uploads)
- Download CSV, store in
tests/fixtures/
Strategy 4: Hybrid Testing (Recommended 🌟)
Combine all strategies:
-
Unit tests (fast, always run):
- Use mocked data for models, ETL, analytics
pytest tests/(current setup)
-
Integration tests (optional, gated by env var):
@pytest.mark.integration @pytest.mark.skipif(not os.getenv("ENABLE_LIVE_TESTS"), reason="Skipping live tests") def test_live_quiver_api(): # Hits real Quiver API (free tier) pass -
Fixture-based tests (real data shape, no network):
- Store 100 real trades in
tests/fixtures/sample_trades.json - Test ETL, analytics, edge cases
- Store 100 real trades in
-
Manual smoke tests (dev only):
python scripts/fetch_sample_prices.py(uses yfinance, free)python scripts/ingest_house_watcher.py(once we build it)
Recommended Next Steps
For PR2 (Congress Trade Ingestion):
-
Build a House Stock Watcher scraper (free, no API key needed)
- Module:
src/pote/ingestion/house_watcher.py - Scrape their RSS or JSON endpoint
- Parse into
Trademodel - Store 100 sample trades in
tests/fixtures/
- Module:
-
Add integration test marker:
# pyproject.toml [tool.pytest.ini_options] markers = [ "integration: marks tests as integration tests (require DB/network)", "slow: marks tests as slow", "live: requires external API/network (use --live flag)", ] -
Make PR2 testable without paid APIs:
# Unit tests (always pass, use mocks) pytest tests/ -m "not integration" # Integration tests (optional, use fixtures or free APIs) pytest tests/ -m integration # Live tests (only if you have API keys) QUIVERQUANT_API_KEY=xxx pytest tests/ -m live
Cost Comparison
| Source | Free Tier | Paid Tier | Best For |
|---|---|---|---|
| yfinance | Unlimited | N/A | Prices (already working ✅) |
| House Stock Watcher | Unlimited scraping | N/A | Free trades (best option) |
| Quiver Free | 500 calls/mo | $30/mo (5k calls) | Testing, not production |
| FMP Free | 250 calls/day | $15/mo | Alternative for trades |
| Mock data | ∞ | N/A | Unit tests |
Bottom Line
You can build and test the entire system for $0 by:
- Using House/Senate Stock Watcher for real trade data (free, unlimited)
- Using yfinance for prices (already working)
- Storing fixture snapshots for regression tests
- Optionally using Quiver free tier (500 calls/mo) for validation
No paid API required until you want:
- Production-grade rate limits
- Historical data beyond 1-2 years
- Official support/SLAs
Example: Building a Free Trade Scraper (PR2)
# src/pote/ingestion/house_watcher.py
import httpx
from datetime import date
class HouseWatcherClient:
"""Free congressional trade scraper."""
BASE_URL = "https://housestockwatcher.com"
def fetch_recent_trades(self, days: int = 7) -> list[dict]:
"""Scrape recent trades (free, no API key)."""
resp = httpx.get(f"{self.BASE_URL}/api/all_transactions")
resp.raise_for_status()
trades = resp.json()
# Filter to last N days, normalize to our schema
return [self._normalize(t) for t in trades[:100]]
def _normalize(self, raw: dict) -> dict:
"""Convert HouseWatcher format to our Trade schema."""
return {
"official_name": raw["representative"],
"ticker": raw["ticker"],
"transaction_date": raw["transaction_date"],
"filing_date": raw["disclosure_date"],
"side": "buy" if "Purchase" in raw["type"] else "sell",
"value_min": raw.get("amount_min"),
"value_max": raw.get("amount_max"),
"source": "house_watcher",
}
Let me know if you want me to implement this scraper now for PR2! 🚀