- PR1: Project scaffold, DB models, price loader - PR2: Congressional trade ingestion (House Stock Watcher) - PR3: Security enrichment + deployment infrastructure - 37 passing tests, 87%+ coverage - Docker + Proxmox deployment ready - Complete documentation - Works 100% offline with fixtures
227 lines
7.1 KiB
Markdown
227 lines
7.1 KiB
Markdown
# Free Testing: Data Sources & Sample Data Strategies
|
|
|
|
## Your Question: "How can we test for free?"
|
|
|
|
Great question! Here are multiple strategies for testing the full pipeline **without paid API keys**:
|
|
|
|
---
|
|
|
|
## Strategy 1: Mock/Fixture Data (Current Approach ✅)
|
|
|
|
**What we already have:**
|
|
- `tests/conftest.py` creates in-memory SQLite DB with sample officials, securities, trades
|
|
- Unit tests use mocked `yfinance` responses (see `test_price_loader.py`)
|
|
- **Cost**: $0
|
|
- **Coverage**: Models, DB logic, ETL transforms, analytics calculations
|
|
|
|
**Pros**: Fast, deterministic, no network, tests edge cases
|
|
**Cons**: Doesn't validate real API behavior or data quality
|
|
|
|
---
|
|
|
|
## Strategy 2: Free Public Congressional Trade Data
|
|
|
|
### Option A: **House Stock Watcher** (Community Project)
|
|
- **URL**: https://housestockwatcher.com/
|
|
- **Format**: Web scraping (no official API, but RSS feed available)
|
|
- **Data**: Real-time congressional trades (House & Senate)
|
|
- **License**: Public domain (scraped from official disclosures)
|
|
- **Cost**: $0
|
|
- **How to use**:
|
|
1. Scrape the RSS feed or JSON data from their GitHub repo
|
|
2. Parse into our `trades` schema
|
|
3. Use as integration test fixture
|
|
|
|
**Example**:
|
|
```python
|
|
# They have a JSON API endpoint (unofficial but free)
|
|
import httpx
|
|
resp = httpx.get("https://housestockwatcher.com/api/all_transactions")
|
|
trades = resp.json()
|
|
```
|
|
|
|
### Option B: **Senate Stock Watcher** API
|
|
- **URL**: https://senatestockwatcher.com/
|
|
- Similar to House Stock Watcher, community-maintained
|
|
- Free JSON endpoints
|
|
|
|
### Option C: **Official Senate eFD** (Electronic Financial Disclosures)
|
|
- **URL**: https://efdsearch.senate.gov/search/
|
|
- **Format**: Web forms (no API, requires scraping)
|
|
- **Cost**: $0, but requires building a scraper
|
|
- **Data**: Official Senate disclosures (PTRs)
|
|
|
|
### Option D: **Quiver Quantitative Free Tier**
|
|
- **URL**: https://www.quiverquant.com/
|
|
- **Free tier**: 500 API calls/month (limited but usable for testing)
|
|
- **Signup**: Email + API key (free)
|
|
- **Data**: Congress, Senate, House trades + insider trades
|
|
- **Docs**: https://api.quiverquant.com/docs
|
|
|
|
**Integration test example**:
|
|
```python
|
|
# Set QUIVERQUANT_API_KEY in .env for integration tests
|
|
@pytest.mark.integration
|
|
@pytest.mark.skipif(not os.getenv("QUIVERQUANT_API_KEY"), reason="No API key")
|
|
def test_quiver_live_fetch():
|
|
client = QuiverClient(api_key=os.getenv("QUIVERQUANT_API_KEY"))
|
|
trades = client.fetch_recent_trades(limit=10)
|
|
assert len(trades) > 0
|
|
```
|
|
|
|
---
|
|
|
|
## Strategy 3: Use Sample/Historical Datasets
|
|
|
|
### Option A: **Pre-downloaded CSV Snapshots**
|
|
1. Manually download 1-2 weeks of data from House/Senate Stock Watcher
|
|
2. Store in `tests/fixtures/sample_trades.csv`
|
|
3. Load in integration tests
|
|
|
|
**Example**:
|
|
```python
|
|
import pandas as pd
|
|
from pathlib import Path
|
|
|
|
def test_etl_with_real_data():
|
|
csv_path = Path(__file__).parent / "fixtures" / "sample_trades.csv"
|
|
df = pd.read_csv(csv_path)
|
|
# Run ETL pipeline
|
|
loader = TradeLoader(session)
|
|
loader.ingest_trades(df)
|
|
# Assert trades were stored correctly
|
|
```
|
|
|
|
### Option B: **Kaggle Datasets**
|
|
- Search for "congressional stock trades" on Kaggle
|
|
- Example: https://www.kaggle.com/datasets (check for recent uploads)
|
|
- Download CSV, store in `tests/fixtures/`
|
|
|
|
---
|
|
|
|
## Strategy 4: Hybrid Testing (Recommended 🌟)
|
|
|
|
**Combine all strategies**:
|
|
|
|
1. **Unit tests** (fast, always run):
|
|
- Use mocked data for models, ETL, analytics
|
|
- `pytest tests/` (current setup)
|
|
|
|
2. **Integration tests** (optional, gated by env var):
|
|
```python
|
|
@pytest.mark.integration
|
|
@pytest.mark.skipif(not os.getenv("ENABLE_LIVE_TESTS"), reason="Skipping live tests")
|
|
def test_live_quiver_api():
|
|
# Hits real Quiver API (free tier)
|
|
pass
|
|
```
|
|
|
|
3. **Fixture-based tests** (real data shape, no network):
|
|
- Store 100 real trades in `tests/fixtures/sample_trades.json`
|
|
- Test ETL, analytics, edge cases
|
|
|
|
4. **Manual smoke tests** (dev only):
|
|
- `python scripts/fetch_sample_prices.py` (uses yfinance, free)
|
|
- `python scripts/ingest_house_watcher.py` (once we build it)
|
|
|
|
---
|
|
|
|
## Recommended Next Steps
|
|
|
|
### For PR2 (Congress Trade Ingestion):
|
|
1. **Build a House Stock Watcher scraper** (free, no API key needed)
|
|
- Module: `src/pote/ingestion/house_watcher.py`
|
|
- Scrape their RSS or JSON endpoint
|
|
- Parse into `Trade` model
|
|
- Store 100 sample trades in `tests/fixtures/`
|
|
|
|
2. **Add integration test marker**:
|
|
```toml
|
|
# pyproject.toml
|
|
[tool.pytest.ini_options]
|
|
markers = [
|
|
"integration: marks tests as integration tests (require DB/network)",
|
|
"slow: marks tests as slow",
|
|
"live: requires external API/network (use --live flag)",
|
|
]
|
|
```
|
|
|
|
3. **Make PR2 testable without paid APIs**:
|
|
```bash
|
|
# Unit tests (always pass, use mocks)
|
|
pytest tests/ -m "not integration"
|
|
|
|
# Integration tests (optional, use fixtures or free APIs)
|
|
pytest tests/ -m integration
|
|
|
|
# Live tests (only if you have API keys)
|
|
QUIVERQUANT_API_KEY=xxx pytest tests/ -m live
|
|
```
|
|
|
|
---
|
|
|
|
## Cost Comparison
|
|
|
|
| Source | Free Tier | Paid Tier | Best For |
|
|
|--------|-----------|-----------|----------|
|
|
| **yfinance** | Unlimited | N/A | Prices (already working ✅) |
|
|
| **House Stock Watcher** | Unlimited scraping | N/A | Free trades (best option) |
|
|
| **Quiver Free** | 500 calls/mo | $30/mo (5k calls) | Testing, not production |
|
|
| **FMP Free** | 250 calls/day | $15/mo | Alternative for trades |
|
|
| **Mock data** | ∞ | N/A | Unit tests |
|
|
|
|
---
|
|
|
|
## Bottom Line
|
|
|
|
**You can build and test the entire system for $0** by:
|
|
1. Using **House/Senate Stock Watcher** for real trade data (free, unlimited)
|
|
2. Using **yfinance** for prices (already working)
|
|
3. Storing **fixture snapshots** for regression tests
|
|
4. Optionally using **Quiver free tier** (500 calls/mo) for validation
|
|
|
|
**No paid API required until you want:**
|
|
- Production-grade rate limits
|
|
- Historical data beyond 1-2 years
|
|
- Official support/SLAs
|
|
|
|
---
|
|
|
|
## Example: Building a Free Trade Scraper (PR2)
|
|
|
|
```python
|
|
# src/pote/ingestion/house_watcher.py
|
|
import httpx
|
|
from datetime import date
|
|
|
|
class HouseWatcherClient:
|
|
"""Free congressional trade scraper."""
|
|
|
|
BASE_URL = "https://housestockwatcher.com"
|
|
|
|
def fetch_recent_trades(self, days: int = 7) -> list[dict]:
|
|
"""Scrape recent trades (free, no API key)."""
|
|
resp = httpx.get(f"{self.BASE_URL}/api/all_transactions")
|
|
resp.raise_for_status()
|
|
|
|
trades = resp.json()
|
|
# Filter to last N days, normalize to our schema
|
|
return [self._normalize(t) for t in trades[:100]]
|
|
|
|
def _normalize(self, raw: dict) -> dict:
|
|
"""Convert HouseWatcher format to our Trade schema."""
|
|
return {
|
|
"official_name": raw["representative"],
|
|
"ticker": raw["ticker"],
|
|
"transaction_date": raw["transaction_date"],
|
|
"filing_date": raw["disclosure_date"],
|
|
"side": "buy" if "Purchase" in raw["type"] else "sell",
|
|
"value_min": raw.get("amount_min"),
|
|
"value_max": raw.get("amount_max"),
|
|
"source": "house_watcher",
|
|
}
|
|
```
|
|
|
|
Let me know if you want me to implement this scraper now for PR2! 🚀
|
|
|