* initial commit? * Address PR feedback on extractor discovery and startup resilience * Address latest PR review comments * fix city resolution fallback when input parses empty * address PR feedback on extractor registry and pipeline validation * address copilot comments on manifests and registry startup * fix extractor discovery export handling and env isolation in tests * enforce duplicate manifest id failures in strict mode * Fix remaining extractor registry and runtime review comments * docs * docs * test all, logic remains in extractors * Address PR review feedback on extractor registry and validation * Revert extractor moduleResolution to bundler * Enforce shared city filtering across all discovery sources * Deduplicate extractor strict city post-filtering
Hiring Cafe Extractor
Browser-backed extractor for Hiring Cafe search APIs.
Special thanks: initial implementation inspiration came from umur957/hiring-cafe-job-scraper.
Environment
HIRING_CAFE_SEARCH_TERMS(JSON array or|/ comma / newline-delimited)HIRING_CAFE_COUNTRY(default:united kingdom)HIRING_CAFE_MAX_JOBS_PER_TERM(default:200)HIRING_CAFE_DATE_FETCHED_PAST_N_DAYS(default:7)HIRING_CAFE_LOCATION_QUERY(optional city, e.g.Leeds)HIRING_CAFE_LOCATION_RADIUS_MILES(default:1when city is set)HIRING_CAFE_OUTPUT_JSON(default:storage/datasets/default/jobs.json)JOBOPS_EMIT_PROGRESS=1to emitJOBOPS_PROGRESSeventsHIRING_CAFE_HEADLESS=falseto run headed
Notes
- The extractor uses
s = base64(url-encoded JSON search state). worldwideandusa/caare treated as broad search modes without hard country location filters.- City geocoding uses Nominatim (OpenStreetMap data).