Shaheer Sarfaraz 82e142a8a8
Auto-Registering Extractor System (#223)
* initial commit?

* Address PR feedback on extractor discovery and startup resilience

* Address latest PR review comments

* fix city resolution fallback when input parses empty

* address PR feedback on extractor registry and pipeline validation

* address copilot comments on manifests and registry startup

* fix extractor discovery export handling and env isolation in tests

* enforce duplicate manifest id failures in strict mode

* Fix remaining extractor registry and runtime review comments

* docs

* docs

* test all, logic remains in extractors

* Address PR review feedback on extractor registry and validation

* Revert extractor moduleResolution to bundler

* Enforce shared city filtering across all discovery sources

* Deduplicate extractor strict city post-filtering
2026-02-21 17:44:07 +00:00
..
2026-02-21 00:42:09 +00:00

Adzuna Extractor

Minimal extractor that pulls jobs from Adzuna's search API and writes a dataset for orchestrator ingestion.

Environment

  • ADZUNA_APP_ID (required)
  • ADZUNA_APP_KEY (required)
  • ADZUNA_COUNTRY (default: gb)
  • ADZUNA_SEARCH_TERMS (JSON array or | / comma / newline-delimited)
  • ADZUNA_LOCATION_QUERY (optional city/location text passed to Adzuna where)
  • ADZUNA_MAX_JOBS_PER_TERM (default: 50)
  • ADZUNA_RESULTS_PER_PAGE (default: 50, max 50)
  • ADZUNA_OUTPUT_JSON (default: storage/datasets/default/jobs.json)
  • JOBOPS_EMIT_PROGRESS=1 to emit JOBOPS_PROGRESS events