Shaheer Sarfaraz c5c6675f04
feat: add Adzuna extractor with orchestrator integration (#177)
* feat(settings): add adzuna source fields and country compatibility

* feat(discovery): integrate adzuna extractor into pipeline

* feat(client): wire adzuna in source selection and run budgeting

* docs(extractors): add adzuna guide and configuration notes

* chore(workspaces): register adzuna extractor in lockfile

* fix(adzuna): run extractor via npm script instead of npx

* fix(adzuna): execute extractor via node+tsx without shell

* fix(adzuna): prefer npm run start without shell, fallback to tsx

* fix(docker): include adzuna extractor workspace in image

* chore(adzuna): reuse shared type-conversion utilities

* type-check adzuna

* formatting

* deeedooop

* better instructions
2026-02-17 16:49:42 +00:00

2.9 KiB

id, title, description, sidebar_position
id title description sidebar_position
overview Extractors Overview Technical index of supported extractors and how they work. 1

This page helps you choose the right extractor for your run, understand key constraints, and navigate to detailed technical guides.

Extractor chooser

Extractor Best use case Core constraints/dependencies Notable controls Output/behavior notes
Gradcracker UK graduate roles from Gradcracker Crawling stability depends on page structure and anti-bot behavior; tuned for low concurrency GRADCRACKER_SEARCH_TERMS, GRADCRACKER_MAX_JOBS_PER_TERM, JOBOPS_SKIP_APPLY_FOR_EXISTING Scrapes listing metadata, then detail pages and apply URL resolution
JobSpy Multi-source discovery (Indeed, LinkedIn, Glassdoor) Requires Python wrapper execution per term; source availability and quality vary by site/location JOBSPY_SITES, JOBSPY_SEARCH_TERMS, JOBSPY_RESULTS_WANTED, JOBSPY_HOURS_OLD, JOBSPY_LINKEDIN_FETCH_DESCRIPTION Produces JSON per term, then orchestrator normalizes and de-duplicates by jobUrl
Adzuna API-based multi-country discovery with low scraping overhead Requires valid App ID/App Key; country must be in Adzuna-supported list ADZUNA_APP_ID, ADZUNA_APP_KEY, ADZUNA_MAX_JOBS_PER_TERM API pagination to dataset output; orchestrator maps progress and de-duplicates by sourceJobId/jobUrl
UKVisaJobs UK visa sponsorship-focused roles Requires authenticated session and periodic token/cookie refresh UKVISAJOBS_EMAIL, UKVISAJOBS_PASSWORD, UKVISAJOBS_MAX_JOBS, UKVISAJOBS_SEARCH_KEYWORD API pagination + dataset output; orchestrator de-dupes and may fetch missing descriptions
Manual Import One-off jobs not covered by scrapers Inference quality depends on model/provider and input quality; some URLs cannot be fetched reliably App/API endpoints (/api/manual-jobs/infer, /api/manual-jobs/import) Accepts text/HTML/URL, runs inference, then saves and scores job after review

Which extractor should I use?

  • Use JobSpy for broad first-pass sourcing across common boards.
  • Use Adzuna when you want API-first discovery in supported non-UK markets.
  • Use Gradcracker when targeting graduate pipelines in the UK.
  • Use UKVisaJobs for sponsorship-specific UK searches.
  • Use Manual Import when you already have a specific posting and need direct import.

Many runs combine sources: broad discovery first, then manual import for high-priority jobs that scraping misses.