Jobber/overview.md at c5c6675f0494e803ac623f0e6af7124346aa499b

Shaheer Sarfaraz c5c6675f04

feat: add Adzuna extractor with orchestrator integration (#177 )

* feat(settings): add adzuna source fields and country compatibility

* feat(discovery): integrate adzuna extractor into pipeline

* feat(client): wire adzuna in source selection and run budgeting

* docs(extractors): add adzuna guide and configuration notes

* chore(workspaces): register adzuna extractor in lockfile

* fix(adzuna): run extractor via npm script instead of npx

* fix(adzuna): execute extractor via node+tsx without shell

* fix(adzuna): prefer npm run start without shell, fallback to tsx

* fix(docker): include adzuna extractor workspace in image

* chore(adzuna): reuse shared type-conversion utilities

* type-check adzuna

* formatting

* deeedooop

* better instructions

2026-02-17 16:49:42 +00:00

2.9 KiB

Raw Blame History

id, title, description, sidebar_position

id	title	description	sidebar_position
overview	Extractors Overview	Technical index of supported extractors and how they work.	1

This page helps you choose the right extractor for your run, understand key constraints, and navigate to detailed technical guides.

Extractor chooser

Extractor	Best use case	Core constraints/dependencies	Notable controls	Output/behavior notes
Gradcracker	UK graduate roles from Gradcracker	Crawling stability depends on page structure and anti-bot behavior; tuned for low concurrency	`GRADCRACKER_SEARCH_TERMS`, `GRADCRACKER_MAX_JOBS_PER_TERM`, `JOBOPS_SKIP_APPLY_FOR_EXISTING`	Scrapes listing metadata, then detail pages and apply URL resolution
JobSpy	Multi-source discovery (Indeed, LinkedIn, Glassdoor)	Requires Python wrapper execution per term; source availability and quality vary by site/location	`JOBSPY_SITES`, `JOBSPY_SEARCH_TERMS`, `JOBSPY_RESULTS_WANTED`, `JOBSPY_HOURS_OLD`, `JOBSPY_LINKEDIN_FETCH_DESCRIPTION`	Produces JSON per term, then orchestrator normalizes and de-duplicates by `jobUrl`
Adzuna	API-based multi-country discovery with low scraping overhead	Requires valid App ID/App Key; country must be in Adzuna-supported list	`ADZUNA_APP_ID`, `ADZUNA_APP_KEY`, `ADZUNA_MAX_JOBS_PER_TERM`	API pagination to dataset output; orchestrator maps progress and de-duplicates by `sourceJobId`/`jobUrl`
UKVisaJobs	UK visa sponsorship-focused roles	Requires authenticated session and periodic token/cookie refresh	`UKVISAJOBS_EMAIL`, `UKVISAJOBS_PASSWORD`, `UKVISAJOBS_MAX_JOBS`, `UKVISAJOBS_SEARCH_KEYWORD`	API pagination + dataset output; orchestrator de-dupes and may fetch missing descriptions
Manual Import	One-off jobs not covered by scrapers	Inference quality depends on model/provider and input quality; some URLs cannot be fetched reliably	App/API endpoints (`/api/manual-jobs/infer`, `/api/manual-jobs/import`)	Accepts text/HTML/URL, runs inference, then saves and scores job after review

Which extractor should I use?

Use JobSpy for broad first-pass sourcing across common boards.
Use Adzuna when you want API-first discovery in supported non-UK markets.
Use Gradcracker when targeting graduate pipelines in the UK.
Use UKVisaJobs for sponsorship-specific UK searches.
Use Manual Import when you already have a specific posting and need direct import.

Many runs combine sources: broad discovery first, then manual import for high-priority jobs that scraping misses.

2.9 KiB Raw Blame History

Extractor chooser

Which extractor should I use?

Related extractor docs

2.9 KiB

Raw Blame History