Jobber/gradcracker.md at f3c164d2522f1a7fa4a107cb317c995f4a3969f4

Shaheer Sarfaraz d34a9f041b

* feat(hiringcafe): register new source across shared/server/client enums

* feat(hiringcafe-extractor): add browser-backed Hiring Cafe dataset extractor

* feat(orchestrator): integrate Hiring Cafe discovery service into pipeline

* feat(orchestrator-ui): add Hiring Cafe to source availability and run estimates

* chore(hiringcafe): wire CI/docker and add extractor documentation

* chore(format): apply biome formatting for Hiring Cafe integration

* add original websites

* coomints

* number or null

2026-02-19 12:51:55 +00:00

1.3 KiB

Raw Blame History

id, title, description, sidebar_position

id	title	description	sidebar_position
gradcracker	Gradcracker Extractor	How the Gradcracker crawler builds search URLs and extracts jobs.	2

A plain-English walkthrough of the Gradcracker extractor in extractors/gradcracker.

Original website: gradcracker.com

Big picture

The crawler builds search URLs, scrapes listing pages, then opens job details for descriptions and apply URLs.

1) Build search URLs

Combines UK regions with role terms.
Defaults include roles such as web-development and software-systems.
GRADCRACKER_SEARCH_TERMS overrides defaults.

2) Crawl list pages

Waits for job cards (article[wire:key]).
Extracts title, employer, discipline, deadline, salary, location, degree, start date.
Queues job detail pages.

Controls:

GRADCRACKER_MAX_JOBS_PER_TERM
JOBOPS_SKIP_APPLY_FOR_EXISTING=1
JOBOPS_EXISTING_JOB_URLS / JOBOPS_EXISTING_JOB_URLS_FILE

3) Crawl detail pages

Waits for .body-content
Captures full description text
Clicks apply button to resolve final application URL
Handles popup and same-tab redirects

4) Progress reporting

Set JOBOPS_EMIT_PROGRESS=1 for structured progress lines consumable by orchestrator UI.

Notes

Uses Playwright + Crawlee via Camoufox.
Low concurrency and longer timeouts for stability.

1.3 KiB Raw Blame History