Jobber/docs-site/docs/extractors/gradcracker.md
Shaheer Sarfaraz d34a9f041b
Hiring cafe extractor (#192)
* feat(hiringcafe): register new source across shared/server/client enums

* feat(hiringcafe-extractor): add browser-backed Hiring Cafe dataset extractor

* feat(orchestrator): integrate Hiring Cafe discovery service into pipeline

* feat(orchestrator-ui): add Hiring Cafe to source availability and run estimates

* chore(hiringcafe): wire CI/docker and add extractor documentation

* chore(format): apply biome formatting for Hiring Cafe integration

* add original websites

* coomints

* number or null
2026-02-19 12:51:55 +00:00

1.3 KiB

id, title, description, sidebar_position
id title description sidebar_position
gradcracker Gradcracker Extractor How the Gradcracker crawler builds search URLs and extracts jobs. 2

A plain-English walkthrough of the Gradcracker extractor in extractors/gradcracker.

Original website: gradcracker.com

Big picture

The crawler builds search URLs, scrapes listing pages, then opens job details for descriptions and apply URLs.

1) Build search URLs

  • Combines UK regions with role terms.
  • Defaults include roles such as web-development and software-systems.
  • GRADCRACKER_SEARCH_TERMS overrides defaults.

2) Crawl list pages

  • Waits for job cards (article[wire:key]).
  • Extracts title, employer, discipline, deadline, salary, location, degree, start date.
  • Queues job detail pages.

Controls:

  • GRADCRACKER_MAX_JOBS_PER_TERM
  • JOBOPS_SKIP_APPLY_FOR_EXISTING=1
  • JOBOPS_EXISTING_JOB_URLS / JOBOPS_EXISTING_JOB_URLS_FILE

3) Crawl detail pages

  • Waits for .body-content
  • Captures full description text
  • Clicks apply button to resolve final application URL
  • Handles popup and same-tab redirects

4) Progress reporting

Set JOBOPS_EMIT_PROGRESS=1 for structured progress lines consumable by orchestrator UI.

Notes

  • Uses Playwright + Crawlee via Camoufox.
  • Low concurrency and longer timeouts for stability.