Jobber/jobspy.md at a0220df17fb00e401f28cb21756fcf8ab3c5e246

Shaheer Sarfaraz d34a9f041b

* feat(hiringcafe): register new source across shared/server/client enums

* feat(hiringcafe-extractor): add browser-backed Hiring Cafe dataset extractor

* feat(orchestrator): integrate Hiring Cafe discovery service into pipeline

* feat(orchestrator-ui): add Hiring Cafe to source availability and run estimates

* chore(hiringcafe): wire CI/docker and add extractor documentation

* chore(format): apply biome formatting for Hiring Cafe integration

* add original websites

* coomints

* number or null

2026-02-19 12:51:55 +00:00

1.5 KiB

Raw Blame History

id, title, description, sidebar_position

id	title	description	sidebar_position
jobspy	JobSpy Extractor	How the JobSpy Python wrapper is orchestrated and normalized.	3

A walkthrough of the JobSpy extractor for Indeed, LinkedIn, and Glassdoor.

Original websites:

Big picture

JobSpy runs as a Python script per search term, writes JSON, then orchestrator ingests and normalizes into internal job shape.

1) Inputs and defaults

Key environment variables:

JOBSPY_SITES (default: indeed,linkedin)
JOBSPY_SEARCH_TERM (default: web developer)
JOBSPY_LOCATION (default: UK)
JOBSPY_RESULTS_WANTED (default: 200)
JOBSPY_HOURS_OLD (default: 72)
JOBSPY_COUNTRY_INDEED (default: UK)
JOBSPY_LINKEDIN_FETCH_DESCRIPTION (default: true)

2) Orchestrator flow

The service in orchestrator/src/server/services/jobspy.ts:

Builds search-term list from UI or env
Runs Python once per term with unique output file
Reads JSON and maps to CreateJobInput
De-dupes by jobUrl
Deletes temp output files best-effort

3) Mapping and cleanup

Normalizes salary ranges
Converts empty values to null
Keeps metadata like skills, ratings, remote flags when available
Skips rows with invalid site or missing URL

Notes

JOBSPY_SEARCH_TERMS can be JSON array or |, comma, newline-delimited text.
Set JOBSPY_LINKEDIN_FETCH_DESCRIPTION=0 to speed runs.
Temp output files are stored under data/imports/.

1.5 KiB Raw Blame History