---
id: jobspy
title: JobSpy Extractor
description: How the JobSpy Python wrapper is orchestrated and normalized.
sidebar_position: 3
---

A walkthrough of the JobSpy extractor for Indeed, LinkedIn, and Glassdoor.

Original websites:
- [indeed.com](https://www.indeed.com)
- [linkedin.com/jobs](https://www.linkedin.com/jobs)
- [glassdoor.com](https://www.glassdoor.com)

## Big picture

JobSpy runs as a Python script per search term, writes JSON, then orchestrator ingests and normalizes into internal job shape.

## 1) Inputs and defaults

Key environment variables:

- `JOBSPY_SITES` (default: `indeed,linkedin`)
- `JOBSPY_SEARCH_TERM` (default: `web developer`)
- `JOBSPY_LOCATION` (default: `UK`)
- `JOBSPY_RESULTS_WANTED` (default: `200`)
- `JOBSPY_HOURS_OLD` (default: `72`)
- `JOBSPY_COUNTRY_INDEED` (default: `UK`)
- `JOBSPY_LINKEDIN_FETCH_DESCRIPTION` (default: `true`)

## 2) Orchestrator flow

The service in `orchestrator/src/server/services/jobspy.ts`:

- Builds search-term list from UI or env
- Runs Python once per term with unique output file
- Reads JSON and maps to `CreateJobInput`
- De-dupes by `jobUrl`
- Deletes temp output files best-effort

## 3) Mapping and cleanup

- Normalizes salary ranges
- Converts empty values to null
- Keeps metadata like skills, ratings, remote flags when available
- Skips rows with invalid site or missing URL

## Notes

- `JOBSPY_SEARCH_TERMS` can be JSON array or `|`, comma, newline-delimited text.
- Set `JOBSPY_LINKEDIN_FETCH_DESCRIPTION=0` to speed runs.
- Temp output files are stored under `data/imports/`.