ilia 7b3dfb002a
Some checks failed
CI / Linting (Biome) (push) Failing after 36s
CI / Tests (push) Successful in 5m54s
CI / Type Check (adzuna-extractor) (push) Successful in 1m6s
CI / Type Check (gradcracker-extractor) (push) Successful in 1m9s
CI / Type Check (hiringcafe-extractor) (push) Successful in 1m5s
CI / Type Check (orchestrator) (push) Successful in 1m21s
CI / Type Check (startupjobs-extractor) (push) Successful in 1m4s
CI / Type Check (ukvisajobs-extractor) (push) Successful in 1m4s
CI / Documentation (push) Successful in 1m52s
feat(extractors): add 17 job source extractors and cross-source dedup
Adds extractor packages: arbeitnow, ashby, careerjet, fourdayweek,
greenhouse, himalayas, jobicy, jooble, lever, reed, remoteok, remotive,
themuse, usajobs, weworkremotely, workday — each with manifest, package
metadata and README.

Pipeline / shared:
- shared/job-fingerprint: stable hash for cross-source dedup, with tests
- discover-jobs: dedup via fingerprint and richer per-source merging
- jobs repository: fingerprint-aware upsert / lookup
- settings-registry, settings types/routes, demo-defaults: knobs for the
  new sources
- shared extractors index: register the new manifests
- location-support, profiles route: small fixes for the new sources

Tooling:
- scripts/smoke-extractors.ts to sanity-check each source locally
- scripts/jobber-cron-{cherepaha,dobkin}.env.example: per-host cron
  templates (CHANGEME placeholders only)
- .env.example: documented env vars for the new extractors
- .gitignore: ignore extractors/*/storage/ runtime caches (was ukvisajobs only)

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-12 20:17:52 -04:00

13 lines
1.1 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# careerjet-extractor
[Careerjet publisher API v4](https://www.careerjet.com/partners/api/) (`https://search.api.careerjet.net/v4/query`).
## Required configuration
- **`CAREERJET_AFFID`** — Your publisher **API key** (settings key `careerjetAffid`). Used as the Basic auth **username**; password is empty.
- **`CAREERJET_REFERER`** — The `Referer` header Careerjet requires: the full URL of the job-search page on your registered site (e.g. `https://yoursite.com/find-jobs/`).
- **`CAREERJET_USER_IP`** — The `user_ip` query parameter. In the [publisher dashboard](https://www.careerjet.com/partners/), add your **servers outbound IP** (and any dev machine IP) under “Server IP address”; this value should match an allowlisted address.
- **`CAREERJET_USER_AGENT`** (optional) — Override the default `user_agent` param if Careerjet asks for a specific string.
`selectedCountry` maps to `locale_code`; the first `searchCities` token is sent as `location`. Capped per term via `careerjetMaxJobsPerTerm` (default 100). The v4 API allows up to **10** pages per query.