Jobber/docs-site/docs/extractors/supplementary-sources-access-notes.md
ilia c840f289e1
Some checks failed
CI / Linting (Biome) (push) Failing after 40s
CI / Tests (push) Successful in 5m54s
CI / Type Check (adzuna-extractor) (push) Successful in 1m8s
CI / Type Check (gradcracker-extractor) (push) Successful in 1m11s
CI / Type Check (hiringcafe-extractor) (push) Successful in 1m8s
CI / Type Check (orchestrator) (push) Successful in 1m23s
CI / Type Check (startupjobs-extractor) (push) Successful in 1m6s
CI / Type Check (ukvisajobs-extractor) (push) Successful in 1m7s
CI / Documentation (push) Successful in 1m54s
feat(extractors): expand catalog, smoke coverage, and sourcing docs
Adds Arc.dev, BC T-Net, Eluta, iCIMS tenants, QAJobsBoard, and SmartRecruiters
manifests with registry/settings/UI wiring; registers full extractor list in
smoke-extractors and documents supplementary board access paths. Aligns Careerjet
v4 with the url query parameter and fixes strict typing in QAJobsBoard.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-15 22:36:23 -04:00

74 lines
4.0 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
id: supplementary-sources-access-notes
title: Supplementary sources — access notes
description: Credential gates, sources to skip, and practical alternatives for boards without a native JobOps extractor.
sidebar_position: 15
---
This page captures **verified access paths** and realistic integration effort for boards that are not fully wired as pipeline extractors. Pair it with [Extractors overview](/docs/next/extractors/overview), [Manual Import](/docs/next/extractors/manual), and [Add an Extractor](/docs/next/workflows/add-an-extractor).
## Credential-gated APIs (usually straightforward)
### Careerjet (v4)
- **Sign-up:** [careerjet.com/partners](https://www.careerjet.com/partners) → add your site → Access API → register **server egress IP(s)**.
- **Endpoint:** `https://search.api.careerjet.net/v4/query`
- **Important parameters:** `affid` (publisher key), **`user_ip`** (documented as end-user IP; for headless/server runs use an IP you allowlisted — fraud-checked), **`user_agent`**, **`url`** (referrer URL where results would appear — maps to `CAREERJET_REFERER` / query `url` + `Referer` header). Missing `user_ip` or `user_agent` tends to yield **403**.
- **Tip:** Official Python client: [`careerjet/careerjet-api-client-python`](https://github.com/careerjet/careerjet-api-client-python).
### Reed
- **Sign-up:** [reed.co.uk/developers](https://www.reed.co.uk/developers) — API key is issued via their contact flow (often ~12 business days).
- **Endpoint:** e.g. `https://www.reed.co.uk/api/1.0/search?keywords=...&locationName=...&resultsToTake=100`
- **Auth:** HTTP Basic — username = API key, password empty (`curl -u "YOUR_API_KEY:" ...`).
- **Pagination:** `resultsToTake` max **100** per request; advance with `resultsToSkip`.
- **Scope:** UK-centric; still useful for remote UK employers.
## Usually not worth scraping yourself
### Job Bank (Canada)
XML syndication is **manual approval**: active Canadian Business Number, established Canadian-facing employment site. No simple public JSON/RSS for arbitrary candidates. HTML exists but is heavy JSF / anti-bot — treat as **skip** unless you qualify for the feed.
### Jobboom / Workopolis / BCJobs
No stable public API/RSS documented for generic job discovery. Third-party scrapers often need **residential proxies** and paid runtime — **skip or pay** for a maintained provider.
### Jobillico
Employer-oriented XML/OAuth API (posting and limited pull). Needs a **business account** — not a candidate discovery API.
### MyVisaJobs / H1BGrader
No practical public API for their enriched UX. Alternatives: **DOL OFLC LCA disclosure** quarterly CSVs (public, bulk), then join to your own job corpus; paid marketplace scrapers if you accept cost/compliance tradeoffs. Browser extensions may still be useful **personally**.
### Untapped (Jopwell)
Closed candidate platform — **no public job-posting API** for arbitrary ingestion.
## Practical additions in JobOps
### iCIMS (per-tenant HTML)
Many tenants expose anonymous search HTML suitable for stable scraping patterns, e.g.:
`https://{tenant}.icims.com/jobs/search?ss=1&searchKeyword=…&in_iframe=1`
Pagination often uses `pr=`; job URLs commonly follow `/jobs/{id}/{slug}/job`. Maintain a **tenant host list** (similar to Greenhouse/Lever company lists). This is **not** the authenticated iCIMS Job Portal API.
Shipped extractor: **iCIMS tenants (HTML)** — configure `icimsTenants` (+ caps in Settings).
### BC T-Net RSS
Free aggregate RSS (example): `https://www.bctechnology.com/rss/jobs/tnetjobs.xml` — useful for **BC / Vancouver** tech roles; custom slices via the sites RSS builder.
Shipped extractor: **BC T-Net (RSS)** — Canada geography only; optional `bctenetRssUrls` overrides default feed.
## Related pages
- [Extractors overview](/docs/next/extractors/overview)
- [Eluta](/docs/next/extractors/eluta) (Canada RSS)
- [SmartRecruiters](/docs/next/extractors/smartrecruiters)
- [Canadian companies — QA-strong ATS](/docs/next/extractors/canadian-companies-qa-ats)
- [Manual Import](/docs/next/extractors/manual)