Some checks failed
CI / Linting (Biome) (push) Failing after 40s
CI / Tests (push) Successful in 5m54s
CI / Type Check (adzuna-extractor) (push) Successful in 1m8s
CI / Type Check (gradcracker-extractor) (push) Successful in 1m11s
CI / Type Check (hiringcafe-extractor) (push) Successful in 1m8s
CI / Type Check (orchestrator) (push) Successful in 1m23s
CI / Type Check (startupjobs-extractor) (push) Successful in 1m6s
CI / Type Check (ukvisajobs-extractor) (push) Successful in 1m7s
CI / Documentation (push) Successful in 1m54s
Adds Arc.dev, BC T-Net, Eluta, iCIMS tenants, QAJobsBoard, and SmartRecruiters manifests with registry/settings/UI wiring; registers full extractor list in smoke-extractors and documents supplementary board access paths. Aligns Careerjet v4 with the url query parameter and fixes strict typing in QAJobsBoard. Co-authored-by: Cursor <cursoragent@cursor.com>
74 lines
4.0 KiB
Markdown
74 lines
4.0 KiB
Markdown
---
|
||
id: supplementary-sources-access-notes
|
||
title: Supplementary sources — access notes
|
||
description: Credential gates, sources to skip, and practical alternatives for boards without a native JobOps extractor.
|
||
sidebar_position: 15
|
||
---
|
||
|
||
This page captures **verified access paths** and realistic integration effort for boards that are not fully wired as pipeline extractors. Pair it with [Extractors overview](/docs/next/extractors/overview), [Manual Import](/docs/next/extractors/manual), and [Add an Extractor](/docs/next/workflows/add-an-extractor).
|
||
|
||
## Credential-gated APIs (usually straightforward)
|
||
|
||
### Careerjet (v4)
|
||
|
||
- **Sign-up:** [careerjet.com/partners](https://www.careerjet.com/partners) → add your site → Access API → register **server egress IP(s)**.
|
||
- **Endpoint:** `https://search.api.careerjet.net/v4/query`
|
||
- **Important parameters:** `affid` (publisher key), **`user_ip`** (documented as end-user IP; for headless/server runs use an IP you allowlisted — fraud-checked), **`user_agent`**, **`url`** (referrer URL where results would appear — maps to `CAREERJET_REFERER` / query `url` + `Referer` header). Missing `user_ip` or `user_agent` tends to yield **403**.
|
||
- **Tip:** Official Python client: [`careerjet/careerjet-api-client-python`](https://github.com/careerjet/careerjet-api-client-python).
|
||
|
||
### Reed
|
||
|
||
- **Sign-up:** [reed.co.uk/developers](https://www.reed.co.uk/developers) — API key is issued via their contact flow (often ~1–2 business days).
|
||
- **Endpoint:** e.g. `https://www.reed.co.uk/api/1.0/search?keywords=...&locationName=...&resultsToTake=100`
|
||
- **Auth:** HTTP Basic — username = API key, password empty (`curl -u "YOUR_API_KEY:" ...`).
|
||
- **Pagination:** `resultsToTake` max **100** per request; advance with `resultsToSkip`.
|
||
- **Scope:** UK-centric; still useful for remote UK employers.
|
||
|
||
## Usually not worth scraping yourself
|
||
|
||
### Job Bank (Canada)
|
||
|
||
XML syndication is **manual approval**: active Canadian Business Number, established Canadian-facing employment site. No simple public JSON/RSS for arbitrary candidates. HTML exists but is heavy JSF / anti-bot — treat as **skip** unless you qualify for the feed.
|
||
|
||
### Jobboom / Workopolis / BCJobs
|
||
|
||
No stable public API/RSS documented for generic job discovery. Third-party scrapers often need **residential proxies** and paid runtime — **skip or pay** for a maintained provider.
|
||
|
||
### Jobillico
|
||
|
||
Employer-oriented XML/OAuth API (posting and limited pull). Needs a **business account** — not a candidate discovery API.
|
||
|
||
### MyVisaJobs / H1BGrader
|
||
|
||
No practical public API for their enriched UX. Alternatives: **DOL OFLC LCA disclosure** quarterly CSVs (public, bulk), then join to your own job corpus; paid marketplace scrapers if you accept cost/compliance tradeoffs. Browser extensions may still be useful **personally**.
|
||
|
||
### Untapped (Jopwell)
|
||
|
||
Closed candidate platform — **no public job-posting API** for arbitrary ingestion.
|
||
|
||
## Practical additions in JobOps
|
||
|
||
### iCIMS (per-tenant HTML)
|
||
|
||
Many tenants expose anonymous search HTML suitable for stable scraping patterns, e.g.:
|
||
|
||
`https://{tenant}.icims.com/jobs/search?ss=1&searchKeyword=…&in_iframe=1`
|
||
|
||
Pagination often uses `pr=`; job URLs commonly follow `/jobs/{id}/{slug}/job`. Maintain a **tenant host list** (similar to Greenhouse/Lever company lists). This is **not** the authenticated iCIMS Job Portal API.
|
||
|
||
Shipped extractor: **iCIMS tenants (HTML)** — configure `icimsTenants` (+ caps in Settings).
|
||
|
||
### BC T-Net RSS
|
||
|
||
Free aggregate RSS (example): `https://www.bctechnology.com/rss/jobs/tnetjobs.xml` — useful for **BC / Vancouver** tech roles; custom slices via the site’s RSS builder.
|
||
|
||
Shipped extractor: **BC T-Net (RSS)** — Canada geography only; optional `bctenetRssUrls` overrides default feed.
|
||
|
||
## Related pages
|
||
|
||
- [Extractors overview](/docs/next/extractors/overview)
|
||
- [Eluta](/docs/next/extractors/eluta) (Canada RSS)
|
||
- [SmartRecruiters](/docs/next/extractors/smartrecruiters)
|
||
- [Canadian companies — QA-strong ATS](/docs/next/extractors/canadian-companies-qa-ats)
|
||
- [Manual Import](/docs/next/extractors/manual)
|