feat(extractors): expand catalog, smoke coverage, and sourcing docs
Some checks failed
CI / Linting (Biome) (push) Failing after 40s
CI / Tests (push) Successful in 5m54s
CI / Type Check (adzuna-extractor) (push) Successful in 1m8s
CI / Type Check (gradcracker-extractor) (push) Successful in 1m11s
CI / Type Check (hiringcafe-extractor) (push) Successful in 1m8s
CI / Type Check (orchestrator) (push) Successful in 1m23s
CI / Type Check (startupjobs-extractor) (push) Successful in 1m6s
CI / Type Check (ukvisajobs-extractor) (push) Successful in 1m7s
CI / Documentation (push) Successful in 1m54s

Adds Arc.dev, BC T-Net, Eluta, iCIMS tenants, QAJobsBoard, and SmartRecruiters
manifests with registry/settings/UI wiring; registers full extractor list in
smoke-extractors and documents supplementary board access paths. Aligns Careerjet
v4 with the url query parameter and fixes strict typing in QAJobsBoard.

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
ilia 2026-05-15 22:36:23 -04:00
parent 67508d56ea
commit c840f289e1
50 changed files with 2926 additions and 101 deletions

View File

@ -200,6 +200,7 @@ ADZUNA_APP_KEY=
# LEVER_COMPANIES=netflix,figma
# ASHBY_COMPANIES=ramp,linear
# GREENHOUSE_COMPANIES=stripe,airbnb
# Canadian QA-employer examples (full table): docs-site/docs/extractors/canadian-companies-qa-ats.md
# =============================================================================
# Workday (public career sites) - optional
@ -210,3 +211,46 @@ ADZUNA_APP_KEY=
# 2) A JSON object with explicit fields:
# {"company":"NVIDIA","tenantUrl":"https://nvidia.wd5.myworkdayjobs.com","tenant":"nvidia","site":"NVIDIAExternalCareerSite","locale":"en-US"}
# WORKDAY_TENANTS=
# =============================================================================
# SmartRecruiters (public Posting API) - optional
# =============================================================================
# Comma- or newline-separated company identifiers (API path segment), e.g.
# jobs.smartrecruiters.com/smartrecruiters/... → "smartrecruiters".
# SMARTRECRUITERS_COMPANIES=smartrecruiters
# SMARTRECRUITERS_MAX_JOBS_PER_COMPANY=100
# =============================================================================
# Eluta (Canada, RSS by location) - optional
# =============================================================================
# Comma- or newline-separated location strings for https://www.eluta.ca/rss?location=...
# Example: ELUTA_RSS_LOCATIONS=Toronto, ON|Vancouver, BC
# ELUTA_MAX_JOBS_PER_TERM=100
# =============================================================================
# BC T-Net (British Columbia tech jobs RSS) — optional
# =============================================================================
# Default feed is built into the extractor when this is unset:
# https://www.bctechnology.com/rss/jobs/tnetjobs.xml
# Override with JSON array or newline-separated URLs (custom feeds from T-Net builder).
# BCTENET_RSS_URLS=
# Prefer Settings: bctenetRssUrls (JSON array), bctenetMaxJobsPerTerm (default 400).
# =============================================================================
# iCIMS tenant portals (anonymous HTML search) — optional
# =============================================================================
# Comma- or newline-separated hosts, e.g. careers-example.icims.com
# ICIMS_TENANTS=
# Caps via Settings: icimsMaxJobsPerTenant (default 250), icimsMaxPagesPerSearch (default 10).
# =============================================================================
# QAJobsBoard (QA JobBoardly JSON) — optional
# =============================================================================
# Configure caps via Settings: qajobsboardMaxJobsPerTerm (default 100).
# =============================================================================
# Arc.dev remote listings — optional
# =============================================================================
# Comma-separated paths under https://arc.dev used when seeding defaults (e.g. Playwright + Cypress feeds).
# ARC_REMOTE_JOBS_PATHS=/remote-jobs/playwright,/remote-jobs/cypress
# Prefer Settings for overrides: arcRemoteJobsPaths (JSON array), arcMaxJobsPerPath (default 120).

View File

@ -0,0 +1,34 @@
---
id: arcdev
title: Arc.dev Extractor
description: Remote tech roles from Arc.dev listing pages via embedded Next.js data.
sidebar_position: 17
---
## What it is
[Arc.dev](https://arc.dev) exposes remote job listings on paths such as `/remote-jobs/playwright` and `/remote-jobs/cypress`. The extractor downloads SSR HTML and parses the embedded `__NEXT_DATA__` payload (Arc-managed and external rows).
Implementation: `extractors/arcdev/manifest.ts`.
## Why it exists
Curated remote hiring with explicit tooling-oriented feeds; many roles are open to North America when labeled that way on the site.
## How to use it
1. Enable **Arc.dev** in pipeline sources (no credentials).
2. Configure **`arcRemoteJobsPaths`** as a JSON array of path strings (defaults include Playwright and Cypress remote feeds). Optionally seed defaults from **`ARC_REMOTE_JOBS_PATHS`** (comma-separated paths).
3. Set **`arcMaxJobsPerPath`** (default `120`, max `300`) to cap rows per listing URL after deduplication.
4. Align **`searchTerms`** with titles or stacks you care about; empty-term behavior is handled inside the manifest per path.
## Common problems
- **HTML changes:** If Arc ships a new payload shape, parsing may need an update; smoke-test with `npx tsx scripts/smoke-extractors.ts arcdev`, or run the full extractor suite with `npx tsx scripts/smoke-extractors.ts`.
- **`Arc talent network` employer:** Some Arc-managed rows omit a company name; the mapper uses that placeholder.
## Related pages
- [Extractors overview](/docs/next/extractors/overview)
- [Canadian QA contracting firms](/docs/next/extractors/qa-contract-staffing-canada)
- [Manual Import](/docs/next/extractors/manual)

View File

@ -0,0 +1,117 @@
---
id: canadian-companies-qa-ats
title: Canadian companies — strong QA orgs and scrapable ATS
description: Reference list of Canadian tech employers with solid QA cultures and practical ATS endpoints for JobOps pipelines.
sidebar_position: 41
---
## What it is
A curated reference of **Canadian-headquartered or Canadian-heavy tech employers** where QA / SDET / test automation is often a first-class function, together with **scrapable ATS endpoints** where they exist.
Tier 1 targets map cleanly to the shipped **Ashby**, **Greenhouse**, **Lever**, **Workday**, and **SmartRecruiters** extractors. Tier 2 entries need custom scraping, browser automation, or upstream quirks.
**Verification:** Tier 1 integrations below were probed successfully (**HTTP `200`**, JSON where applicable). **Posting counts change daily** — re-run probes locally when you need exact volumes.
## Why it exists
Canada-focused QA sourcing benefits from **employer-direct ATS feeds** (clean titles, real apply URLs) instead of only aggregator noise. This page maps recognizable brands to **exact integration shapes** so you can paste slugs into Settings or env without rediscovering URLs.
## How to use it
### Tier 1 — Public ATS APIs (shipped extractors)
| Company | HQ | ATS | Endpoint / shape (reference) | JobOps wiring |
| --- | --- | --- | --- | --- |
| Wealthsimple | Toronto | Ashby | `GET https://api.ashbyhq.com/posting-api/job-board/wealthsimple` | Add **`wealthsimple`** to **`ashbyCompanies`** / `ASHBY_COMPANIES`. |
| 1Password | Toronto (remote-first) | Ashby | `.../job-board/1password` | Add **`1password`**. |
| Jobber | Edmonton / Toronto | Ashby | `.../job-board/jobber` | Add **`jobber`**. |
| Nylas | Toronto / SF | Ashby | `.../job-board/nylas` | Add **`nylas`**. |
| Hootsuite | Vancouver | Greenhouse | `GET https://boards-api.greenhouse.io/v1/boards/hootsuite/jobs?content=true` | Add **`hootsuite`** to **`greenhouseCompanies`**. |
| Faire | Waterloo / SF | Greenhouse | `.../boards/faire/jobs?content=true` | Add **`faire`**. |
| PointClickCare | Mississauga | Lever | `GET https://api.lever.co/v0/postings/pointclickcare?mode=json` | Add **`pointclickcare`** to **`leverCompanies`**. |
| Clio | Burnaby / Calgary / Toronto | Workday | `POST https://clio.wd3.myworkdayjobs.com/wday/cxs/clio/ClioCareerSite/jobs` (`limit`, `offset`, `searchText`) | Add **`https://clio.wd3.myworkdayjobs.com/en-US/ClioCareerSite`** to **`workdayTenants`** / `WORKDAY_TENANTS`. |
| Coveo | Quebec City / Montreal | SmartRecruiters | `GET https://api.smartrecruiters.com/v1/companies/Coveo/postings` | Add **`Coveo`** to **`smartrecruitersCompanies`**. API stays **`200`**; **`totalFound`** may be **zero** between hiring waves. |
Optional Ashby query parameter `?includeCompensation=true` works in browsers and `curl` for richer payloads; the bundled Ashby extractor calls the **same path without that query** and still returns full job lists.
**Example Settings JSON (merge with your existing lists):**
```json
["wealthsimple", "1password", "jobber", "nylas"]
```
```json
["hootsuite", "faire"]
```
```json
["pointclickcare"]
```
```json
["https://clio.wd3.myworkdayjobs.com/en-US/ClioCareerSite"]
```
```json
["Coveo"]
```
### Tier 2 — Harder or custom surfaces
| Company | HQ | ATS | Notes |
| --- | --- | --- | --- |
| Shopify | Ottawa / remote | Ashby (custom) | Hosted board / GraphQL (`jobs.ashbyhq.com/api/non-user-graphql`, `organizationHostedJobsPageName: "shopify"`) or parse careers HTML — not covered by the slug-based Ashby extractor today. |
| Lightspeed Commerce | Montreal | Custom (often Cloudflare) | Careers HTML at `https://www.lightspeedhq.com/careers/openings/` — browser or tolerant fetcher; no shipped extractor. |
| RBC Borealis | Toronto / Montreal | Greenhouse (embedded) | `boards-api` path **`rbcborealis`** returned **404** when probed — scrape `https://rbcborealis.com/careers/` or rediscover the active board slug before using Greenhouse JSON. |
| Vidyard | KitchenerWaterloo | JS-heavy site | `https://careers.vidyard.com/` — Playwright/Puppeteer if automating. |
| Loblaw Digital | Toronto | Workday (parent) | Parent Workday host may need the correct site segment; careers marketing site often lists roles — browser-backed discovery may be more reliable than guessing CXS paths. |
### Ready-to-use CLI filters (QA-oriented titles)
Ashby (example: Wealthsimple):
```bash
curl -s 'https://api.ashbyhq.com/posting-api/job-board/wealthsimple?includeCompensation=true' \
| jq '.jobs[] | select(.title | test("QA|SDET|Test|Automation"; "i")) | {title, location, url: .jobUrl}'
```
Greenhouse (example: Hootsuite):
```bash
curl -s 'https://boards-api.greenhouse.io/v1/boards/hootsuite/jobs?content=true' \
| jq '.jobs[] | select(.title | test("QA|SDET|Test|Automation"; "i")) | {title, location: .location.name, url: .absolute_url}'
```
Lever (PointClickCare):
```bash
curl -s 'https://api.lever.co/v0/postings/pointclickcare?mode=json' \
| jq '.[] | select(.text | test("QA|SDET|Test|Automation"; "i")) | {title: .text, location: .categories.location, url: .hostedUrl}'
```
Workday (Clio — QA search text):
```bash
curl -s -X POST 'https://clio.wd3.myworkdayjobs.com/wday/cxs/clio/ClioCareerSite/jobs' \
-H 'Content-Type: application/json' \
-d '{"limit":50,"offset":0,"searchText":"QA"}' \
| jq '.jobPostings[] | select(.title | test("QA|SDET|Test|Automation"; "i")) | {title, location: .locationsText}'
```
### Other strong-QA Canadian employers (ATS not deep-verified here)
Worth manual checks or Eluta / LinkedIn cross-reference: **Wattpad**, **Knix**, **Ada**, **Hopper**, **Plusgrade**, **D2L**, **Kinaxis**, **TELUS Digital / Mirum**, **Trulioo**, **OpenText / Hubdoc**.
## Common problems
- **Ashby counts vs `includeCompensation`:** Omitting the query param still returns jobs; compensation fields may be sparser.
- **Greenhouse board slug drift:** If `boards-api` returns `404`, the employer may have renamed the board — inspect their careers page embed or HTML source for the current board id.
- **SmartRecruiters zero postings:** Still a valid integration; dont treat empty arrays as a broken extractor.
## Related pages
- [Extractors overview](/docs/next/extractors/overview)
- [Canadian / NA QA contracting firms](/docs/next/extractors/qa-contract-staffing-canada)
- [Eluta](/docs/next/extractors/eluta)
- [Manual Import](/docs/next/extractors/manual)

View File

@ -0,0 +1,44 @@
---
id: eluta
title: Eluta Extractor
description: Canadian job discovery via Eluta.ca public RSS feeds.
sidebar_position: 15
---
## What it is
Original site: [eluta.ca](https://www.eluta.ca)
The extractor lives in `extractors/eluta/manifest.ts`. It requests one or more public RSS URLs of the form `https://www.eluta.ca/rss?location=...`, parses items (title, employer, location, link, description), filters by pipeline search terms, and merges feeds while de-duplicating by `guid` / URL.
## Why it exists
Eluta surfaces Canadian roles indexed directly from employer career sites, often with less aggregator noise than generic job search. RSS provides a stable, low-auth integration compared to scraping HTML.
## How to use it
1. Choose **location strings** Eluta accepts in the RSS `location` query parameter (for example `Toronto, ON`, `Vancouver, BC`). Very broad values such as a whole country may return empty feeds; prefer metros or provinces.
2. In **Settings**, set **Eluta RSS locations** (`elutaRssLocations`) as a JSON array or comma/newline-separated list, or set `ELUTA_RSS_LOCATIONS` in the environment (for example `Toronto, ON|Montreal, QC`).
3. Optionally set **Eluta max jobs per term** (`elutaMaxJobsPerTerm`, default `100`).
4. Set your search geography to **Canada** — Eluta is **Canada-only** and is skipped automatically when the resolved pipeline country is not Canada.
5. Enable **Eluta** in pipeline sources and run the pipeline.
## Common problems
### Eluta is skipped for my run
- Search geography is not Canada (city/country/Indeed country resolution). Align geography to Canada or disable Eluta for non-Canada profiles.
### Empty feeds
- The `location` string may be too broad or spelled differently than Eluta expects. Try a major city plus province (e.g. `Calgary, AB`).
### RSS HTTP errors
- Eluta may block unusual clients; the extractor sends a conventional User-Agent. Retry later or reduce the number of location feeds per run.
## Related pages
- [Extractors Overview](/docs/next/extractors/overview)
- [Add an Extractor](/docs/next/workflows/add-an-extractor)
- [Settings](/docs/next/features/settings)

View File

@ -19,6 +19,12 @@ Extractor integrations are now registered through manifests and loaded automatic
| [Hiring Cafe](/docs/next/extractors/hiring-cafe) | Browser-backed discovery using Hiring Cafe search APIs | Subject to upstream anti-bot checks; uses browser context and encoded search-state payloads | `HIRING_CAFE_SEARCH_TERMS`, `HIRING_CAFE_COUNTRY`, `HIRING_CAFE_MAX_JOBS_PER_TERM`, `HIRING_CAFE_DATE_FETCHED_PAST_N_DAYS` | Uses existing pipeline term/country/budget knobs and maps directly to normalized jobs |
| [startup.jobs](/docs/next/extractors/startup-jobs) | Startup-focused discovery through the published `startup-jobs-scraper` package | No credentials required; detail enrichment depends on Playwright browser binaries being installed | existing pipeline `searchTerms`, selected country/cities, `jobspyResultsWanted`; `npx playwright install` for fresh environments | Algolia-backed search plus detail-page enrichment via package import; orchestrator maps normalized records and de-duplicates by `jobUrl` |
| [UKVisaJobs](/docs/next/extractors/ukvisajobs) | UK visa sponsorship-focused roles | Requires authenticated session and periodic token/cookie refresh | `UKVISAJOBS_EMAIL`, `UKVISAJOBS_PASSWORD`, `UKVISAJOBS_MAX_JOBS`, `UKVISAJOBS_SEARCH_KEYWORD` | API pagination + dataset output; orchestrator de-dupes and may fetch missing descriptions |
| [SmartRecruiters](/docs/next/extractors/smartrecruiters) | Enterprise employers on SmartRecruiters public boards | No auth; needs configured company identifiers; one HTTP round-trip per posting for apply URLs + descriptions | `SMARTRECRUITERS_COMPANIES`, `SMARTRECRUITERS_MAX_JOBS_PER_COMPANY` | Paginates the public Posting API, filters by pipeline terms, normalizes to `CreateJobInput` |
| iCIMS tenants (HTML) | Large employers on iCIMS portals | No auth; HTML search varies by tenant — maintain explicit tenant hosts | `ICIMS_TENANTS`, Settings: `icimsTenants`, `icimsMaxJobsPerTenant`, `icimsMaxPagesPerSearch` | Fetches `/jobs/search` with iframe-style params, parses listing links, caps per tenant |
| BC T-Net (RSS) | BC tech aggregate via T-Net | Canada-only; free RSS (default feed built-in); optional extra feeds | `BCTENET_RSS_URLS`, Settings: `bctenetRssUrls`, `bctenetMaxJobsPerTerm` | Fetches RSS item blocks, normalizes quirky CDATA link fragments, filters by pipeline terms |
| [Eluta](/docs/next/extractors/eluta) | Canadian listings aggregated from employer career sites (RSS) | Canada-only source (skipped when search geography is not Canada); RSS `location` strings must be set | `ELUTA_RSS_LOCATIONS`, `ELUTA_MAX_JOBS_PER_TERM` | Fetches one or more `eluta.ca` RSS feeds, filters by terms, de-duplicates by guid/URL |
| [QAJobsBoard](/docs/next/extractors/qajobsboard) | QA / SDET / automation-heavy board (global JSON feed) | No auth; geography skew is manual/filter downstream | `qajobsboardMaxJobsPerTerm` | Fetches JobBoardly JSON, filters by pipeline terms |
| [Arc.dev](/docs/next/extractors/arcdev) | Remote roles from Arc.dev listing pages (tool-tagged paths) | Parses SSR `__NEXT_DATA__`; relies on stable Next payload | `ARC_REMOTE_JOBS_PATHS` (seeds defaults), `arcRemoteJobsPaths`, `arcMaxJobsPerPath` | Merges Arc-managed + external rows; dedupes by URL |
| [Manual Import](/docs/next/extractors/manual) | One-off jobs not covered by scrapers | Inference quality depends on model/provider and input quality; some URLs cannot be fetched reliably | App/API endpoints (`/api/manual-jobs/infer`, `/api/manual-jobs/import`) | Accepts text/HTML/URL, runs inference, then saves and scores job after review |
## Which extractor should I use?
@ -29,10 +35,38 @@ Extractor integrations are now registered through manifests and loaded automatic
- Use **startup.jobs** when you want startup-heavy listings without maintaining another scraper locally.
- Use **Gradcracker** when targeting graduate pipelines in the UK.
- Use **UKVisaJobs** for sponsorship-specific UK searches.
- Use **SmartRecruiters** when you can list target employers public SmartRecruiters company identifiers.
- Use **iCIMS tenants** when you can list target `*.icims.com` career hosts (anonymous portal HTML search).
- Use **BC T-Net** for British Columbia tech RSS listings (runs only when search geography is Canada).
- Use **Eluta** for Canadian employer-direct listings via RSS (set metro/province `location` strings).
- Use **QAJobsBoard** or **Arc.dev** when you want QA- or remote-stack-focused feeds without extra credentials.
- Use **Manual Import** when you already have a specific posting and need direct import.
Many runs combine sources: broad discovery first, then manual import for high-priority jobs that scraping misses.
### QA-focused boards (shipped extractors)
- **[QAJobsBoard](/docs/next/extractors/qajobsboard)** — Large QA-oriented index via public JSON; filter geography downstream.
- **[Arc.dev](/docs/next/extractors/arcdev)** — Remote feeds (e.g. Playwright / Cypress paths); good for vetted remote slices.
### Canadian QA contracting firms (reference)
Staffing and consultancy firms that frequently post QA automation contracts — scrape hints and CLI probes: **[Canadian / NA QA contracting firms](/docs/next/extractors/qa-contract-staffing-canada)**.
### Canadian employers — QA-strong ATS (reference)
Direct ATS JSON / extractor wiring for well-known Canadian tech brands (Ashby, Greenhouse, Lever, Workday, SmartRecruiters): **[Canadian companies — strong QA orgs and scrapable ATS](/docs/next/extractors/canadian-companies-qa-ats)**.
## Supplementary job boards
Some boards are **credential-gated**, **approval-gated**, or **scraping-hostile** — see **[Supplementary sources — access notes](/docs/next/extractors/supplementary-sources-access-notes)** for realistic paths (Careerjet, Reed, Job Bank XML policy, sponsorship data sources, etc.).
JobOps ships **BC T-Net** and **iCIMS tenant HTML** extractors for two cases that are usually workable without vendor contracts; everything else in the old “long tail” list still lands best via **[Manual Import](/docs/next/extractors/manual)** until someone promotes it to a manifest.
### Still common manual-import targets
- **Wellfound** (formerly AngelList), **Otta**, **Welcome to the Jungle**, **Dice**, **Job Bank** (unless you qualify for syndication), regional boards without stable feeds — use Manual Import or an external tool, then normalize here.
## Related extractor docs
- [Gradcracker](/docs/next/extractors/gradcracker)
@ -41,5 +75,12 @@ Many runs combine sources: broad discovery first, then manual import for high-pr
- [Hiring Cafe](/docs/next/extractors/hiring-cafe)
- [startup.jobs](/docs/next/extractors/startup-jobs)
- [UKVisaJobs](/docs/next/extractors/ukvisajobs)
- [SmartRecruiters](/docs/next/extractors/smartrecruiters)
- [Supplementary sources — access notes](/docs/next/extractors/supplementary-sources-access-notes)
- [Eluta](/docs/next/extractors/eluta)
- [QAJobsBoard](/docs/next/extractors/qajobsboard)
- [Arc.dev](/docs/next/extractors/arcdev)
- [Canadian / NA QA contracting firms](/docs/next/extractors/qa-contract-staffing-canada)
- [Canadian companies — QA-strong ATS](/docs/next/extractors/canadian-companies-qa-ats)
- [Manual Import](/docs/next/extractors/manual)
- [Add an Extractor](/docs/next/workflows/add-an-extractor)

View File

@ -0,0 +1,92 @@
---
id: qa-contract-staffing-canada
title: Canadian / NA QA contracting firms
description: Staffing and consultancy boards that often post QA automation contracts, with JobOps wiring notes and scrape hints.
sidebar_position: 40
---
## What it is
A curated list of Canadian and North American **staffing firms and consultancies** that regularly carry **QA / SDET / test automation** contract roles. Coverage emphasizes targets that are **live**, **contract-heavy**, and roughly ordered by **scraping ease** for automation.
This is **not** an extractor implementation checklist: several firms need HTML or browser automation. Use native JobOps extractors where they apply, and [Manual Import](/docs/next/extractors/manual) elsewhere.
## Why it exists
Contract QA pipelines often come through agencies before they appear on Indeed or LinkedIn runs. Mapping firms to **ATS type** (Workday, Greenhouse, Lever, custom API, JS-rendered site) saves weeks of one-off research.
Counts and role titles **change daily** — re-verify listings before relying on them for outreach.
## How to use it
### Tier 1 — Confirmed live QA contracts + scrapable
| Firm | HQ | Where to scrape | Confirmed QA / contract notes |
| --- | --- | --- | --- |
| Procom | Toronto | [Find a job](https://procomservices.com/en-ca/find-a-job/) (~230+ roles when checked; paginated HTML). Titles such as “QA/QC Analyst”, “Automation QA Analyst”, “Sr QA Analyst” appear regularly. | Strong banking-sector volume; typical **46 month** contracts. No shipped extractor — HTML or browser automation. |
| S.i. Systems | Calgary / Toronto | [Search IT jobs (`q=QA`)](https://www.sisystems.com/search-it-jobs/?q=QA) — custom web API behind the UI; inspect DevTools network for JSON. | Frequent **Sr QA**, **Mobile QE** (WebdriverIO / Appium) and similar; lots of **GTA** roles; postings refresh often. |
| Synechron | NYC / Toronto / Montreal | Workday CXS: `POST https://synechron.wd1.myworkdayjobs.com/wday/cxs/synechron/SynechronCareers/jobs` with e.g. `{"limit":20,"offset":0,"searchText":"QA automation"}`. | Often **20+** QA automation-facing rows when searched; includes **Playwright** and related stacks. Add **`https://synechron.wd1.myworkdayjobs.com/en-US/SynechronCareers`** to **`workdayTenants`** / `WORKDAY_TENANTS` and use QA search terms (bundled extractor sends empty facets — see CLI snippet below for Canada facet example). |
| Capco | London / Toronto | Greenhouse: [boards-api capco](https://boards-api.greenhouse.io/v1/boards/capco/jobs?content=true) | Large board (**700+** roles when checked); dozens QA-related titles but many **India / Poland****filter by location** (Toronto / Canada metros). Add **`capco`** to **`greenhouseCompanies`**. |
| Foilcon | Toronto | [CVViz — Foilcon](https://jobs.cvviz.com/foilcon) | Lower volume; roles such as **Systems Testing QA Specialist** show up when hiring. |
| Robert Half (Technology) | Toronto / intl | [Jobs — QA automation keyword](https://www.roberthalf.com/ca/en/jobs?keywords=qa+automation) (often Workday-backed listings) | Discover tenant/host patterns from network tab if you want bulk Workday-style pulls; otherwise HTML/manual. |
| Hays Canada | Toronto | [QA automation search](https://www.hays.ca/job-search/qa-automation) — `/job-detail/...` permalinks | Custom HTML board; manual import or custom crawler. |
| Randstad Digital | Toronto / Montreal | [Randstad Canada jobs](https://www.randstad.ca/jobs/) | Site listings plus heavy **LinkedIn** contract volume under the Randstad Digital brand; often manual cross-check. |
| Compunnel | NJ / Toronto | [Job search](https://www.compunnel.com/job-search/) — **JS-rendered** | **Quality Assurance Automation Engineer** (and similar) titles recur; use **Playwright** if automating. |
| Pyramid Consulting | Atlanta / Toronto | [Job openings](https://www.pyramidci.com/careers/job-openings/) | **QA Automation Engineer** style roles often cross-posted on **LinkedIn** under their brand. |
| Qualitest Group | NYC / global | [Careers](https://careers.qualitestgroup.com) | Pure-play QA consultancy — schema varies by region; inspect ATS per locale. |
### Tier 2 — Active in Canadian QA contracting but smaller / harder to scrape
| Firm | Notes |
| --- | --- |
| Jarvis Consulting Group (`jrvs.ca`) | Toronto; [jobs.jrvs.ca/home/](https://jobs.jrvs.ca/home/) is **JS-rendered****Playwright** if scraping; SDET roles also surface on LinkedIn. |
| Electric Mind (formerly Intelliware) | Toronto. Lever: [`electricmind` postings JSON](https://api.lever.co/v0/postings/electricmind?mode=json) — add **`electricmind`** to **`leverCompanies`** when hiring opens; volume can be **small** (e.g. only a handful of open reqs) and sometimes **no QA** until project demand spikes. |
| Light Consulting (LightCI) | Toronto. Lever: [`lightci` postings JSON](https://api.lever.co/v0/postings/lightci?mode=json). Often **empty** publicly — they may prefer direct outreach. |
| Yoush Consulting | Toronto IT staffing; **no structured job board** — SDET contracts often only on **LinkedIn** company presence. |
| Accenture / Deloitte / EY / KPMG / PwC | Big-four **Canadian banking-tech QA** contract pools; usually **Workday** per brand (e.g. **EY**`ey.wd3.myworkdayjobs.com`). Add each tenant you care about to **`workdayTenants`**. |
| CGI | Montreal (large gov / enterprise contractor). [cgi.com/en/careers](https://www.cgi.com/en/careers) — HTML / embedded ATS; inspect for stable patterns. |
| TEKsystems | US-based with Canadian offices — [Find a job (Canada)](https://www.teksystems.com/en-ca/careers/find-a-job). Enterprise staffing; HTML/search UX varies. |
| Robertson & Company | Often referenced from **LinkedIn** QA contract threads — treat as manual / referral-led unless you find a stable feed. |
| Plan A Technologies | Legit shop — [planatechnologies.com](https://www.planatechnologies.com); **no public job board** in many periods; **LinkedIn / referrals**. |
### Ready-to-use CLI probes
Greenhouse (Capco) — Canada-oriented QA filter (tweak city regex as needed):
```bash
curl -s 'https://boards-api.greenhouse.io/v1/boards/capco/jobs?content=true' \
| jq '.jobs[]
| select(.title | test("QA|SDET|Test|Automation|Quality"; "i"))
| select(.location.name | test("Toronto|Canada|Montreal|Vancouver|Calgary|Ottawa"; "i"))
| {title, location: .location.name, url: .absolute_url}'
```
Synechron Workday — `QA automation` search plus example **country facet** (`appliedFacets.locationCountry`). Facet IDs are **tenant-specific** and **expire or change** — if this stops matching, capture a fresh id from the career sites network panel.
```bash
curl -s 'https://synechron.wd1.myworkdayjobs.com/wday/cxs/synechron/SynechronCareers/jobs' \
-X POST \
-H 'Content-Type: application/json' \
-d '{"limit":50,"offset":0,"searchText":"QA automation","appliedFacets":{"locationCountry":["bc33aa3152ec42d4995f4791a106ed09"]}}' \
| jq '.jobPostings[]
| {title, location: .locationsText, url: ("https://synechron.wd1.myworkdayjobs.com" + .externalPath)}'
```
### Heads-up: Ionosphere and “Reqd”
- **Ionosphere Inc.** — May only show a **LinkedIn** presence plus legacy **Google Sites** (e.g. `sites.google.com/a/ionosphereinc.com/...`) **without** a real careers board or steady postings — **not** a dependable scrape target unless you confirm a dedicated jobs URL. (“Ionosphere” is also used by unrelated firms — disambiguate by legal name and domain.)
- **Reqd** — No widely confirmed Canadian IT staffing brand spelled exactly **Reqd**. Possible leads to double-check: **Recroot**, **Reqroute**, **Recruitio**, **Required Technologies**, etc. If you have a **website or sample posting URL**, verify before adding to any automation list.
## Common problems
- **Workday without facets:** The bundled Workday extractor posts `appliedFacets: {}`. You still get roles via **`searchText`**; use tighter terms or post-filter for Canada.
- **Greenhouse volume:** Boards like Capco are large — always filter by title regex and location text before importing hundreds of rows.
## Related pages
- [Extractors overview](/docs/next/extractors/overview)
- [Canadian companies — QA-strong ATS](/docs/next/extractors/canadian-companies-qa-ats)
- [Eluta](/docs/next/extractors/eluta) (Canada employer-direct RSS)
- [QAJobsBoard](/docs/next/extractors/qajobsboard)
- [Arc.dev](/docs/next/extractors/arcdev)
- [Manual Import](/docs/next/extractors/manual)

View File

@ -0,0 +1,36 @@
---
id: qajobsboard
title: QAJobsBoard Extractor
description: QA and automation-focused listings via the boards public JSON feed.
sidebar_position: 16
---
## What it is
[QAJobsBoard](https://www.qajobsboard.com) publishes postings through JobBoardly. The extractor calls:
`GET https://qajobsboard.jobboardly.com/jobs.json`
Implementation: `extractors/qajobsboard/manifest.ts`.
## Why it exists
Dense QA / SDET / automation signal versus generic boards; categories often reflect tooling (Playwright, Cypress, Selenium). Geography skews India-remote unless you combine region filtering downstream.
## How to use it
1. Enable **QAJobsBoard** in pipeline sources (no credentials).
2. Set **`qajobsboardMaxJobsPerTerm`** (default `100`) to cap mapped rows after term filtering.
3. Tune **`searchTerms`** for QA-focused phrases (`QA automation`, `SDET`, `Playwright`, etc.).
4. Optional: narrow by geography using orchestrator city/country filters where applicable.
## Common problems
- **Few or no rows:** Terms may be too narrow; broaden titles or temporarily remove strict city filters.
- **Irrelevant locales:** The feed is global; pair with geography or employer filters in your pipeline profile.
## Related pages
- [Extractors overview](/docs/next/extractors/overview)
- [Canadian QA contracting firms](/docs/next/extractors/qa-contract-staffing-canada)
- [Manual Import](/docs/next/extractors/manual)

View File

@ -0,0 +1,44 @@
---
id: smartrecruiters
title: SmartRecruiters Extractor
description: Public SmartRecruiters Posting API discovery with per-company identifiers.
sidebar_position: 14
---
## What it is
Original API: [SmartRecruiters Posting API](https://developers.smartrecruiters.com/reference/v1listpostings)
The extractor lives in `extractors/smartrecruiters/manifest.ts`. It calls the public JSON endpoints (no API key for public boards), paginates active **PUBLIC** postings per configured company, optionally matches pipeline search terms against title and location, then loads each postings detail document so `jobUrl` / `applicationLink` and HTML descriptions resolve to the same URLs candidates see on `jobs.smartrecruiters.com`.
## Why it exists
Many large employers (including a significant share in Canada and the EU) publish on SmartRecruiters. This source complements Greenhouse, Lever, Ashby, and Workday by covering another major ATS with a predictable public API.
## How to use it
1. Find each employers **company identifier** — the path segment in their public board URL (for example `jobs.smartrecruiters.com/smartrecruiters/...``smartrecruiters`).
2. In **Settings**, set **SmartRecruiters companies** (`smartrecruitersCompanies`) to a JSON array or comma/newline-separated list of those identifiers, or set `SMARTRECRUITERS_COMPANIES` in the environment.
3. Optionally set **SmartRecruiters max jobs per company** (`smartrecruitersMaxJobsPerCompany`, default `100`, max `500`) to cap pagination after term filtering.
4. Set your pipeline **search geography** and **search terms** as usual; terms filter postings by title, location text, and company display name.
5. Enable **SmartRecruiters** in pipeline sources and run the pipeline.
## Common problems
### SmartRecruiters never appears in source toggles
- No companies are configured (`smartrecruitersCompanies` / `SMARTRECRUITERS_COMPANIES` is empty).
### Zero jobs for a slug I know is correct
- The identifier must match the **public Posting API** path, not necessarily the marketing site name. Confirm listings exist on `jobs.smartrecruiters.com/<identifier>/`.
### Rate limiting or intermittent HTTP errors
- Reduce `smartrecruitersMaxJobsPerCompany` or the number of configured companies; each kept posting triggers a detail request after the list pass.
## Related pages
- [Extractors Overview](/docs/next/extractors/overview)
- [Add an Extractor](/docs/next/workflows/add-an-extractor)
- [Settings](/docs/next/features/settings)

View File

@ -0,0 +1,73 @@
---
id: supplementary-sources-access-notes
title: Supplementary sources — access notes
description: Credential gates, sources to skip, and practical alternatives for boards without a native JobOps extractor.
sidebar_position: 15
---
This page captures **verified access paths** and realistic integration effort for boards that are not fully wired as pipeline extractors. Pair it with [Extractors overview](/docs/next/extractors/overview), [Manual Import](/docs/next/extractors/manual), and [Add an Extractor](/docs/next/workflows/add-an-extractor).
## Credential-gated APIs (usually straightforward)
### Careerjet (v4)
- **Sign-up:** [careerjet.com/partners](https://www.careerjet.com/partners) → add your site → Access API → register **server egress IP(s)**.
- **Endpoint:** `https://search.api.careerjet.net/v4/query`
- **Important parameters:** `affid` (publisher key), **`user_ip`** (documented as end-user IP; for headless/server runs use an IP you allowlisted — fraud-checked), **`user_agent`**, **`url`** (referrer URL where results would appear — maps to `CAREERJET_REFERER` / query `url` + `Referer` header). Missing `user_ip` or `user_agent` tends to yield **403**.
- **Tip:** Official Python client: [`careerjet/careerjet-api-client-python`](https://github.com/careerjet/careerjet-api-client-python).
### Reed
- **Sign-up:** [reed.co.uk/developers](https://www.reed.co.uk/developers) — API key is issued via their contact flow (often ~12 business days).
- **Endpoint:** e.g. `https://www.reed.co.uk/api/1.0/search?keywords=...&locationName=...&resultsToTake=100`
- **Auth:** HTTP Basic — username = API key, password empty (`curl -u "YOUR_API_KEY:" ...`).
- **Pagination:** `resultsToTake` max **100** per request; advance with `resultsToSkip`.
- **Scope:** UK-centric; still useful for remote UK employers.
## Usually not worth scraping yourself
### Job Bank (Canada)
XML syndication is **manual approval**: active Canadian Business Number, established Canadian-facing employment site. No simple public JSON/RSS for arbitrary candidates. HTML exists but is heavy JSF / anti-bot — treat as **skip** unless you qualify for the feed.
### Jobboom / Workopolis / BCJobs
No stable public API/RSS documented for generic job discovery. Third-party scrapers often need **residential proxies** and paid runtime — **skip or pay** for a maintained provider.
### Jobillico
Employer-oriented XML/OAuth API (posting and limited pull). Needs a **business account** — not a candidate discovery API.
### MyVisaJobs / H1BGrader
No practical public API for their enriched UX. Alternatives: **DOL OFLC LCA disclosure** quarterly CSVs (public, bulk), then join to your own job corpus; paid marketplace scrapers if you accept cost/compliance tradeoffs. Browser extensions may still be useful **personally**.
### Untapped (Jopwell)
Closed candidate platform — **no public job-posting API** for arbitrary ingestion.
## Practical additions in JobOps
### iCIMS (per-tenant HTML)
Many tenants expose anonymous search HTML suitable for stable scraping patterns, e.g.:
`https://{tenant}.icims.com/jobs/search?ss=1&searchKeyword=…&in_iframe=1`
Pagination often uses `pr=`; job URLs commonly follow `/jobs/{id}/{slug}/job`. Maintain a **tenant host list** (similar to Greenhouse/Lever company lists). This is **not** the authenticated iCIMS Job Portal API.
Shipped extractor: **iCIMS tenants (HTML)** — configure `icimsTenants` (+ caps in Settings).
### BC T-Net RSS
Free aggregate RSS (example): `https://www.bctechnology.com/rss/jobs/tnetjobs.xml` — useful for **BC / Vancouver** tech roles; custom slices via the sites RSS builder.
Shipped extractor: **BC T-Net (RSS)** — Canada geography only; optional `bctenetRssUrls` overrides default feed.
## Related pages
- [Extractors overview](/docs/next/extractors/overview)
- [Eluta](/docs/next/extractors/eluta) (Canada RSS)
- [SmartRecruiters](/docs/next/extractors/smartrecruiters)
- [Canadian companies — QA-strong ATS](/docs/next/extractors/canadian-companies-qa-ats)
- [Manual Import](/docs/next/extractors/manual)

View File

@ -41,7 +41,8 @@ That keeps runtime wiring dynamic while preserving compile-time safety in API an
- append to `EXTRACTOR_SOURCE_IDS`
- add an entry in `EXTRACTOR_SOURCE_METADATA`
5. Ensure your extractor maps output to `CreateJobInput[]`.
6. Run the full CI checks.
6. Register it in `scripts/smoke-extractors.ts` (`ALL_TARGETS`): add one row per manifest so `npx tsx scripts/smoke-extractors.ts` exercises every shipped extractor (keyed sources `SKIP` until env vars exist).
7. Run the full CI checks.
Example manifest:
@ -77,6 +78,22 @@ Subprocess extractors are supported. Keep subprocess spawning inside `run(contex
- Add the new source id to `shared/src/extractors/index.ts`.
- Confirm metadata exists for that source id.
### Smoke connectivity
After wiring settings/env, run:
```bash
npx tsx scripts/smoke-extractors.ts myextractor
```
Or the full suite (may take several minutes — JobSpy invokes Python, Hiring Cafe / startup.jobs may need browser deps):
```bash
npx tsx scripts/smoke-extractors.ts
```
Keep `ALL_TARGETS` in that script aligned with manifests under each `extractors/<name>/` package (`manifest.ts` or `src/manifest.ts`).
### Source appears in shared catalog but is unavailable at runtime
- The manifest was not loaded successfully.

View File

@ -46,6 +46,11 @@ const sidebars: SidebarsConfig = {
label: "Extractors",
items: [
"extractors/overview",
"extractors/supplementary-sources-access-notes",
"extractors/qajobsboard",
"extractors/arcdev",
"extractors/qa-contract-staffing-canada",
"extractors/canadian-companies-qa-ats",
"extractors/gradcracker",
"extractors/jobspy",
"extractors/adzuna",

View File

@ -0,0 +1,15 @@
# arcdev-extractor
Reads Arc remote-job listings from **SSR HTML**: each page embeds `__NEXT_DATA__` with `arcJobs` (Arc talent network) and `externalJobs` (partner postings).
Configure **`arcRemoteJobsPaths`** as URL paths on `https://arc.dev`, for example:
- `/remote-jobs/playwright`
- `/remote-jobs/cypress`
- `/remote-jobs/selenium`
Or set `ARC_REMOTE_JOBS_PATHS` (comma/newline-separated). Defaults include Playwright and Cypress stacks.
**Employer names:** External jobs include `company.name`. Arc-managed listings omit company names in the payload — those rows use employer `"Arc talent network"` while preserving titles and skill categories.
Cap merged matches per configuration fetch via `arcMaxJobsPerPath` (applied separately per path, default `120`).

View File

@ -0,0 +1,329 @@
/**
* Arc.dev remote jobs parse embedded Next.js __NEXT_DATA__ from SSR HTML.
*
* Listing URLs look like https://arc.dev/remote-jobs/playwright
*/
import type {
ExtractorManifest,
ExtractorRunResult,
} from "@shared/types/extractors";
import type { CreateJobInput } from "@shared/types/jobs";
const ORIGIN = "https://arc.dev";
interface ArcCategory {
name?: string;
urlString?: string;
}
interface ArcCompanyJson {
randomKey?: string | null;
urlString?: string;
name?: string;
}
interface ArcJobJson {
randomKey?: string;
title?: string;
jobType?: string;
jobRole?: string;
urlString?: string;
postedAt?: number;
company?: ArcCompanyJson;
categories?: ArcCategory[];
requiredCountries?: string[];
minAnnualSalary?: number | null;
maxAnnualSalary?: number | null;
minHourlyRate?: number | null;
maxHourlyRate?: number | null;
timeZone?: string | null;
positionType?: string;
experienceLevel?: string;
experienceLevels?: string[];
}
function readPaths(raw: string | undefined): string[] {
if (!raw) return [];
try {
const parsed = JSON.parse(raw);
if (Array.isArray(parsed)) {
return parsed
.map((entry) => (typeof entry === "string" ? entry.trim() : ""))
.filter(Boolean);
}
} catch {
// fall through
}
return raw
.split(/[\n,;|]+/)
.map((entry) => entry.trim())
.filter(Boolean);
}
function defaultArcPaths(): string[] {
const raw =
typeof process !== "undefined" ? process.env.ARC_REMOTE_JOBS_PATHS : "";
const parsed = readPaths(raw);
return parsed.length > 0
? parsed
: ["/remote-jobs/playwright", "/remote-jobs/cypress"];
}
function asString(value: unknown): string | undefined {
if (typeof value !== "string") return undefined;
const t = value.trim();
return t ? t : undefined;
}
function categoryHaystack(job: ArcJobJson): string {
if (!Array.isArray(job.categories)) return "";
return job.categories
.map((c) => `${c.name ?? ""} ${c.urlString ?? ""}`)
.join(" ")
.toLowerCase();
}
function matchesTerm(job: ArcJobJson, term: string): boolean {
const lower = term.toLowerCase();
if (job.title?.toLowerCase().includes(lower)) return true;
if (categoryHaystack(job).includes(lower)) return true;
if (job.jobRole?.toLowerCase().includes(lower)) return true;
if (job.positionType?.toLowerCase().includes(lower)) return true;
if (
Array.isArray(job.experienceLevels) &&
job.experienceLevels.some((l) => l.toLowerCase().includes(lower))
)
return true;
if (job.experienceLevel?.toLowerCase().includes(lower)) return true;
return false;
}
function salaryParts(job: ArcJobJson): string | undefined {
const bits: string[] = [];
if (
typeof job.minAnnualSalary === "number" &&
typeof job.maxAnnualSalary === "number"
) {
bits.push(`USD ${job.minAnnualSalary}${job.maxAnnualSalary} / yr`);
} else if (typeof job.minAnnualSalary === "number") {
bits.push(`USD ${job.minAnnualSalary}+ / yr`);
}
if (
typeof job.minHourlyRate === "number" ||
typeof job.maxHourlyRate === "number"
) {
bits.push(`$${job.minHourlyRate ?? "?"}${job.maxHourlyRate ?? "?"} / hr`);
}
return bits.length > 0 ? bits.join("; ") : undefined;
}
function locationLine(job: ArcJobJson): string {
if (
Array.isArray(job.requiredCountries) &&
job.requiredCountries.length > 0
) {
return job.requiredCountries.join(", ");
}
if (job.timeZone) return job.timeZone;
return "Remote";
}
function postedIso(postedAt: number | undefined): string | undefined {
if (typeof postedAt !== "number" || !Number.isFinite(postedAt))
return undefined;
return new Date(postedAt * 1000).toISOString();
}
function parseNextPageProps(html: string): {
arcJobs: ArcJobJson[];
externalJobs: ArcJobJson[];
} | null {
const match = html.match(
/<script id="__NEXT_DATA__"[^>]*>([\s\S]*?)<\/script>/,
);
if (!match?.[1]) return null;
try {
const parsed = JSON.parse(match[1]) as {
props?: { pageProps?: unknown };
};
const pageProps = parsed.props?.pageProps as
| {
arcJobs?: ArcJobJson[];
externalJobs?: ArcJobJson[];
}
| undefined;
if (!pageProps) return null;
return {
arcJobs: Array.isArray(pageProps.arcJobs) ? pageProps.arcJobs : [],
externalJobs: Array.isArray(pageProps.externalJobs)
? pageProps.externalJobs
: [],
};
} catch {
return null;
}
}
function mapExternalJob(job: ArcJobJson): CreateJobInput | null {
const rk = asString(job.randomKey);
const slug = asString(job.urlString);
if (!rk || !slug) return null;
const jobUrl = `${ORIGIN}/remote-jobs/j/${slug}-${rk}`;
const employer = asString(job.company?.name)?.trim() || "Unknown employer";
const disciplines = Array.isArray(job.categories)
? job.categories
.map((c) => c.name?.trim())
.filter((v): v is string => Boolean(v))
.join(", ")
: undefined;
return {
source: "arcdev",
sourceJobId: slug,
title: asString(job.title) ?? "Unknown Title",
employer,
jobUrl,
applicationLink: jobUrl,
location: locationLine(job),
datePosted: postedIso(job.postedAt),
jobType: asString(job.jobType),
salary: salaryParts(job),
disciplines,
jobLevel:
job.experienceLevels?.join(", ") ??
asString(job.experienceLevel) ??
undefined,
isRemote: true,
};
}
function mapArcManagedJob(job: ArcJobJson): CreateJobInput | null {
const rk = asString(job.randomKey);
const slug = asString(job.urlString);
if (!rk || !slug) return null;
const jobUrl = `${ORIGIN}/remote-jobs/details/${slug}-${rk}`;
const disciplines = Array.isArray(job.categories)
? job.categories
.map((c) => c.name?.trim())
.filter((v): v is string => Boolean(v))
.join(", ")
: undefined;
const employer = "Arc talent network";
return {
source: "arcdev",
sourceJobId: `${slug}-${rk}`,
title: asString(job.title) ?? "Unknown Title",
employer,
jobUrl,
applicationLink: jobUrl,
location: locationLine(job),
datePosted: postedIso(job.postedAt),
jobType: asString(job.jobType),
salary: salaryParts(job),
disciplines,
jobLevel: asString(job.experienceLevel),
jobFunction: asString(job.jobRole),
isRemote: true,
};
}
export const manifest: ExtractorManifest = {
id: "arcdev",
displayName: "Arc.dev (remote)",
providesSources: ["arcdev"],
async run(context): Promise<ExtractorRunResult> {
if (context.shouldCancel?.()) return { success: true, jobs: [] };
let paths = readPaths(context.settings.arcRemoteJobsPaths);
if (paths.length === 0) paths = defaultArcPaths();
paths = paths.map((p) => (p.startsWith("/") ? p : `/${p}`));
const maxPerPath = context.settings.arcMaxJobsPerPath
? Number.parseInt(context.settings.arcMaxJobsPerPath, 10)
: 120;
const cap = Number.isFinite(maxPerPath)
? Math.min(Math.max(maxPerPath, 1), 300)
: 120;
const terms = context.searchTerms.length > 0 ? context.searchTerms : [];
const seen = new Set<string>();
const out: CreateJobInput[] = [];
try {
for (let i = 0; i < paths.length; i += 1) {
if (context.shouldCancel?.()) break;
const path = paths[i];
const pageUrl = `${ORIGIN}${path}`;
context.onProgress?.({
phase: "list",
termsProcessed: i,
termsTotal: paths.length,
currentUrl: pageUrl,
detail: `Arc.dev: fetching (${i + 1}/${paths.length}) ${path}`,
});
const response = await fetch(pageUrl, {
headers: {
Accept: "text/html",
"User-Agent":
"Mozilla/5.0 (compatible; JobOps/1.0; +https://github.com)",
},
});
if (!response.ok) {
throw new Error(
`Arc.dev "${path}" failed with status ${response.status}`,
);
}
const html = await response.text();
const payload = parseNextPageProps(html);
if (!payload) {
throw new Error(`Arc.dev "${path}": missing __NEXT_DATA__ payload`);
}
let pathAdded = 0;
const labeled = [
...payload.arcJobs.map((job) => ({ job, kind: "arc" as const })),
...payload.externalJobs.map((job) => ({ job, kind: "ext" as const })),
];
for (const { job: raw, kind } of labeled) {
if (pathAdded >= cap) break;
if (terms.length > 0 && !terms.some((t) => matchesTerm(raw, t))) {
continue;
}
const mapped =
kind === "arc" ? mapArcManagedJob(raw) : mapExternalJob(raw);
if (!mapped) continue;
if (seen.has(mapped.jobUrl)) continue;
seen.add(mapped.jobUrl);
out.push(mapped);
pathAdded += 1;
}
context.onProgress?.({
phase: "list",
termsProcessed: i + 1,
termsTotal: paths.length,
currentUrl: pageUrl,
jobPagesProcessed: out.length,
detail: `Arc.dev: ${path}${pathAdded} kept (${payload.arcJobs.length} arc + ${payload.externalJobs.length} external rows)`,
});
}
return { success: true, jobs: out };
} catch (error) {
const message = error instanceof Error ? error.message : "Unknown error";
return { success: false, jobs: out, error: message };
}
},
};
export default manifest;

View File

@ -0,0 +1,17 @@
{
"name": "arcdev-extractor",
"version": "0.0.1",
"type": "module",
"description": "Arc.dev remote jobs extractor (__NEXT_DATA__ SSR)",
"main": "manifest.ts",
"dependencies": {
"job-ops-shared": "^1.0.0"
},
"devDependencies": {
"@types/node": "^24.0.0",
"typescript": "~5.9.0"
},
"scripts": {
"check:types": "tsc --noEmit"
}
}

View File

@ -0,0 +1,17 @@
{
"compilerOptions": {
"module": "ESNext",
"moduleResolution": "bundler",
"target": "ES2022",
"outDir": "dist",
"strict": true,
"noUnusedLocals": false,
"lib": ["ES2022", "DOM"],
"types": ["node"],
"baseUrl": ".",
"paths": {
"@shared/*": ["../../shared/src/*"]
}
},
"include": ["./manifest.ts", "./src/**/*"]
}

View File

@ -0,0 +1,9 @@
# bctenet-extractor
Consumes the public **BC T-Net** aggregated tech-jobs RSS feed (no auth).
Default feed: `https://www.bctechnology.com/rss/jobs/tnetjobs.xml`
Controls: `bctenetRssUrls` (optional extra feeds from the T-Net RSS builder), `bctenetMaxJobsPerTerm`.
Canada-focused listings (British Columbia). The orchestrator skips this source when pipeline geography is not Canada (`countryAllowlist`).

View File

@ -0,0 +1,194 @@
/**
* BC T-Net public RSS aggregate of BC tech jobs.
*
* Default: https://www.bctechnology.com/rss/jobs/tnetjobs.xml
*
* Feeds may embed `<![CDATA[&]]>` inside `<link>` URLs normalized before fetch.
*/
import type {
ExtractorManifest,
ExtractorRunResult,
} from "@shared/types/extractors";
import type { CreateJobInput } from "@shared/types/jobs";
interface BcItem {
title?: string;
link?: string;
guid?: string;
description?: string;
pubDate?: string;
category?: string;
}
function xmlText(xml: string, tag: string): string | undefined {
const pattern = new RegExp(`<${tag}[^>]*>([\\s\\S]*?)</${tag}>`);
const match = xml.match(pattern);
if (!match?.[1]) return undefined;
return (
match[1].replace(/<!\[CDATA\[([\s\S]*?)\]\]>/g, "$1").trim() || undefined
);
}
function normalizeFeedLink(raw: string): string {
return raw.replace(/<!\[CDATA\[&\]\]>/g, "&").trim();
}
function parseItems(xml: string): BcItem[] {
const items: BcItem[] = [];
const blocks = xml.match(/<item>([\s\S]*?)<\/item>/g) ?? [];
for (const raw of blocks) {
const block = raw.replace(/^<item>/, "").replace(/<\/item>$/, "");
const linkRaw = xmlText(block, "link");
items.push({
title: xmlText(block, "title"),
link: linkRaw ? normalizeFeedLink(linkRaw) : undefined,
guid: xmlText(block, "guid"),
description: xmlText(block, "description"),
pubDate: xmlText(block, "pubDate"),
category: xmlText(block, "category"),
});
}
return items;
}
function readUrls(raw: string | undefined): string[] {
if (!raw) return [];
try {
const parsed = JSON.parse(raw);
if (Array.isArray(parsed)) {
return parsed
.map((entry) => (typeof entry === "string" ? entry.trim() : ""))
.filter(Boolean);
}
} catch {
// fall through
}
return raw
.split(/[\n|]+/)
.map((entry) => entry.trim())
.filter(Boolean);
}
function decodeHtmlEntities(html: string): string {
return html
.replace(/&amp;/g, "&")
.replace(/&lt;/g, "<")
.replace(/&gt;/g, ">")
.replace(/&#x2f;/gi, "/")
.replace(/&#x26;/gi, "&");
}
function matchesTerm(item: BcItem, term: string): boolean {
const lower = term.toLowerCase();
const hay =
`${item.title ?? ""} ${item.description ?? ""} ${item.category ?? ""}`.toLowerCase();
return hay.includes(lower);
}
function mapJob(item: BcItem): CreateJobInput | null {
const jobUrl = item.link?.trim();
if (!jobUrl) return null;
const title = item.title ? decodeHtmlEntities(item.title) : "Unknown Title";
const employer = item.category?.trim() || "Unknown Employer";
return {
source: "bctenet",
sourceJobId: item.guid ?? jobUrl,
title,
employer,
jobUrl,
applicationLink: jobUrl,
location: "British Columbia, Canada",
datePosted: item.pubDate,
jobDescription: item.description
? decodeHtmlEntities(item.description)
: undefined,
};
}
export const manifest: ExtractorManifest = {
id: "bctenet",
displayName: "BC T-Net (RSS)",
providesSources: ["bctenet"],
async run(context): Promise<ExtractorRunResult> {
if (context.shouldCancel?.()) return { success: true, jobs: [] };
const defaults = ["https://www.bctechnology.com/rss/jobs/tnetjobs.xml"];
const configured = readUrls(context.settings.bctenetRssUrls);
const urls = configured.length > 0 ? configured : defaults;
const maxJobs = context.settings.bctenetMaxJobsPerTerm
? Number.parseInt(context.settings.bctenetMaxJobsPerTerm, 10)
: 400;
const cap = Number.isFinite(maxJobs)
? Math.min(Math.max(maxJobs, 1), 2000)
: 400;
const terms = context.searchTerms.length > 0 ? context.searchTerms : [];
const maxTotal = cap * Math.max(terms.length, 1);
const seen = new Set<string>();
const out: CreateJobInput[] = [];
try {
for (let i = 0; i < urls.length; i += 1) {
if (context.shouldCancel?.()) break;
const rssUrl = urls[i];
context.onProgress?.({
phase: "list",
termsProcessed: i,
termsTotal: urls.length,
currentUrl: rssUrl,
detail: `BC T-Net: fetching (${i + 1}/${urls.length})`,
});
const response = await fetch(rssUrl, {
headers: {
Accept: "application/rss+xml, application/xml, text/xml",
"User-Agent":
"Mozilla/5.0 (compatible; JobOps/1.0) BC T-Net RSS consumer",
},
});
if (!response.ok) {
throw new Error(`BC T-Net RSS failed: ${response.status}`);
}
const xml = await response.text();
const items = parseItems(xml);
for (const item of items) {
if (out.length >= maxTotal) break;
if (terms.length > 0 && !terms.some((t) => matchesTerm(item, t))) {
continue;
}
const mapped = mapJob(item);
if (!mapped) continue;
const key = mapped.sourceJobId || mapped.jobUrl;
if (seen.has(key)) continue;
seen.add(key);
out.push(mapped);
}
context.onProgress?.({
phase: "list",
termsProcessed: i + 1,
termsTotal: urls.length,
currentUrl: rssUrl,
jobPagesProcessed: out.length,
detail: `BC T-Net: ${items.length} items (${out.length} kept total)`,
});
}
return { success: true, jobs: out };
} catch (error) {
const message = error instanceof Error ? error.message : "Unknown error";
return { success: false, jobs: out, error: message };
}
},
};
export default manifest;

View File

@ -0,0 +1,17 @@
{
"name": "bctenet-extractor",
"version": "0.0.1",
"type": "module",
"description": "BC T-Net public RSS job feed (British Columbia tech roles)",
"main": "manifest.ts",
"dependencies": {
"job-ops-shared": "^1.0.0"
},
"devDependencies": {
"@types/node": "^24.0.0",
"typescript": "~5.9.0"
},
"scripts": {
"check:types": "tsc --noEmit"
}
}

View File

@ -0,0 +1,17 @@
{
"compilerOptions": {
"module": "ESNext",
"moduleResolution": "bundler",
"target": "ES2022",
"outDir": "dist",
"strict": true,
"noUnusedLocals": false,
"lib": ["ES2022", "DOM"],
"types": ["node"],
"baseUrl": ".",
"paths": {
"@shared/*": ["../../shared/src/*"]
}
},
"include": ["./manifest.ts", "./src/**/*"]
}

View File

@ -8,8 +8,11 @@
* Referer header and `user_ip` / `user_agent` query params. Register your
* server's outbound IP(s) in the Careerjet publisher dashboard.
*
* Env: CAREERJET_AFFID (API key), CAREERJET_REFERER (job-search page URL),
* CAREERJET_USER_IP (must match an allowlisted IP), optional CAREERJET_USER_AGENT.
* Publisher signup: careerjet.com/partners register allowlisted server IP(s).
* Env: CAREERJET_AFFID (API key for Basic auth username), CAREERJET_REFERER (maps to
* Referer header and the API `url` query param page where results would appear),
* CAREERJET_USER_IP (public egress IP allowlisted in dashboard; fraud-checked),
* optional CAREERJET_USER_AGENT. Missing user_ip / user_agent yields 403 per docs.
*/
import type {
@ -117,6 +120,7 @@ async function fetchPage(args: {
url.searchParams.set("page_size", String(args.pageSize));
url.searchParams.set("user_ip", args.userIp);
url.searchParams.set("user_agent", args.userAgent);
url.searchParams.set("url", args.referer);
const response = await fetch(url.toString(), {
headers: {
@ -213,11 +217,7 @@ export const manifest: ExtractorManifest = {
let collected = 0;
let page = 1;
let totalPages = Number.POSITIVE_INFINITY;
while (
collected < maxJobsPerTerm &&
page <= totalPages &&
page <= 10
) {
while (collected < maxJobsPerTerm && page <= totalPages && page <= 10) {
if (context.shouldCancel?.()) break;
const body = await fetchPage({
apiKey,

View File

@ -0,0 +1,9 @@
# eluta-extractor
Pulls Canadian job postings from [Eluta.ca](https://www.eluta.ca) public **RSS** feeds (`https://www.eluta.ca/rss?location=...`). Listings are indexed from employer career sites.
- Configure **`elutaRssLocations`**: JSON array or comma/newline-separated location strings passed to the `location` query parameter (e.g. `Toronto, ON`, `Vancouver, BC`). RSS for very broad regions (e.g. a whole country) may return an empty feed; prefer metro/province strings.
- Optional: `ELUTA_RSS_LOCATIONS` environment default (same format).
- **`elutaMaxJobsPerTerm`** caps how many RSS items are kept after pipeline search-term filtering (default 100).
This source is **Canada-only**: it is automatically skipped when your search geography is not Canada.

View File

@ -0,0 +1,201 @@
/**
* Eluta.ca public RSS feeds (Canadian employer-direct listings).
*
* Example: https://www.eluta.ca/rss?location=Toronto%2C%20ON
*
* No auth. Multiple `elutaRssLocations` values each fetch a feed; results are
* merged and de-duplicated by guid/link.
*/
import type {
ExtractorManifest,
ExtractorRunResult,
} from "@shared/types/extractors";
import type { CreateJobInput } from "@shared/types/jobs";
const RSS_BASE = "https://www.eluta.ca/rss";
interface ElutaItem {
title?: string;
link?: string;
guid?: string;
description?: string;
pubDate?: string;
employer?: string;
location?: string;
}
function xmlText(xml: string, tag: string): string | undefined {
const pattern = new RegExp(`<${tag}[^>]*>([\\s\\S]*?)</${tag}>`);
const match = xml.match(pattern);
if (!match?.[1]) return undefined;
return (
match[1].replace(/<!\[CDATA\[([\s\S]*?)\]\]>/g, "$1").trim() || undefined
);
}
function parseItems(xml: string): ElutaItem[] {
const items: ElutaItem[] = [];
const blocks = xml.match(/<item>([\s\S]*?)<\/item>/g) ?? [];
for (const raw of blocks) {
const block = raw.replace(/^<item>/, "").replace(/<\/item>$/, "");
items.push({
title: xmlText(block, "title"),
link: xmlText(block, "link"),
guid: xmlText(block, "guid"),
description: xmlText(block, "description"),
pubDate: xmlText(block, "pubDate"),
employer: xmlText(block, "employer"),
location: xmlText(block, "location"),
});
}
return items;
}
function readLocations(raw: string | undefined): string[] {
if (!raw) return [];
try {
const parsed = JSON.parse(raw);
if (Array.isArray(parsed)) {
return parsed
.map((entry) => (typeof entry === "string" ? entry.trim() : ""))
.filter(Boolean);
}
} catch {
// fall through
}
return raw
.split(/[\n,;|]+/)
.map((entry) => entry.trim())
.filter(Boolean);
}
function decodeHtmlEntities(html: string): string {
return html
.replace(/&amp;/g, "&")
.replace(/&lt;/g, "<")
.replace(/&gt;/g, ">");
}
function matchesTerm(item: ElutaItem, term: string): boolean {
const lower = term.toLowerCase();
if (item.title?.toLowerCase().includes(lower)) return true;
if (item.description?.toLowerCase().includes(lower)) return true;
if (item.employer?.toLowerCase().includes(lower)) return true;
if (item.location?.toLowerCase().includes(lower)) return true;
return false;
}
function mapJob(item: ElutaItem): CreateJobInput | null {
const jobUrl = item.link || item.guid;
if (!jobUrl) return null;
const title = item.title ? decodeHtmlEntities(item.title) : "Unknown Title";
const employer = item.employer?.trim() || "Unknown Employer";
const location = item.location?.trim() || "Canada";
return {
source: "eluta",
sourceJobId: item.guid ?? item.link,
title,
employer,
jobUrl,
applicationLink: jobUrl,
location,
datePosted: item.pubDate,
jobDescription: item.description
? decodeHtmlEntities(item.description)
: undefined,
};
}
export const manifest: ExtractorManifest = {
id: "eluta",
displayName: "Eluta",
providesSources: ["eluta"],
async run(context): Promise<ExtractorRunResult> {
if (context.shouldCancel?.()) return { success: true, jobs: [] };
const locations = readLocations(context.settings.elutaRssLocations);
if (locations.length === 0) {
return {
success: true,
jobs: [],
error:
'No Eluta RSS locations configured. Set ELUTA_RSS_LOCATIONS or elutaRssLocations (comma- or newline-separated, e.g. "Toronto, ON|Vancouver, BC").',
};
}
const maxJobs = context.settings.elutaMaxJobsPerTerm
? Number.parseInt(context.settings.elutaMaxJobsPerTerm, 10)
: 100;
const cap = Number.isFinite(maxJobs)
? Math.min(Math.max(maxJobs, 1), 500)
: 100;
const terms = context.searchTerms.length > 0 ? context.searchTerms : [];
const maxTotal = cap * Math.max(terms.length, 1);
const seen = new Set<string>();
const out: CreateJobInput[] = [];
try {
for (let i = 0; i < locations.length; i += 1) {
if (context.shouldCancel?.()) break;
const loc = locations[i];
const rssUrl = `${RSS_BASE}?location=${encodeURIComponent(loc)}`;
context.onProgress?.({
phase: "list",
termsProcessed: i,
termsTotal: locations.length,
currentUrl: rssUrl,
detail: `Eluta: fetching RSS (${i + 1}/${locations.length}) — ${loc}`,
});
const response = await fetch(rssUrl, {
headers: {
Accept: "application/rss+xml, application/xml, text/xml",
"User-Agent": "JobOps/1.0 (+https://github.com) Eluta RSS consumer",
},
});
if (!response.ok) {
throw new Error(`Eluta RSS failed (${loc}): ${response.status}`);
}
const xml = await response.text();
const items = parseItems(xml);
for (const item of items) {
if (out.length >= maxTotal) break;
if (terms.length > 0 && !terms.some((t) => matchesTerm(item, t))) {
continue;
}
const mapped = mapJob(item);
if (!mapped) continue;
const key = mapped.sourceJobId || mapped.jobUrl;
if (seen.has(key)) continue;
seen.add(key);
out.push(mapped);
}
context.onProgress?.({
phase: "list",
termsProcessed: i + 1,
termsTotal: locations.length,
currentUrl: rssUrl,
jobPagesProcessed: out.length,
detail: `Eluta: ${loc}${items.length} items in feed (${out.length} matched total)`,
});
}
return { success: true, jobs: out };
} catch (error) {
const message = error instanceof Error ? error.message : "Unknown error";
return { success: false, jobs: out, error: message };
}
},
};
export default manifest;

View File

@ -0,0 +1,17 @@
{
"name": "eluta-extractor",
"version": "0.0.1",
"type": "module",
"description": "Eluta.ca RSS feed extractor (Canadian employer-direct listings)",
"main": "manifest.ts",
"dependencies": {
"job-ops-shared": "^1.0.0"
},
"devDependencies": {
"@types/node": "^24.0.0",
"typescript": "~5.9.0"
},
"scripts": {
"check:types": "tsc --noEmit"
}
}

View File

@ -0,0 +1,17 @@
{
"compilerOptions": {
"module": "ESNext",
"moduleResolution": "bundler",
"target": "ES2022",
"outDir": "dist",
"strict": true,
"noUnusedLocals": false,
"lib": ["ES2022", "DOM"],
"types": ["node"],
"baseUrl": ".",
"paths": {
"@shared/*": ["../../shared/src/*"]
}
},
"include": ["./manifest.ts", "./src/**/*"]
}

View File

@ -0,0 +1,13 @@
# icims-extractor
Lightweight fetch of **anonymous iCIMS portal HTML search results** per configured tenant host.
Example host: `careers-appliedsystems.icims.com`
Controls:
- `icimsTenants` — newline/comma-separated hosts or JSON array of hosts
- `icimsMaxJobsPerTenant` — cap rows accepted per tenant host (default `250`)
- `icimsMaxPagesPerSearch` — max `pr=` pages per keyword query (default `10`)
This does **not** use the authenticated iCIMS Job Portal API.

View File

@ -0,0 +1,233 @@
/**
* iCIMS tenant portal anonymous HTML search (`/jobs/search`) pattern.
*
* Many tenants expose listings suitable for HTML extraction when loaded with
* `ss=1` + `in_iframe=1`. Job links typically follow `/jobs/{id}/{slug}/job`.
*/
import type {
ExtractorManifest,
ExtractorRunResult,
} from "@shared/types/extractors";
import type { CreateJobInput } from "@shared/types/jobs";
interface ParsedJobRow {
url: string;
title: string;
}
function parseHosts(raw: string | undefined): string[] {
if (!raw) return [];
try {
const parsed = JSON.parse(raw);
if (Array.isArray(parsed)) {
return parsed
.map((entry) => (typeof entry === "string" ? entry.trim() : ""))
.filter(Boolean);
}
} catch {
// fall through
}
return raw
.split(/[\n,]+/)
.map((entry) => entry.trim())
.filter(Boolean);
}
function normalizeHost(hostOrUrl: string): string {
const trimmed = hostOrUrl.trim();
if (!trimmed) return "";
try {
if (trimmed.includes("://")) {
const url = new URL(trimmed);
return url.host;
}
} catch {
return trimmed.replace(/^\/\//, "");
}
return trimmed.replace(/^\/\//, "");
}
function canonicalJobUrl(url: string): string {
try {
const parsed = new URL(url);
parsed.search = "";
return parsed.toString();
} catch {
return url.replace(/\?[^#]*/, "");
}
}
function extractRows(html: string): ParsedJobRow[] {
const out: ParsedJobRow[] = [];
const seen = new Set<string>();
const primary =
/<a[^>]*href="(https:\/\/[^"]+\/jobs\/\d+\/[^"]+\/job)(?:\?[^"]*)?"[^>]*title="\d+\s*-\s*([^"]+)"/gi;
for (;;) {
const match = primary.exec(html);
if (match === null) break;
const url = canonicalJobUrl(match[1]);
const title = match[2]?.trim();
if (!url || !title || seen.has(url)) continue;
seen.add(url);
out.push({ url, title });
}
const fallback =
/<a[^>]*href="(https:\/\/[^"]+\/jobs\/\d+\/([^"/]+)\/job)(?:\?[^"]*)?"[^>]*>/gi;
for (;;) {
const match = fallback.exec(html);
if (match === null) break;
const url = canonicalJobUrl(match[1]);
const slug = match[2];
if (!url || seen.has(url)) continue;
seen.add(url);
const title = slug
? decodeURIComponent(slug.replace(/\+/g, " "))
: "Unknown Title";
out.push({ url, title });
}
return out;
}
function matchesTerm(row: ParsedJobRow, term: string): boolean {
const lower = term.toLowerCase();
return row.title.toLowerCase().includes(lower);
}
function employerFromHost(host: string): string {
const prefix = host.replace(/^careers-/, "").replace(/^careers\./, "");
const base = prefix.replace(/\.icims\.com$/i, "");
return base.replace(/[-_.]/g, " ").trim() || host;
}
export const manifest: ExtractorManifest = {
id: "icims",
displayName: "iCIMS tenants (HTML)",
providesSources: ["icims"],
async run(context): Promise<ExtractorRunResult> {
if (context.shouldCancel?.()) return { success: true, jobs: [] };
const hosts = parseHosts(context.settings.icimsTenants)
.map(normalizeHost)
.filter(Boolean);
if (hosts.length === 0) {
return {
success: false,
jobs: [],
error: "No icimsTenants configured",
};
}
const maxPagesRaw = context.settings.icimsMaxPagesPerSearch;
const maxPages = maxPagesRaw ? Number.parseInt(maxPagesRaw, 10) : 10;
const pages = Number.isFinite(maxPages)
? Math.min(Math.max(maxPages, 1), 50)
: 10;
const maxPerTenantRaw = context.settings.icimsMaxJobsPerTenant;
const maxPerTenant = maxPerTenantRaw
? Number.parseInt(maxPerTenantRaw, 10)
: 250;
const tenantCap = Number.isFinite(maxPerTenant)
? Math.min(Math.max(maxPerTenant, 1), 2000)
: 250;
const terms = context.searchTerms.length > 0 ? context.searchTerms : [""];
const jobs: CreateJobInput[] = [];
const seenGlobal = new Set<string>();
try {
let tenantIndex = 0;
for (const rawHost of hosts) {
if (context.shouldCancel?.()) break;
const host = normalizeHost(rawHost);
tenantIndex += 1;
let tenantCount = 0;
context.onProgress?.({
phase: "list",
termsProcessed: tenantIndex - 1,
termsTotal: hosts.length,
currentUrl: host,
detail: `iCIMS tenant ${tenantIndex}/${hosts.length}: ${host}`,
});
for (const term of terms) {
if (tenantCount >= tenantCap) break;
for (let page = 1; page <= pages; page += 1) {
if (tenantCount >= tenantCap) break;
const query = new URLSearchParams({
ss: "1",
in_iframe: "1",
searchKeyword: term,
pr: String(page),
});
const searchUrl = `https://${host}/jobs/search?${query.toString()}`;
const response = await fetch(searchUrl, {
headers: {
Accept: "text/html",
"User-Agent":
"Mozilla/5.0 (compatible; JobOps/1.0) iCIMS portal reader",
},
});
if (!response.ok) {
throw new Error(
`iCIMS fetch failed (${host}): ${response.status}`,
);
}
const html = await response.text();
const rows = extractRows(html).filter((row) =>
term ? matchesTerm(row, term) : true,
);
if (rows.length === 0) break;
for (const row of rows) {
if (tenantCount >= tenantCap) break;
if (seenGlobal.has(row.url)) continue;
seenGlobal.add(row.url);
tenantCount += 1;
jobs.push({
source: "icims",
sourceJobId: row.url,
title: row.title,
employer: employerFromHost(host),
jobUrl: row.url,
applicationLink: row.url,
});
}
context.onProgress?.({
phase: "list",
termsProcessed: tenantIndex - 1,
termsTotal: hosts.length,
currentUrl: host,
jobPagesProcessed: jobs.length,
detail: `iCIMS ${host}: page ${page}, +${rows.length} rows`,
});
}
}
}
return { success: true, jobs };
} catch (error) {
const message = error instanceof Error ? error.message : "Unknown error";
return { success: false, jobs, error: message };
}
},
};
export default manifest;

View File

@ -0,0 +1,17 @@
{
"name": "icims-extractor",
"version": "0.0.1",
"type": "module",
"description": "iCIMS tenant job portal HTML search (anonymous iframe-style listings)",
"main": "manifest.ts",
"dependencies": {
"job-ops-shared": "^1.0.0"
},
"devDependencies": {
"@types/node": "^24.0.0",
"typescript": "~5.9.0"
},
"scripts": {
"check:types": "tsc --noEmit"
}
}

View File

@ -0,0 +1,17 @@
{
"compilerOptions": {
"module": "ESNext",
"moduleResolution": "bundler",
"target": "ES2022",
"outDir": "dist",
"strict": true,
"noUnusedLocals": false,
"lib": ["ES2022", "DOM"],
"types": ["node"],
"baseUrl": ".",
"paths": {
"@shared/*": ["../../shared/src/*"]
}
},
"include": ["./manifest.ts", "./src/**/*"]
}

View File

@ -0,0 +1,10 @@
# qajobsboard-extractor
Loads QA-focused postings from [QAJobsBoard](https://www.qajobsboard.com) via the hosts public JSON feed:
`GET https://qajobsboard.jobboardly.com/jobs.json`
RSS is also published at `jobs.rss`; this extractor uses JSON for structured fields.
- Caps matches via `qajobsboardMaxJobsPerTerm` (default `100`).
- Filters client-side by pipeline search terms against title, categories, and description HTML.

View File

@ -0,0 +1,217 @@
/**
* QAJobsBoard (JobBoardly) public jobs listing JSON.
*
* https://qajobsboard.jobboardly.com/jobs.json
*/
import type {
ExtractorManifest,
ExtractorRunResult,
} from "@shared/types/extractors";
import type { CreateJobInput } from "@shared/types/jobs";
const JOBS_URL = "https://qajobsboard.jobboardly.com/jobs.json";
interface JobCategory {
name?: string;
}
interface SalaryBand {
schedule?: string;
minimum?: number | null;
maximum?: number | null;
}
interface DescriptionBlock {
html?: string;
}
interface QaJobBoardlyJob {
title?: string;
arrangement?: string;
location?: string;
location_limits?: string[];
published_at?: string;
application_link?: string;
description?: DescriptionBlock;
company?: { name?: string; logo?: string };
salary?: SalaryBand;
categories?: JobCategory[];
links?: { self?: string };
}
function asString(value: unknown): string | undefined {
if (typeof value !== "string") return undefined;
const trimmed = value.trim();
return trimmed ? trimmed : undefined;
}
function decodeHtmlEntities(value: string): string {
return value
.replace(/&amp;/g, "&")
.replace(/&lt;/g, "<")
.replace(/&gt;/g, ">")
.replace(/&quot;/g, '"')
.replace(/&#39;/g, "'")
.replace(/&nbsp;/g, " ");
}
function stripHtml(html: string): string {
const noTags = html.replace(/<[^>]+>/g, " ");
return decodeHtmlEntities(noTags).replace(/\s+/g, " ").trim();
}
function salaryLabel(raw: SalaryBand | undefined): string | undefined {
if (!raw) return undefined;
const schedule = raw.schedule ? `${raw.schedule}: ` : "";
if (
typeof raw.minimum === "number" &&
typeof raw.maximum === "number" &&
Number.isFinite(raw.minimum) &&
Number.isFinite(raw.maximum)
) {
return `${schedule}${raw.minimum}${raw.maximum}`;
}
if (typeof raw.minimum === "number" && Number.isFinite(raw.minimum)) {
return `${schedule}${raw.minimum}+`;
}
if (typeof raw.maximum === "number" && Number.isFinite(raw.maximum)) {
return `${schedule}${raw.maximum}`;
}
return schedule.trim() || undefined;
}
function locationLabel(job: QaJobBoardlyJob): string {
const limits = Array.isArray(job.location_limits)
? job.location_limits.filter(
(v): v is string => typeof v === "string" && v.trim().length > 0,
)
: [];
if (limits.length > 0) return limits.join(", ");
const loc = asString(job.location);
if (loc) return loc;
return "Unknown";
}
function matchesTerm(job: QaJobBoardlyJob, term: string): boolean {
const lower = term.toLowerCase();
if (job.title?.toLowerCase().includes(lower)) return true;
const cats = Array.isArray(job.categories)
? job.categories.map((c) => c.name?.toLowerCase() ?? "").join(" ")
: "";
if (cats.includes(lower)) return true;
const html = job.description?.html ?? "";
if (stripHtml(html).toLowerCase().includes(lower)) return true;
return false;
}
function mapJob(raw: QaJobBoardlyJob): CreateJobInput | null {
const jobUrl = asString(raw.links?.self);
if (!jobUrl) return null;
const employer =
asString(raw.company?.name)
?.replace(/^[\s-]+/, "")
.trim() || "Unknown Employer";
const applicationLink = asString(raw.application_link) ?? jobUrl;
const descHtml = raw.description?.html;
const jobDescription = descHtml ? stripHtml(descHtml) : undefined;
const salary = salaryLabel(raw.salary);
const cats = Array.isArray(raw.categories)
? raw.categories
.map((c) => c?.name?.trim())
.filter((v): v is string => Boolean(v))
.join(", ")
: undefined;
return {
source: "qajobsboard",
sourceJobId: jobUrl.split("/").pop(),
title: asString(raw.title) ?? "Unknown Title",
employer,
jobUrl,
applicationLink,
location: locationLabel(raw),
isRemote: asString(raw.location)?.toLowerCase() === "remote",
datePosted: asString(raw.published_at),
jobDescription,
jobType: asString(raw.arrangement),
salary,
disciplines: cats,
companyLogo: asString(raw.company?.logo),
};
}
export const manifest: ExtractorManifest = {
id: "qajobsboard",
displayName: "QAJobsBoard",
providesSources: ["qajobsboard"],
async run(context): Promise<ExtractorRunResult> {
if (context.shouldCancel?.()) return { success: true, jobs: [] };
const maxJobs = context.settings.qajobsboardMaxJobsPerTerm
? Number.parseInt(context.settings.qajobsboardMaxJobsPerTerm, 10)
: 100;
const cap = Number.isFinite(maxJobs)
? Math.min(Math.max(maxJobs, 1), 500)
: 100;
const terms = context.searchTerms.length > 0 ? context.searchTerms : [];
context.onProgress?.({
phase: "list",
termsProcessed: 0,
termsTotal: 1,
currentUrl: JOBS_URL,
detail: "QAJobsBoard: fetching jobs.json",
});
try {
const response = await fetch(JOBS_URL, {
headers: { Accept: "application/json", "User-Agent": "JobOps/1.0" },
});
if (!response.ok) {
throw new Error(
`QAJobsBoard request failed with status ${response.status}`,
);
}
const body = (await response.json()) as unknown;
const rows = Array.isArray(body) ? body : [];
const seen = new Set<string>();
const out: CreateJobInput[] = [];
for (const row of rows as QaJobBoardlyJob[]) {
if (out.length >= cap) break;
if (terms.length > 0 && !terms.some((t) => matchesTerm(row, t)))
continue;
const mapped = mapJob(row);
if (!mapped) continue;
const key = mapped.sourceJobId || mapped.jobUrl;
if (seen.has(key)) continue;
seen.add(key);
out.push(mapped);
}
context.onProgress?.({
phase: "list",
termsProcessed: 1,
termsTotal: 1,
currentUrl: JOBS_URL,
jobPagesProcessed: out.length,
detail: `QAJobsBoard: ${out.length} matched (${rows.length} total listings)`,
});
return { success: true, jobs: out };
} catch (error) {
const message = error instanceof Error ? error.message : "Unknown error";
return { success: false, jobs: [], error: message };
}
},
};
export default manifest;

View File

@ -0,0 +1,17 @@
{
"name": "qajobsboard-extractor",
"version": "0.0.1",
"type": "module",
"description": "QAJobsBoard (JobBoardly) public jobs.json extractor",
"main": "manifest.ts",
"dependencies": {
"job-ops-shared": "^1.0.0"
},
"devDependencies": {
"@types/node": "^24.0.0",
"typescript": "~5.9.0"
},
"scripts": {
"check:types": "tsc --noEmit"
}
}

View File

@ -0,0 +1,17 @@
{
"compilerOptions": {
"module": "ESNext",
"moduleResolution": "bundler",
"target": "ES2022",
"outDir": "dist",
"strict": true,
"noUnusedLocals": false,
"lib": ["ES2022", "DOM"],
"types": ["node"],
"baseUrl": ".",
"paths": {
"@shared/*": ["../../shared/src/*"]
}
},
"include": ["./manifest.ts", "./src/**/*"]
}

View File

@ -0,0 +1,11 @@
# smartrecruiters-extractor
Fetches public job postings from the [SmartRecruiters Posting API](https://developers.smartrecruiters.com/reference/v1listpostings):
`GET https://api.smartrecruiters.com/v1/companies/{companyIdentifier}/postings`
No API key is required for public listings. Configure one **company identifier** per employer (the slug from `jobs.smartrecruiters.com/<identifier>/...` or `careers.smartrecruiters.com/<identifier>`).
- Set `smartrecruitersCompanies` (JSON array or comma/newline-separated identifiers), or `SMARTRECRUITERS_COMPANIES` in the environment.
- Optional: `smartrecruitersMaxJobsPerCompany` caps how many postings are pulled **per company** after search-term filtering (default 100).
- The manifest loads posting **details** for each match so `jobUrl` / `applicationLink` and HTML descriptions resolve correctly.

View File

@ -0,0 +1,287 @@
/**
* SmartRecruiters public Posting API (no auth for public boards).
*
* https://developers.smartrecruiters.com/reference/v1listpostings
* GET https://api.smartrecruiters.com/v1/companies/{companyIdentifier}/postings
* GET https://api.smartrecruiters.com/v1/companies/{companyIdentifier}/postings/{postingId}
*/
import type {
ExtractorManifest,
ExtractorRunResult,
} from "@shared/types/extractors";
import type { CreateJobInput } from "@shared/types/jobs";
const LIST_LIMIT = 100;
interface SrCompany {
identifier?: string;
name?: string;
}
interface SrLocation {
fullLocation?: string;
city?: string;
region?: string;
country?: string;
remote?: boolean;
hybrid?: boolean;
}
interface SrPostingSummary {
id?: string;
name?: string;
releasedDate?: string;
company?: SrCompany;
location?: SrLocation;
typeOfEmployment?: { label?: string };
experienceLevel?: { id?: string; label?: string };
}
interface SrListResponse {
content?: SrPostingSummary[];
totalFound?: number;
offset?: number;
limit?: number;
}
interface SrDetail extends SrPostingSummary {
postingUrl?: string;
applyUrl?: string;
jobAd?: {
sections?: Record<string, { title?: string; text?: string } | undefined>;
};
}
function asString(value: unknown): string | undefined {
if (typeof value !== "string") return undefined;
const trimmed = value.trim();
return trimmed ? trimmed : undefined;
}
function readCompanies(raw: string | undefined): string[] {
if (!raw) return [];
try {
const parsed = JSON.parse(raw);
if (Array.isArray(parsed)) {
return parsed
.map((entry) => (typeof entry === "string" ? entry.trim() : ""))
.filter(Boolean);
}
} catch {
// fall through
}
return raw
.split(/[\n,;|]+/)
.map((entry) => entry.trim())
.filter(Boolean);
}
function decodeHtmlEntities(value: string): string {
return value
.replace(/&amp;/g, "&")
.replace(/&lt;/g, "<")
.replace(/&gt;/g, ">")
.replace(/&quot;/g, '"')
.replace(/&#39;/g, "'")
.replace(/&nbsp;/g, " ");
}
function stripHtml(html: string): string {
const noTags = html.replace(/<[^>]+>/g, " ");
return decodeHtmlEntities(noTags).replace(/\s+/g, " ").trim();
}
function locationString(loc: SrLocation | undefined): string {
if (!loc) return "Unknown";
const full = asString(loc.fullLocation);
if (full) return full;
const parts = [loc.city, loc.region, loc.country]
.map((p) => asString(p))
.filter(Boolean) as string[];
return parts.length > 0 ? parts.join(", ") : "Unknown";
}
function extractDescription(detail: SrDetail): string | undefined {
const sections = detail.jobAd?.sections;
if (!sections || typeof sections !== "object") return undefined;
const chunks: string[] = [];
for (const block of Object.values(sections)) {
const text = block && typeof block.text === "string" ? block.text : "";
if (text.trim()) chunks.push(text);
}
if (chunks.length === 0) return undefined;
return stripHtml(chunks.join("\n\n"));
}
function matchesTerm(summary: SrPostingSummary, term: string): boolean {
const lower = term.toLowerCase();
if (summary.name?.toLowerCase().includes(lower)) return true;
if (locationString(summary.location).toLowerCase().includes(lower))
return true;
if (summary.company?.name?.toLowerCase().includes(lower)) return true;
return false;
}
async function fetchPostingsPage(
company: string,
offset: number,
): Promise<SrListResponse> {
const base = `https://api.smartrecruiters.com/v1/companies/${encodeURIComponent(company)}/postings`;
const url = `${base}?destination=PUBLIC&limit=${LIST_LIMIT}&offset=${offset}`;
const response = await fetch(url, {
headers: { Accept: "application/json" },
});
if (response.status === 404) {
return { content: [], totalFound: 0, offset: 0, limit: LIST_LIMIT };
}
if (!response.ok) {
throw new Error(
`SmartRecruiters list for "${company}" failed with status ${response.status}`,
);
}
return (await response.json()) as SrListResponse;
}
async function fetchPostingDetail(
company: string,
postingId: string,
): Promise<SrDetail | null> {
const url = `https://api.smartrecruiters.com/v1/companies/${encodeURIComponent(company)}/postings/${encodeURIComponent(postingId)}`;
const response = await fetch(url, {
headers: { Accept: "application/json" },
});
if (!response.ok) return null;
return (await response.json()) as SrDetail;
}
function mapDetailToJob(detail: SrDetail): CreateJobInput | null {
const postingId = asString(detail.id);
if (!postingId) return null;
const jobUrl = asString(detail.applyUrl) ?? asString(detail.postingUrl);
if (!jobUrl) return null;
const employer =
asString(detail.company?.name) ??
asString(detail.company?.identifier) ??
"Unknown Employer";
const jobType = asString(detail.typeOfEmployment?.label);
const jobLevel = asString(detail.experienceLevel?.label);
return {
source: "smartrecruiters",
sourceJobId: postingId,
title: asString(detail.name) ?? "Unknown Title",
employer,
jobUrl,
applicationLink: asString(detail.applyUrl) ?? jobUrl,
location: locationString(detail.location),
isRemote: detail.location?.remote === true,
datePosted: asString(detail.releasedDate),
jobDescription: extractDescription(detail),
jobType: jobType || undefined,
jobLevel: jobLevel || undefined,
};
}
export const manifest: ExtractorManifest = {
id: "smartrecruiters",
displayName: "SmartRecruiters (ATS)",
providesSources: ["smartrecruiters"],
async run(context): Promise<ExtractorRunResult> {
if (context.shouldCancel?.()) return { success: true, jobs: [] };
const companies = readCompanies(context.settings.smartrecruitersCompanies);
if (companies.length === 0) {
return {
success: true,
jobs: [],
error:
"No SmartRecruiters companies configured. Set SMARTRECRUITERS_COMPANIES or smartrecruitersCompanies (comma- or newline-separated company identifiers).",
};
}
const maxPerCompany = context.settings.smartrecruitersMaxJobsPerCompany
? Number.parseInt(context.settings.smartrecruitersMaxJobsPerCompany, 10)
: 100;
const cap = Number.isFinite(maxPerCompany)
? Math.min(Math.max(maxPerCompany, 1), 500)
: 100;
const terms = context.searchTerms.length > 0 ? context.searchTerms : [];
const seen = new Set<string>();
const out: CreateJobInput[] = [];
try {
for (let i = 0; i < companies.length; i += 1) {
if (context.shouldCancel?.()) break;
const company = companies[i];
context.onProgress?.({
phase: "list",
termsProcessed: i,
termsTotal: companies.length,
currentUrl: company,
detail: `SmartRecruiters: ${company} (${i + 1}/${companies.length})`,
});
const matchedSummaries: SrPostingSummary[] = [];
let offset = 0;
let totalFound = Number.POSITIVE_INFINITY;
while (matchedSummaries.length < cap && offset < totalFound) {
if (context.shouldCancel?.()) break;
const page = await fetchPostingsPage(company, offset);
const batch = Array.isArray(page.content) ? page.content : [];
totalFound =
typeof page.totalFound === "number" ? page.totalFound : offset;
if (batch.length === 0) break;
for (const row of batch) {
if (matchedSummaries.length >= cap) break;
if (terms.length > 0 && !terms.some((t) => matchesTerm(row, t))) {
continue;
}
matchedSummaries.push(row);
}
offset += batch.length;
if (offset >= totalFound) break;
}
let added = 0;
for (const summary of matchedSummaries) {
if (context.shouldCancel?.()) break;
const id = asString(summary.id);
if (!id) continue;
const detail = await fetchPostingDetail(company, id);
if (!detail) continue;
const mapped = mapDetailToJob(detail);
if (!mapped) continue;
const key = mapped.sourceJobId || mapped.jobUrl;
if (seen.has(key)) continue;
seen.add(key);
out.push(mapped);
added += 1;
}
context.onProgress?.({
phase: "list",
termsProcessed: i + 1,
termsTotal: companies.length,
currentUrl: company,
jobPagesProcessed: out.length,
detail: `SmartRecruiters: ${company}${added} jobs (${out.length} total)`,
});
}
} catch (error) {
const message = error instanceof Error ? error.message : "Unknown error";
return { success: false, jobs: out, error: message };
}
return { success: true, jobs: out };
},
};
export default manifest;

View File

@ -0,0 +1,17 @@
{
"name": "smartrecruiters-extractor",
"version": "0.0.1",
"type": "module",
"description": "SmartRecruiters public Posting API extractor",
"main": "manifest.ts",
"dependencies": {
"job-ops-shared": "^1.0.0"
},
"devDependencies": {
"@types/node": "^24.0.0",
"typescript": "~5.9.0"
},
"scripts": {
"check:types": "tsc --noEmit"
}
}

View File

@ -0,0 +1,17 @@
{
"compilerOptions": {
"module": "ESNext",
"moduleResolution": "bundler",
"target": "ES2022",
"outDir": "dist",
"strict": true,
"noUnusedLocals": false,
"lib": ["ES2022", "DOM"],
"types": ["node"],
"baseUrl": ".",
"paths": {
"@shared/*": ["../../shared/src/*"]
}
},
"include": ["./manifest.ts", "./src/**/*"]
}

View File

@ -503,7 +503,9 @@ async function fetchApi<T>(
): Promise<T> {
const method = (options?.method || "GET").toUpperCase();
const activeCreds = getActiveBasicAuthCredentials();
let authHeader = activeCreds ? encodeBasicAuthHeaderValue(activeCreds) : undefined;
let authHeader = activeCreds
? encodeBasicAuthHeaderValue(activeCreds)
: undefined;
let authAttempt = 0;
let usernameHint = activeCreds?.username;

View File

@ -210,6 +210,11 @@ export const getEnabledSources = (
const hasGreenhouseCompanies =
(settings.greenhouseCompanies?.value ?? []).length > 0;
const hasWorkdayTenants = (settings.workdayTenants?.value ?? []).length > 0;
const hasSmartrecruitersCompanies =
(settings.smartrecruitersCompanies?.value ?? []).length > 0;
const hasElutaRssLocations =
(settings.elutaRssLocations?.value ?? []).length > 0;
const hasIcimsTenants = (settings.icimsTenants?.value ?? []).length > 0;
for (const source of orderedSources) {
if (source === "gradcracker") {
@ -272,6 +277,22 @@ export const getEnabledSources = (
if (hasWorkdayTenants) enabled.push(source);
continue;
}
if (source === "smartrecruiters") {
if (hasSmartrecruitersCompanies) enabled.push(source);
continue;
}
if (source === "icims") {
if (hasIcimsTenants) enabled.push(source);
continue;
}
if (source === "bctenet") {
enabled.push(source);
continue;
}
if (source === "eluta") {
if (hasElutaRssLocations) enabled.push(source);
continue;
}
if (
source === "indeed" ||
source === "linkedin" ||
@ -286,7 +307,9 @@ export const getEnabledSources = (
source === "arbeitnow" ||
source === "himalayas" ||
source === "weworkremotely" ||
source === "fourdayweek"
source === "fourdayweek" ||
source === "qajobsboard" ||
source === "arcdev"
) {
enabled.push(source);
}

View File

@ -10,9 +10,9 @@ import { asyncRoute, fail, ok } from "@infra/http";
import { logger } from "@infra/logger";
import { getRequestId } from "@infra/request-context";
import { isDemoMode, sendDemoBlocked } from "@server/config/demo";
import { getJobOwnerProfileId } from "@server/infra/request-context";
import { DEFAULT_JOB_OWNER_PROFILE_ID } from "@server/infra/job-owner-context";
import { parseBasicAuthUsername } from "@server/infra/basic-auth-credentials";
import { DEFAULT_JOB_OWNER_PROFILE_ID } from "@server/infra/job-owner-context";
import { getJobOwnerProfileId } from "@server/infra/request-context";
import * as profilesRepo from "@server/repositories/profiles";
import { getSetting } from "@server/repositories/settings";
import { setBackupSettings } from "@server/services/backup/index";
@ -30,11 +30,11 @@ import {
} from "@server/services/rxresume";
import { getEffectiveSettings } from "@server/services/settings";
import { applySettingsUpdates } from "@server/services/settings-update";
import { jobSearchProfileSchema } from "@shared/settings-registry";
import {
type UpdateSettingsInput,
updateSettingsSchema,
} from "@shared/settings-schema";
import { jobSearchProfileSchema } from "@shared/settings-registry";
import { type Request, type Response, Router } from "express";
export const settingsRouter = Router();
@ -251,9 +251,12 @@ settingsRouter.patch(
) {
const parsed = jobSearchProfileSchema.safeParse(input.jobSearchProfile);
if (parsed.success) {
const username = parseBasicAuthUsername(req.headers.authorization)?.trim();
const dataWithOwner =
username ? { ...parsed.data, basicAuthUser: username } : parsed.data;
const username = parseBasicAuthUsername(
req.headers.authorization,
)?.trim();
const dataWithOwner = username
? { ...parsed.data, basicAuthUser: username }
: parsed.data;
await profilesRepo.updateProfile(ownerId, { data: dataWithOwner });
}
}

View File

@ -272,6 +272,12 @@ export const DEMO_SOURCE_BASE_URLS: Record<JobSource, string> = {
lever: "https://jobs.lever.co",
greenhouse: "https://boards.greenhouse.io",
workday: "https://workday.com",
smartrecruiters: "https://www.smartrecruiters.com",
icims: "https://www.icims.com",
bctenet: "https://www.bctechnology.com",
eluta: "https://www.eluta.ca",
qajobsboard: "https://www.qajobsboard.com",
arcdev: "https://arc.dev",
manual: "https://example.com",
};

View File

@ -12,13 +12,13 @@ import {
normalizeCountryKey,
} from "@shared/location-support.js";
import { resolveBlockedCompanyKeywordsFromStoredString } from "@shared/resolve-blocked-company-keywords.js";
import { jobSearchProfileSchema } from "@shared/settings-registry.js";
import {
inferCountryKeyFromSearchGeography,
matchesRequestedCity,
resolveSearchCities,
shouldApplyStrictCityFilter,
} from "@shared/search-cities.js";
import { jobSearchProfileSchema } from "@shared/settings-registry.js";
import type { CreateJobInput, PipelineConfig } from "@shared/types";
import { type CrawlSource, progressHelpers, updateProgress } from "../progress";
@ -106,19 +106,14 @@ const ROLE_TOKEN_STOPWORDS = new Set([
]);
function normalizeText(value: string | null | undefined): string {
return (value ?? "")
.toLowerCase()
.replace(/\s+/g, " ")
.trim();
return (value ?? "").toLowerCase().replace(/\s+/g, " ").trim();
}
function buildRoleMatchers(phrases: string[]): {
phraseMatchers: string[];
tokenMatchers: string[];
} {
const phraseMatchers = phrases
.map((p) => normalizeText(p))
.filter(Boolean);
const phraseMatchers = phrases.map((p) => normalizeText(p)).filter(Boolean);
const tokenSet = new Set<string>();
for (const phrase of phraseMatchers) {
@ -164,7 +159,10 @@ function filterJobsBySearchProfile(args: {
const body = normalizeText(job.jobDescription);
const haystack = `${title}\n${body}`;
if (dealBreakersLower.length > 0 && matchesAny(haystack, dealBreakersLower)) {
if (
dealBreakersLower.length > 0 &&
matchesAny(haystack, dealBreakersLower)
) {
return false;
}

151
package-lock.json generated
View File

@ -14,6 +14,7 @@
"devDependencies": {
"@types/node": "^25.2.3",
"dotenv": "^17.2.3",
"job-ops-shared": "workspace:*",
"knip": "^5.83.1",
"tsx": "^4.19.2",
"typescript": "^5.9.3"
@ -125,6 +126,27 @@
"undici-types": "~7.16.0"
}
},
"extractors/arcdev": {
"name": "arcdev-extractor",
"version": "0.0.1",
"dependencies": {
"job-ops-shared": "^1.0.0"
},
"devDependencies": {
"@types/node": "^24.0.0",
"typescript": "~5.9.0"
}
},
"extractors/arcdev/node_modules/@types/node": {
"version": "24.12.4",
"resolved": "https://registry.npmjs.org/@types/node/-/node-24.12.4.tgz",
"integrity": "sha512-GUUEShf+PBCGW2KaXwcIt3Yk+e3pkKwWKb9GSyM9WQVE+ep2jzmHdGsHzu4wgcZy5fN9FBdVzjpBQsYlpfpgLA==",
"dev": true,
"license": "MIT",
"dependencies": {
"undici-types": "~7.16.0"
}
},
"extractors/ashby": {
"name": "ashby-extractor",
"version": "0.0.1",
@ -146,6 +168,27 @@
"undici-types": "~7.16.0"
}
},
"extractors/bctenet": {
"name": "bctenet-extractor",
"version": "0.0.1",
"dependencies": {
"job-ops-shared": "^1.0.0"
},
"devDependencies": {
"@types/node": "^24.0.0",
"typescript": "~5.9.0"
}
},
"extractors/bctenet/node_modules/@types/node": {
"version": "24.12.4",
"resolved": "https://registry.npmjs.org/@types/node/-/node-24.12.4.tgz",
"integrity": "sha512-GUUEShf+PBCGW2KaXwcIt3Yk+e3pkKwWKb9GSyM9WQVE+ep2jzmHdGsHzu4wgcZy5fN9FBdVzjpBQsYlpfpgLA==",
"dev": true,
"license": "MIT",
"dependencies": {
"undici-types": "~7.16.0"
}
},
"extractors/careerjet": {
"name": "careerjet-extractor",
"version": "0.0.1",
@ -167,6 +210,27 @@
"undici-types": "~7.16.0"
}
},
"extractors/eluta": {
"name": "eluta-extractor",
"version": "0.0.1",
"dependencies": {
"job-ops-shared": "^1.0.0"
},
"devDependencies": {
"@types/node": "^24.0.0",
"typescript": "~5.9.0"
}
},
"extractors/eluta/node_modules/@types/node": {
"version": "24.12.4",
"resolved": "https://registry.npmjs.org/@types/node/-/node-24.12.4.tgz",
"integrity": "sha512-GUUEShf+PBCGW2KaXwcIt3Yk+e3pkKwWKb9GSyM9WQVE+ep2jzmHdGsHzu4wgcZy5fN9FBdVzjpBQsYlpfpgLA==",
"dev": true,
"license": "MIT",
"dependencies": {
"undici-types": "~7.16.0"
}
},
"extractors/fourdayweek": {
"name": "fourdayweek-extractor",
"version": "0.0.1",
@ -308,6 +372,27 @@
"undici-types": "~7.16.0"
}
},
"extractors/icims": {
"name": "icims-extractor",
"version": "0.0.1",
"dependencies": {
"job-ops-shared": "^1.0.0"
},
"devDependencies": {
"@types/node": "^24.0.0",
"typescript": "~5.9.0"
}
},
"extractors/icims/node_modules/@types/node": {
"version": "24.12.4",
"resolved": "https://registry.npmjs.org/@types/node/-/node-24.12.4.tgz",
"integrity": "sha512-GUUEShf+PBCGW2KaXwcIt3Yk+e3pkKwWKb9GSyM9WQVE+ep2jzmHdGsHzu4wgcZy5fN9FBdVzjpBQsYlpfpgLA==",
"dev": true,
"license": "MIT",
"dependencies": {
"undici-types": "~7.16.0"
}
},
"extractors/jobicy": {
"name": "jobicy-extractor",
"version": "0.0.1",
@ -371,6 +456,27 @@
"undici-types": "~7.16.0"
}
},
"extractors/qajobsboard": {
"name": "qajobsboard-extractor",
"version": "0.0.1",
"dependencies": {
"job-ops-shared": "^1.0.0"
},
"devDependencies": {
"@types/node": "^24.0.0",
"typescript": "~5.9.0"
}
},
"extractors/qajobsboard/node_modules/@types/node": {
"version": "24.12.4",
"resolved": "https://registry.npmjs.org/@types/node/-/node-24.12.4.tgz",
"integrity": "sha512-GUUEShf+PBCGW2KaXwcIt3Yk+e3pkKwWKb9GSyM9WQVE+ep2jzmHdGsHzu4wgcZy5fN9FBdVzjpBQsYlpfpgLA==",
"dev": true,
"license": "MIT",
"dependencies": {
"undici-types": "~7.16.0"
}
},
"extractors/reed": {
"name": "reed-extractor",
"version": "0.0.1",
@ -434,6 +540,27 @@
"undici-types": "~7.16.0"
}
},
"extractors/smartrecruiters": {
"name": "smartrecruiters-extractor",
"version": "0.0.1",
"dependencies": {
"job-ops-shared": "^1.0.0"
},
"devDependencies": {
"@types/node": "^24.0.0",
"typescript": "~5.9.0"
}
},
"extractors/smartrecruiters/node_modules/@types/node": {
"version": "24.12.4",
"resolved": "https://registry.npmjs.org/@types/node/-/node-24.12.4.tgz",
"integrity": "sha512-GUUEShf+PBCGW2KaXwcIt3Yk+e3pkKwWKb9GSyM9WQVE+ep2jzmHdGsHzu4wgcZy5fN9FBdVzjpBQsYlpfpgLA==",
"dev": true,
"license": "MIT",
"dependencies": {
"undici-types": "~7.16.0"
}
},
"extractors/startupjobs": {
"name": "startupjobs-extractor",
"version": "0.0.1",
@ -9725,6 +9852,10 @@
"resolved": "extractors/arbeitnow",
"link": true
},
"node_modules/arcdev-extractor": {
"resolved": "extractors/arcdev",
"link": true
},
"node_modules/arg": {
"version": "5.0.2",
"resolved": "https://registry.npmjs.org/arg/-/arg-5.0.2.tgz",
@ -10008,6 +10139,10 @@
"integrity": "sha512-x+VAiMRL6UPkx+kudNvxTl6hB2XNNCG2r+7wixVfIYwu/2HKRXimwQyaumLjMveWvT2Hkd/cAJw+QBMfJ/EKVw==",
"license": "MIT"
},
"node_modules/bctenet-extractor": {
"resolved": "extractors/bctenet",
"link": true
},
"node_modules/better-sqlite3": {
"version": "12.6.2",
"resolved": "https://registry.npmjs.org/better-sqlite3/-/better-sqlite3-12.6.2.tgz",
@ -12298,6 +12433,10 @@
"integrity": "sha512-3vifjt1HgrGW/h76UEeny+adYApveS9dH2h3p57JYzBSXJIKUJAvtmIytDKjcSCt9xHfrNCFJ7gts6vkhuq++w==",
"license": "ISC"
},
"node_modules/eluta-extractor": {
"resolved": "extractors/eluta",
"link": true
},
"node_modules/emoji-regex": {
"version": "8.0.0",
"resolved": "https://registry.npmjs.org/emoji-regex/-/emoji-regex-8.0.0.tgz",
@ -14774,6 +14913,10 @@
"node": ">=10.18"
}
},
"node_modules/icims-extractor": {
"resolved": "extractors/icims",
"link": true
},
"node_modules/iconv-lite": {
"version": "0.7.2",
"resolved": "https://registry.npmjs.org/iconv-lite/-/iconv-lite-0.7.2.tgz",
@ -21784,6 +21927,10 @@
"node": ">=16.0.0"
}
},
"node_modules/qajobsboard-extractor": {
"resolved": "extractors/qajobsboard",
"link": true
},
"node_modules/qs": {
"version": "6.14.2",
"resolved": "https://registry.npmjs.org/qs/-/qs-6.14.2.tgz",
@ -23500,6 +23647,10 @@
"npm": ">= 3.0.0"
}
},
"node_modules/smartrecruiters-extractor": {
"resolved": "extractors/smartrecruiters",
"link": true
},
"node_modules/smol-toml": {
"version": "1.6.0",
"resolved": "https://registry.npmjs.org/smol-toml/-/smol-toml-1.6.0.tgz",

View File

@ -26,6 +26,7 @@
"devDependencies": {
"dotenv": "^17.2.3",
"@types/node": "^25.2.3",
"job-ops-shared": "workspace:*",
"knip": "^5.83.1",
"tsx": "^4.19.2",
"typescript": "^5.9.3"

View File

@ -1,8 +1,14 @@
/**
* Tiny smoke-test for new extractors: imports each manifest, runs it with a
* minimal context, and prints the count of mapped jobs + a few samples.
* Smoke-test helper for extractor manifests: imports each manifest, runs it with a
* minimal context, and prints mapped job counts + a sample row.
*
* Run from repo root: npx tsx scripts/smoke-extractors.ts [comma,separated,ids]
* Run from repo root:
* npx tsx scripts/smoke-extractors.ts
* npx tsx scripts/smoke-extractors.ts arcdev,icims
* npx tsx scripts/smoke-extractors.ts indeed # alias `jobspy` (same manifest)
*
* Keep `ALL_TARGETS` aligned with every shipped manifest under each
* `extractors/<name>/` package (`manifest.ts` or `src/manifest.ts`).
*
* Loads repo-root `.env` so keyed extractors match orchestrator behavior (plain
* `tsx` does not read `.env` automatically).
@ -14,7 +20,7 @@ import { config as loadEnv } from "dotenv";
import type {
ExtractorManifest,
ExtractorRuntimeContext,
} from "../shared/src/types/extractors";
} from "job-ops-shared/types/extractors";
const repoRoot = path.resolve(
path.dirname(fileURLToPath(import.meta.url)),
@ -22,54 +28,56 @@ const repoRoot = path.resolve(
);
loadEnv({ path: path.join(repoRoot, ".env") });
/** Left column width for log alignment (longest pipeline source id today). */
const ID_COL = 15;
/** JobSpy serves Indeed / LinkedIn / Glassdoor; CLI filter accepts those ids as aliases. */
const JOBSPY_SITE_IDS = ["indeed", "linkedin", "glassdoor"] as const;
function expandSmokeFilter(ids: Set<string>): Set<string> {
const next = new Set(ids);
for (const site of JOBSPY_SITE_IDS) {
if (next.has(site)) {
next.add("jobspy");
break;
}
}
return next;
}
interface Target {
id: string;
importPath: string;
needs?: string[]; // env vars required to run; skipped if missing
settings?: Record<string, string>;
/** When set, replaces the default smoke search terms (use [] for sources that filter client-side). */
searchTerms?: string[];
/** Geography passed as `selectedCountry` (must match what each extractor expects). */
selectedCountry?: string;
}
const ALL_TARGETS: Target[] = [
{
id: "jobicy",
importPath: "../extractors/jobicy/manifest",
settings: { jobicyMaxJobsPerTerm: "10" },
},
{
id: "themuse",
importPath: "../extractors/themuse/manifest",
settings: { themuseMaxJobsPerTerm: "10" },
},
{
id: "usajobs",
importPath: "../extractors/usajobs/manifest",
needs: ["USAJOBS_API_KEY", "USAJOBS_USER_AGENT"],
settings: { usajobsMaxJobsPerTerm: "10" },
},
{
id: "jooble",
importPath: "../extractors/jooble/manifest",
needs: ["JOOBLE_API_KEY"],
settings: { joobleMaxJobsPerTerm: "10" },
},
{
id: "careerjet",
importPath: "../extractors/careerjet/manifest",
needs: ["CAREERJET_AFFID", "CAREERJET_REFERER", "CAREERJET_USER_IP"],
settings: { careerjetMaxJobsPerTerm: "10" },
},
{
id: "reed",
importPath: "../extractors/reed/manifest",
needs: ["REED_API_KEY"],
settings: { reedMaxJobsPerTerm: "10" },
},
{
id: "lever",
importPath: "../extractors/lever/manifest",
id: "adzuna",
importPath: "../extractors/adzuna/manifest",
needs: ["ADZUNA_APP_ID", "ADZUNA_APP_KEY"],
selectedCountry: "United States",
settings: {
// Known active public Lever board used purely as a connectivity check.
leverCompanies: JSON.stringify(["palantir", "netflix"]),
adzunaMaxJobsPerTerm: "10",
searchCities: "United States",
},
},
{
id: "arbeitnow",
importPath: "../extractors/arbeitnow/manifest",
settings: { arbeitnowMaxJobsPerTerm: "10" },
},
{
id: "arcdev",
importPath: "../extractors/arcdev/manifest",
settings: {
arcRemoteJobsPaths: JSON.stringify(["/remote-jobs/playwright"]),
arcMaxJobsPerPath: "20",
},
},
{
@ -79,6 +87,40 @@ const ALL_TARGETS: Target[] = [
ashbyCompanies: JSON.stringify(["ramp", "linear"]),
},
},
{
id: "bctenet",
importPath: "../extractors/bctenet/manifest",
selectedCountry: "Canada",
settings: {
bctenetMaxJobsPerTerm: "25",
},
},
{
id: "careerjet",
importPath: "../extractors/careerjet/manifest",
needs: ["CAREERJET_AFFID", "CAREERJET_REFERER", "CAREERJET_USER_IP"],
settings: { careerjetMaxJobsPerTerm: "10", searchCities: "United States" },
},
{
id: "eluta",
importPath: "../extractors/eluta/manifest",
selectedCountry: "Canada",
settings: {
elutaRssLocations: JSON.stringify(["Toronto, ON"]),
elutaMaxJobsPerTerm: "15",
},
},
{
id: "fourdayweek",
importPath: "../extractors/fourdayweek/manifest",
settings: { fourdayweekMaxJobsPerTerm: "10" },
},
{
id: "gradcracker",
importPath: "../extractors/gradcracker/manifest",
selectedCountry: "United Kingdom",
settings: { gradcrackerMaxJobsPerTerm: "10" },
},
{
id: "greenhouse",
importPath: "../extractors/greenhouse/manifest",
@ -87,13 +129,70 @@ const ALL_TARGETS: Target[] = [
},
},
{
id: "workday",
importPath: "../extractors/workday/manifest",
settings: {
workdayTenants: JSON.stringify([
"https://nvidia.wd5.myworkdayjobs.com/en-US/NVIDIAExternalCareerSite",
]),
id: "himalayas",
importPath: "../extractors/himalayas/manifest",
settings: { himalayasMaxJobsPerTerm: "10" },
},
{
id: "hiringcafe",
importPath: "../extractors/hiringcafe/manifest",
selectedCountry: "United Kingdom",
settings: {
searchCities: "UK",
jobspyResultsWanted: "10",
workplaceTypes: JSON.stringify(["remote", "hybrid", "onsite"]),
},
},
{
id: "icims",
importPath: "../extractors/icims/manifest",
searchTerms: [],
settings: {
icimsTenants: JSON.stringify(["careers-appliedsystems.icims.com"]),
icimsMaxJobsPerTenant: "15",
icimsMaxPagesPerSearch: "2",
},
},
{
id: "jobicy",
importPath: "../extractors/jobicy/manifest",
settings: { jobicyMaxJobsPerTerm: "10" },
},
{
id: "jobspy",
importPath: "../extractors/jobspy/manifest",
selectedCountry: "United Kingdom",
settings: {
searchCities: "UK",
jobspyCountryIndeed: "UK",
jobspyResultsWanted: "5",
workplaceTypes: JSON.stringify(["remote", "hybrid", "onsite"]),
},
},
{
id: "jooble",
importPath: "../extractors/jooble/manifest",
needs: ["JOOBLE_API_KEY"],
settings: { joobleMaxJobsPerTerm: "10", searchCities: "United States" },
},
{
id: "lever",
importPath: "../extractors/lever/manifest",
settings: {
leverCompanies: JSON.stringify(["palantir", "netflix"]),
},
},
{
id: "qajobsboard",
importPath: "../extractors/qajobsboard/manifest",
settings: { qajobsboardMaxJobsPerTerm: "25" },
},
{
id: "reed",
importPath: "../extractors/reed/manifest",
needs: ["REED_API_KEY"],
selectedCountry: "United Kingdom",
settings: { reedMaxJobsPerTerm: "10" },
},
{
id: "remoteok",
@ -106,14 +205,40 @@ const ALL_TARGETS: Target[] = [
settings: { remotiveMaxJobsPerTerm: "10" },
},
{
id: "arbeitnow",
importPath: "../extractors/arbeitnow/manifest",
settings: { arbeitnowMaxJobsPerTerm: "10" },
id: "smartrecruiters",
importPath: "../extractors/smartrecruiters/manifest",
settings: {
smartrecruitersCompanies: JSON.stringify(["smartrecruiters"]),
smartrecruitersMaxJobsPerCompany: "5",
},
},
{
id: "himalayas",
importPath: "../extractors/himalayas/manifest",
settings: { himalayasMaxJobsPerTerm: "10" },
id: "startupjobs",
importPath: "../extractors/startupjobs/src/manifest",
selectedCountry: "United Kingdom",
settings: {
searchCities: "UK",
startupjobsMaxJobsPerTerm: "10",
workplaceTypes: JSON.stringify(["remote", "hybrid", "onsite"]),
},
},
{
id: "themuse",
importPath: "../extractors/themuse/manifest",
settings: { themuseMaxJobsPerTerm: "10" },
},
{
id: "ukvisajobs",
importPath: "../extractors/ukvisajobs/manifest",
needs: ["UKVISAJOBS_EMAIL", "UKVISAJOBS_PASSWORD"],
selectedCountry: "United Kingdom",
settings: { ukvisajobsMaxJobs: "10" },
},
{
id: "usajobs",
importPath: "../extractors/usajobs/manifest",
needs: ["USAJOBS_API_KEY", "USAJOBS_USER_AGENT"],
settings: { usajobsMaxJobsPerTerm: "10" },
},
{
id: "weworkremotely",
@ -121,22 +246,29 @@ const ALL_TARGETS: Target[] = [
settings: { weworkremotelyMaxJobsPerTerm: "10" },
},
{
id: "fourdayweek",
importPath: "../extractors/fourdayweek/manifest",
settings: { fourdayweekMaxJobsPerTerm: "10" },
id: "workday",
importPath: "../extractors/workday/manifest",
settings: {
workdayTenants: JSON.stringify([
"https://nvidia.wd5.myworkdayjobs.com/en-US/NVIDIAExternalCareerSite",
]),
},
},
];
function buildContext(
source: string,
settings: Record<string, string>,
): ExtractorRuntimeContext {
function buildContext(args: {
source: string;
settings: Record<string, string>;
searchTerms?: string[];
selectedCountry?: string;
}): ExtractorRuntimeContext {
return {
source,
selectedSources: [source],
settings,
searchTerms: ["software engineer"],
selectedCountry: "United States",
source: args.source,
selectedSources: [args.source],
settings: args.settings,
searchTerms:
args.searchTerms !== undefined ? args.searchTerms : ["software engineer"],
selectedCountry: args.selectedCountry ?? "United States",
getExistingJobUrls: async () => [],
shouldCancel: () => false,
onProgress: () => {},
@ -151,7 +283,7 @@ async function runOne(target: Target): Promise<void> {
const missing = (target.needs ?? []).filter((k) => !process.env[k]);
if (missing.length > 0) {
console.log(
`${pad(target.id, 12)} SKIP missing env: ${missing.join(", ")}`,
`${pad(target.id, ID_COL)} SKIP missing env: ${missing.join(", ")}`,
);
return;
}
@ -161,20 +293,25 @@ async function runOne(target: Target): Promise<void> {
mod = await import(target.importPath);
} catch (err) {
console.log(
`${pad(target.id, 12)} FAIL import error: ${(err as Error).message}`,
`${pad(target.id, ID_COL)} FAIL import error: ${(err as Error).message}`,
);
return;
}
const manifest = mod.manifest ?? mod.default;
if (!manifest) {
console.log(`${pad(target.id, 12)} FAIL manifest export missing`);
console.log(`${pad(target.id, ID_COL)} FAIL manifest export missing`);
return;
}
const started = Date.now();
try {
const ctx = buildContext(target.id, target.settings ?? {});
const ctx = buildContext({
source: target.id,
settings: target.settings ?? {},
searchTerms: target.searchTerms,
selectedCountry: target.selectedCountry,
});
const result = await manifest.run(ctx);
const ms = Date.now() - started;
const status = result.success ? "OK " : "ERR ";
@ -183,12 +320,12 @@ async function runOne(target: Target): Promise<void> {
? ` | first: "${sample.title}" @ ${sample.employer}`
: "";
console.log(
`${pad(target.id, 12)} ${status} jobs=${result.jobs.length} ${ms}ms${result.error ? ` | error: ${result.error}` : ""}${sampleStr}`,
`${pad(target.id, ID_COL)} ${status} jobs=${result.jobs.length} ${ms}ms${result.error ? ` | error: ${result.error}` : ""}${sampleStr}`,
);
} catch (err) {
const ms = Date.now() - started;
console.log(
`${pad(target.id, 12)} CRASH ${ms}ms ${(err as Error).message}`,
`${pad(target.id, ID_COL)} CRASH ${ms}ms ${(err as Error).message}`,
);
}
}
@ -196,11 +333,13 @@ async function runOne(target: Target): Promise<void> {
async function main() {
const requested = (process.argv[2] ?? "").trim();
const filter = requested
? new Set(
? expandSmokeFilter(
new Set(
requested
.split(",")
.map((s) => s.trim())
.filter(Boolean),
),
)
: null;
const targets = filter

View File

@ -27,6 +27,12 @@ export const EXTRACTOR_SOURCE_IDS = [
"lever",
"greenhouse",
"workday",
"smartrecruiters",
"icims",
"bctenet",
"eluta",
"qajobsboard",
"arcdev",
"manual",
] as const;
@ -203,6 +209,44 @@ export const EXTRACTOR_SOURCE_METADATA: Record<
category: "pipeline",
region: "global",
},
smartrecruiters: {
label: "SmartRecruiters (ATS)",
order: 245,
category: "pipeline",
region: "global",
},
icims: {
label: "iCIMS tenants (HTML)",
order: 241,
category: "pipeline",
region: "global",
},
bctenet: {
label: "BC T-Net (RSS)",
order: 243,
category: "pipeline",
region: "global",
countryAllowlist: ["canada"],
},
eluta: {
label: "Eluta",
order: 247,
category: "pipeline",
region: "global",
countryAllowlist: ["canada"],
},
qajobsboard: {
label: "QAJobsBoard",
order: 248,
category: "pipeline",
region: "remote",
},
arcdev: {
label: "Arc.dev",
order: 249,
category: "pipeline",
region: "remote",
},
manual: { label: "Manual", order: 900, category: "manual" },
};

View File

@ -59,6 +59,9 @@ describe("location-support", () => {
true,
);
expect(isSourceAllowedForCountry("startupjobs", "worldwide")).toBe(true);
expect(isSourceAllowedForCountry("eluta", "canada")).toBe(true);
expect(isSourceAllowedForCountry("eluta", "united states")).toBe(false);
expect(isSourceAllowedForCountry("smartrecruiters", "japan")).toBe(true);
});
it("filters incompatible sources while preserving compatible order", () => {

View File

@ -1,3 +1,4 @@
import { EXTRACTOR_SOURCE_METADATA } from "./extractors";
import type { JobSource } from "./types";
const COUNTRY_ALIASES: Record<string, string> = {
@ -199,6 +200,16 @@ export function isSourceAllowedForCountry(
if (US_ONLY_SOURCES.has(source)) return isUsCountry(country);
if (source === "glassdoor") return isGlassdoorCountry(country);
if (source === "adzuna") return getAdzunaCountryCode(country) !== null;
const meta =
EXTRACTOR_SOURCE_METADATA[source as keyof typeof EXTRACTOR_SOURCE_METADATA];
if (meta?.countryAllowlist && meta.countryAllowlist.length > 0) {
const key = normalizeCountryKey(country);
return meta.countryAllowlist.some(
(token) => normalizeCountryKey(token) === key,
);
}
return true;
}

View File

@ -445,6 +445,37 @@ export const settingsRegistry = {
parse: parseIntOrNull,
serialize: serializeNullableNumber,
},
qajobsboardMaxJobsPerTerm: {
kind: "typed" as const,
schema: z.number().int().min(1).max(1000),
default: (): number => 100,
parse: parseIntOrNull,
serialize: serializeNullableNumber,
},
/**
* Arc.dev remote job listing paths (on https://arc.dev), e.g. /remote-jobs/playwright
*/
arcRemoteJobsPaths: {
kind: "typed" as const,
schema: z.array(z.string().trim().min(1).max(200)).max(20),
default: (): string[] => {
const fromEnv = parseCompanyList(
typeof process !== "undefined" ? process.env.ARC_REMOTE_JOBS_PATHS : "",
);
return fromEnv.length > 0
? fromEnv
: ["/remote-jobs/playwright", "/remote-jobs/cypress"];
},
parse: parseJsonArrayOrNull,
serialize: serializeNullableJsonArray,
},
arcMaxJobsPerPath: {
kind: "typed" as const,
schema: z.number().int().min(1).max(300),
default: (): number => 120,
parse: parseIntOrNull,
serialize: serializeNullableNumber,
},
/**
* Comma- or newline-separated company slugs to fetch from public ATS feeds.
* `lever`, `ashby`, and `greenhouse` each take one entry per company.
@ -493,6 +524,108 @@ export const settingsRegistry = {
parse: parseJsonArrayOrNull,
serialize: serializeNullableJsonArray,
},
/**
* SmartRecruiters company identifiers (Posting API path segment), e.g.
* `jobs.smartrecruiters.com/smartrecruiters/...` "smartrecruiters".
*/
smartrecruitersCompanies: {
kind: "typed" as const,
schema: z.array(z.string().trim().min(1).max(100)).max(200),
default: (): string[] =>
parseCompanyList(
typeof process !== "undefined"
? process.env.SMARTRECRUITERS_COMPANIES
: "",
),
parse: parseJsonArrayOrNull,
serialize: serializeNullableJsonArray,
},
smartrecruitersMaxJobsPerCompany: {
kind: "typed" as const,
schema: z.number().int().min(1).max(500),
default: (): number => 100,
parse: parseIntOrNull,
serialize: serializeNullableNumber,
},
/**
* Eluta RSS `location` query values, e.g. "Toronto, ON", "Vancouver, BC".
* Very broad values may yield empty feeds; prefer metro/province strings.
*/
elutaRssLocations: {
kind: "typed" as const,
schema: z.array(z.string().trim().min(1).max(200)).max(50),
default: (): string[] =>
parseCompanyList(
typeof process !== "undefined" ? process.env.ELUTA_RSS_LOCATIONS : "",
),
parse: parseJsonArrayOrNull,
serialize: serializeNullableJsonArray,
},
elutaMaxJobsPerTerm: {
kind: "typed" as const,
schema: z.number().int().min(1).max(1000),
default: (): number => 100,
parse: parseIntOrNull,
serialize: serializeNullableNumber,
},
/**
* Optional BC T-Net RSS URLs. When empty, the extractor uses the default public
* aggregate feed. Env `BCTENET_RSS_URLS` seeds defaults (JSON array or newline-separated).
*/
bctenetRssUrls: {
kind: "typed" as const,
schema: z.array(z.string().trim().min(1).max(500)).max(20),
default: (): string[] => {
const raw =
typeof process !== "undefined"
? process.env.BCTENET_RSS_URLS?.trim()
: "";
if (!raw) return [];
const parsed = parseJsonArrayOrNull(raw);
if (parsed && parsed.length > 0) return parsed;
return raw
.split(/\n+/)
.map((piece) => piece.trim())
.filter(Boolean);
},
parse: parseJsonArrayOrNull,
serialize: serializeNullableJsonArray,
},
bctenetMaxJobsPerTerm: {
kind: "typed" as const,
schema: z.number().int().min(1).max(2000),
default: (): number => 400,
parse: parseIntOrNull,
serialize: serializeNullableNumber,
},
/**
* iCIMS tenant hosts (e.g. careers-example.icims.com). Env `ICIMS_TENANTS`
* seeds defaults (comma/newline-separated).
*/
icimsTenants: {
kind: "typed" as const,
schema: z.array(z.string().trim().min(1).max(200)).max(100),
default: (): string[] =>
parseCompanyList(
typeof process !== "undefined" ? process.env.ICIMS_TENANTS : "",
),
parse: parseJsonArrayOrNull,
serialize: serializeNullableJsonArray,
},
icimsMaxJobsPerTenant: {
kind: "typed" as const,
schema: z.number().int().min(1).max(2000),
default: (): number => 250,
parse: parseIntOrNull,
serialize: serializeNullableNumber,
},
icimsMaxPagesPerSearch: {
kind: "typed" as const,
schema: z.number().int().min(1).max(50),
default: (): number => 10,
parse: parseIntOrNull,
serialize: serializeNullableNumber,
},
searchTerms: {
kind: "typed" as const,
schema: z.array(z.string().trim().min(1).max(200)).max(100),

View File

@ -200,10 +200,30 @@ export const createAppSettings = (
himalayasMaxJobsPerTerm: { value: 50, default: 50, override: null },
weworkremotelyMaxJobsPerTerm: { value: 50, default: 50, override: null },
fourdayweekMaxJobsPerTerm: { value: 50, default: 50, override: null },
qajobsboardMaxJobsPerTerm: { value: 100, default: 100, override: null },
arcRemoteJobsPaths: {
value: ["/remote-jobs/playwright", "/remote-jobs/cypress"],
default: ["/remote-jobs/playwright", "/remote-jobs/cypress"],
override: null,
},
arcMaxJobsPerPath: { value: 120, default: 120, override: null },
leverCompanies: { value: [], default: [], override: null },
ashbyCompanies: { value: [], default: [], override: null },
greenhouseCompanies: { value: [], default: [], override: null },
workdayTenants: { value: [], default: [], override: null },
smartrecruitersCompanies: { value: [], default: [], override: null },
smartrecruitersMaxJobsPerCompany: {
value: 100,
default: 100,
override: null,
},
elutaRssLocations: { value: [], default: [], override: null },
elutaMaxJobsPerTerm: { value: 100, default: 100, override: null },
bctenetRssUrls: { value: [], default: [], override: null },
bctenetMaxJobsPerTerm: { value: 400, default: 400, override: null },
icimsTenants: { value: [], default: [], override: null },
icimsMaxJobsPerTenant: { value: 250, default: 250, override: null },
icimsMaxPagesPerSearch: { value: 10, default: 10, override: null },
searchTerms: {
value: ["Software Engineer"],
default: ["Software Engineer"],

View File

@ -213,10 +213,22 @@ export interface AppSettings {
himalayasMaxJobsPerTerm: Resolved<number>;
weworkremotelyMaxJobsPerTerm: Resolved<number>;
fourdayweekMaxJobsPerTerm: Resolved<number>;
qajobsboardMaxJobsPerTerm: Resolved<number>;
arcRemoteJobsPaths: Resolved<string[]>;
arcMaxJobsPerPath: Resolved<number>;
leverCompanies: Resolved<string[]>;
ashbyCompanies: Resolved<string[]>;
greenhouseCompanies: Resolved<string[]>;
workdayTenants: Resolved<string[]>;
smartrecruitersCompanies: Resolved<string[]>;
smartrecruitersMaxJobsPerCompany: Resolved<number>;
elutaRssLocations: Resolved<string[]>;
elutaMaxJobsPerTerm: Resolved<number>;
bctenetRssUrls: Resolved<string[]>;
bctenetMaxJobsPerTerm: Resolved<number>;
icimsTenants: Resolved<string[]>;
icimsMaxJobsPerTenant: Resolved<number>;
icimsMaxPagesPerSearch: Resolved<number>;
searchTerms: Resolved<string[]>;
workplaceTypes: Resolved<Array<"remote" | "hybrid" | "onsite">>;
blockedCompanyKeywords: Resolved<string[]>;