feat(extractors): expand catalog, smoke coverage, and sourcing docs
Some checks failed
CI / Linting (Biome) (push) Failing after 40s
CI / Tests (push) Successful in 5m54s
CI / Type Check (adzuna-extractor) (push) Successful in 1m8s
CI / Type Check (gradcracker-extractor) (push) Successful in 1m11s
CI / Type Check (hiringcafe-extractor) (push) Successful in 1m8s
CI / Type Check (orchestrator) (push) Successful in 1m23s
CI / Type Check (startupjobs-extractor) (push) Successful in 1m6s
CI / Type Check (ukvisajobs-extractor) (push) Successful in 1m7s
CI / Documentation (push) Successful in 1m54s
Some checks failed
CI / Linting (Biome) (push) Failing after 40s
CI / Tests (push) Successful in 5m54s
CI / Type Check (adzuna-extractor) (push) Successful in 1m8s
CI / Type Check (gradcracker-extractor) (push) Successful in 1m11s
CI / Type Check (hiringcafe-extractor) (push) Successful in 1m8s
CI / Type Check (orchestrator) (push) Successful in 1m23s
CI / Type Check (startupjobs-extractor) (push) Successful in 1m6s
CI / Type Check (ukvisajobs-extractor) (push) Successful in 1m7s
CI / Documentation (push) Successful in 1m54s
Adds Arc.dev, BC T-Net, Eluta, iCIMS tenants, QAJobsBoard, and SmartRecruiters manifests with registry/settings/UI wiring; registers full extractor list in smoke-extractors and documents supplementary board access paths. Aligns Careerjet v4 with the url query parameter and fixes strict typing in QAJobsBoard. Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
parent
67508d56ea
commit
c840f289e1
44
.env.example
44
.env.example
@ -200,6 +200,7 @@ ADZUNA_APP_KEY=
|
||||
# LEVER_COMPANIES=netflix,figma
|
||||
# ASHBY_COMPANIES=ramp,linear
|
||||
# GREENHOUSE_COMPANIES=stripe,airbnb
|
||||
# Canadian QA-employer examples (full table): docs-site/docs/extractors/canadian-companies-qa-ats.md
|
||||
|
||||
# =============================================================================
|
||||
# Workday (public career sites) - optional
|
||||
@ -210,3 +211,46 @@ ADZUNA_APP_KEY=
|
||||
# 2) A JSON object with explicit fields:
|
||||
# {"company":"NVIDIA","tenantUrl":"https://nvidia.wd5.myworkdayjobs.com","tenant":"nvidia","site":"NVIDIAExternalCareerSite","locale":"en-US"}
|
||||
# WORKDAY_TENANTS=
|
||||
|
||||
# =============================================================================
|
||||
# SmartRecruiters (public Posting API) - optional
|
||||
# =============================================================================
|
||||
# Comma- or newline-separated company identifiers (API path segment), e.g.
|
||||
# jobs.smartrecruiters.com/smartrecruiters/... → "smartrecruiters".
|
||||
# SMARTRECRUITERS_COMPANIES=smartrecruiters
|
||||
# SMARTRECRUITERS_MAX_JOBS_PER_COMPANY=100
|
||||
|
||||
# =============================================================================
|
||||
# Eluta (Canada, RSS by location) - optional
|
||||
# =============================================================================
|
||||
# Comma- or newline-separated location strings for https://www.eluta.ca/rss?location=...
|
||||
# Example: ELUTA_RSS_LOCATIONS=Toronto, ON|Vancouver, BC
|
||||
# ELUTA_MAX_JOBS_PER_TERM=100
|
||||
|
||||
# =============================================================================
|
||||
# BC T-Net (British Columbia tech jobs RSS) — optional
|
||||
# =============================================================================
|
||||
# Default feed is built into the extractor when this is unset:
|
||||
# https://www.bctechnology.com/rss/jobs/tnetjobs.xml
|
||||
# Override with JSON array or newline-separated URLs (custom feeds from T-Net builder).
|
||||
# BCTENET_RSS_URLS=
|
||||
# Prefer Settings: bctenetRssUrls (JSON array), bctenetMaxJobsPerTerm (default 400).
|
||||
|
||||
# =============================================================================
|
||||
# iCIMS tenant portals (anonymous HTML search) — optional
|
||||
# =============================================================================
|
||||
# Comma- or newline-separated hosts, e.g. careers-example.icims.com
|
||||
# ICIMS_TENANTS=
|
||||
# Caps via Settings: icimsMaxJobsPerTenant (default 250), icimsMaxPagesPerSearch (default 10).
|
||||
|
||||
# =============================================================================
|
||||
# QAJobsBoard (QA JobBoardly JSON) — optional
|
||||
# =============================================================================
|
||||
# Configure caps via Settings: qajobsboardMaxJobsPerTerm (default 100).
|
||||
|
||||
# =============================================================================
|
||||
# Arc.dev remote listings — optional
|
||||
# =============================================================================
|
||||
# Comma-separated paths under https://arc.dev used when seeding defaults (e.g. Playwright + Cypress feeds).
|
||||
# ARC_REMOTE_JOBS_PATHS=/remote-jobs/playwright,/remote-jobs/cypress
|
||||
# Prefer Settings for overrides: arcRemoteJobsPaths (JSON array), arcMaxJobsPerPath (default 120).
|
||||
|
||||
34
docs-site/docs/extractors/arcdev.md
Normal file
34
docs-site/docs/extractors/arcdev.md
Normal file
@ -0,0 +1,34 @@
|
||||
---
|
||||
id: arcdev
|
||||
title: Arc.dev Extractor
|
||||
description: Remote tech roles from Arc.dev listing pages via embedded Next.js data.
|
||||
sidebar_position: 17
|
||||
---
|
||||
|
||||
## What it is
|
||||
|
||||
[Arc.dev](https://arc.dev) exposes remote job listings on paths such as `/remote-jobs/playwright` and `/remote-jobs/cypress`. The extractor downloads SSR HTML and parses the embedded `__NEXT_DATA__` payload (Arc-managed and external rows).
|
||||
|
||||
Implementation: `extractors/arcdev/manifest.ts`.
|
||||
|
||||
## Why it exists
|
||||
|
||||
Curated remote hiring with explicit tooling-oriented feeds; many roles are open to North America when labeled that way on the site.
|
||||
|
||||
## How to use it
|
||||
|
||||
1. Enable **Arc.dev** in pipeline sources (no credentials).
|
||||
2. Configure **`arcRemoteJobsPaths`** as a JSON array of path strings (defaults include Playwright and Cypress remote feeds). Optionally seed defaults from **`ARC_REMOTE_JOBS_PATHS`** (comma-separated paths).
|
||||
3. Set **`arcMaxJobsPerPath`** (default `120`, max `300`) to cap rows per listing URL after deduplication.
|
||||
4. Align **`searchTerms`** with titles or stacks you care about; empty-term behavior is handled inside the manifest per path.
|
||||
|
||||
## Common problems
|
||||
|
||||
- **HTML changes:** If Arc ships a new payload shape, parsing may need an update; smoke-test with `npx tsx scripts/smoke-extractors.ts arcdev`, or run the full extractor suite with `npx tsx scripts/smoke-extractors.ts`.
|
||||
- **`Arc talent network` employer:** Some Arc-managed rows omit a company name; the mapper uses that placeholder.
|
||||
|
||||
## Related pages
|
||||
|
||||
- [Extractors overview](/docs/next/extractors/overview)
|
||||
- [Canadian QA contracting firms](/docs/next/extractors/qa-contract-staffing-canada)
|
||||
- [Manual Import](/docs/next/extractors/manual)
|
||||
117
docs-site/docs/extractors/canadian-companies-qa-ats.md
Normal file
117
docs-site/docs/extractors/canadian-companies-qa-ats.md
Normal file
@ -0,0 +1,117 @@
|
||||
---
|
||||
id: canadian-companies-qa-ats
|
||||
title: Canadian companies — strong QA orgs and scrapable ATS
|
||||
description: Reference list of Canadian tech employers with solid QA cultures and practical ATS endpoints for JobOps pipelines.
|
||||
sidebar_position: 41
|
||||
---
|
||||
|
||||
## What it is
|
||||
|
||||
A curated reference of **Canadian-headquartered or Canadian-heavy tech employers** where QA / SDET / test automation is often a first-class function, together with **scrapable ATS endpoints** where they exist.
|
||||
|
||||
Tier 1 targets map cleanly to the shipped **Ashby**, **Greenhouse**, **Lever**, **Workday**, and **SmartRecruiters** extractors. Tier 2 entries need custom scraping, browser automation, or upstream quirks.
|
||||
|
||||
**Verification:** Tier 1 integrations below were probed successfully (**HTTP `200`**, JSON where applicable). **Posting counts change daily** — re-run probes locally when you need exact volumes.
|
||||
|
||||
## Why it exists
|
||||
|
||||
Canada-focused QA sourcing benefits from **employer-direct ATS feeds** (clean titles, real apply URLs) instead of only aggregator noise. This page maps recognizable brands to **exact integration shapes** so you can paste slugs into Settings or env without rediscovering URLs.
|
||||
|
||||
## How to use it
|
||||
|
||||
### Tier 1 — Public ATS APIs (shipped extractors)
|
||||
|
||||
| Company | HQ | ATS | Endpoint / shape (reference) | JobOps wiring |
|
||||
| --- | --- | --- | --- | --- |
|
||||
| Wealthsimple | Toronto | Ashby | `GET https://api.ashbyhq.com/posting-api/job-board/wealthsimple` | Add **`wealthsimple`** to **`ashbyCompanies`** / `ASHBY_COMPANIES`. |
|
||||
| 1Password | Toronto (remote-first) | Ashby | `.../job-board/1password` | Add **`1password`**. |
|
||||
| Jobber | Edmonton / Toronto | Ashby | `.../job-board/jobber` | Add **`jobber`**. |
|
||||
| Nylas | Toronto / SF | Ashby | `.../job-board/nylas` | Add **`nylas`**. |
|
||||
| Hootsuite | Vancouver | Greenhouse | `GET https://boards-api.greenhouse.io/v1/boards/hootsuite/jobs?content=true` | Add **`hootsuite`** to **`greenhouseCompanies`**. |
|
||||
| Faire | Waterloo / SF | Greenhouse | `.../boards/faire/jobs?content=true` | Add **`faire`**. |
|
||||
| PointClickCare | Mississauga | Lever | `GET https://api.lever.co/v0/postings/pointclickcare?mode=json` | Add **`pointclickcare`** to **`leverCompanies`**. |
|
||||
| Clio | Burnaby / Calgary / Toronto | Workday | `POST https://clio.wd3.myworkdayjobs.com/wday/cxs/clio/ClioCareerSite/jobs` (`limit`, `offset`, `searchText`) | Add **`https://clio.wd3.myworkdayjobs.com/en-US/ClioCareerSite`** to **`workdayTenants`** / `WORKDAY_TENANTS`. |
|
||||
| Coveo | Quebec City / Montreal | SmartRecruiters | `GET https://api.smartrecruiters.com/v1/companies/Coveo/postings` | Add **`Coveo`** to **`smartrecruitersCompanies`**. API stays **`200`**; **`totalFound`** may be **zero** between hiring waves. |
|
||||
|
||||
Optional Ashby query parameter `?includeCompensation=true` works in browsers and `curl` for richer payloads; the bundled Ashby extractor calls the **same path without that query** and still returns full job lists.
|
||||
|
||||
**Example Settings JSON (merge with your existing lists):**
|
||||
|
||||
```json
|
||||
["wealthsimple", "1password", "jobber", "nylas"]
|
||||
```
|
||||
|
||||
```json
|
||||
["hootsuite", "faire"]
|
||||
```
|
||||
|
||||
```json
|
||||
["pointclickcare"]
|
||||
```
|
||||
|
||||
```json
|
||||
["https://clio.wd3.myworkdayjobs.com/en-US/ClioCareerSite"]
|
||||
```
|
||||
|
||||
```json
|
||||
["Coveo"]
|
||||
```
|
||||
|
||||
### Tier 2 — Harder or custom surfaces
|
||||
|
||||
| Company | HQ | ATS | Notes |
|
||||
| --- | --- | --- | --- |
|
||||
| Shopify | Ottawa / remote | Ashby (custom) | Hosted board / GraphQL (`jobs.ashbyhq.com/api/non-user-graphql`, `organizationHostedJobsPageName: "shopify"`) or parse careers HTML — not covered by the slug-based Ashby extractor today. |
|
||||
| Lightspeed Commerce | Montreal | Custom (often Cloudflare) | Careers HTML at `https://www.lightspeedhq.com/careers/openings/` — browser or tolerant fetcher; no shipped extractor. |
|
||||
| RBC Borealis | Toronto / Montreal | Greenhouse (embedded) | `boards-api` path **`rbcborealis`** returned **404** when probed — scrape `https://rbcborealis.com/careers/` or rediscover the active board slug before using Greenhouse JSON. |
|
||||
| Vidyard | Kitchener–Waterloo | JS-heavy site | `https://careers.vidyard.com/` — Playwright/Puppeteer if automating. |
|
||||
| Loblaw Digital | Toronto | Workday (parent) | Parent Workday host may need the correct site segment; careers marketing site often lists roles — browser-backed discovery may be more reliable than guessing CXS paths. |
|
||||
|
||||
### Ready-to-use CLI filters (QA-oriented titles)
|
||||
|
||||
Ashby (example: Wealthsimple):
|
||||
|
||||
```bash
|
||||
curl -s 'https://api.ashbyhq.com/posting-api/job-board/wealthsimple?includeCompensation=true' \
|
||||
| jq '.jobs[] | select(.title | test("QA|SDET|Test|Automation"; "i")) | {title, location, url: .jobUrl}'
|
||||
```
|
||||
|
||||
Greenhouse (example: Hootsuite):
|
||||
|
||||
```bash
|
||||
curl -s 'https://boards-api.greenhouse.io/v1/boards/hootsuite/jobs?content=true' \
|
||||
| jq '.jobs[] | select(.title | test("QA|SDET|Test|Automation"; "i")) | {title, location: .location.name, url: .absolute_url}'
|
||||
```
|
||||
|
||||
Lever (PointClickCare):
|
||||
|
||||
```bash
|
||||
curl -s 'https://api.lever.co/v0/postings/pointclickcare?mode=json' \
|
||||
| jq '.[] | select(.text | test("QA|SDET|Test|Automation"; "i")) | {title: .text, location: .categories.location, url: .hostedUrl}'
|
||||
```
|
||||
|
||||
Workday (Clio — QA search text):
|
||||
|
||||
```bash
|
||||
curl -s -X POST 'https://clio.wd3.myworkdayjobs.com/wday/cxs/clio/ClioCareerSite/jobs' \
|
||||
-H 'Content-Type: application/json' \
|
||||
-d '{"limit":50,"offset":0,"searchText":"QA"}' \
|
||||
| jq '.jobPostings[] | select(.title | test("QA|SDET|Test|Automation"; "i")) | {title, location: .locationsText}'
|
||||
```
|
||||
|
||||
### Other strong-QA Canadian employers (ATS not deep-verified here)
|
||||
|
||||
Worth manual checks or Eluta / LinkedIn cross-reference: **Wattpad**, **Knix**, **Ada**, **Hopper**, **Plusgrade**, **D2L**, **Kinaxis**, **TELUS Digital / Mirum**, **Trulioo**, **OpenText / Hubdoc**.
|
||||
|
||||
## Common problems
|
||||
|
||||
- **Ashby counts vs `includeCompensation`:** Omitting the query param still returns jobs; compensation fields may be sparser.
|
||||
- **Greenhouse board slug drift:** If `boards-api` returns `404`, the employer may have renamed the board — inspect their careers page embed or HTML source for the current board id.
|
||||
- **SmartRecruiters zero postings:** Still a valid integration; don’t treat empty arrays as a broken extractor.
|
||||
|
||||
## Related pages
|
||||
|
||||
- [Extractors overview](/docs/next/extractors/overview)
|
||||
- [Canadian / NA QA contracting firms](/docs/next/extractors/qa-contract-staffing-canada)
|
||||
- [Eluta](/docs/next/extractors/eluta)
|
||||
- [Manual Import](/docs/next/extractors/manual)
|
||||
44
docs-site/docs/extractors/eluta.md
Normal file
44
docs-site/docs/extractors/eluta.md
Normal file
@ -0,0 +1,44 @@
|
||||
---
|
||||
id: eluta
|
||||
title: Eluta Extractor
|
||||
description: Canadian job discovery via Eluta.ca public RSS feeds.
|
||||
sidebar_position: 15
|
||||
---
|
||||
|
||||
## What it is
|
||||
|
||||
Original site: [eluta.ca](https://www.eluta.ca)
|
||||
|
||||
The extractor lives in `extractors/eluta/manifest.ts`. It requests one or more public RSS URLs of the form `https://www.eluta.ca/rss?location=...`, parses items (title, employer, location, link, description), filters by pipeline search terms, and merges feeds while de-duplicating by `guid` / URL.
|
||||
|
||||
## Why it exists
|
||||
|
||||
Eluta surfaces Canadian roles indexed directly from employer career sites, often with less aggregator noise than generic job search. RSS provides a stable, low-auth integration compared to scraping HTML.
|
||||
|
||||
## How to use it
|
||||
|
||||
1. Choose **location strings** Eluta accepts in the RSS `location` query parameter (for example `Toronto, ON`, `Vancouver, BC`). Very broad values such as a whole country may return empty feeds; prefer metros or provinces.
|
||||
2. In **Settings**, set **Eluta RSS locations** (`elutaRssLocations`) as a JSON array or comma/newline-separated list, or set `ELUTA_RSS_LOCATIONS` in the environment (for example `Toronto, ON|Montreal, QC`).
|
||||
3. Optionally set **Eluta max jobs per term** (`elutaMaxJobsPerTerm`, default `100`).
|
||||
4. Set your search geography to **Canada** — Eluta is **Canada-only** and is skipped automatically when the resolved pipeline country is not Canada.
|
||||
5. Enable **Eluta** in pipeline sources and run the pipeline.
|
||||
|
||||
## Common problems
|
||||
|
||||
### Eluta is skipped for my run
|
||||
|
||||
- Search geography is not Canada (city/country/Indeed country resolution). Align geography to Canada or disable Eluta for non-Canada profiles.
|
||||
|
||||
### Empty feeds
|
||||
|
||||
- The `location` string may be too broad or spelled differently than Eluta expects. Try a major city plus province (e.g. `Calgary, AB`).
|
||||
|
||||
### RSS HTTP errors
|
||||
|
||||
- Eluta may block unusual clients; the extractor sends a conventional User-Agent. Retry later or reduce the number of location feeds per run.
|
||||
|
||||
## Related pages
|
||||
|
||||
- [Extractors Overview](/docs/next/extractors/overview)
|
||||
- [Add an Extractor](/docs/next/workflows/add-an-extractor)
|
||||
- [Settings](/docs/next/features/settings)
|
||||
@ -19,6 +19,12 @@ Extractor integrations are now registered through manifests and loaded automatic
|
||||
| [Hiring Cafe](/docs/next/extractors/hiring-cafe) | Browser-backed discovery using Hiring Cafe search APIs | Subject to upstream anti-bot checks; uses browser context and encoded search-state payloads | `HIRING_CAFE_SEARCH_TERMS`, `HIRING_CAFE_COUNTRY`, `HIRING_CAFE_MAX_JOBS_PER_TERM`, `HIRING_CAFE_DATE_FETCHED_PAST_N_DAYS` | Uses existing pipeline term/country/budget knobs and maps directly to normalized jobs |
|
||||
| [startup.jobs](/docs/next/extractors/startup-jobs) | Startup-focused discovery through the published `startup-jobs-scraper` package | No credentials required; detail enrichment depends on Playwright browser binaries being installed | existing pipeline `searchTerms`, selected country/cities, `jobspyResultsWanted`; `npx playwright install` for fresh environments | Algolia-backed search plus detail-page enrichment via package import; orchestrator maps normalized records and de-duplicates by `jobUrl` |
|
||||
| [UKVisaJobs](/docs/next/extractors/ukvisajobs) | UK visa sponsorship-focused roles | Requires authenticated session and periodic token/cookie refresh | `UKVISAJOBS_EMAIL`, `UKVISAJOBS_PASSWORD`, `UKVISAJOBS_MAX_JOBS`, `UKVISAJOBS_SEARCH_KEYWORD` | API pagination + dataset output; orchestrator de-dupes and may fetch missing descriptions |
|
||||
| [SmartRecruiters](/docs/next/extractors/smartrecruiters) | Enterprise employers on SmartRecruiters public boards | No auth; needs configured company identifiers; one HTTP round-trip per posting for apply URLs + descriptions | `SMARTRECRUITERS_COMPANIES`, `SMARTRECRUITERS_MAX_JOBS_PER_COMPANY` | Paginates the public Posting API, filters by pipeline terms, normalizes to `CreateJobInput` |
|
||||
| iCIMS tenants (HTML) | Large employers on iCIMS portals | No auth; HTML search varies by tenant — maintain explicit tenant hosts | `ICIMS_TENANTS`, Settings: `icimsTenants`, `icimsMaxJobsPerTenant`, `icimsMaxPagesPerSearch` | Fetches `/jobs/search` with iframe-style params, parses listing links, caps per tenant |
|
||||
| BC T-Net (RSS) | BC tech aggregate via T-Net | Canada-only; free RSS (default feed built-in); optional extra feeds | `BCTENET_RSS_URLS`, Settings: `bctenetRssUrls`, `bctenetMaxJobsPerTerm` | Fetches RSS item blocks, normalizes quirky CDATA link fragments, filters by pipeline terms |
|
||||
| [Eluta](/docs/next/extractors/eluta) | Canadian listings aggregated from employer career sites (RSS) | Canada-only source (skipped when search geography is not Canada); RSS `location` strings must be set | `ELUTA_RSS_LOCATIONS`, `ELUTA_MAX_JOBS_PER_TERM` | Fetches one or more `eluta.ca` RSS feeds, filters by terms, de-duplicates by guid/URL |
|
||||
| [QAJobsBoard](/docs/next/extractors/qajobsboard) | QA / SDET / automation-heavy board (global JSON feed) | No auth; geography skew is manual/filter downstream | `qajobsboardMaxJobsPerTerm` | Fetches JobBoardly JSON, filters by pipeline terms |
|
||||
| [Arc.dev](/docs/next/extractors/arcdev) | Remote roles from Arc.dev listing pages (tool-tagged paths) | Parses SSR `__NEXT_DATA__`; relies on stable Next payload | `ARC_REMOTE_JOBS_PATHS` (seeds defaults), `arcRemoteJobsPaths`, `arcMaxJobsPerPath` | Merges Arc-managed + external rows; dedupes by URL |
|
||||
| [Manual Import](/docs/next/extractors/manual) | One-off jobs not covered by scrapers | Inference quality depends on model/provider and input quality; some URLs cannot be fetched reliably | App/API endpoints (`/api/manual-jobs/infer`, `/api/manual-jobs/import`) | Accepts text/HTML/URL, runs inference, then saves and scores job after review |
|
||||
|
||||
## Which extractor should I use?
|
||||
@ -29,10 +35,38 @@ Extractor integrations are now registered through manifests and loaded automatic
|
||||
- Use **startup.jobs** when you want startup-heavy listings without maintaining another scraper locally.
|
||||
- Use **Gradcracker** when targeting graduate pipelines in the UK.
|
||||
- Use **UKVisaJobs** for sponsorship-specific UK searches.
|
||||
- Use **SmartRecruiters** when you can list target employers’ public SmartRecruiters company identifiers.
|
||||
- Use **iCIMS tenants** when you can list target `*.icims.com` career hosts (anonymous portal HTML search).
|
||||
- Use **BC T-Net** for British Columbia tech RSS listings (runs only when search geography is Canada).
|
||||
- Use **Eluta** for Canadian employer-direct listings via RSS (set metro/province `location` strings).
|
||||
- Use **QAJobsBoard** or **Arc.dev** when you want QA- or remote-stack-focused feeds without extra credentials.
|
||||
- Use **Manual Import** when you already have a specific posting and need direct import.
|
||||
|
||||
Many runs combine sources: broad discovery first, then manual import for high-priority jobs that scraping misses.
|
||||
|
||||
### QA-focused boards (shipped extractors)
|
||||
|
||||
- **[QAJobsBoard](/docs/next/extractors/qajobsboard)** — Large QA-oriented index via public JSON; filter geography downstream.
|
||||
- **[Arc.dev](/docs/next/extractors/arcdev)** — Remote feeds (e.g. Playwright / Cypress paths); good for vetted remote slices.
|
||||
|
||||
### Canadian QA contracting firms (reference)
|
||||
|
||||
Staffing and consultancy firms that frequently post QA automation contracts — scrape hints and CLI probes: **[Canadian / NA QA contracting firms](/docs/next/extractors/qa-contract-staffing-canada)**.
|
||||
|
||||
### Canadian employers — QA-strong ATS (reference)
|
||||
|
||||
Direct ATS JSON / extractor wiring for well-known Canadian tech brands (Ashby, Greenhouse, Lever, Workday, SmartRecruiters): **[Canadian companies — strong QA orgs and scrapable ATS](/docs/next/extractors/canadian-companies-qa-ats)**.
|
||||
|
||||
## Supplementary job boards
|
||||
|
||||
Some boards are **credential-gated**, **approval-gated**, or **scraping-hostile** — see **[Supplementary sources — access notes](/docs/next/extractors/supplementary-sources-access-notes)** for realistic paths (Careerjet, Reed, Job Bank XML policy, sponsorship data sources, etc.).
|
||||
|
||||
JobOps ships **BC T-Net** and **iCIMS tenant HTML** extractors for two cases that are usually workable without vendor contracts; everything else in the old “long tail” list still lands best via **[Manual Import](/docs/next/extractors/manual)** until someone promotes it to a manifest.
|
||||
|
||||
### Still common manual-import targets
|
||||
|
||||
- **Wellfound** (formerly AngelList), **Otta**, **Welcome to the Jungle**, **Dice**, **Job Bank** (unless you qualify for syndication), regional boards without stable feeds — use Manual Import or an external tool, then normalize here.
|
||||
|
||||
## Related extractor docs
|
||||
|
||||
- [Gradcracker](/docs/next/extractors/gradcracker)
|
||||
@ -41,5 +75,12 @@ Many runs combine sources: broad discovery first, then manual import for high-pr
|
||||
- [Hiring Cafe](/docs/next/extractors/hiring-cafe)
|
||||
- [startup.jobs](/docs/next/extractors/startup-jobs)
|
||||
- [UKVisaJobs](/docs/next/extractors/ukvisajobs)
|
||||
- [SmartRecruiters](/docs/next/extractors/smartrecruiters)
|
||||
- [Supplementary sources — access notes](/docs/next/extractors/supplementary-sources-access-notes)
|
||||
- [Eluta](/docs/next/extractors/eluta)
|
||||
- [QAJobsBoard](/docs/next/extractors/qajobsboard)
|
||||
- [Arc.dev](/docs/next/extractors/arcdev)
|
||||
- [Canadian / NA QA contracting firms](/docs/next/extractors/qa-contract-staffing-canada)
|
||||
- [Canadian companies — QA-strong ATS](/docs/next/extractors/canadian-companies-qa-ats)
|
||||
- [Manual Import](/docs/next/extractors/manual)
|
||||
- [Add an Extractor](/docs/next/workflows/add-an-extractor)
|
||||
|
||||
92
docs-site/docs/extractors/qa-contract-staffing-canada.md
Normal file
92
docs-site/docs/extractors/qa-contract-staffing-canada.md
Normal file
@ -0,0 +1,92 @@
|
||||
---
|
||||
id: qa-contract-staffing-canada
|
||||
title: Canadian / NA QA contracting firms
|
||||
description: Staffing and consultancy boards that often post QA automation contracts, with JobOps wiring notes and scrape hints.
|
||||
sidebar_position: 40
|
||||
---
|
||||
|
||||
## What it is
|
||||
|
||||
A curated list of Canadian and North American **staffing firms and consultancies** that regularly carry **QA / SDET / test automation** contract roles. Coverage emphasizes targets that are **live**, **contract-heavy**, and roughly ordered by **scraping ease** for automation.
|
||||
|
||||
This is **not** an extractor implementation checklist: several firms need HTML or browser automation. Use native JobOps extractors where they apply, and [Manual Import](/docs/next/extractors/manual) elsewhere.
|
||||
|
||||
## Why it exists
|
||||
|
||||
Contract QA pipelines often come through agencies before they appear on Indeed or LinkedIn runs. Mapping firms to **ATS type** (Workday, Greenhouse, Lever, custom API, JS-rendered site) saves weeks of one-off research.
|
||||
|
||||
Counts and role titles **change daily** — re-verify listings before relying on them for outreach.
|
||||
|
||||
## How to use it
|
||||
|
||||
### Tier 1 — Confirmed live QA contracts + scrapable
|
||||
|
||||
| Firm | HQ | Where to scrape | Confirmed QA / contract notes |
|
||||
| --- | --- | --- | --- |
|
||||
| Procom | Toronto | [Find a job](https://procomservices.com/en-ca/find-a-job/) (~230+ roles when checked; paginated HTML). Titles such as “QA/QC Analyst”, “Automation QA Analyst”, “Sr QA Analyst” appear regularly. | Strong banking-sector volume; typical **4–6 month** contracts. No shipped extractor — HTML or browser automation. |
|
||||
| S.i. Systems | Calgary / Toronto | [Search IT jobs (`q=QA`)](https://www.sisystems.com/search-it-jobs/?q=QA) — custom web API behind the UI; inspect DevTools network for JSON. | Frequent **Sr QA**, **Mobile QE** (WebdriverIO / Appium) and similar; lots of **GTA** roles; postings refresh often. |
|
||||
| Synechron | NYC / Toronto / Montreal | Workday CXS: `POST https://synechron.wd1.myworkdayjobs.com/wday/cxs/synechron/SynechronCareers/jobs` with e.g. `{"limit":20,"offset":0,"searchText":"QA automation"}`. | Often **20+** QA automation-facing rows when searched; includes **Playwright** and related stacks. Add **`https://synechron.wd1.myworkdayjobs.com/en-US/SynechronCareers`** to **`workdayTenants`** / `WORKDAY_TENANTS` and use QA search terms (bundled extractor sends empty facets — see CLI snippet below for Canada facet example). |
|
||||
| Capco | London / Toronto | Greenhouse: [boards-api capco](https://boards-api.greenhouse.io/v1/boards/capco/jobs?content=true) | Large board (**700+** roles when checked); dozens QA-related titles but many **India / Poland** — **filter by location** (Toronto / Canada metros). Add **`capco`** to **`greenhouseCompanies`**. |
|
||||
| Foilcon | Toronto | [CVViz — Foilcon](https://jobs.cvviz.com/foilcon) | Lower volume; roles such as **Systems Testing QA Specialist** show up when hiring. |
|
||||
| Robert Half (Technology) | Toronto / intl | [Jobs — QA automation keyword](https://www.roberthalf.com/ca/en/jobs?keywords=qa+automation) (often Workday-backed listings) | Discover tenant/host patterns from network tab if you want bulk Workday-style pulls; otherwise HTML/manual. |
|
||||
| Hays Canada | Toronto | [QA automation search](https://www.hays.ca/job-search/qa-automation) — `/job-detail/...` permalinks | Custom HTML board; manual import or custom crawler. |
|
||||
| Randstad Digital | Toronto / Montreal | [Randstad Canada jobs](https://www.randstad.ca/jobs/) | Site listings plus heavy **LinkedIn** contract volume under the Randstad Digital brand; often manual cross-check. |
|
||||
| Compunnel | NJ / Toronto | [Job search](https://www.compunnel.com/job-search/) — **JS-rendered** | **Quality Assurance Automation Engineer** (and similar) titles recur; use **Playwright** if automating. |
|
||||
| Pyramid Consulting | Atlanta / Toronto | [Job openings](https://www.pyramidci.com/careers/job-openings/) | **QA Automation Engineer** style roles often cross-posted on **LinkedIn** under their brand. |
|
||||
| Qualitest Group | NYC / global | [Careers](https://careers.qualitestgroup.com) | Pure-play QA consultancy — schema varies by region; inspect ATS per locale. |
|
||||
|
||||
### Tier 2 — Active in Canadian QA contracting but smaller / harder to scrape
|
||||
|
||||
| Firm | Notes |
|
||||
| --- | --- |
|
||||
| Jarvis Consulting Group (`jrvs.ca`) | Toronto; [jobs.jrvs.ca/home/](https://jobs.jrvs.ca/home/) is **JS-rendered** — **Playwright** if scraping; SDET roles also surface on LinkedIn. |
|
||||
| Electric Mind (formerly Intelliware) | Toronto. Lever: [`electricmind` postings JSON](https://api.lever.co/v0/postings/electricmind?mode=json) — add **`electricmind`** to **`leverCompanies`** when hiring opens; volume can be **small** (e.g. only a handful of open reqs) and sometimes **no QA** until project demand spikes. |
|
||||
| Light Consulting (LightCI) | Toronto. Lever: [`lightci` postings JSON](https://api.lever.co/v0/postings/lightci?mode=json). Often **empty** publicly — they may prefer direct outreach. |
|
||||
| Yoush Consulting | Toronto IT staffing; **no structured job board** — SDET contracts often only on **LinkedIn** company presence. |
|
||||
| Accenture / Deloitte / EY / KPMG / PwC | Big-four **Canadian banking-tech QA** contract pools; usually **Workday** per brand (e.g. **EY** → `ey.wd3.myworkdayjobs.com`). Add each tenant you care about to **`workdayTenants`**. |
|
||||
| CGI | Montreal (large gov / enterprise contractor). [cgi.com/en/careers](https://www.cgi.com/en/careers) — HTML / embedded ATS; inspect for stable patterns. |
|
||||
| TEKsystems | US-based with Canadian offices — [Find a job (Canada)](https://www.teksystems.com/en-ca/careers/find-a-job). Enterprise staffing; HTML/search UX varies. |
|
||||
| Robertson & Company | Often referenced from **LinkedIn** QA contract threads — treat as manual / referral-led unless you find a stable feed. |
|
||||
| Plan A Technologies | Legit shop — [planatechnologies.com](https://www.planatechnologies.com); **no public job board** in many periods; **LinkedIn / referrals**. |
|
||||
|
||||
### Ready-to-use CLI probes
|
||||
|
||||
Greenhouse (Capco) — Canada-oriented QA filter (tweak city regex as needed):
|
||||
|
||||
```bash
|
||||
curl -s 'https://boards-api.greenhouse.io/v1/boards/capco/jobs?content=true' \
|
||||
| jq '.jobs[]
|
||||
| select(.title | test("QA|SDET|Test|Automation|Quality"; "i"))
|
||||
| select(.location.name | test("Toronto|Canada|Montreal|Vancouver|Calgary|Ottawa"; "i"))
|
||||
| {title, location: .location.name, url: .absolute_url}'
|
||||
```
|
||||
|
||||
Synechron Workday — `QA automation` search plus example **country facet** (`appliedFacets.locationCountry`). Facet IDs are **tenant-specific** and **expire or change** — if this stops matching, capture a fresh id from the career site’s network panel.
|
||||
|
||||
```bash
|
||||
curl -s 'https://synechron.wd1.myworkdayjobs.com/wday/cxs/synechron/SynechronCareers/jobs' \
|
||||
-X POST \
|
||||
-H 'Content-Type: application/json' \
|
||||
-d '{"limit":50,"offset":0,"searchText":"QA automation","appliedFacets":{"locationCountry":["bc33aa3152ec42d4995f4791a106ed09"]}}' \
|
||||
| jq '.jobPostings[]
|
||||
| {title, location: .locationsText, url: ("https://synechron.wd1.myworkdayjobs.com" + .externalPath)}'
|
||||
```
|
||||
|
||||
### Heads-up: Ionosphere and “Reqd”
|
||||
|
||||
- **Ionosphere Inc.** — May only show a **LinkedIn** presence plus legacy **Google Sites** (e.g. `sites.google.com/a/ionosphereinc.com/...`) **without** a real careers board or steady postings — **not** a dependable scrape target unless you confirm a dedicated jobs URL. (“Ionosphere” is also used by unrelated firms — disambiguate by legal name and domain.)
|
||||
- **Reqd** — No widely confirmed Canadian IT staffing brand spelled exactly **Reqd**. Possible leads to double-check: **Recroot**, **Reqroute**, **Recruitio**, **Required Technologies**, etc. If you have a **website or sample posting URL**, verify before adding to any automation list.
|
||||
|
||||
## Common problems
|
||||
|
||||
- **Workday without facets:** The bundled Workday extractor posts `appliedFacets: {}`. You still get roles via **`searchText`**; use tighter terms or post-filter for Canada.
|
||||
- **Greenhouse volume:** Boards like Capco are large — always filter by title regex and location text before importing hundreds of rows.
|
||||
|
||||
## Related pages
|
||||
|
||||
- [Extractors overview](/docs/next/extractors/overview)
|
||||
- [Canadian companies — QA-strong ATS](/docs/next/extractors/canadian-companies-qa-ats)
|
||||
- [Eluta](/docs/next/extractors/eluta) (Canada employer-direct RSS)
|
||||
- [QAJobsBoard](/docs/next/extractors/qajobsboard)
|
||||
- [Arc.dev](/docs/next/extractors/arcdev)
|
||||
- [Manual Import](/docs/next/extractors/manual)
|
||||
36
docs-site/docs/extractors/qajobsboard.md
Normal file
36
docs-site/docs/extractors/qajobsboard.md
Normal file
@ -0,0 +1,36 @@
|
||||
---
|
||||
id: qajobsboard
|
||||
title: QAJobsBoard Extractor
|
||||
description: QA and automation-focused listings via the board’s public JSON feed.
|
||||
sidebar_position: 16
|
||||
---
|
||||
|
||||
## What it is
|
||||
|
||||
[QAJobsBoard](https://www.qajobsboard.com) publishes postings through JobBoardly. The extractor calls:
|
||||
|
||||
`GET https://qajobsboard.jobboardly.com/jobs.json`
|
||||
|
||||
Implementation: `extractors/qajobsboard/manifest.ts`.
|
||||
|
||||
## Why it exists
|
||||
|
||||
Dense QA / SDET / automation signal versus generic boards; categories often reflect tooling (Playwright, Cypress, Selenium). Geography skews India-remote unless you combine region filtering downstream.
|
||||
|
||||
## How to use it
|
||||
|
||||
1. Enable **QAJobsBoard** in pipeline sources (no credentials).
|
||||
2. Set **`qajobsboardMaxJobsPerTerm`** (default `100`) to cap mapped rows after term filtering.
|
||||
3. Tune **`searchTerms`** for QA-focused phrases (`QA automation`, `SDET`, `Playwright`, etc.).
|
||||
4. Optional: narrow by geography using orchestrator city/country filters where applicable.
|
||||
|
||||
## Common problems
|
||||
|
||||
- **Few or no rows:** Terms may be too narrow; broaden titles or temporarily remove strict city filters.
|
||||
- **Irrelevant locales:** The feed is global; pair with geography or employer filters in your pipeline profile.
|
||||
|
||||
## Related pages
|
||||
|
||||
- [Extractors overview](/docs/next/extractors/overview)
|
||||
- [Canadian QA contracting firms](/docs/next/extractors/qa-contract-staffing-canada)
|
||||
- [Manual Import](/docs/next/extractors/manual)
|
||||
44
docs-site/docs/extractors/smartrecruiters.md
Normal file
44
docs-site/docs/extractors/smartrecruiters.md
Normal file
@ -0,0 +1,44 @@
|
||||
---
|
||||
id: smartrecruiters
|
||||
title: SmartRecruiters Extractor
|
||||
description: Public SmartRecruiters Posting API discovery with per-company identifiers.
|
||||
sidebar_position: 14
|
||||
---
|
||||
|
||||
## What it is
|
||||
|
||||
Original API: [SmartRecruiters Posting API](https://developers.smartrecruiters.com/reference/v1listpostings)
|
||||
|
||||
The extractor lives in `extractors/smartrecruiters/manifest.ts`. It calls the public JSON endpoints (no API key for public boards), paginates active **PUBLIC** postings per configured company, optionally matches pipeline search terms against title and location, then loads each posting’s detail document so `jobUrl` / `applicationLink` and HTML descriptions resolve to the same URLs candidates see on `jobs.smartrecruiters.com`.
|
||||
|
||||
## Why it exists
|
||||
|
||||
Many large employers (including a significant share in Canada and the EU) publish on SmartRecruiters. This source complements Greenhouse, Lever, Ashby, and Workday by covering another major ATS with a predictable public API.
|
||||
|
||||
## How to use it
|
||||
|
||||
1. Find each employer’s **company identifier** — the path segment in their public board URL (for example `jobs.smartrecruiters.com/smartrecruiters/...` → `smartrecruiters`).
|
||||
2. In **Settings**, set **SmartRecruiters companies** (`smartrecruitersCompanies`) to a JSON array or comma/newline-separated list of those identifiers, or set `SMARTRECRUITERS_COMPANIES` in the environment.
|
||||
3. Optionally set **SmartRecruiters max jobs per company** (`smartrecruitersMaxJobsPerCompany`, default `100`, max `500`) to cap pagination after term filtering.
|
||||
4. Set your pipeline **search geography** and **search terms** as usual; terms filter postings by title, location text, and company display name.
|
||||
5. Enable **SmartRecruiters** in pipeline sources and run the pipeline.
|
||||
|
||||
## Common problems
|
||||
|
||||
### SmartRecruiters never appears in source toggles
|
||||
|
||||
- No companies are configured (`smartrecruitersCompanies` / `SMARTRECRUITERS_COMPANIES` is empty).
|
||||
|
||||
### Zero jobs for a slug I know is correct
|
||||
|
||||
- The identifier must match the **public Posting API** path, not necessarily the marketing site name. Confirm listings exist on `jobs.smartrecruiters.com/<identifier>/`.
|
||||
|
||||
### Rate limiting or intermittent HTTP errors
|
||||
|
||||
- Reduce `smartrecruitersMaxJobsPerCompany` or the number of configured companies; each kept posting triggers a detail request after the list pass.
|
||||
|
||||
## Related pages
|
||||
|
||||
- [Extractors Overview](/docs/next/extractors/overview)
|
||||
- [Add an Extractor](/docs/next/workflows/add-an-extractor)
|
||||
- [Settings](/docs/next/features/settings)
|
||||
@ -0,0 +1,73 @@
|
||||
---
|
||||
id: supplementary-sources-access-notes
|
||||
title: Supplementary sources — access notes
|
||||
description: Credential gates, sources to skip, and practical alternatives for boards without a native JobOps extractor.
|
||||
sidebar_position: 15
|
||||
---
|
||||
|
||||
This page captures **verified access paths** and realistic integration effort for boards that are not fully wired as pipeline extractors. Pair it with [Extractors overview](/docs/next/extractors/overview), [Manual Import](/docs/next/extractors/manual), and [Add an Extractor](/docs/next/workflows/add-an-extractor).
|
||||
|
||||
## Credential-gated APIs (usually straightforward)
|
||||
|
||||
### Careerjet (v4)
|
||||
|
||||
- **Sign-up:** [careerjet.com/partners](https://www.careerjet.com/partners) → add your site → Access API → register **server egress IP(s)**.
|
||||
- **Endpoint:** `https://search.api.careerjet.net/v4/query`
|
||||
- **Important parameters:** `affid` (publisher key), **`user_ip`** (documented as end-user IP; for headless/server runs use an IP you allowlisted — fraud-checked), **`user_agent`**, **`url`** (referrer URL where results would appear — maps to `CAREERJET_REFERER` / query `url` + `Referer` header). Missing `user_ip` or `user_agent` tends to yield **403**.
|
||||
- **Tip:** Official Python client: [`careerjet/careerjet-api-client-python`](https://github.com/careerjet/careerjet-api-client-python).
|
||||
|
||||
### Reed
|
||||
|
||||
- **Sign-up:** [reed.co.uk/developers](https://www.reed.co.uk/developers) — API key is issued via their contact flow (often ~1–2 business days).
|
||||
- **Endpoint:** e.g. `https://www.reed.co.uk/api/1.0/search?keywords=...&locationName=...&resultsToTake=100`
|
||||
- **Auth:** HTTP Basic — username = API key, password empty (`curl -u "YOUR_API_KEY:" ...`).
|
||||
- **Pagination:** `resultsToTake` max **100** per request; advance with `resultsToSkip`.
|
||||
- **Scope:** UK-centric; still useful for remote UK employers.
|
||||
|
||||
## Usually not worth scraping yourself
|
||||
|
||||
### Job Bank (Canada)
|
||||
|
||||
XML syndication is **manual approval**: active Canadian Business Number, established Canadian-facing employment site. No simple public JSON/RSS for arbitrary candidates. HTML exists but is heavy JSF / anti-bot — treat as **skip** unless you qualify for the feed.
|
||||
|
||||
### Jobboom / Workopolis / BCJobs
|
||||
|
||||
No stable public API/RSS documented for generic job discovery. Third-party scrapers often need **residential proxies** and paid runtime — **skip or pay** for a maintained provider.
|
||||
|
||||
### Jobillico
|
||||
|
||||
Employer-oriented XML/OAuth API (posting and limited pull). Needs a **business account** — not a candidate discovery API.
|
||||
|
||||
### MyVisaJobs / H1BGrader
|
||||
|
||||
No practical public API for their enriched UX. Alternatives: **DOL OFLC LCA disclosure** quarterly CSVs (public, bulk), then join to your own job corpus; paid marketplace scrapers if you accept cost/compliance tradeoffs. Browser extensions may still be useful **personally**.
|
||||
|
||||
### Untapped (Jopwell)
|
||||
|
||||
Closed candidate platform — **no public job-posting API** for arbitrary ingestion.
|
||||
|
||||
## Practical additions in JobOps
|
||||
|
||||
### iCIMS (per-tenant HTML)
|
||||
|
||||
Many tenants expose anonymous search HTML suitable for stable scraping patterns, e.g.:
|
||||
|
||||
`https://{tenant}.icims.com/jobs/search?ss=1&searchKeyword=…&in_iframe=1`
|
||||
|
||||
Pagination often uses `pr=`; job URLs commonly follow `/jobs/{id}/{slug}/job`. Maintain a **tenant host list** (similar to Greenhouse/Lever company lists). This is **not** the authenticated iCIMS Job Portal API.
|
||||
|
||||
Shipped extractor: **iCIMS tenants (HTML)** — configure `icimsTenants` (+ caps in Settings).
|
||||
|
||||
### BC T-Net RSS
|
||||
|
||||
Free aggregate RSS (example): `https://www.bctechnology.com/rss/jobs/tnetjobs.xml` — useful for **BC / Vancouver** tech roles; custom slices via the site’s RSS builder.
|
||||
|
||||
Shipped extractor: **BC T-Net (RSS)** — Canada geography only; optional `bctenetRssUrls` overrides default feed.
|
||||
|
||||
## Related pages
|
||||
|
||||
- [Extractors overview](/docs/next/extractors/overview)
|
||||
- [Eluta](/docs/next/extractors/eluta) (Canada RSS)
|
||||
- [SmartRecruiters](/docs/next/extractors/smartrecruiters)
|
||||
- [Canadian companies — QA-strong ATS](/docs/next/extractors/canadian-companies-qa-ats)
|
||||
- [Manual Import](/docs/next/extractors/manual)
|
||||
@ -41,7 +41,8 @@ That keeps runtime wiring dynamic while preserving compile-time safety in API an
|
||||
- append to `EXTRACTOR_SOURCE_IDS`
|
||||
- add an entry in `EXTRACTOR_SOURCE_METADATA`
|
||||
5. Ensure your extractor maps output to `CreateJobInput[]`.
|
||||
6. Run the full CI checks.
|
||||
6. Register it in `scripts/smoke-extractors.ts` (`ALL_TARGETS`): add one row per manifest so `npx tsx scripts/smoke-extractors.ts` exercises every shipped extractor (keyed sources `SKIP` until env vars exist).
|
||||
7. Run the full CI checks.
|
||||
|
||||
Example manifest:
|
||||
|
||||
@ -77,6 +78,22 @@ Subprocess extractors are supported. Keep subprocess spawning inside `run(contex
|
||||
- Add the new source id to `shared/src/extractors/index.ts`.
|
||||
- Confirm metadata exists for that source id.
|
||||
|
||||
### Smoke connectivity
|
||||
|
||||
After wiring settings/env, run:
|
||||
|
||||
```bash
|
||||
npx tsx scripts/smoke-extractors.ts myextractor
|
||||
```
|
||||
|
||||
Or the full suite (may take several minutes — JobSpy invokes Python, Hiring Cafe / startup.jobs may need browser deps):
|
||||
|
||||
```bash
|
||||
npx tsx scripts/smoke-extractors.ts
|
||||
```
|
||||
|
||||
Keep `ALL_TARGETS` in that script aligned with manifests under each `extractors/<name>/` package (`manifest.ts` or `src/manifest.ts`).
|
||||
|
||||
### Source appears in shared catalog but is unavailable at runtime
|
||||
|
||||
- The manifest was not loaded successfully.
|
||||
|
||||
@ -46,6 +46,11 @@ const sidebars: SidebarsConfig = {
|
||||
label: "Extractors",
|
||||
items: [
|
||||
"extractors/overview",
|
||||
"extractors/supplementary-sources-access-notes",
|
||||
"extractors/qajobsboard",
|
||||
"extractors/arcdev",
|
||||
"extractors/qa-contract-staffing-canada",
|
||||
"extractors/canadian-companies-qa-ats",
|
||||
"extractors/gradcracker",
|
||||
"extractors/jobspy",
|
||||
"extractors/adzuna",
|
||||
|
||||
15
extractors/arcdev/README.md
Normal file
15
extractors/arcdev/README.md
Normal file
@ -0,0 +1,15 @@
|
||||
# arcdev-extractor
|
||||
|
||||
Reads Arc remote-job listings from **SSR HTML**: each page embeds `__NEXT_DATA__` with `arcJobs` (Arc talent network) and `externalJobs` (partner postings).
|
||||
|
||||
Configure **`arcRemoteJobsPaths`** as URL paths on `https://arc.dev`, for example:
|
||||
|
||||
- `/remote-jobs/playwright`
|
||||
- `/remote-jobs/cypress`
|
||||
- `/remote-jobs/selenium`
|
||||
|
||||
Or set `ARC_REMOTE_JOBS_PATHS` (comma/newline-separated). Defaults include Playwright and Cypress stacks.
|
||||
|
||||
**Employer names:** External jobs include `company.name`. Arc-managed listings omit company names in the payload — those rows use employer `"Arc talent network"` while preserving titles and skill categories.
|
||||
|
||||
Cap merged matches per configuration fetch via `arcMaxJobsPerPath` (applied separately per path, default `120`).
|
||||
329
extractors/arcdev/manifest.ts
Normal file
329
extractors/arcdev/manifest.ts
Normal file
@ -0,0 +1,329 @@
|
||||
/**
|
||||
* Arc.dev remote jobs — parse embedded Next.js __NEXT_DATA__ from SSR HTML.
|
||||
*
|
||||
* Listing URLs look like https://arc.dev/remote-jobs/playwright
|
||||
*/
|
||||
|
||||
import type {
|
||||
ExtractorManifest,
|
||||
ExtractorRunResult,
|
||||
} from "@shared/types/extractors";
|
||||
import type { CreateJobInput } from "@shared/types/jobs";
|
||||
|
||||
const ORIGIN = "https://arc.dev";
|
||||
|
||||
interface ArcCategory {
|
||||
name?: string;
|
||||
urlString?: string;
|
||||
}
|
||||
|
||||
interface ArcCompanyJson {
|
||||
randomKey?: string | null;
|
||||
urlString?: string;
|
||||
name?: string;
|
||||
}
|
||||
|
||||
interface ArcJobJson {
|
||||
randomKey?: string;
|
||||
title?: string;
|
||||
jobType?: string;
|
||||
jobRole?: string;
|
||||
urlString?: string;
|
||||
postedAt?: number;
|
||||
company?: ArcCompanyJson;
|
||||
categories?: ArcCategory[];
|
||||
requiredCountries?: string[];
|
||||
minAnnualSalary?: number | null;
|
||||
maxAnnualSalary?: number | null;
|
||||
minHourlyRate?: number | null;
|
||||
maxHourlyRate?: number | null;
|
||||
timeZone?: string | null;
|
||||
positionType?: string;
|
||||
experienceLevel?: string;
|
||||
experienceLevels?: string[];
|
||||
}
|
||||
|
||||
function readPaths(raw: string | undefined): string[] {
|
||||
if (!raw) return [];
|
||||
try {
|
||||
const parsed = JSON.parse(raw);
|
||||
if (Array.isArray(parsed)) {
|
||||
return parsed
|
||||
.map((entry) => (typeof entry === "string" ? entry.trim() : ""))
|
||||
.filter(Boolean);
|
||||
}
|
||||
} catch {
|
||||
// fall through
|
||||
}
|
||||
return raw
|
||||
.split(/[\n,;|]+/)
|
||||
.map((entry) => entry.trim())
|
||||
.filter(Boolean);
|
||||
}
|
||||
|
||||
function defaultArcPaths(): string[] {
|
||||
const raw =
|
||||
typeof process !== "undefined" ? process.env.ARC_REMOTE_JOBS_PATHS : "";
|
||||
const parsed = readPaths(raw);
|
||||
return parsed.length > 0
|
||||
? parsed
|
||||
: ["/remote-jobs/playwright", "/remote-jobs/cypress"];
|
||||
}
|
||||
|
||||
function asString(value: unknown): string | undefined {
|
||||
if (typeof value !== "string") return undefined;
|
||||
const t = value.trim();
|
||||
return t ? t : undefined;
|
||||
}
|
||||
|
||||
function categoryHaystack(job: ArcJobJson): string {
|
||||
if (!Array.isArray(job.categories)) return "";
|
||||
return job.categories
|
||||
.map((c) => `${c.name ?? ""} ${c.urlString ?? ""}`)
|
||||
.join(" ")
|
||||
.toLowerCase();
|
||||
}
|
||||
|
||||
function matchesTerm(job: ArcJobJson, term: string): boolean {
|
||||
const lower = term.toLowerCase();
|
||||
if (job.title?.toLowerCase().includes(lower)) return true;
|
||||
if (categoryHaystack(job).includes(lower)) return true;
|
||||
if (job.jobRole?.toLowerCase().includes(lower)) return true;
|
||||
if (job.positionType?.toLowerCase().includes(lower)) return true;
|
||||
if (
|
||||
Array.isArray(job.experienceLevels) &&
|
||||
job.experienceLevels.some((l) => l.toLowerCase().includes(lower))
|
||||
)
|
||||
return true;
|
||||
if (job.experienceLevel?.toLowerCase().includes(lower)) return true;
|
||||
return false;
|
||||
}
|
||||
|
||||
function salaryParts(job: ArcJobJson): string | undefined {
|
||||
const bits: string[] = [];
|
||||
if (
|
||||
typeof job.minAnnualSalary === "number" &&
|
||||
typeof job.maxAnnualSalary === "number"
|
||||
) {
|
||||
bits.push(`USD ${job.minAnnualSalary}–${job.maxAnnualSalary} / yr`);
|
||||
} else if (typeof job.minAnnualSalary === "number") {
|
||||
bits.push(`USD ${job.minAnnualSalary}+ / yr`);
|
||||
}
|
||||
if (
|
||||
typeof job.minHourlyRate === "number" ||
|
||||
typeof job.maxHourlyRate === "number"
|
||||
) {
|
||||
bits.push(`$${job.minHourlyRate ?? "?"}–${job.maxHourlyRate ?? "?"} / hr`);
|
||||
}
|
||||
return bits.length > 0 ? bits.join("; ") : undefined;
|
||||
}
|
||||
|
||||
function locationLine(job: ArcJobJson): string {
|
||||
if (
|
||||
Array.isArray(job.requiredCountries) &&
|
||||
job.requiredCountries.length > 0
|
||||
) {
|
||||
return job.requiredCountries.join(", ");
|
||||
}
|
||||
if (job.timeZone) return job.timeZone;
|
||||
return "Remote";
|
||||
}
|
||||
|
||||
function postedIso(postedAt: number | undefined): string | undefined {
|
||||
if (typeof postedAt !== "number" || !Number.isFinite(postedAt))
|
||||
return undefined;
|
||||
return new Date(postedAt * 1000).toISOString();
|
||||
}
|
||||
|
||||
function parseNextPageProps(html: string): {
|
||||
arcJobs: ArcJobJson[];
|
||||
externalJobs: ArcJobJson[];
|
||||
} | null {
|
||||
const match = html.match(
|
||||
/<script id="__NEXT_DATA__"[^>]*>([\s\S]*?)<\/script>/,
|
||||
);
|
||||
if (!match?.[1]) return null;
|
||||
try {
|
||||
const parsed = JSON.parse(match[1]) as {
|
||||
props?: { pageProps?: unknown };
|
||||
};
|
||||
const pageProps = parsed.props?.pageProps as
|
||||
| {
|
||||
arcJobs?: ArcJobJson[];
|
||||
externalJobs?: ArcJobJson[];
|
||||
}
|
||||
| undefined;
|
||||
if (!pageProps) return null;
|
||||
return {
|
||||
arcJobs: Array.isArray(pageProps.arcJobs) ? pageProps.arcJobs : [],
|
||||
externalJobs: Array.isArray(pageProps.externalJobs)
|
||||
? pageProps.externalJobs
|
||||
: [],
|
||||
};
|
||||
} catch {
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
function mapExternalJob(job: ArcJobJson): CreateJobInput | null {
|
||||
const rk = asString(job.randomKey);
|
||||
const slug = asString(job.urlString);
|
||||
if (!rk || !slug) return null;
|
||||
const jobUrl = `${ORIGIN}/remote-jobs/j/${slug}-${rk}`;
|
||||
const employer = asString(job.company?.name)?.trim() || "Unknown employer";
|
||||
|
||||
const disciplines = Array.isArray(job.categories)
|
||||
? job.categories
|
||||
.map((c) => c.name?.trim())
|
||||
.filter((v): v is string => Boolean(v))
|
||||
.join(", ")
|
||||
: undefined;
|
||||
|
||||
return {
|
||||
source: "arcdev",
|
||||
sourceJobId: slug,
|
||||
title: asString(job.title) ?? "Unknown Title",
|
||||
employer,
|
||||
jobUrl,
|
||||
applicationLink: jobUrl,
|
||||
location: locationLine(job),
|
||||
datePosted: postedIso(job.postedAt),
|
||||
jobType: asString(job.jobType),
|
||||
salary: salaryParts(job),
|
||||
disciplines,
|
||||
jobLevel:
|
||||
job.experienceLevels?.join(", ") ??
|
||||
asString(job.experienceLevel) ??
|
||||
undefined,
|
||||
isRemote: true,
|
||||
};
|
||||
}
|
||||
|
||||
function mapArcManagedJob(job: ArcJobJson): CreateJobInput | null {
|
||||
const rk = asString(job.randomKey);
|
||||
const slug = asString(job.urlString);
|
||||
if (!rk || !slug) return null;
|
||||
const jobUrl = `${ORIGIN}/remote-jobs/details/${slug}-${rk}`;
|
||||
|
||||
const disciplines = Array.isArray(job.categories)
|
||||
? job.categories
|
||||
.map((c) => c.name?.trim())
|
||||
.filter((v): v is string => Boolean(v))
|
||||
.join(", ")
|
||||
: undefined;
|
||||
|
||||
const employer = "Arc talent network";
|
||||
|
||||
return {
|
||||
source: "arcdev",
|
||||
sourceJobId: `${slug}-${rk}`,
|
||||
title: asString(job.title) ?? "Unknown Title",
|
||||
employer,
|
||||
jobUrl,
|
||||
applicationLink: jobUrl,
|
||||
location: locationLine(job),
|
||||
datePosted: postedIso(job.postedAt),
|
||||
jobType: asString(job.jobType),
|
||||
salary: salaryParts(job),
|
||||
disciplines,
|
||||
jobLevel: asString(job.experienceLevel),
|
||||
jobFunction: asString(job.jobRole),
|
||||
isRemote: true,
|
||||
};
|
||||
}
|
||||
|
||||
export const manifest: ExtractorManifest = {
|
||||
id: "arcdev",
|
||||
displayName: "Arc.dev (remote)",
|
||||
providesSources: ["arcdev"],
|
||||
async run(context): Promise<ExtractorRunResult> {
|
||||
if (context.shouldCancel?.()) return { success: true, jobs: [] };
|
||||
|
||||
let paths = readPaths(context.settings.arcRemoteJobsPaths);
|
||||
if (paths.length === 0) paths = defaultArcPaths();
|
||||
|
||||
paths = paths.map((p) => (p.startsWith("/") ? p : `/${p}`));
|
||||
|
||||
const maxPerPath = context.settings.arcMaxJobsPerPath
|
||||
? Number.parseInt(context.settings.arcMaxJobsPerPath, 10)
|
||||
: 120;
|
||||
const cap = Number.isFinite(maxPerPath)
|
||||
? Math.min(Math.max(maxPerPath, 1), 300)
|
||||
: 120;
|
||||
|
||||
const terms = context.searchTerms.length > 0 ? context.searchTerms : [];
|
||||
const seen = new Set<string>();
|
||||
const out: CreateJobInput[] = [];
|
||||
|
||||
try {
|
||||
for (let i = 0; i < paths.length; i += 1) {
|
||||
if (context.shouldCancel?.()) break;
|
||||
const path = paths[i];
|
||||
const pageUrl = `${ORIGIN}${path}`;
|
||||
|
||||
context.onProgress?.({
|
||||
phase: "list",
|
||||
termsProcessed: i,
|
||||
termsTotal: paths.length,
|
||||
currentUrl: pageUrl,
|
||||
detail: `Arc.dev: fetching (${i + 1}/${paths.length}) ${path}`,
|
||||
});
|
||||
|
||||
const response = await fetch(pageUrl, {
|
||||
headers: {
|
||||
Accept: "text/html",
|
||||
"User-Agent":
|
||||
"Mozilla/5.0 (compatible; JobOps/1.0; +https://github.com)",
|
||||
},
|
||||
});
|
||||
if (!response.ok) {
|
||||
throw new Error(
|
||||
`Arc.dev "${path}" failed with status ${response.status}`,
|
||||
);
|
||||
}
|
||||
const html = await response.text();
|
||||
const payload = parseNextPageProps(html);
|
||||
if (!payload) {
|
||||
throw new Error(`Arc.dev "${path}": missing __NEXT_DATA__ payload`);
|
||||
}
|
||||
|
||||
let pathAdded = 0;
|
||||
|
||||
const labeled = [
|
||||
...payload.arcJobs.map((job) => ({ job, kind: "arc" as const })),
|
||||
...payload.externalJobs.map((job) => ({ job, kind: "ext" as const })),
|
||||
];
|
||||
|
||||
for (const { job: raw, kind } of labeled) {
|
||||
if (pathAdded >= cap) break;
|
||||
if (terms.length > 0 && !terms.some((t) => matchesTerm(raw, t))) {
|
||||
continue;
|
||||
}
|
||||
const mapped =
|
||||
kind === "arc" ? mapArcManagedJob(raw) : mapExternalJob(raw);
|
||||
if (!mapped) continue;
|
||||
if (seen.has(mapped.jobUrl)) continue;
|
||||
seen.add(mapped.jobUrl);
|
||||
out.push(mapped);
|
||||
pathAdded += 1;
|
||||
}
|
||||
|
||||
context.onProgress?.({
|
||||
phase: "list",
|
||||
termsProcessed: i + 1,
|
||||
termsTotal: paths.length,
|
||||
currentUrl: pageUrl,
|
||||
jobPagesProcessed: out.length,
|
||||
detail: `Arc.dev: ${path} → ${pathAdded} kept (${payload.arcJobs.length} arc + ${payload.externalJobs.length} external rows)`,
|
||||
});
|
||||
}
|
||||
|
||||
return { success: true, jobs: out };
|
||||
} catch (error) {
|
||||
const message = error instanceof Error ? error.message : "Unknown error";
|
||||
return { success: false, jobs: out, error: message };
|
||||
}
|
||||
},
|
||||
};
|
||||
|
||||
export default manifest;
|
||||
17
extractors/arcdev/package.json
Normal file
17
extractors/arcdev/package.json
Normal file
@ -0,0 +1,17 @@
|
||||
{
|
||||
"name": "arcdev-extractor",
|
||||
"version": "0.0.1",
|
||||
"type": "module",
|
||||
"description": "Arc.dev remote jobs extractor (__NEXT_DATA__ SSR)",
|
||||
"main": "manifest.ts",
|
||||
"dependencies": {
|
||||
"job-ops-shared": "^1.0.0"
|
||||
},
|
||||
"devDependencies": {
|
||||
"@types/node": "^24.0.0",
|
||||
"typescript": "~5.9.0"
|
||||
},
|
||||
"scripts": {
|
||||
"check:types": "tsc --noEmit"
|
||||
}
|
||||
}
|
||||
17
extractors/arcdev/tsconfig.json
Normal file
17
extractors/arcdev/tsconfig.json
Normal file
@ -0,0 +1,17 @@
|
||||
{
|
||||
"compilerOptions": {
|
||||
"module": "ESNext",
|
||||
"moduleResolution": "bundler",
|
||||
"target": "ES2022",
|
||||
"outDir": "dist",
|
||||
"strict": true,
|
||||
"noUnusedLocals": false,
|
||||
"lib": ["ES2022", "DOM"],
|
||||
"types": ["node"],
|
||||
"baseUrl": ".",
|
||||
"paths": {
|
||||
"@shared/*": ["../../shared/src/*"]
|
||||
}
|
||||
},
|
||||
"include": ["./manifest.ts", "./src/**/*"]
|
||||
}
|
||||
9
extractors/bctenet/README.md
Normal file
9
extractors/bctenet/README.md
Normal file
@ -0,0 +1,9 @@
|
||||
# bctenet-extractor
|
||||
|
||||
Consumes the public **BC T-Net** aggregated tech-jobs RSS feed (no auth).
|
||||
|
||||
Default feed: `https://www.bctechnology.com/rss/jobs/tnetjobs.xml`
|
||||
|
||||
Controls: `bctenetRssUrls` (optional extra feeds from the T-Net RSS builder), `bctenetMaxJobsPerTerm`.
|
||||
|
||||
Canada-focused listings (British Columbia). The orchestrator skips this source when pipeline geography is not Canada (`countryAllowlist`).
|
||||
194
extractors/bctenet/manifest.ts
Normal file
194
extractors/bctenet/manifest.ts
Normal file
@ -0,0 +1,194 @@
|
||||
/**
|
||||
* BC T-Net — public RSS aggregate of BC tech jobs.
|
||||
*
|
||||
* Default: https://www.bctechnology.com/rss/jobs/tnetjobs.xml
|
||||
*
|
||||
* Feeds may embed `<![CDATA[&]]>` inside `<link>` URLs — normalized before fetch.
|
||||
*/
|
||||
|
||||
import type {
|
||||
ExtractorManifest,
|
||||
ExtractorRunResult,
|
||||
} from "@shared/types/extractors";
|
||||
import type { CreateJobInput } from "@shared/types/jobs";
|
||||
|
||||
interface BcItem {
|
||||
title?: string;
|
||||
link?: string;
|
||||
guid?: string;
|
||||
description?: string;
|
||||
pubDate?: string;
|
||||
category?: string;
|
||||
}
|
||||
|
||||
function xmlText(xml: string, tag: string): string | undefined {
|
||||
const pattern = new RegExp(`<${tag}[^>]*>([\\s\\S]*?)</${tag}>`);
|
||||
const match = xml.match(pattern);
|
||||
if (!match?.[1]) return undefined;
|
||||
return (
|
||||
match[1].replace(/<!\[CDATA\[([\s\S]*?)\]\]>/g, "$1").trim() || undefined
|
||||
);
|
||||
}
|
||||
|
||||
function normalizeFeedLink(raw: string): string {
|
||||
return raw.replace(/<!\[CDATA\[&\]\]>/g, "&").trim();
|
||||
}
|
||||
|
||||
function parseItems(xml: string): BcItem[] {
|
||||
const items: BcItem[] = [];
|
||||
const blocks = xml.match(/<item>([\s\S]*?)<\/item>/g) ?? [];
|
||||
|
||||
for (const raw of blocks) {
|
||||
const block = raw.replace(/^<item>/, "").replace(/<\/item>$/, "");
|
||||
const linkRaw = xmlText(block, "link");
|
||||
items.push({
|
||||
title: xmlText(block, "title"),
|
||||
link: linkRaw ? normalizeFeedLink(linkRaw) : undefined,
|
||||
guid: xmlText(block, "guid"),
|
||||
description: xmlText(block, "description"),
|
||||
pubDate: xmlText(block, "pubDate"),
|
||||
category: xmlText(block, "category"),
|
||||
});
|
||||
}
|
||||
|
||||
return items;
|
||||
}
|
||||
|
||||
function readUrls(raw: string | undefined): string[] {
|
||||
if (!raw) return [];
|
||||
try {
|
||||
const parsed = JSON.parse(raw);
|
||||
if (Array.isArray(parsed)) {
|
||||
return parsed
|
||||
.map((entry) => (typeof entry === "string" ? entry.trim() : ""))
|
||||
.filter(Boolean);
|
||||
}
|
||||
} catch {
|
||||
// fall through
|
||||
}
|
||||
return raw
|
||||
.split(/[\n|]+/)
|
||||
.map((entry) => entry.trim())
|
||||
.filter(Boolean);
|
||||
}
|
||||
|
||||
function decodeHtmlEntities(html: string): string {
|
||||
return html
|
||||
.replace(/&/g, "&")
|
||||
.replace(/</g, "<")
|
||||
.replace(/>/g, ">")
|
||||
.replace(///gi, "/")
|
||||
.replace(/&/gi, "&");
|
||||
}
|
||||
|
||||
function matchesTerm(item: BcItem, term: string): boolean {
|
||||
const lower = term.toLowerCase();
|
||||
const hay =
|
||||
`${item.title ?? ""} ${item.description ?? ""} ${item.category ?? ""}`.toLowerCase();
|
||||
return hay.includes(lower);
|
||||
}
|
||||
|
||||
function mapJob(item: BcItem): CreateJobInput | null {
|
||||
const jobUrl = item.link?.trim();
|
||||
if (!jobUrl) return null;
|
||||
|
||||
const title = item.title ? decodeHtmlEntities(item.title) : "Unknown Title";
|
||||
const employer = item.category?.trim() || "Unknown Employer";
|
||||
|
||||
return {
|
||||
source: "bctenet",
|
||||
sourceJobId: item.guid ?? jobUrl,
|
||||
title,
|
||||
employer,
|
||||
jobUrl,
|
||||
applicationLink: jobUrl,
|
||||
location: "British Columbia, Canada",
|
||||
datePosted: item.pubDate,
|
||||
jobDescription: item.description
|
||||
? decodeHtmlEntities(item.description)
|
||||
: undefined,
|
||||
};
|
||||
}
|
||||
|
||||
export const manifest: ExtractorManifest = {
|
||||
id: "bctenet",
|
||||
displayName: "BC T-Net (RSS)",
|
||||
providesSources: ["bctenet"],
|
||||
async run(context): Promise<ExtractorRunResult> {
|
||||
if (context.shouldCancel?.()) return { success: true, jobs: [] };
|
||||
|
||||
const defaults = ["https://www.bctechnology.com/rss/jobs/tnetjobs.xml"];
|
||||
const configured = readUrls(context.settings.bctenetRssUrls);
|
||||
const urls = configured.length > 0 ? configured : defaults;
|
||||
|
||||
const maxJobs = context.settings.bctenetMaxJobsPerTerm
|
||||
? Number.parseInt(context.settings.bctenetMaxJobsPerTerm, 10)
|
||||
: 400;
|
||||
const cap = Number.isFinite(maxJobs)
|
||||
? Math.min(Math.max(maxJobs, 1), 2000)
|
||||
: 400;
|
||||
|
||||
const terms = context.searchTerms.length > 0 ? context.searchTerms : [];
|
||||
const maxTotal = cap * Math.max(terms.length, 1);
|
||||
const seen = new Set<string>();
|
||||
const out: CreateJobInput[] = [];
|
||||
|
||||
try {
|
||||
for (let i = 0; i < urls.length; i += 1) {
|
||||
if (context.shouldCancel?.()) break;
|
||||
const rssUrl = urls[i];
|
||||
|
||||
context.onProgress?.({
|
||||
phase: "list",
|
||||
termsProcessed: i,
|
||||
termsTotal: urls.length,
|
||||
currentUrl: rssUrl,
|
||||
detail: `BC T-Net: fetching (${i + 1}/${urls.length})`,
|
||||
});
|
||||
|
||||
const response = await fetch(rssUrl, {
|
||||
headers: {
|
||||
Accept: "application/rss+xml, application/xml, text/xml",
|
||||
"User-Agent":
|
||||
"Mozilla/5.0 (compatible; JobOps/1.0) BC T-Net RSS consumer",
|
||||
},
|
||||
});
|
||||
if (!response.ok) {
|
||||
throw new Error(`BC T-Net RSS failed: ${response.status}`);
|
||||
}
|
||||
|
||||
const xml = await response.text();
|
||||
const items = parseItems(xml);
|
||||
|
||||
for (const item of items) {
|
||||
if (out.length >= maxTotal) break;
|
||||
if (terms.length > 0 && !terms.some((t) => matchesTerm(item, t))) {
|
||||
continue;
|
||||
}
|
||||
const mapped = mapJob(item);
|
||||
if (!mapped) continue;
|
||||
const key = mapped.sourceJobId || mapped.jobUrl;
|
||||
if (seen.has(key)) continue;
|
||||
seen.add(key);
|
||||
out.push(mapped);
|
||||
}
|
||||
|
||||
context.onProgress?.({
|
||||
phase: "list",
|
||||
termsProcessed: i + 1,
|
||||
termsTotal: urls.length,
|
||||
currentUrl: rssUrl,
|
||||
jobPagesProcessed: out.length,
|
||||
detail: `BC T-Net: ${items.length} items (${out.length} kept total)`,
|
||||
});
|
||||
}
|
||||
|
||||
return { success: true, jobs: out };
|
||||
} catch (error) {
|
||||
const message = error instanceof Error ? error.message : "Unknown error";
|
||||
return { success: false, jobs: out, error: message };
|
||||
}
|
||||
},
|
||||
};
|
||||
|
||||
export default manifest;
|
||||
17
extractors/bctenet/package.json
Normal file
17
extractors/bctenet/package.json
Normal file
@ -0,0 +1,17 @@
|
||||
{
|
||||
"name": "bctenet-extractor",
|
||||
"version": "0.0.1",
|
||||
"type": "module",
|
||||
"description": "BC T-Net public RSS job feed (British Columbia tech roles)",
|
||||
"main": "manifest.ts",
|
||||
"dependencies": {
|
||||
"job-ops-shared": "^1.0.0"
|
||||
},
|
||||
"devDependencies": {
|
||||
"@types/node": "^24.0.0",
|
||||
"typescript": "~5.9.0"
|
||||
},
|
||||
"scripts": {
|
||||
"check:types": "tsc --noEmit"
|
||||
}
|
||||
}
|
||||
17
extractors/bctenet/tsconfig.json
Normal file
17
extractors/bctenet/tsconfig.json
Normal file
@ -0,0 +1,17 @@
|
||||
{
|
||||
"compilerOptions": {
|
||||
"module": "ESNext",
|
||||
"moduleResolution": "bundler",
|
||||
"target": "ES2022",
|
||||
"outDir": "dist",
|
||||
"strict": true,
|
||||
"noUnusedLocals": false,
|
||||
"lib": ["ES2022", "DOM"],
|
||||
"types": ["node"],
|
||||
"baseUrl": ".",
|
||||
"paths": {
|
||||
"@shared/*": ["../../shared/src/*"]
|
||||
}
|
||||
},
|
||||
"include": ["./manifest.ts", "./src/**/*"]
|
||||
}
|
||||
@ -8,8 +8,11 @@
|
||||
* Referer header and `user_ip` / `user_agent` query params. Register your
|
||||
* server's outbound IP(s) in the Careerjet publisher dashboard.
|
||||
*
|
||||
* Env: CAREERJET_AFFID (API key), CAREERJET_REFERER (job-search page URL),
|
||||
* CAREERJET_USER_IP (must match an allowlisted IP), optional CAREERJET_USER_AGENT.
|
||||
* Publisher signup: careerjet.com/partners → register allowlisted server IP(s).
|
||||
* Env: CAREERJET_AFFID (API key for Basic auth username), CAREERJET_REFERER (maps to
|
||||
* Referer header and the API `url` query param — page where results would appear),
|
||||
* CAREERJET_USER_IP (public egress IP allowlisted in dashboard; fraud-checked),
|
||||
* optional CAREERJET_USER_AGENT. Missing user_ip / user_agent yields 403 per docs.
|
||||
*/
|
||||
|
||||
import type {
|
||||
@ -117,6 +120,7 @@ async function fetchPage(args: {
|
||||
url.searchParams.set("page_size", String(args.pageSize));
|
||||
url.searchParams.set("user_ip", args.userIp);
|
||||
url.searchParams.set("user_agent", args.userAgent);
|
||||
url.searchParams.set("url", args.referer);
|
||||
|
||||
const response = await fetch(url.toString(), {
|
||||
headers: {
|
||||
@ -213,11 +217,7 @@ export const manifest: ExtractorManifest = {
|
||||
let collected = 0;
|
||||
let page = 1;
|
||||
let totalPages = Number.POSITIVE_INFINITY;
|
||||
while (
|
||||
collected < maxJobsPerTerm &&
|
||||
page <= totalPages &&
|
||||
page <= 10
|
||||
) {
|
||||
while (collected < maxJobsPerTerm && page <= totalPages && page <= 10) {
|
||||
if (context.shouldCancel?.()) break;
|
||||
const body = await fetchPage({
|
||||
apiKey,
|
||||
|
||||
9
extractors/eluta/README.md
Normal file
9
extractors/eluta/README.md
Normal file
@ -0,0 +1,9 @@
|
||||
# eluta-extractor
|
||||
|
||||
Pulls Canadian job postings from [Eluta.ca](https://www.eluta.ca) public **RSS** feeds (`https://www.eluta.ca/rss?location=...`). Listings are indexed from employer career sites.
|
||||
|
||||
- Configure **`elutaRssLocations`**: JSON array or comma/newline-separated location strings passed to the `location` query parameter (e.g. `Toronto, ON`, `Vancouver, BC`). RSS for very broad regions (e.g. a whole country) may return an empty feed; prefer metro/province strings.
|
||||
- Optional: `ELUTA_RSS_LOCATIONS` environment default (same format).
|
||||
- **`elutaMaxJobsPerTerm`** caps how many RSS items are kept after pipeline search-term filtering (default 100).
|
||||
|
||||
This source is **Canada-only**: it is automatically skipped when your search geography is not Canada.
|
||||
201
extractors/eluta/manifest.ts
Normal file
201
extractors/eluta/manifest.ts
Normal file
@ -0,0 +1,201 @@
|
||||
/**
|
||||
* Eluta.ca — public RSS feeds (Canadian employer-direct listings).
|
||||
*
|
||||
* Example: https://www.eluta.ca/rss?location=Toronto%2C%20ON
|
||||
*
|
||||
* No auth. Multiple `elutaRssLocations` values each fetch a feed; results are
|
||||
* merged and de-duplicated by guid/link.
|
||||
*/
|
||||
|
||||
import type {
|
||||
ExtractorManifest,
|
||||
ExtractorRunResult,
|
||||
} from "@shared/types/extractors";
|
||||
import type { CreateJobInput } from "@shared/types/jobs";
|
||||
|
||||
const RSS_BASE = "https://www.eluta.ca/rss";
|
||||
|
||||
interface ElutaItem {
|
||||
title?: string;
|
||||
link?: string;
|
||||
guid?: string;
|
||||
description?: string;
|
||||
pubDate?: string;
|
||||
employer?: string;
|
||||
location?: string;
|
||||
}
|
||||
|
||||
function xmlText(xml: string, tag: string): string | undefined {
|
||||
const pattern = new RegExp(`<${tag}[^>]*>([\\s\\S]*?)</${tag}>`);
|
||||
const match = xml.match(pattern);
|
||||
if (!match?.[1]) return undefined;
|
||||
return (
|
||||
match[1].replace(/<!\[CDATA\[([\s\S]*?)\]\]>/g, "$1").trim() || undefined
|
||||
);
|
||||
}
|
||||
|
||||
function parseItems(xml: string): ElutaItem[] {
|
||||
const items: ElutaItem[] = [];
|
||||
const blocks = xml.match(/<item>([\s\S]*?)<\/item>/g) ?? [];
|
||||
|
||||
for (const raw of blocks) {
|
||||
const block = raw.replace(/^<item>/, "").replace(/<\/item>$/, "");
|
||||
items.push({
|
||||
title: xmlText(block, "title"),
|
||||
link: xmlText(block, "link"),
|
||||
guid: xmlText(block, "guid"),
|
||||
description: xmlText(block, "description"),
|
||||
pubDate: xmlText(block, "pubDate"),
|
||||
employer: xmlText(block, "employer"),
|
||||
location: xmlText(block, "location"),
|
||||
});
|
||||
}
|
||||
|
||||
return items;
|
||||
}
|
||||
|
||||
function readLocations(raw: string | undefined): string[] {
|
||||
if (!raw) return [];
|
||||
try {
|
||||
const parsed = JSON.parse(raw);
|
||||
if (Array.isArray(parsed)) {
|
||||
return parsed
|
||||
.map((entry) => (typeof entry === "string" ? entry.trim() : ""))
|
||||
.filter(Boolean);
|
||||
}
|
||||
} catch {
|
||||
// fall through
|
||||
}
|
||||
return raw
|
||||
.split(/[\n,;|]+/)
|
||||
.map((entry) => entry.trim())
|
||||
.filter(Boolean);
|
||||
}
|
||||
|
||||
function decodeHtmlEntities(html: string): string {
|
||||
return html
|
||||
.replace(/&/g, "&")
|
||||
.replace(/</g, "<")
|
||||
.replace(/>/g, ">");
|
||||
}
|
||||
|
||||
function matchesTerm(item: ElutaItem, term: string): boolean {
|
||||
const lower = term.toLowerCase();
|
||||
if (item.title?.toLowerCase().includes(lower)) return true;
|
||||
if (item.description?.toLowerCase().includes(lower)) return true;
|
||||
if (item.employer?.toLowerCase().includes(lower)) return true;
|
||||
if (item.location?.toLowerCase().includes(lower)) return true;
|
||||
return false;
|
||||
}
|
||||
|
||||
function mapJob(item: ElutaItem): CreateJobInput | null {
|
||||
const jobUrl = item.link || item.guid;
|
||||
if (!jobUrl) return null;
|
||||
|
||||
const title = item.title ? decodeHtmlEntities(item.title) : "Unknown Title";
|
||||
const employer = item.employer?.trim() || "Unknown Employer";
|
||||
const location = item.location?.trim() || "Canada";
|
||||
|
||||
return {
|
||||
source: "eluta",
|
||||
sourceJobId: item.guid ?? item.link,
|
||||
title,
|
||||
employer,
|
||||
jobUrl,
|
||||
applicationLink: jobUrl,
|
||||
location,
|
||||
datePosted: item.pubDate,
|
||||
jobDescription: item.description
|
||||
? decodeHtmlEntities(item.description)
|
||||
: undefined,
|
||||
};
|
||||
}
|
||||
|
||||
export const manifest: ExtractorManifest = {
|
||||
id: "eluta",
|
||||
displayName: "Eluta",
|
||||
providesSources: ["eluta"],
|
||||
async run(context): Promise<ExtractorRunResult> {
|
||||
if (context.shouldCancel?.()) return { success: true, jobs: [] };
|
||||
|
||||
const locations = readLocations(context.settings.elutaRssLocations);
|
||||
if (locations.length === 0) {
|
||||
return {
|
||||
success: true,
|
||||
jobs: [],
|
||||
error:
|
||||
'No Eluta RSS locations configured. Set ELUTA_RSS_LOCATIONS or elutaRssLocations (comma- or newline-separated, e.g. "Toronto, ON|Vancouver, BC").',
|
||||
};
|
||||
}
|
||||
|
||||
const maxJobs = context.settings.elutaMaxJobsPerTerm
|
||||
? Number.parseInt(context.settings.elutaMaxJobsPerTerm, 10)
|
||||
: 100;
|
||||
const cap = Number.isFinite(maxJobs)
|
||||
? Math.min(Math.max(maxJobs, 1), 500)
|
||||
: 100;
|
||||
|
||||
const terms = context.searchTerms.length > 0 ? context.searchTerms : [];
|
||||
const maxTotal = cap * Math.max(terms.length, 1);
|
||||
const seen = new Set<string>();
|
||||
const out: CreateJobInput[] = [];
|
||||
|
||||
try {
|
||||
for (let i = 0; i < locations.length; i += 1) {
|
||||
if (context.shouldCancel?.()) break;
|
||||
const loc = locations[i];
|
||||
const rssUrl = `${RSS_BASE}?location=${encodeURIComponent(loc)}`;
|
||||
|
||||
context.onProgress?.({
|
||||
phase: "list",
|
||||
termsProcessed: i,
|
||||
termsTotal: locations.length,
|
||||
currentUrl: rssUrl,
|
||||
detail: `Eluta: fetching RSS (${i + 1}/${locations.length}) — ${loc}`,
|
||||
});
|
||||
|
||||
const response = await fetch(rssUrl, {
|
||||
headers: {
|
||||
Accept: "application/rss+xml, application/xml, text/xml",
|
||||
"User-Agent": "JobOps/1.0 (+https://github.com) Eluta RSS consumer",
|
||||
},
|
||||
});
|
||||
if (!response.ok) {
|
||||
throw new Error(`Eluta RSS failed (${loc}): ${response.status}`);
|
||||
}
|
||||
|
||||
const xml = await response.text();
|
||||
const items = parseItems(xml);
|
||||
|
||||
for (const item of items) {
|
||||
if (out.length >= maxTotal) break;
|
||||
if (terms.length > 0 && !terms.some((t) => matchesTerm(item, t))) {
|
||||
continue;
|
||||
}
|
||||
const mapped = mapJob(item);
|
||||
if (!mapped) continue;
|
||||
const key = mapped.sourceJobId || mapped.jobUrl;
|
||||
if (seen.has(key)) continue;
|
||||
seen.add(key);
|
||||
out.push(mapped);
|
||||
}
|
||||
|
||||
context.onProgress?.({
|
||||
phase: "list",
|
||||
termsProcessed: i + 1,
|
||||
termsTotal: locations.length,
|
||||
currentUrl: rssUrl,
|
||||
jobPagesProcessed: out.length,
|
||||
detail: `Eluta: ${loc} → ${items.length} items in feed (${out.length} matched total)`,
|
||||
});
|
||||
}
|
||||
|
||||
return { success: true, jobs: out };
|
||||
} catch (error) {
|
||||
const message = error instanceof Error ? error.message : "Unknown error";
|
||||
return { success: false, jobs: out, error: message };
|
||||
}
|
||||
},
|
||||
};
|
||||
|
||||
export default manifest;
|
||||
17
extractors/eluta/package.json
Normal file
17
extractors/eluta/package.json
Normal file
@ -0,0 +1,17 @@
|
||||
{
|
||||
"name": "eluta-extractor",
|
||||
"version": "0.0.1",
|
||||
"type": "module",
|
||||
"description": "Eluta.ca RSS feed extractor (Canadian employer-direct listings)",
|
||||
"main": "manifest.ts",
|
||||
"dependencies": {
|
||||
"job-ops-shared": "^1.0.0"
|
||||
},
|
||||
"devDependencies": {
|
||||
"@types/node": "^24.0.0",
|
||||
"typescript": "~5.9.0"
|
||||
},
|
||||
"scripts": {
|
||||
"check:types": "tsc --noEmit"
|
||||
}
|
||||
}
|
||||
17
extractors/eluta/tsconfig.json
Normal file
17
extractors/eluta/tsconfig.json
Normal file
@ -0,0 +1,17 @@
|
||||
{
|
||||
"compilerOptions": {
|
||||
"module": "ESNext",
|
||||
"moduleResolution": "bundler",
|
||||
"target": "ES2022",
|
||||
"outDir": "dist",
|
||||
"strict": true,
|
||||
"noUnusedLocals": false,
|
||||
"lib": ["ES2022", "DOM"],
|
||||
"types": ["node"],
|
||||
"baseUrl": ".",
|
||||
"paths": {
|
||||
"@shared/*": ["../../shared/src/*"]
|
||||
}
|
||||
},
|
||||
"include": ["./manifest.ts", "./src/**/*"]
|
||||
}
|
||||
13
extractors/icims/README.md
Normal file
13
extractors/icims/README.md
Normal file
@ -0,0 +1,13 @@
|
||||
# icims-extractor
|
||||
|
||||
Lightweight fetch of **anonymous iCIMS portal HTML search results** per configured tenant host.
|
||||
|
||||
Example host: `careers-appliedsystems.icims.com`
|
||||
|
||||
Controls:
|
||||
|
||||
- `icimsTenants` — newline/comma-separated hosts or JSON array of hosts
|
||||
- `icimsMaxJobsPerTenant` — cap rows accepted per tenant host (default `250`)
|
||||
- `icimsMaxPagesPerSearch` — max `pr=` pages per keyword query (default `10`)
|
||||
|
||||
This does **not** use the authenticated iCIMS Job Portal API.
|
||||
233
extractors/icims/manifest.ts
Normal file
233
extractors/icims/manifest.ts
Normal file
@ -0,0 +1,233 @@
|
||||
/**
|
||||
* iCIMS tenant portal — anonymous HTML search (`/jobs/search`) pattern.
|
||||
*
|
||||
* Many tenants expose listings suitable for HTML extraction when loaded with
|
||||
* `ss=1` + `in_iframe=1`. Job links typically follow `/jobs/{id}/{slug}/job`.
|
||||
*/
|
||||
|
||||
import type {
|
||||
ExtractorManifest,
|
||||
ExtractorRunResult,
|
||||
} from "@shared/types/extractors";
|
||||
import type { CreateJobInput } from "@shared/types/jobs";
|
||||
|
||||
interface ParsedJobRow {
|
||||
url: string;
|
||||
title: string;
|
||||
}
|
||||
|
||||
function parseHosts(raw: string | undefined): string[] {
|
||||
if (!raw) return [];
|
||||
try {
|
||||
const parsed = JSON.parse(raw);
|
||||
if (Array.isArray(parsed)) {
|
||||
return parsed
|
||||
.map((entry) => (typeof entry === "string" ? entry.trim() : ""))
|
||||
.filter(Boolean);
|
||||
}
|
||||
} catch {
|
||||
// fall through
|
||||
}
|
||||
return raw
|
||||
.split(/[\n,]+/)
|
||||
.map((entry) => entry.trim())
|
||||
.filter(Boolean);
|
||||
}
|
||||
|
||||
function normalizeHost(hostOrUrl: string): string {
|
||||
const trimmed = hostOrUrl.trim();
|
||||
if (!trimmed) return "";
|
||||
try {
|
||||
if (trimmed.includes("://")) {
|
||||
const url = new URL(trimmed);
|
||||
return url.host;
|
||||
}
|
||||
} catch {
|
||||
return trimmed.replace(/^\/\//, "");
|
||||
}
|
||||
return trimmed.replace(/^\/\//, "");
|
||||
}
|
||||
|
||||
function canonicalJobUrl(url: string): string {
|
||||
try {
|
||||
const parsed = new URL(url);
|
||||
parsed.search = "";
|
||||
return parsed.toString();
|
||||
} catch {
|
||||
return url.replace(/\?[^#]*/, "");
|
||||
}
|
||||
}
|
||||
|
||||
function extractRows(html: string): ParsedJobRow[] {
|
||||
const out: ParsedJobRow[] = [];
|
||||
const seen = new Set<string>();
|
||||
|
||||
const primary =
|
||||
/<a[^>]*href="(https:\/\/[^"]+\/jobs\/\d+\/[^"]+\/job)(?:\?[^"]*)?"[^>]*title="\d+\s*-\s*([^"]+)"/gi;
|
||||
for (;;) {
|
||||
const match = primary.exec(html);
|
||||
if (match === null) break;
|
||||
const url = canonicalJobUrl(match[1]);
|
||||
const title = match[2]?.trim();
|
||||
if (!url || !title || seen.has(url)) continue;
|
||||
seen.add(url);
|
||||
out.push({ url, title });
|
||||
}
|
||||
|
||||
const fallback =
|
||||
/<a[^>]*href="(https:\/\/[^"]+\/jobs\/\d+\/([^"/]+)\/job)(?:\?[^"]*)?"[^>]*>/gi;
|
||||
for (;;) {
|
||||
const match = fallback.exec(html);
|
||||
if (match === null) break;
|
||||
const url = canonicalJobUrl(match[1]);
|
||||
const slug = match[2];
|
||||
if (!url || seen.has(url)) continue;
|
||||
seen.add(url);
|
||||
const title = slug
|
||||
? decodeURIComponent(slug.replace(/\+/g, " "))
|
||||
: "Unknown Title";
|
||||
out.push({ url, title });
|
||||
}
|
||||
|
||||
return out;
|
||||
}
|
||||
|
||||
function matchesTerm(row: ParsedJobRow, term: string): boolean {
|
||||
const lower = term.toLowerCase();
|
||||
return row.title.toLowerCase().includes(lower);
|
||||
}
|
||||
|
||||
function employerFromHost(host: string): string {
|
||||
const prefix = host.replace(/^careers-/, "").replace(/^careers\./, "");
|
||||
const base = prefix.replace(/\.icims\.com$/i, "");
|
||||
return base.replace(/[-_.]/g, " ").trim() || host;
|
||||
}
|
||||
|
||||
export const manifest: ExtractorManifest = {
|
||||
id: "icims",
|
||||
displayName: "iCIMS tenants (HTML)",
|
||||
providesSources: ["icims"],
|
||||
async run(context): Promise<ExtractorRunResult> {
|
||||
if (context.shouldCancel?.()) return { success: true, jobs: [] };
|
||||
|
||||
const hosts = parseHosts(context.settings.icimsTenants)
|
||||
.map(normalizeHost)
|
||||
.filter(Boolean);
|
||||
|
||||
if (hosts.length === 0) {
|
||||
return {
|
||||
success: false,
|
||||
jobs: [],
|
||||
error: "No icimsTenants configured",
|
||||
};
|
||||
}
|
||||
|
||||
const maxPagesRaw = context.settings.icimsMaxPagesPerSearch;
|
||||
const maxPages = maxPagesRaw ? Number.parseInt(maxPagesRaw, 10) : 10;
|
||||
const pages = Number.isFinite(maxPages)
|
||||
? Math.min(Math.max(maxPages, 1), 50)
|
||||
: 10;
|
||||
|
||||
const maxPerTenantRaw = context.settings.icimsMaxJobsPerTenant;
|
||||
const maxPerTenant = maxPerTenantRaw
|
||||
? Number.parseInt(maxPerTenantRaw, 10)
|
||||
: 250;
|
||||
const tenantCap = Number.isFinite(maxPerTenant)
|
||||
? Math.min(Math.max(maxPerTenant, 1), 2000)
|
||||
: 250;
|
||||
|
||||
const terms = context.searchTerms.length > 0 ? context.searchTerms : [""];
|
||||
|
||||
const jobs: CreateJobInput[] = [];
|
||||
const seenGlobal = new Set<string>();
|
||||
|
||||
try {
|
||||
let tenantIndex = 0;
|
||||
for (const rawHost of hosts) {
|
||||
if (context.shouldCancel?.()) break;
|
||||
|
||||
const host = normalizeHost(rawHost);
|
||||
tenantIndex += 1;
|
||||
let tenantCount = 0;
|
||||
|
||||
context.onProgress?.({
|
||||
phase: "list",
|
||||
termsProcessed: tenantIndex - 1,
|
||||
termsTotal: hosts.length,
|
||||
currentUrl: host,
|
||||
detail: `iCIMS tenant ${tenantIndex}/${hosts.length}: ${host}`,
|
||||
});
|
||||
|
||||
for (const term of terms) {
|
||||
if (tenantCount >= tenantCap) break;
|
||||
|
||||
for (let page = 1; page <= pages; page += 1) {
|
||||
if (tenantCount >= tenantCap) break;
|
||||
|
||||
const query = new URLSearchParams({
|
||||
ss: "1",
|
||||
in_iframe: "1",
|
||||
searchKeyword: term,
|
||||
pr: String(page),
|
||||
});
|
||||
|
||||
const searchUrl = `https://${host}/jobs/search?${query.toString()}`;
|
||||
const response = await fetch(searchUrl, {
|
||||
headers: {
|
||||
Accept: "text/html",
|
||||
"User-Agent":
|
||||
"Mozilla/5.0 (compatible; JobOps/1.0) iCIMS portal reader",
|
||||
},
|
||||
});
|
||||
|
||||
if (!response.ok) {
|
||||
throw new Error(
|
||||
`iCIMS fetch failed (${host}): ${response.status}`,
|
||||
);
|
||||
}
|
||||
|
||||
const html = await response.text();
|
||||
const rows = extractRows(html).filter((row) =>
|
||||
term ? matchesTerm(row, term) : true,
|
||||
);
|
||||
|
||||
if (rows.length === 0) break;
|
||||
|
||||
for (const row of rows) {
|
||||
if (tenantCount >= tenantCap) break;
|
||||
if (seenGlobal.has(row.url)) continue;
|
||||
|
||||
seenGlobal.add(row.url);
|
||||
tenantCount += 1;
|
||||
|
||||
jobs.push({
|
||||
source: "icims",
|
||||
sourceJobId: row.url,
|
||||
title: row.title,
|
||||
employer: employerFromHost(host),
|
||||
jobUrl: row.url,
|
||||
applicationLink: row.url,
|
||||
});
|
||||
}
|
||||
|
||||
context.onProgress?.({
|
||||
phase: "list",
|
||||
termsProcessed: tenantIndex - 1,
|
||||
termsTotal: hosts.length,
|
||||
currentUrl: host,
|
||||
jobPagesProcessed: jobs.length,
|
||||
detail: `iCIMS ${host}: page ${page}, +${rows.length} rows`,
|
||||
});
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return { success: true, jobs };
|
||||
} catch (error) {
|
||||
const message = error instanceof Error ? error.message : "Unknown error";
|
||||
return { success: false, jobs, error: message };
|
||||
}
|
||||
},
|
||||
};
|
||||
|
||||
export default manifest;
|
||||
17
extractors/icims/package.json
Normal file
17
extractors/icims/package.json
Normal file
@ -0,0 +1,17 @@
|
||||
{
|
||||
"name": "icims-extractor",
|
||||
"version": "0.0.1",
|
||||
"type": "module",
|
||||
"description": "iCIMS tenant job portal HTML search (anonymous iframe-style listings)",
|
||||
"main": "manifest.ts",
|
||||
"dependencies": {
|
||||
"job-ops-shared": "^1.0.0"
|
||||
},
|
||||
"devDependencies": {
|
||||
"@types/node": "^24.0.0",
|
||||
"typescript": "~5.9.0"
|
||||
},
|
||||
"scripts": {
|
||||
"check:types": "tsc --noEmit"
|
||||
}
|
||||
}
|
||||
17
extractors/icims/tsconfig.json
Normal file
17
extractors/icims/tsconfig.json
Normal file
@ -0,0 +1,17 @@
|
||||
{
|
||||
"compilerOptions": {
|
||||
"module": "ESNext",
|
||||
"moduleResolution": "bundler",
|
||||
"target": "ES2022",
|
||||
"outDir": "dist",
|
||||
"strict": true,
|
||||
"noUnusedLocals": false,
|
||||
"lib": ["ES2022", "DOM"],
|
||||
"types": ["node"],
|
||||
"baseUrl": ".",
|
||||
"paths": {
|
||||
"@shared/*": ["../../shared/src/*"]
|
||||
}
|
||||
},
|
||||
"include": ["./manifest.ts", "./src/**/*"]
|
||||
}
|
||||
10
extractors/qajobsboard/README.md
Normal file
10
extractors/qajobsboard/README.md
Normal file
@ -0,0 +1,10 @@
|
||||
# qajobsboard-extractor
|
||||
|
||||
Loads QA-focused postings from [QAJobsBoard](https://www.qajobsboard.com) via the host’s public JSON feed:
|
||||
|
||||
`GET https://qajobsboard.jobboardly.com/jobs.json`
|
||||
|
||||
RSS is also published at `jobs.rss`; this extractor uses JSON for structured fields.
|
||||
|
||||
- Caps matches via `qajobsboardMaxJobsPerTerm` (default `100`).
|
||||
- Filters client-side by pipeline search terms against title, categories, and description HTML.
|
||||
217
extractors/qajobsboard/manifest.ts
Normal file
217
extractors/qajobsboard/manifest.ts
Normal file
@ -0,0 +1,217 @@
|
||||
/**
|
||||
* QAJobsBoard (JobBoardly) — public jobs listing JSON.
|
||||
*
|
||||
* https://qajobsboard.jobboardly.com/jobs.json
|
||||
*/
|
||||
|
||||
import type {
|
||||
ExtractorManifest,
|
||||
ExtractorRunResult,
|
||||
} from "@shared/types/extractors";
|
||||
import type { CreateJobInput } from "@shared/types/jobs";
|
||||
|
||||
const JOBS_URL = "https://qajobsboard.jobboardly.com/jobs.json";
|
||||
|
||||
interface JobCategory {
|
||||
name?: string;
|
||||
}
|
||||
|
||||
interface SalaryBand {
|
||||
schedule?: string;
|
||||
minimum?: number | null;
|
||||
maximum?: number | null;
|
||||
}
|
||||
|
||||
interface DescriptionBlock {
|
||||
html?: string;
|
||||
}
|
||||
|
||||
interface QaJobBoardlyJob {
|
||||
title?: string;
|
||||
arrangement?: string;
|
||||
location?: string;
|
||||
location_limits?: string[];
|
||||
published_at?: string;
|
||||
application_link?: string;
|
||||
description?: DescriptionBlock;
|
||||
company?: { name?: string; logo?: string };
|
||||
salary?: SalaryBand;
|
||||
categories?: JobCategory[];
|
||||
links?: { self?: string };
|
||||
}
|
||||
|
||||
function asString(value: unknown): string | undefined {
|
||||
if (typeof value !== "string") return undefined;
|
||||
const trimmed = value.trim();
|
||||
return trimmed ? trimmed : undefined;
|
||||
}
|
||||
|
||||
function decodeHtmlEntities(value: string): string {
|
||||
return value
|
||||
.replace(/&/g, "&")
|
||||
.replace(/</g, "<")
|
||||
.replace(/>/g, ">")
|
||||
.replace(/"/g, '"')
|
||||
.replace(/'/g, "'")
|
||||
.replace(/ /g, " ");
|
||||
}
|
||||
|
||||
function stripHtml(html: string): string {
|
||||
const noTags = html.replace(/<[^>]+>/g, " ");
|
||||
return decodeHtmlEntities(noTags).replace(/\s+/g, " ").trim();
|
||||
}
|
||||
|
||||
function salaryLabel(raw: SalaryBand | undefined): string | undefined {
|
||||
if (!raw) return undefined;
|
||||
const schedule = raw.schedule ? `${raw.schedule}: ` : "";
|
||||
if (
|
||||
typeof raw.minimum === "number" &&
|
||||
typeof raw.maximum === "number" &&
|
||||
Number.isFinite(raw.minimum) &&
|
||||
Number.isFinite(raw.maximum)
|
||||
) {
|
||||
return `${schedule}${raw.minimum}–${raw.maximum}`;
|
||||
}
|
||||
if (typeof raw.minimum === "number" && Number.isFinite(raw.minimum)) {
|
||||
return `${schedule}${raw.minimum}+`;
|
||||
}
|
||||
if (typeof raw.maximum === "number" && Number.isFinite(raw.maximum)) {
|
||||
return `${schedule}≤${raw.maximum}`;
|
||||
}
|
||||
return schedule.trim() || undefined;
|
||||
}
|
||||
|
||||
function locationLabel(job: QaJobBoardlyJob): string {
|
||||
const limits = Array.isArray(job.location_limits)
|
||||
? job.location_limits.filter(
|
||||
(v): v is string => typeof v === "string" && v.trim().length > 0,
|
||||
)
|
||||
: [];
|
||||
if (limits.length > 0) return limits.join(", ");
|
||||
const loc = asString(job.location);
|
||||
if (loc) return loc;
|
||||
return "Unknown";
|
||||
}
|
||||
|
||||
function matchesTerm(job: QaJobBoardlyJob, term: string): boolean {
|
||||
const lower = term.toLowerCase();
|
||||
if (job.title?.toLowerCase().includes(lower)) return true;
|
||||
const cats = Array.isArray(job.categories)
|
||||
? job.categories.map((c) => c.name?.toLowerCase() ?? "").join(" ")
|
||||
: "";
|
||||
if (cats.includes(lower)) return true;
|
||||
const html = job.description?.html ?? "";
|
||||
if (stripHtml(html).toLowerCase().includes(lower)) return true;
|
||||
return false;
|
||||
}
|
||||
|
||||
function mapJob(raw: QaJobBoardlyJob): CreateJobInput | null {
|
||||
const jobUrl = asString(raw.links?.self);
|
||||
if (!jobUrl) return null;
|
||||
|
||||
const employer =
|
||||
asString(raw.company?.name)
|
||||
?.replace(/^[\s–-]+/, "")
|
||||
.trim() || "Unknown Employer";
|
||||
|
||||
const applicationLink = asString(raw.application_link) ?? jobUrl;
|
||||
|
||||
const descHtml = raw.description?.html;
|
||||
const jobDescription = descHtml ? stripHtml(descHtml) : undefined;
|
||||
|
||||
const salary = salaryLabel(raw.salary);
|
||||
|
||||
const cats = Array.isArray(raw.categories)
|
||||
? raw.categories
|
||||
.map((c) => c?.name?.trim())
|
||||
.filter((v): v is string => Boolean(v))
|
||||
.join(", ")
|
||||
: undefined;
|
||||
|
||||
return {
|
||||
source: "qajobsboard",
|
||||
sourceJobId: jobUrl.split("/").pop(),
|
||||
title: asString(raw.title) ?? "Unknown Title",
|
||||
employer,
|
||||
jobUrl,
|
||||
applicationLink,
|
||||
location: locationLabel(raw),
|
||||
isRemote: asString(raw.location)?.toLowerCase() === "remote",
|
||||
datePosted: asString(raw.published_at),
|
||||
jobDescription,
|
||||
jobType: asString(raw.arrangement),
|
||||
salary,
|
||||
disciplines: cats,
|
||||
companyLogo: asString(raw.company?.logo),
|
||||
};
|
||||
}
|
||||
|
||||
export const manifest: ExtractorManifest = {
|
||||
id: "qajobsboard",
|
||||
displayName: "QAJobsBoard",
|
||||
providesSources: ["qajobsboard"],
|
||||
async run(context): Promise<ExtractorRunResult> {
|
||||
if (context.shouldCancel?.()) return { success: true, jobs: [] };
|
||||
|
||||
const maxJobs = context.settings.qajobsboardMaxJobsPerTerm
|
||||
? Number.parseInt(context.settings.qajobsboardMaxJobsPerTerm, 10)
|
||||
: 100;
|
||||
const cap = Number.isFinite(maxJobs)
|
||||
? Math.min(Math.max(maxJobs, 1), 500)
|
||||
: 100;
|
||||
|
||||
const terms = context.searchTerms.length > 0 ? context.searchTerms : [];
|
||||
|
||||
context.onProgress?.({
|
||||
phase: "list",
|
||||
termsProcessed: 0,
|
||||
termsTotal: 1,
|
||||
currentUrl: JOBS_URL,
|
||||
detail: "QAJobsBoard: fetching jobs.json",
|
||||
});
|
||||
|
||||
try {
|
||||
const response = await fetch(JOBS_URL, {
|
||||
headers: { Accept: "application/json", "User-Agent": "JobOps/1.0" },
|
||||
});
|
||||
if (!response.ok) {
|
||||
throw new Error(
|
||||
`QAJobsBoard request failed with status ${response.status}`,
|
||||
);
|
||||
}
|
||||
const body = (await response.json()) as unknown;
|
||||
const rows = Array.isArray(body) ? body : [];
|
||||
|
||||
const seen = new Set<string>();
|
||||
const out: CreateJobInput[] = [];
|
||||
|
||||
for (const row of rows as QaJobBoardlyJob[]) {
|
||||
if (out.length >= cap) break;
|
||||
if (terms.length > 0 && !terms.some((t) => matchesTerm(row, t)))
|
||||
continue;
|
||||
const mapped = mapJob(row);
|
||||
if (!mapped) continue;
|
||||
const key = mapped.sourceJobId || mapped.jobUrl;
|
||||
if (seen.has(key)) continue;
|
||||
seen.add(key);
|
||||
out.push(mapped);
|
||||
}
|
||||
|
||||
context.onProgress?.({
|
||||
phase: "list",
|
||||
termsProcessed: 1,
|
||||
termsTotal: 1,
|
||||
currentUrl: JOBS_URL,
|
||||
jobPagesProcessed: out.length,
|
||||
detail: `QAJobsBoard: ${out.length} matched (${rows.length} total listings)`,
|
||||
});
|
||||
|
||||
return { success: true, jobs: out };
|
||||
} catch (error) {
|
||||
const message = error instanceof Error ? error.message : "Unknown error";
|
||||
return { success: false, jobs: [], error: message };
|
||||
}
|
||||
},
|
||||
};
|
||||
|
||||
export default manifest;
|
||||
17
extractors/qajobsboard/package.json
Normal file
17
extractors/qajobsboard/package.json
Normal file
@ -0,0 +1,17 @@
|
||||
{
|
||||
"name": "qajobsboard-extractor",
|
||||
"version": "0.0.1",
|
||||
"type": "module",
|
||||
"description": "QAJobsBoard (JobBoardly) public jobs.json extractor",
|
||||
"main": "manifest.ts",
|
||||
"dependencies": {
|
||||
"job-ops-shared": "^1.0.0"
|
||||
},
|
||||
"devDependencies": {
|
||||
"@types/node": "^24.0.0",
|
||||
"typescript": "~5.9.0"
|
||||
},
|
||||
"scripts": {
|
||||
"check:types": "tsc --noEmit"
|
||||
}
|
||||
}
|
||||
17
extractors/qajobsboard/tsconfig.json
Normal file
17
extractors/qajobsboard/tsconfig.json
Normal file
@ -0,0 +1,17 @@
|
||||
{
|
||||
"compilerOptions": {
|
||||
"module": "ESNext",
|
||||
"moduleResolution": "bundler",
|
||||
"target": "ES2022",
|
||||
"outDir": "dist",
|
||||
"strict": true,
|
||||
"noUnusedLocals": false,
|
||||
"lib": ["ES2022", "DOM"],
|
||||
"types": ["node"],
|
||||
"baseUrl": ".",
|
||||
"paths": {
|
||||
"@shared/*": ["../../shared/src/*"]
|
||||
}
|
||||
},
|
||||
"include": ["./manifest.ts", "./src/**/*"]
|
||||
}
|
||||
11
extractors/smartrecruiters/README.md
Normal file
11
extractors/smartrecruiters/README.md
Normal file
@ -0,0 +1,11 @@
|
||||
# smartrecruiters-extractor
|
||||
|
||||
Fetches public job postings from the [SmartRecruiters Posting API](https://developers.smartrecruiters.com/reference/v1listpostings):
|
||||
|
||||
`GET https://api.smartrecruiters.com/v1/companies/{companyIdentifier}/postings`
|
||||
|
||||
No API key is required for public listings. Configure one **company identifier** per employer (the slug from `jobs.smartrecruiters.com/<identifier>/...` or `careers.smartrecruiters.com/<identifier>`).
|
||||
|
||||
- Set `smartrecruitersCompanies` (JSON array or comma/newline-separated identifiers), or `SMARTRECRUITERS_COMPANIES` in the environment.
|
||||
- Optional: `smartrecruitersMaxJobsPerCompany` caps how many postings are pulled **per company** after search-term filtering (default 100).
|
||||
- The manifest loads posting **details** for each match so `jobUrl` / `applicationLink` and HTML descriptions resolve correctly.
|
||||
287
extractors/smartrecruiters/manifest.ts
Normal file
287
extractors/smartrecruiters/manifest.ts
Normal file
@ -0,0 +1,287 @@
|
||||
/**
|
||||
* SmartRecruiters public Posting API (no auth for public boards).
|
||||
*
|
||||
* https://developers.smartrecruiters.com/reference/v1listpostings
|
||||
* GET https://api.smartrecruiters.com/v1/companies/{companyIdentifier}/postings
|
||||
* GET https://api.smartrecruiters.com/v1/companies/{companyIdentifier}/postings/{postingId}
|
||||
*/
|
||||
|
||||
import type {
|
||||
ExtractorManifest,
|
||||
ExtractorRunResult,
|
||||
} from "@shared/types/extractors";
|
||||
import type { CreateJobInput } from "@shared/types/jobs";
|
||||
|
||||
const LIST_LIMIT = 100;
|
||||
|
||||
interface SrCompany {
|
||||
identifier?: string;
|
||||
name?: string;
|
||||
}
|
||||
|
||||
interface SrLocation {
|
||||
fullLocation?: string;
|
||||
city?: string;
|
||||
region?: string;
|
||||
country?: string;
|
||||
remote?: boolean;
|
||||
hybrid?: boolean;
|
||||
}
|
||||
|
||||
interface SrPostingSummary {
|
||||
id?: string;
|
||||
name?: string;
|
||||
releasedDate?: string;
|
||||
company?: SrCompany;
|
||||
location?: SrLocation;
|
||||
typeOfEmployment?: { label?: string };
|
||||
experienceLevel?: { id?: string; label?: string };
|
||||
}
|
||||
|
||||
interface SrListResponse {
|
||||
content?: SrPostingSummary[];
|
||||
totalFound?: number;
|
||||
offset?: number;
|
||||
limit?: number;
|
||||
}
|
||||
|
||||
interface SrDetail extends SrPostingSummary {
|
||||
postingUrl?: string;
|
||||
applyUrl?: string;
|
||||
jobAd?: {
|
||||
sections?: Record<string, { title?: string; text?: string } | undefined>;
|
||||
};
|
||||
}
|
||||
|
||||
function asString(value: unknown): string | undefined {
|
||||
if (typeof value !== "string") return undefined;
|
||||
const trimmed = value.trim();
|
||||
return trimmed ? trimmed : undefined;
|
||||
}
|
||||
|
||||
function readCompanies(raw: string | undefined): string[] {
|
||||
if (!raw) return [];
|
||||
try {
|
||||
const parsed = JSON.parse(raw);
|
||||
if (Array.isArray(parsed)) {
|
||||
return parsed
|
||||
.map((entry) => (typeof entry === "string" ? entry.trim() : ""))
|
||||
.filter(Boolean);
|
||||
}
|
||||
} catch {
|
||||
// fall through
|
||||
}
|
||||
return raw
|
||||
.split(/[\n,;|]+/)
|
||||
.map((entry) => entry.trim())
|
||||
.filter(Boolean);
|
||||
}
|
||||
|
||||
function decodeHtmlEntities(value: string): string {
|
||||
return value
|
||||
.replace(/&/g, "&")
|
||||
.replace(/</g, "<")
|
||||
.replace(/>/g, ">")
|
||||
.replace(/"/g, '"')
|
||||
.replace(/'/g, "'")
|
||||
.replace(/ /g, " ");
|
||||
}
|
||||
|
||||
function stripHtml(html: string): string {
|
||||
const noTags = html.replace(/<[^>]+>/g, " ");
|
||||
return decodeHtmlEntities(noTags).replace(/\s+/g, " ").trim();
|
||||
}
|
||||
|
||||
function locationString(loc: SrLocation | undefined): string {
|
||||
if (!loc) return "Unknown";
|
||||
const full = asString(loc.fullLocation);
|
||||
if (full) return full;
|
||||
const parts = [loc.city, loc.region, loc.country]
|
||||
.map((p) => asString(p))
|
||||
.filter(Boolean) as string[];
|
||||
return parts.length > 0 ? parts.join(", ") : "Unknown";
|
||||
}
|
||||
|
||||
function extractDescription(detail: SrDetail): string | undefined {
|
||||
const sections = detail.jobAd?.sections;
|
||||
if (!sections || typeof sections !== "object") return undefined;
|
||||
const chunks: string[] = [];
|
||||
for (const block of Object.values(sections)) {
|
||||
const text = block && typeof block.text === "string" ? block.text : "";
|
||||
if (text.trim()) chunks.push(text);
|
||||
}
|
||||
if (chunks.length === 0) return undefined;
|
||||
return stripHtml(chunks.join("\n\n"));
|
||||
}
|
||||
|
||||
function matchesTerm(summary: SrPostingSummary, term: string): boolean {
|
||||
const lower = term.toLowerCase();
|
||||
if (summary.name?.toLowerCase().includes(lower)) return true;
|
||||
if (locationString(summary.location).toLowerCase().includes(lower))
|
||||
return true;
|
||||
if (summary.company?.name?.toLowerCase().includes(lower)) return true;
|
||||
return false;
|
||||
}
|
||||
|
||||
async function fetchPostingsPage(
|
||||
company: string,
|
||||
offset: number,
|
||||
): Promise<SrListResponse> {
|
||||
const base = `https://api.smartrecruiters.com/v1/companies/${encodeURIComponent(company)}/postings`;
|
||||
const url = `${base}?destination=PUBLIC&limit=${LIST_LIMIT}&offset=${offset}`;
|
||||
const response = await fetch(url, {
|
||||
headers: { Accept: "application/json" },
|
||||
});
|
||||
if (response.status === 404) {
|
||||
return { content: [], totalFound: 0, offset: 0, limit: LIST_LIMIT };
|
||||
}
|
||||
if (!response.ok) {
|
||||
throw new Error(
|
||||
`SmartRecruiters list for "${company}" failed with status ${response.status}`,
|
||||
);
|
||||
}
|
||||
return (await response.json()) as SrListResponse;
|
||||
}
|
||||
|
||||
async function fetchPostingDetail(
|
||||
company: string,
|
||||
postingId: string,
|
||||
): Promise<SrDetail | null> {
|
||||
const url = `https://api.smartrecruiters.com/v1/companies/${encodeURIComponent(company)}/postings/${encodeURIComponent(postingId)}`;
|
||||
const response = await fetch(url, {
|
||||
headers: { Accept: "application/json" },
|
||||
});
|
||||
if (!response.ok) return null;
|
||||
return (await response.json()) as SrDetail;
|
||||
}
|
||||
|
||||
function mapDetailToJob(detail: SrDetail): CreateJobInput | null {
|
||||
const postingId = asString(detail.id);
|
||||
if (!postingId) return null;
|
||||
|
||||
const jobUrl = asString(detail.applyUrl) ?? asString(detail.postingUrl);
|
||||
if (!jobUrl) return null;
|
||||
|
||||
const employer =
|
||||
asString(detail.company?.name) ??
|
||||
asString(detail.company?.identifier) ??
|
||||
"Unknown Employer";
|
||||
|
||||
const jobType = asString(detail.typeOfEmployment?.label);
|
||||
const jobLevel = asString(detail.experienceLevel?.label);
|
||||
|
||||
return {
|
||||
source: "smartrecruiters",
|
||||
sourceJobId: postingId,
|
||||
title: asString(detail.name) ?? "Unknown Title",
|
||||
employer,
|
||||
jobUrl,
|
||||
applicationLink: asString(detail.applyUrl) ?? jobUrl,
|
||||
location: locationString(detail.location),
|
||||
isRemote: detail.location?.remote === true,
|
||||
datePosted: asString(detail.releasedDate),
|
||||
jobDescription: extractDescription(detail),
|
||||
jobType: jobType || undefined,
|
||||
jobLevel: jobLevel || undefined,
|
||||
};
|
||||
}
|
||||
|
||||
export const manifest: ExtractorManifest = {
|
||||
id: "smartrecruiters",
|
||||
displayName: "SmartRecruiters (ATS)",
|
||||
providesSources: ["smartrecruiters"],
|
||||
async run(context): Promise<ExtractorRunResult> {
|
||||
if (context.shouldCancel?.()) return { success: true, jobs: [] };
|
||||
|
||||
const companies = readCompanies(context.settings.smartrecruitersCompanies);
|
||||
if (companies.length === 0) {
|
||||
return {
|
||||
success: true,
|
||||
jobs: [],
|
||||
error:
|
||||
"No SmartRecruiters companies configured. Set SMARTRECRUITERS_COMPANIES or smartrecruitersCompanies (comma- or newline-separated company identifiers).",
|
||||
};
|
||||
}
|
||||
|
||||
const maxPerCompany = context.settings.smartrecruitersMaxJobsPerCompany
|
||||
? Number.parseInt(context.settings.smartrecruitersMaxJobsPerCompany, 10)
|
||||
: 100;
|
||||
const cap = Number.isFinite(maxPerCompany)
|
||||
? Math.min(Math.max(maxPerCompany, 1), 500)
|
||||
: 100;
|
||||
|
||||
const terms = context.searchTerms.length > 0 ? context.searchTerms : [];
|
||||
const seen = new Set<string>();
|
||||
const out: CreateJobInput[] = [];
|
||||
|
||||
try {
|
||||
for (let i = 0; i < companies.length; i += 1) {
|
||||
if (context.shouldCancel?.()) break;
|
||||
const company = companies[i];
|
||||
context.onProgress?.({
|
||||
phase: "list",
|
||||
termsProcessed: i,
|
||||
termsTotal: companies.length,
|
||||
currentUrl: company,
|
||||
detail: `SmartRecruiters: ${company} (${i + 1}/${companies.length})`,
|
||||
});
|
||||
|
||||
const matchedSummaries: SrPostingSummary[] = [];
|
||||
let offset = 0;
|
||||
let totalFound = Number.POSITIVE_INFINITY;
|
||||
|
||||
while (matchedSummaries.length < cap && offset < totalFound) {
|
||||
if (context.shouldCancel?.()) break;
|
||||
const page = await fetchPostingsPage(company, offset);
|
||||
const batch = Array.isArray(page.content) ? page.content : [];
|
||||
totalFound =
|
||||
typeof page.totalFound === "number" ? page.totalFound : offset;
|
||||
if (batch.length === 0) break;
|
||||
|
||||
for (const row of batch) {
|
||||
if (matchedSummaries.length >= cap) break;
|
||||
if (terms.length > 0 && !terms.some((t) => matchesTerm(row, t))) {
|
||||
continue;
|
||||
}
|
||||
matchedSummaries.push(row);
|
||||
}
|
||||
|
||||
offset += batch.length;
|
||||
if (offset >= totalFound) break;
|
||||
}
|
||||
|
||||
let added = 0;
|
||||
for (const summary of matchedSummaries) {
|
||||
if (context.shouldCancel?.()) break;
|
||||
const id = asString(summary.id);
|
||||
if (!id) continue;
|
||||
const detail = await fetchPostingDetail(company, id);
|
||||
if (!detail) continue;
|
||||
const mapped = mapDetailToJob(detail);
|
||||
if (!mapped) continue;
|
||||
const key = mapped.sourceJobId || mapped.jobUrl;
|
||||
if (seen.has(key)) continue;
|
||||
seen.add(key);
|
||||
out.push(mapped);
|
||||
added += 1;
|
||||
}
|
||||
|
||||
context.onProgress?.({
|
||||
phase: "list",
|
||||
termsProcessed: i + 1,
|
||||
termsTotal: companies.length,
|
||||
currentUrl: company,
|
||||
jobPagesProcessed: out.length,
|
||||
detail: `SmartRecruiters: ${company} → ${added} jobs (${out.length} total)`,
|
||||
});
|
||||
}
|
||||
} catch (error) {
|
||||
const message = error instanceof Error ? error.message : "Unknown error";
|
||||
return { success: false, jobs: out, error: message };
|
||||
}
|
||||
|
||||
return { success: true, jobs: out };
|
||||
},
|
||||
};
|
||||
|
||||
export default manifest;
|
||||
17
extractors/smartrecruiters/package.json
Normal file
17
extractors/smartrecruiters/package.json
Normal file
@ -0,0 +1,17 @@
|
||||
{
|
||||
"name": "smartrecruiters-extractor",
|
||||
"version": "0.0.1",
|
||||
"type": "module",
|
||||
"description": "SmartRecruiters public Posting API extractor",
|
||||
"main": "manifest.ts",
|
||||
"dependencies": {
|
||||
"job-ops-shared": "^1.0.0"
|
||||
},
|
||||
"devDependencies": {
|
||||
"@types/node": "^24.0.0",
|
||||
"typescript": "~5.9.0"
|
||||
},
|
||||
"scripts": {
|
||||
"check:types": "tsc --noEmit"
|
||||
}
|
||||
}
|
||||
17
extractors/smartrecruiters/tsconfig.json
Normal file
17
extractors/smartrecruiters/tsconfig.json
Normal file
@ -0,0 +1,17 @@
|
||||
{
|
||||
"compilerOptions": {
|
||||
"module": "ESNext",
|
||||
"moduleResolution": "bundler",
|
||||
"target": "ES2022",
|
||||
"outDir": "dist",
|
||||
"strict": true,
|
||||
"noUnusedLocals": false,
|
||||
"lib": ["ES2022", "DOM"],
|
||||
"types": ["node"],
|
||||
"baseUrl": ".",
|
||||
"paths": {
|
||||
"@shared/*": ["../../shared/src/*"]
|
||||
}
|
||||
},
|
||||
"include": ["./manifest.ts", "./src/**/*"]
|
||||
}
|
||||
@ -503,7 +503,9 @@ async function fetchApi<T>(
|
||||
): Promise<T> {
|
||||
const method = (options?.method || "GET").toUpperCase();
|
||||
const activeCreds = getActiveBasicAuthCredentials();
|
||||
let authHeader = activeCreds ? encodeBasicAuthHeaderValue(activeCreds) : undefined;
|
||||
let authHeader = activeCreds
|
||||
? encodeBasicAuthHeaderValue(activeCreds)
|
||||
: undefined;
|
||||
let authAttempt = 0;
|
||||
let usernameHint = activeCreds?.username;
|
||||
|
||||
|
||||
@ -210,6 +210,11 @@ export const getEnabledSources = (
|
||||
const hasGreenhouseCompanies =
|
||||
(settings.greenhouseCompanies?.value ?? []).length > 0;
|
||||
const hasWorkdayTenants = (settings.workdayTenants?.value ?? []).length > 0;
|
||||
const hasSmartrecruitersCompanies =
|
||||
(settings.smartrecruitersCompanies?.value ?? []).length > 0;
|
||||
const hasElutaRssLocations =
|
||||
(settings.elutaRssLocations?.value ?? []).length > 0;
|
||||
const hasIcimsTenants = (settings.icimsTenants?.value ?? []).length > 0;
|
||||
|
||||
for (const source of orderedSources) {
|
||||
if (source === "gradcracker") {
|
||||
@ -272,6 +277,22 @@ export const getEnabledSources = (
|
||||
if (hasWorkdayTenants) enabled.push(source);
|
||||
continue;
|
||||
}
|
||||
if (source === "smartrecruiters") {
|
||||
if (hasSmartrecruitersCompanies) enabled.push(source);
|
||||
continue;
|
||||
}
|
||||
if (source === "icims") {
|
||||
if (hasIcimsTenants) enabled.push(source);
|
||||
continue;
|
||||
}
|
||||
if (source === "bctenet") {
|
||||
enabled.push(source);
|
||||
continue;
|
||||
}
|
||||
if (source === "eluta") {
|
||||
if (hasElutaRssLocations) enabled.push(source);
|
||||
continue;
|
||||
}
|
||||
if (
|
||||
source === "indeed" ||
|
||||
source === "linkedin" ||
|
||||
@ -286,7 +307,9 @@ export const getEnabledSources = (
|
||||
source === "arbeitnow" ||
|
||||
source === "himalayas" ||
|
||||
source === "weworkremotely" ||
|
||||
source === "fourdayweek"
|
||||
source === "fourdayweek" ||
|
||||
source === "qajobsboard" ||
|
||||
source === "arcdev"
|
||||
) {
|
||||
enabled.push(source);
|
||||
}
|
||||
|
||||
@ -10,9 +10,9 @@ import { asyncRoute, fail, ok } from "@infra/http";
|
||||
import { logger } from "@infra/logger";
|
||||
import { getRequestId } from "@infra/request-context";
|
||||
import { isDemoMode, sendDemoBlocked } from "@server/config/demo";
|
||||
import { getJobOwnerProfileId } from "@server/infra/request-context";
|
||||
import { DEFAULT_JOB_OWNER_PROFILE_ID } from "@server/infra/job-owner-context";
|
||||
import { parseBasicAuthUsername } from "@server/infra/basic-auth-credentials";
|
||||
import { DEFAULT_JOB_OWNER_PROFILE_ID } from "@server/infra/job-owner-context";
|
||||
import { getJobOwnerProfileId } from "@server/infra/request-context";
|
||||
import * as profilesRepo from "@server/repositories/profiles";
|
||||
import { getSetting } from "@server/repositories/settings";
|
||||
import { setBackupSettings } from "@server/services/backup/index";
|
||||
@ -30,11 +30,11 @@ import {
|
||||
} from "@server/services/rxresume";
|
||||
import { getEffectiveSettings } from "@server/services/settings";
|
||||
import { applySettingsUpdates } from "@server/services/settings-update";
|
||||
import { jobSearchProfileSchema } from "@shared/settings-registry";
|
||||
import {
|
||||
type UpdateSettingsInput,
|
||||
updateSettingsSchema,
|
||||
} from "@shared/settings-schema";
|
||||
import { jobSearchProfileSchema } from "@shared/settings-registry";
|
||||
import { type Request, type Response, Router } from "express";
|
||||
|
||||
export const settingsRouter = Router();
|
||||
@ -251,9 +251,12 @@ settingsRouter.patch(
|
||||
) {
|
||||
const parsed = jobSearchProfileSchema.safeParse(input.jobSearchProfile);
|
||||
if (parsed.success) {
|
||||
const username = parseBasicAuthUsername(req.headers.authorization)?.trim();
|
||||
const dataWithOwner =
|
||||
username ? { ...parsed.data, basicAuthUser: username } : parsed.data;
|
||||
const username = parseBasicAuthUsername(
|
||||
req.headers.authorization,
|
||||
)?.trim();
|
||||
const dataWithOwner = username
|
||||
? { ...parsed.data, basicAuthUser: username }
|
||||
: parsed.data;
|
||||
await profilesRepo.updateProfile(ownerId, { data: dataWithOwner });
|
||||
}
|
||||
}
|
||||
|
||||
@ -272,6 +272,12 @@ export const DEMO_SOURCE_BASE_URLS: Record<JobSource, string> = {
|
||||
lever: "https://jobs.lever.co",
|
||||
greenhouse: "https://boards.greenhouse.io",
|
||||
workday: "https://workday.com",
|
||||
smartrecruiters: "https://www.smartrecruiters.com",
|
||||
icims: "https://www.icims.com",
|
||||
bctenet: "https://www.bctechnology.com",
|
||||
eluta: "https://www.eluta.ca",
|
||||
qajobsboard: "https://www.qajobsboard.com",
|
||||
arcdev: "https://arc.dev",
|
||||
manual: "https://example.com",
|
||||
};
|
||||
|
||||
|
||||
@ -12,13 +12,13 @@ import {
|
||||
normalizeCountryKey,
|
||||
} from "@shared/location-support.js";
|
||||
import { resolveBlockedCompanyKeywordsFromStoredString } from "@shared/resolve-blocked-company-keywords.js";
|
||||
import { jobSearchProfileSchema } from "@shared/settings-registry.js";
|
||||
import {
|
||||
inferCountryKeyFromSearchGeography,
|
||||
matchesRequestedCity,
|
||||
resolveSearchCities,
|
||||
shouldApplyStrictCityFilter,
|
||||
} from "@shared/search-cities.js";
|
||||
import { jobSearchProfileSchema } from "@shared/settings-registry.js";
|
||||
import type { CreateJobInput, PipelineConfig } from "@shared/types";
|
||||
import { type CrawlSource, progressHelpers, updateProgress } from "../progress";
|
||||
|
||||
@ -106,19 +106,14 @@ const ROLE_TOKEN_STOPWORDS = new Set([
|
||||
]);
|
||||
|
||||
function normalizeText(value: string | null | undefined): string {
|
||||
return (value ?? "")
|
||||
.toLowerCase()
|
||||
.replace(/\s+/g, " ")
|
||||
.trim();
|
||||
return (value ?? "").toLowerCase().replace(/\s+/g, " ").trim();
|
||||
}
|
||||
|
||||
function buildRoleMatchers(phrases: string[]): {
|
||||
phraseMatchers: string[];
|
||||
tokenMatchers: string[];
|
||||
} {
|
||||
const phraseMatchers = phrases
|
||||
.map((p) => normalizeText(p))
|
||||
.filter(Boolean);
|
||||
const phraseMatchers = phrases.map((p) => normalizeText(p)).filter(Boolean);
|
||||
|
||||
const tokenSet = new Set<string>();
|
||||
for (const phrase of phraseMatchers) {
|
||||
@ -164,7 +159,10 @@ function filterJobsBySearchProfile(args: {
|
||||
const body = normalizeText(job.jobDescription);
|
||||
const haystack = `${title}\n${body}`;
|
||||
|
||||
if (dealBreakersLower.length > 0 && matchesAny(haystack, dealBreakersLower)) {
|
||||
if (
|
||||
dealBreakersLower.length > 0 &&
|
||||
matchesAny(haystack, dealBreakersLower)
|
||||
) {
|
||||
return false;
|
||||
}
|
||||
|
||||
|
||||
151
package-lock.json
generated
151
package-lock.json
generated
@ -14,6 +14,7 @@
|
||||
"devDependencies": {
|
||||
"@types/node": "^25.2.3",
|
||||
"dotenv": "^17.2.3",
|
||||
"job-ops-shared": "workspace:*",
|
||||
"knip": "^5.83.1",
|
||||
"tsx": "^4.19.2",
|
||||
"typescript": "^5.9.3"
|
||||
@ -125,6 +126,27 @@
|
||||
"undici-types": "~7.16.0"
|
||||
}
|
||||
},
|
||||
"extractors/arcdev": {
|
||||
"name": "arcdev-extractor",
|
||||
"version": "0.0.1",
|
||||
"dependencies": {
|
||||
"job-ops-shared": "^1.0.0"
|
||||
},
|
||||
"devDependencies": {
|
||||
"@types/node": "^24.0.0",
|
||||
"typescript": "~5.9.0"
|
||||
}
|
||||
},
|
||||
"extractors/arcdev/node_modules/@types/node": {
|
||||
"version": "24.12.4",
|
||||
"resolved": "https://registry.npmjs.org/@types/node/-/node-24.12.4.tgz",
|
||||
"integrity": "sha512-GUUEShf+PBCGW2KaXwcIt3Yk+e3pkKwWKb9GSyM9WQVE+ep2jzmHdGsHzu4wgcZy5fN9FBdVzjpBQsYlpfpgLA==",
|
||||
"dev": true,
|
||||
"license": "MIT",
|
||||
"dependencies": {
|
||||
"undici-types": "~7.16.0"
|
||||
}
|
||||
},
|
||||
"extractors/ashby": {
|
||||
"name": "ashby-extractor",
|
||||
"version": "0.0.1",
|
||||
@ -146,6 +168,27 @@
|
||||
"undici-types": "~7.16.0"
|
||||
}
|
||||
},
|
||||
"extractors/bctenet": {
|
||||
"name": "bctenet-extractor",
|
||||
"version": "0.0.1",
|
||||
"dependencies": {
|
||||
"job-ops-shared": "^1.0.0"
|
||||
},
|
||||
"devDependencies": {
|
||||
"@types/node": "^24.0.0",
|
||||
"typescript": "~5.9.0"
|
||||
}
|
||||
},
|
||||
"extractors/bctenet/node_modules/@types/node": {
|
||||
"version": "24.12.4",
|
||||
"resolved": "https://registry.npmjs.org/@types/node/-/node-24.12.4.tgz",
|
||||
"integrity": "sha512-GUUEShf+PBCGW2KaXwcIt3Yk+e3pkKwWKb9GSyM9WQVE+ep2jzmHdGsHzu4wgcZy5fN9FBdVzjpBQsYlpfpgLA==",
|
||||
"dev": true,
|
||||
"license": "MIT",
|
||||
"dependencies": {
|
||||
"undici-types": "~7.16.0"
|
||||
}
|
||||
},
|
||||
"extractors/careerjet": {
|
||||
"name": "careerjet-extractor",
|
||||
"version": "0.0.1",
|
||||
@ -167,6 +210,27 @@
|
||||
"undici-types": "~7.16.0"
|
||||
}
|
||||
},
|
||||
"extractors/eluta": {
|
||||
"name": "eluta-extractor",
|
||||
"version": "0.0.1",
|
||||
"dependencies": {
|
||||
"job-ops-shared": "^1.0.0"
|
||||
},
|
||||
"devDependencies": {
|
||||
"@types/node": "^24.0.0",
|
||||
"typescript": "~5.9.0"
|
||||
}
|
||||
},
|
||||
"extractors/eluta/node_modules/@types/node": {
|
||||
"version": "24.12.4",
|
||||
"resolved": "https://registry.npmjs.org/@types/node/-/node-24.12.4.tgz",
|
||||
"integrity": "sha512-GUUEShf+PBCGW2KaXwcIt3Yk+e3pkKwWKb9GSyM9WQVE+ep2jzmHdGsHzu4wgcZy5fN9FBdVzjpBQsYlpfpgLA==",
|
||||
"dev": true,
|
||||
"license": "MIT",
|
||||
"dependencies": {
|
||||
"undici-types": "~7.16.0"
|
||||
}
|
||||
},
|
||||
"extractors/fourdayweek": {
|
||||
"name": "fourdayweek-extractor",
|
||||
"version": "0.0.1",
|
||||
@ -308,6 +372,27 @@
|
||||
"undici-types": "~7.16.0"
|
||||
}
|
||||
},
|
||||
"extractors/icims": {
|
||||
"name": "icims-extractor",
|
||||
"version": "0.0.1",
|
||||
"dependencies": {
|
||||
"job-ops-shared": "^1.0.0"
|
||||
},
|
||||
"devDependencies": {
|
||||
"@types/node": "^24.0.0",
|
||||
"typescript": "~5.9.0"
|
||||
}
|
||||
},
|
||||
"extractors/icims/node_modules/@types/node": {
|
||||
"version": "24.12.4",
|
||||
"resolved": "https://registry.npmjs.org/@types/node/-/node-24.12.4.tgz",
|
||||
"integrity": "sha512-GUUEShf+PBCGW2KaXwcIt3Yk+e3pkKwWKb9GSyM9WQVE+ep2jzmHdGsHzu4wgcZy5fN9FBdVzjpBQsYlpfpgLA==",
|
||||
"dev": true,
|
||||
"license": "MIT",
|
||||
"dependencies": {
|
||||
"undici-types": "~7.16.0"
|
||||
}
|
||||
},
|
||||
"extractors/jobicy": {
|
||||
"name": "jobicy-extractor",
|
||||
"version": "0.0.1",
|
||||
@ -371,6 +456,27 @@
|
||||
"undici-types": "~7.16.0"
|
||||
}
|
||||
},
|
||||
"extractors/qajobsboard": {
|
||||
"name": "qajobsboard-extractor",
|
||||
"version": "0.0.1",
|
||||
"dependencies": {
|
||||
"job-ops-shared": "^1.0.0"
|
||||
},
|
||||
"devDependencies": {
|
||||
"@types/node": "^24.0.0",
|
||||
"typescript": "~5.9.0"
|
||||
}
|
||||
},
|
||||
"extractors/qajobsboard/node_modules/@types/node": {
|
||||
"version": "24.12.4",
|
||||
"resolved": "https://registry.npmjs.org/@types/node/-/node-24.12.4.tgz",
|
||||
"integrity": "sha512-GUUEShf+PBCGW2KaXwcIt3Yk+e3pkKwWKb9GSyM9WQVE+ep2jzmHdGsHzu4wgcZy5fN9FBdVzjpBQsYlpfpgLA==",
|
||||
"dev": true,
|
||||
"license": "MIT",
|
||||
"dependencies": {
|
||||
"undici-types": "~7.16.0"
|
||||
}
|
||||
},
|
||||
"extractors/reed": {
|
||||
"name": "reed-extractor",
|
||||
"version": "0.0.1",
|
||||
@ -434,6 +540,27 @@
|
||||
"undici-types": "~7.16.0"
|
||||
}
|
||||
},
|
||||
"extractors/smartrecruiters": {
|
||||
"name": "smartrecruiters-extractor",
|
||||
"version": "0.0.1",
|
||||
"dependencies": {
|
||||
"job-ops-shared": "^1.0.0"
|
||||
},
|
||||
"devDependencies": {
|
||||
"@types/node": "^24.0.0",
|
||||
"typescript": "~5.9.0"
|
||||
}
|
||||
},
|
||||
"extractors/smartrecruiters/node_modules/@types/node": {
|
||||
"version": "24.12.4",
|
||||
"resolved": "https://registry.npmjs.org/@types/node/-/node-24.12.4.tgz",
|
||||
"integrity": "sha512-GUUEShf+PBCGW2KaXwcIt3Yk+e3pkKwWKb9GSyM9WQVE+ep2jzmHdGsHzu4wgcZy5fN9FBdVzjpBQsYlpfpgLA==",
|
||||
"dev": true,
|
||||
"license": "MIT",
|
||||
"dependencies": {
|
||||
"undici-types": "~7.16.0"
|
||||
}
|
||||
},
|
||||
"extractors/startupjobs": {
|
||||
"name": "startupjobs-extractor",
|
||||
"version": "0.0.1",
|
||||
@ -9725,6 +9852,10 @@
|
||||
"resolved": "extractors/arbeitnow",
|
||||
"link": true
|
||||
},
|
||||
"node_modules/arcdev-extractor": {
|
||||
"resolved": "extractors/arcdev",
|
||||
"link": true
|
||||
},
|
||||
"node_modules/arg": {
|
||||
"version": "5.0.2",
|
||||
"resolved": "https://registry.npmjs.org/arg/-/arg-5.0.2.tgz",
|
||||
@ -10008,6 +10139,10 @@
|
||||
"integrity": "sha512-x+VAiMRL6UPkx+kudNvxTl6hB2XNNCG2r+7wixVfIYwu/2HKRXimwQyaumLjMveWvT2Hkd/cAJw+QBMfJ/EKVw==",
|
||||
"license": "MIT"
|
||||
},
|
||||
"node_modules/bctenet-extractor": {
|
||||
"resolved": "extractors/bctenet",
|
||||
"link": true
|
||||
},
|
||||
"node_modules/better-sqlite3": {
|
||||
"version": "12.6.2",
|
||||
"resolved": "https://registry.npmjs.org/better-sqlite3/-/better-sqlite3-12.6.2.tgz",
|
||||
@ -12298,6 +12433,10 @@
|
||||
"integrity": "sha512-3vifjt1HgrGW/h76UEeny+adYApveS9dH2h3p57JYzBSXJIKUJAvtmIytDKjcSCt9xHfrNCFJ7gts6vkhuq++w==",
|
||||
"license": "ISC"
|
||||
},
|
||||
"node_modules/eluta-extractor": {
|
||||
"resolved": "extractors/eluta",
|
||||
"link": true
|
||||
},
|
||||
"node_modules/emoji-regex": {
|
||||
"version": "8.0.0",
|
||||
"resolved": "https://registry.npmjs.org/emoji-regex/-/emoji-regex-8.0.0.tgz",
|
||||
@ -14774,6 +14913,10 @@
|
||||
"node": ">=10.18"
|
||||
}
|
||||
},
|
||||
"node_modules/icims-extractor": {
|
||||
"resolved": "extractors/icims",
|
||||
"link": true
|
||||
},
|
||||
"node_modules/iconv-lite": {
|
||||
"version": "0.7.2",
|
||||
"resolved": "https://registry.npmjs.org/iconv-lite/-/iconv-lite-0.7.2.tgz",
|
||||
@ -21784,6 +21927,10 @@
|
||||
"node": ">=16.0.0"
|
||||
}
|
||||
},
|
||||
"node_modules/qajobsboard-extractor": {
|
||||
"resolved": "extractors/qajobsboard",
|
||||
"link": true
|
||||
},
|
||||
"node_modules/qs": {
|
||||
"version": "6.14.2",
|
||||
"resolved": "https://registry.npmjs.org/qs/-/qs-6.14.2.tgz",
|
||||
@ -23500,6 +23647,10 @@
|
||||
"npm": ">= 3.0.0"
|
||||
}
|
||||
},
|
||||
"node_modules/smartrecruiters-extractor": {
|
||||
"resolved": "extractors/smartrecruiters",
|
||||
"link": true
|
||||
},
|
||||
"node_modules/smol-toml": {
|
||||
"version": "1.6.0",
|
||||
"resolved": "https://registry.npmjs.org/smol-toml/-/smol-toml-1.6.0.tgz",
|
||||
|
||||
@ -26,6 +26,7 @@
|
||||
"devDependencies": {
|
||||
"dotenv": "^17.2.3",
|
||||
"@types/node": "^25.2.3",
|
||||
"job-ops-shared": "workspace:*",
|
||||
"knip": "^5.83.1",
|
||||
"tsx": "^4.19.2",
|
||||
"typescript": "^5.9.3"
|
||||
|
||||
@ -1,8 +1,14 @@
|
||||
/**
|
||||
* Tiny smoke-test for new extractors: imports each manifest, runs it with a
|
||||
* minimal context, and prints the count of mapped jobs + a few samples.
|
||||
* Smoke-test helper for extractor manifests: imports each manifest, runs it with a
|
||||
* minimal context, and prints mapped job counts + a sample row.
|
||||
*
|
||||
* Run from repo root: npx tsx scripts/smoke-extractors.ts [comma,separated,ids]
|
||||
* Run from repo root:
|
||||
* npx tsx scripts/smoke-extractors.ts
|
||||
* npx tsx scripts/smoke-extractors.ts arcdev,icims
|
||||
* npx tsx scripts/smoke-extractors.ts indeed # alias → `jobspy` (same manifest)
|
||||
*
|
||||
* Keep `ALL_TARGETS` aligned with every shipped manifest under each
|
||||
* `extractors/<name>/` package (`manifest.ts` or `src/manifest.ts`).
|
||||
*
|
||||
* Loads repo-root `.env` so keyed extractors match orchestrator behavior (plain
|
||||
* `tsx` does not read `.env` automatically).
|
||||
@ -14,7 +20,7 @@ import { config as loadEnv } from "dotenv";
|
||||
import type {
|
||||
ExtractorManifest,
|
||||
ExtractorRuntimeContext,
|
||||
} from "../shared/src/types/extractors";
|
||||
} from "job-ops-shared/types/extractors";
|
||||
|
||||
const repoRoot = path.resolve(
|
||||
path.dirname(fileURLToPath(import.meta.url)),
|
||||
@ -22,54 +28,56 @@ const repoRoot = path.resolve(
|
||||
);
|
||||
loadEnv({ path: path.join(repoRoot, ".env") });
|
||||
|
||||
/** Left column width for log alignment (longest pipeline source id today). */
|
||||
const ID_COL = 15;
|
||||
|
||||
/** JobSpy serves Indeed / LinkedIn / Glassdoor; CLI filter accepts those ids as aliases. */
|
||||
const JOBSPY_SITE_IDS = ["indeed", "linkedin", "glassdoor"] as const;
|
||||
|
||||
function expandSmokeFilter(ids: Set<string>): Set<string> {
|
||||
const next = new Set(ids);
|
||||
for (const site of JOBSPY_SITE_IDS) {
|
||||
if (next.has(site)) {
|
||||
next.add("jobspy");
|
||||
break;
|
||||
}
|
||||
}
|
||||
return next;
|
||||
}
|
||||
|
||||
interface Target {
|
||||
id: string;
|
||||
importPath: string;
|
||||
needs?: string[]; // env vars required to run; skipped if missing
|
||||
settings?: Record<string, string>;
|
||||
/** When set, replaces the default smoke search terms (use [] for sources that filter client-side). */
|
||||
searchTerms?: string[];
|
||||
/** Geography passed as `selectedCountry` (must match what each extractor expects). */
|
||||
selectedCountry?: string;
|
||||
}
|
||||
|
||||
const ALL_TARGETS: Target[] = [
|
||||
{
|
||||
id: "jobicy",
|
||||
importPath: "../extractors/jobicy/manifest",
|
||||
settings: { jobicyMaxJobsPerTerm: "10" },
|
||||
},
|
||||
{
|
||||
id: "themuse",
|
||||
importPath: "../extractors/themuse/manifest",
|
||||
settings: { themuseMaxJobsPerTerm: "10" },
|
||||
},
|
||||
{
|
||||
id: "usajobs",
|
||||
importPath: "../extractors/usajobs/manifest",
|
||||
needs: ["USAJOBS_API_KEY", "USAJOBS_USER_AGENT"],
|
||||
settings: { usajobsMaxJobsPerTerm: "10" },
|
||||
},
|
||||
{
|
||||
id: "jooble",
|
||||
importPath: "../extractors/jooble/manifest",
|
||||
needs: ["JOOBLE_API_KEY"],
|
||||
settings: { joobleMaxJobsPerTerm: "10" },
|
||||
},
|
||||
{
|
||||
id: "careerjet",
|
||||
importPath: "../extractors/careerjet/manifest",
|
||||
needs: ["CAREERJET_AFFID", "CAREERJET_REFERER", "CAREERJET_USER_IP"],
|
||||
settings: { careerjetMaxJobsPerTerm: "10" },
|
||||
},
|
||||
{
|
||||
id: "reed",
|
||||
importPath: "../extractors/reed/manifest",
|
||||
needs: ["REED_API_KEY"],
|
||||
settings: { reedMaxJobsPerTerm: "10" },
|
||||
},
|
||||
{
|
||||
id: "lever",
|
||||
importPath: "../extractors/lever/manifest",
|
||||
id: "adzuna",
|
||||
importPath: "../extractors/adzuna/manifest",
|
||||
needs: ["ADZUNA_APP_ID", "ADZUNA_APP_KEY"],
|
||||
selectedCountry: "United States",
|
||||
settings: {
|
||||
// Known active public Lever board used purely as a connectivity check.
|
||||
leverCompanies: JSON.stringify(["palantir", "netflix"]),
|
||||
adzunaMaxJobsPerTerm: "10",
|
||||
searchCities: "United States",
|
||||
},
|
||||
},
|
||||
{
|
||||
id: "arbeitnow",
|
||||
importPath: "../extractors/arbeitnow/manifest",
|
||||
settings: { arbeitnowMaxJobsPerTerm: "10" },
|
||||
},
|
||||
{
|
||||
id: "arcdev",
|
||||
importPath: "../extractors/arcdev/manifest",
|
||||
settings: {
|
||||
arcRemoteJobsPaths: JSON.stringify(["/remote-jobs/playwright"]),
|
||||
arcMaxJobsPerPath: "20",
|
||||
},
|
||||
},
|
||||
{
|
||||
@ -79,6 +87,40 @@ const ALL_TARGETS: Target[] = [
|
||||
ashbyCompanies: JSON.stringify(["ramp", "linear"]),
|
||||
},
|
||||
},
|
||||
{
|
||||
id: "bctenet",
|
||||
importPath: "../extractors/bctenet/manifest",
|
||||
selectedCountry: "Canada",
|
||||
settings: {
|
||||
bctenetMaxJobsPerTerm: "25",
|
||||
},
|
||||
},
|
||||
{
|
||||
id: "careerjet",
|
||||
importPath: "../extractors/careerjet/manifest",
|
||||
needs: ["CAREERJET_AFFID", "CAREERJET_REFERER", "CAREERJET_USER_IP"],
|
||||
settings: { careerjetMaxJobsPerTerm: "10", searchCities: "United States" },
|
||||
},
|
||||
{
|
||||
id: "eluta",
|
||||
importPath: "../extractors/eluta/manifest",
|
||||
selectedCountry: "Canada",
|
||||
settings: {
|
||||
elutaRssLocations: JSON.stringify(["Toronto, ON"]),
|
||||
elutaMaxJobsPerTerm: "15",
|
||||
},
|
||||
},
|
||||
{
|
||||
id: "fourdayweek",
|
||||
importPath: "../extractors/fourdayweek/manifest",
|
||||
settings: { fourdayweekMaxJobsPerTerm: "10" },
|
||||
},
|
||||
{
|
||||
id: "gradcracker",
|
||||
importPath: "../extractors/gradcracker/manifest",
|
||||
selectedCountry: "United Kingdom",
|
||||
settings: { gradcrackerMaxJobsPerTerm: "10" },
|
||||
},
|
||||
{
|
||||
id: "greenhouse",
|
||||
importPath: "../extractors/greenhouse/manifest",
|
||||
@ -87,14 +129,71 @@ const ALL_TARGETS: Target[] = [
|
||||
},
|
||||
},
|
||||
{
|
||||
id: "workday",
|
||||
importPath: "../extractors/workday/manifest",
|
||||
id: "himalayas",
|
||||
importPath: "../extractors/himalayas/manifest",
|
||||
settings: { himalayasMaxJobsPerTerm: "10" },
|
||||
},
|
||||
{
|
||||
id: "hiringcafe",
|
||||
importPath: "../extractors/hiringcafe/manifest",
|
||||
selectedCountry: "United Kingdom",
|
||||
settings: {
|
||||
workdayTenants: JSON.stringify([
|
||||
"https://nvidia.wd5.myworkdayjobs.com/en-US/NVIDIAExternalCareerSite",
|
||||
]),
|
||||
searchCities: "UK",
|
||||
jobspyResultsWanted: "10",
|
||||
workplaceTypes: JSON.stringify(["remote", "hybrid", "onsite"]),
|
||||
},
|
||||
},
|
||||
{
|
||||
id: "icims",
|
||||
importPath: "../extractors/icims/manifest",
|
||||
searchTerms: [],
|
||||
settings: {
|
||||
icimsTenants: JSON.stringify(["careers-appliedsystems.icims.com"]),
|
||||
icimsMaxJobsPerTenant: "15",
|
||||
icimsMaxPagesPerSearch: "2",
|
||||
},
|
||||
},
|
||||
{
|
||||
id: "jobicy",
|
||||
importPath: "../extractors/jobicy/manifest",
|
||||
settings: { jobicyMaxJobsPerTerm: "10" },
|
||||
},
|
||||
{
|
||||
id: "jobspy",
|
||||
importPath: "../extractors/jobspy/manifest",
|
||||
selectedCountry: "United Kingdom",
|
||||
settings: {
|
||||
searchCities: "UK",
|
||||
jobspyCountryIndeed: "UK",
|
||||
jobspyResultsWanted: "5",
|
||||
workplaceTypes: JSON.stringify(["remote", "hybrid", "onsite"]),
|
||||
},
|
||||
},
|
||||
{
|
||||
id: "jooble",
|
||||
importPath: "../extractors/jooble/manifest",
|
||||
needs: ["JOOBLE_API_KEY"],
|
||||
settings: { joobleMaxJobsPerTerm: "10", searchCities: "United States" },
|
||||
},
|
||||
{
|
||||
id: "lever",
|
||||
importPath: "../extractors/lever/manifest",
|
||||
settings: {
|
||||
leverCompanies: JSON.stringify(["palantir", "netflix"]),
|
||||
},
|
||||
},
|
||||
{
|
||||
id: "qajobsboard",
|
||||
importPath: "../extractors/qajobsboard/manifest",
|
||||
settings: { qajobsboardMaxJobsPerTerm: "25" },
|
||||
},
|
||||
{
|
||||
id: "reed",
|
||||
importPath: "../extractors/reed/manifest",
|
||||
needs: ["REED_API_KEY"],
|
||||
selectedCountry: "United Kingdom",
|
||||
settings: { reedMaxJobsPerTerm: "10" },
|
||||
},
|
||||
{
|
||||
id: "remoteok",
|
||||
importPath: "../extractors/remoteok/manifest",
|
||||
@ -106,14 +205,40 @@ const ALL_TARGETS: Target[] = [
|
||||
settings: { remotiveMaxJobsPerTerm: "10" },
|
||||
},
|
||||
{
|
||||
id: "arbeitnow",
|
||||
importPath: "../extractors/arbeitnow/manifest",
|
||||
settings: { arbeitnowMaxJobsPerTerm: "10" },
|
||||
id: "smartrecruiters",
|
||||
importPath: "../extractors/smartrecruiters/manifest",
|
||||
settings: {
|
||||
smartrecruitersCompanies: JSON.stringify(["smartrecruiters"]),
|
||||
smartrecruitersMaxJobsPerCompany: "5",
|
||||
},
|
||||
},
|
||||
{
|
||||
id: "himalayas",
|
||||
importPath: "../extractors/himalayas/manifest",
|
||||
settings: { himalayasMaxJobsPerTerm: "10" },
|
||||
id: "startupjobs",
|
||||
importPath: "../extractors/startupjobs/src/manifest",
|
||||
selectedCountry: "United Kingdom",
|
||||
settings: {
|
||||
searchCities: "UK",
|
||||
startupjobsMaxJobsPerTerm: "10",
|
||||
workplaceTypes: JSON.stringify(["remote", "hybrid", "onsite"]),
|
||||
},
|
||||
},
|
||||
{
|
||||
id: "themuse",
|
||||
importPath: "../extractors/themuse/manifest",
|
||||
settings: { themuseMaxJobsPerTerm: "10" },
|
||||
},
|
||||
{
|
||||
id: "ukvisajobs",
|
||||
importPath: "../extractors/ukvisajobs/manifest",
|
||||
needs: ["UKVISAJOBS_EMAIL", "UKVISAJOBS_PASSWORD"],
|
||||
selectedCountry: "United Kingdom",
|
||||
settings: { ukvisajobsMaxJobs: "10" },
|
||||
},
|
||||
{
|
||||
id: "usajobs",
|
||||
importPath: "../extractors/usajobs/manifest",
|
||||
needs: ["USAJOBS_API_KEY", "USAJOBS_USER_AGENT"],
|
||||
settings: { usajobsMaxJobsPerTerm: "10" },
|
||||
},
|
||||
{
|
||||
id: "weworkremotely",
|
||||
@ -121,22 +246,29 @@ const ALL_TARGETS: Target[] = [
|
||||
settings: { weworkremotelyMaxJobsPerTerm: "10" },
|
||||
},
|
||||
{
|
||||
id: "fourdayweek",
|
||||
importPath: "../extractors/fourdayweek/manifest",
|
||||
settings: { fourdayweekMaxJobsPerTerm: "10" },
|
||||
id: "workday",
|
||||
importPath: "../extractors/workday/manifest",
|
||||
settings: {
|
||||
workdayTenants: JSON.stringify([
|
||||
"https://nvidia.wd5.myworkdayjobs.com/en-US/NVIDIAExternalCareerSite",
|
||||
]),
|
||||
},
|
||||
},
|
||||
];
|
||||
|
||||
function buildContext(
|
||||
source: string,
|
||||
settings: Record<string, string>,
|
||||
): ExtractorRuntimeContext {
|
||||
function buildContext(args: {
|
||||
source: string;
|
||||
settings: Record<string, string>;
|
||||
searchTerms?: string[];
|
||||
selectedCountry?: string;
|
||||
}): ExtractorRuntimeContext {
|
||||
return {
|
||||
source,
|
||||
selectedSources: [source],
|
||||
settings,
|
||||
searchTerms: ["software engineer"],
|
||||
selectedCountry: "United States",
|
||||
source: args.source,
|
||||
selectedSources: [args.source],
|
||||
settings: args.settings,
|
||||
searchTerms:
|
||||
args.searchTerms !== undefined ? args.searchTerms : ["software engineer"],
|
||||
selectedCountry: args.selectedCountry ?? "United States",
|
||||
getExistingJobUrls: async () => [],
|
||||
shouldCancel: () => false,
|
||||
onProgress: () => {},
|
||||
@ -151,7 +283,7 @@ async function runOne(target: Target): Promise<void> {
|
||||
const missing = (target.needs ?? []).filter((k) => !process.env[k]);
|
||||
if (missing.length > 0) {
|
||||
console.log(
|
||||
`${pad(target.id, 12)} SKIP missing env: ${missing.join(", ")}`,
|
||||
`${pad(target.id, ID_COL)} SKIP missing env: ${missing.join(", ")}`,
|
||||
);
|
||||
return;
|
||||
}
|
||||
@ -161,20 +293,25 @@ async function runOne(target: Target): Promise<void> {
|
||||
mod = await import(target.importPath);
|
||||
} catch (err) {
|
||||
console.log(
|
||||
`${pad(target.id, 12)} FAIL import error: ${(err as Error).message}`,
|
||||
`${pad(target.id, ID_COL)} FAIL import error: ${(err as Error).message}`,
|
||||
);
|
||||
return;
|
||||
}
|
||||
|
||||
const manifest = mod.manifest ?? mod.default;
|
||||
if (!manifest) {
|
||||
console.log(`${pad(target.id, 12)} FAIL manifest export missing`);
|
||||
console.log(`${pad(target.id, ID_COL)} FAIL manifest export missing`);
|
||||
return;
|
||||
}
|
||||
|
||||
const started = Date.now();
|
||||
try {
|
||||
const ctx = buildContext(target.id, target.settings ?? {});
|
||||
const ctx = buildContext({
|
||||
source: target.id,
|
||||
settings: target.settings ?? {},
|
||||
searchTerms: target.searchTerms,
|
||||
selectedCountry: target.selectedCountry,
|
||||
});
|
||||
const result = await manifest.run(ctx);
|
||||
const ms = Date.now() - started;
|
||||
const status = result.success ? "OK " : "ERR ";
|
||||
@ -183,12 +320,12 @@ async function runOne(target: Target): Promise<void> {
|
||||
? ` | first: "${sample.title}" @ ${sample.employer}`
|
||||
: "";
|
||||
console.log(
|
||||
`${pad(target.id, 12)} ${status} jobs=${result.jobs.length} ${ms}ms${result.error ? ` | error: ${result.error}` : ""}${sampleStr}`,
|
||||
`${pad(target.id, ID_COL)} ${status} jobs=${result.jobs.length} ${ms}ms${result.error ? ` | error: ${result.error}` : ""}${sampleStr}`,
|
||||
);
|
||||
} catch (err) {
|
||||
const ms = Date.now() - started;
|
||||
console.log(
|
||||
`${pad(target.id, 12)} CRASH ${ms}ms ${(err as Error).message}`,
|
||||
`${pad(target.id, ID_COL)} CRASH ${ms}ms ${(err as Error).message}`,
|
||||
);
|
||||
}
|
||||
}
|
||||
@ -196,11 +333,13 @@ async function runOne(target: Target): Promise<void> {
|
||||
async function main() {
|
||||
const requested = (process.argv[2] ?? "").trim();
|
||||
const filter = requested
|
||||
? new Set(
|
||||
requested
|
||||
.split(",")
|
||||
.map((s) => s.trim())
|
||||
.filter(Boolean),
|
||||
? expandSmokeFilter(
|
||||
new Set(
|
||||
requested
|
||||
.split(",")
|
||||
.map((s) => s.trim())
|
||||
.filter(Boolean),
|
||||
),
|
||||
)
|
||||
: null;
|
||||
const targets = filter
|
||||
|
||||
@ -27,6 +27,12 @@ export const EXTRACTOR_SOURCE_IDS = [
|
||||
"lever",
|
||||
"greenhouse",
|
||||
"workday",
|
||||
"smartrecruiters",
|
||||
"icims",
|
||||
"bctenet",
|
||||
"eluta",
|
||||
"qajobsboard",
|
||||
"arcdev",
|
||||
"manual",
|
||||
] as const;
|
||||
|
||||
@ -203,6 +209,44 @@ export const EXTRACTOR_SOURCE_METADATA: Record<
|
||||
category: "pipeline",
|
||||
region: "global",
|
||||
},
|
||||
smartrecruiters: {
|
||||
label: "SmartRecruiters (ATS)",
|
||||
order: 245,
|
||||
category: "pipeline",
|
||||
region: "global",
|
||||
},
|
||||
icims: {
|
||||
label: "iCIMS tenants (HTML)",
|
||||
order: 241,
|
||||
category: "pipeline",
|
||||
region: "global",
|
||||
},
|
||||
bctenet: {
|
||||
label: "BC T-Net (RSS)",
|
||||
order: 243,
|
||||
category: "pipeline",
|
||||
region: "global",
|
||||
countryAllowlist: ["canada"],
|
||||
},
|
||||
eluta: {
|
||||
label: "Eluta",
|
||||
order: 247,
|
||||
category: "pipeline",
|
||||
region: "global",
|
||||
countryAllowlist: ["canada"],
|
||||
},
|
||||
qajobsboard: {
|
||||
label: "QAJobsBoard",
|
||||
order: 248,
|
||||
category: "pipeline",
|
||||
region: "remote",
|
||||
},
|
||||
arcdev: {
|
||||
label: "Arc.dev",
|
||||
order: 249,
|
||||
category: "pipeline",
|
||||
region: "remote",
|
||||
},
|
||||
manual: { label: "Manual", order: 900, category: "manual" },
|
||||
};
|
||||
|
||||
|
||||
@ -59,6 +59,9 @@ describe("location-support", () => {
|
||||
true,
|
||||
);
|
||||
expect(isSourceAllowedForCountry("startupjobs", "worldwide")).toBe(true);
|
||||
expect(isSourceAllowedForCountry("eluta", "canada")).toBe(true);
|
||||
expect(isSourceAllowedForCountry("eluta", "united states")).toBe(false);
|
||||
expect(isSourceAllowedForCountry("smartrecruiters", "japan")).toBe(true);
|
||||
});
|
||||
|
||||
it("filters incompatible sources while preserving compatible order", () => {
|
||||
|
||||
@ -1,3 +1,4 @@
|
||||
import { EXTRACTOR_SOURCE_METADATA } from "./extractors";
|
||||
import type { JobSource } from "./types";
|
||||
|
||||
const COUNTRY_ALIASES: Record<string, string> = {
|
||||
@ -199,6 +200,16 @@ export function isSourceAllowedForCountry(
|
||||
if (US_ONLY_SOURCES.has(source)) return isUsCountry(country);
|
||||
if (source === "glassdoor") return isGlassdoorCountry(country);
|
||||
if (source === "adzuna") return getAdzunaCountryCode(country) !== null;
|
||||
|
||||
const meta =
|
||||
EXTRACTOR_SOURCE_METADATA[source as keyof typeof EXTRACTOR_SOURCE_METADATA];
|
||||
if (meta?.countryAllowlist && meta.countryAllowlist.length > 0) {
|
||||
const key = normalizeCountryKey(country);
|
||||
return meta.countryAllowlist.some(
|
||||
(token) => normalizeCountryKey(token) === key,
|
||||
);
|
||||
}
|
||||
|
||||
return true;
|
||||
}
|
||||
|
||||
|
||||
@ -445,6 +445,37 @@ export const settingsRegistry = {
|
||||
parse: parseIntOrNull,
|
||||
serialize: serializeNullableNumber,
|
||||
},
|
||||
qajobsboardMaxJobsPerTerm: {
|
||||
kind: "typed" as const,
|
||||
schema: z.number().int().min(1).max(1000),
|
||||
default: (): number => 100,
|
||||
parse: parseIntOrNull,
|
||||
serialize: serializeNullableNumber,
|
||||
},
|
||||
/**
|
||||
* Arc.dev remote job listing paths (on https://arc.dev), e.g. /remote-jobs/playwright
|
||||
*/
|
||||
arcRemoteJobsPaths: {
|
||||
kind: "typed" as const,
|
||||
schema: z.array(z.string().trim().min(1).max(200)).max(20),
|
||||
default: (): string[] => {
|
||||
const fromEnv = parseCompanyList(
|
||||
typeof process !== "undefined" ? process.env.ARC_REMOTE_JOBS_PATHS : "",
|
||||
);
|
||||
return fromEnv.length > 0
|
||||
? fromEnv
|
||||
: ["/remote-jobs/playwright", "/remote-jobs/cypress"];
|
||||
},
|
||||
parse: parseJsonArrayOrNull,
|
||||
serialize: serializeNullableJsonArray,
|
||||
},
|
||||
arcMaxJobsPerPath: {
|
||||
kind: "typed" as const,
|
||||
schema: z.number().int().min(1).max(300),
|
||||
default: (): number => 120,
|
||||
parse: parseIntOrNull,
|
||||
serialize: serializeNullableNumber,
|
||||
},
|
||||
/**
|
||||
* Comma- or newline-separated company slugs to fetch from public ATS feeds.
|
||||
* `lever`, `ashby`, and `greenhouse` each take one entry per company.
|
||||
@ -493,6 +524,108 @@ export const settingsRegistry = {
|
||||
parse: parseJsonArrayOrNull,
|
||||
serialize: serializeNullableJsonArray,
|
||||
},
|
||||
/**
|
||||
* SmartRecruiters company identifiers (Posting API path segment), e.g.
|
||||
* `jobs.smartrecruiters.com/smartrecruiters/...` → "smartrecruiters".
|
||||
*/
|
||||
smartrecruitersCompanies: {
|
||||
kind: "typed" as const,
|
||||
schema: z.array(z.string().trim().min(1).max(100)).max(200),
|
||||
default: (): string[] =>
|
||||
parseCompanyList(
|
||||
typeof process !== "undefined"
|
||||
? process.env.SMARTRECRUITERS_COMPANIES
|
||||
: "",
|
||||
),
|
||||
parse: parseJsonArrayOrNull,
|
||||
serialize: serializeNullableJsonArray,
|
||||
},
|
||||
smartrecruitersMaxJobsPerCompany: {
|
||||
kind: "typed" as const,
|
||||
schema: z.number().int().min(1).max(500),
|
||||
default: (): number => 100,
|
||||
parse: parseIntOrNull,
|
||||
serialize: serializeNullableNumber,
|
||||
},
|
||||
/**
|
||||
* Eluta RSS `location` query values, e.g. "Toronto, ON", "Vancouver, BC".
|
||||
* Very broad values may yield empty feeds; prefer metro/province strings.
|
||||
*/
|
||||
elutaRssLocations: {
|
||||
kind: "typed" as const,
|
||||
schema: z.array(z.string().trim().min(1).max(200)).max(50),
|
||||
default: (): string[] =>
|
||||
parseCompanyList(
|
||||
typeof process !== "undefined" ? process.env.ELUTA_RSS_LOCATIONS : "",
|
||||
),
|
||||
parse: parseJsonArrayOrNull,
|
||||
serialize: serializeNullableJsonArray,
|
||||
},
|
||||
elutaMaxJobsPerTerm: {
|
||||
kind: "typed" as const,
|
||||
schema: z.number().int().min(1).max(1000),
|
||||
default: (): number => 100,
|
||||
parse: parseIntOrNull,
|
||||
serialize: serializeNullableNumber,
|
||||
},
|
||||
/**
|
||||
* Optional BC T-Net RSS URLs. When empty, the extractor uses the default public
|
||||
* aggregate feed. Env `BCTENET_RSS_URLS` seeds defaults (JSON array or newline-separated).
|
||||
*/
|
||||
bctenetRssUrls: {
|
||||
kind: "typed" as const,
|
||||
schema: z.array(z.string().trim().min(1).max(500)).max(20),
|
||||
default: (): string[] => {
|
||||
const raw =
|
||||
typeof process !== "undefined"
|
||||
? process.env.BCTENET_RSS_URLS?.trim()
|
||||
: "";
|
||||
if (!raw) return [];
|
||||
const parsed = parseJsonArrayOrNull(raw);
|
||||
if (parsed && parsed.length > 0) return parsed;
|
||||
return raw
|
||||
.split(/\n+/)
|
||||
.map((piece) => piece.trim())
|
||||
.filter(Boolean);
|
||||
},
|
||||
parse: parseJsonArrayOrNull,
|
||||
serialize: serializeNullableJsonArray,
|
||||
},
|
||||
bctenetMaxJobsPerTerm: {
|
||||
kind: "typed" as const,
|
||||
schema: z.number().int().min(1).max(2000),
|
||||
default: (): number => 400,
|
||||
parse: parseIntOrNull,
|
||||
serialize: serializeNullableNumber,
|
||||
},
|
||||
/**
|
||||
* iCIMS tenant hosts (e.g. careers-example.icims.com). Env `ICIMS_TENANTS`
|
||||
* seeds defaults (comma/newline-separated).
|
||||
*/
|
||||
icimsTenants: {
|
||||
kind: "typed" as const,
|
||||
schema: z.array(z.string().trim().min(1).max(200)).max(100),
|
||||
default: (): string[] =>
|
||||
parseCompanyList(
|
||||
typeof process !== "undefined" ? process.env.ICIMS_TENANTS : "",
|
||||
),
|
||||
parse: parseJsonArrayOrNull,
|
||||
serialize: serializeNullableJsonArray,
|
||||
},
|
||||
icimsMaxJobsPerTenant: {
|
||||
kind: "typed" as const,
|
||||
schema: z.number().int().min(1).max(2000),
|
||||
default: (): number => 250,
|
||||
parse: parseIntOrNull,
|
||||
serialize: serializeNullableNumber,
|
||||
},
|
||||
icimsMaxPagesPerSearch: {
|
||||
kind: "typed" as const,
|
||||
schema: z.number().int().min(1).max(50),
|
||||
default: (): number => 10,
|
||||
parse: parseIntOrNull,
|
||||
serialize: serializeNullableNumber,
|
||||
},
|
||||
searchTerms: {
|
||||
kind: "typed" as const,
|
||||
schema: z.array(z.string().trim().min(1).max(200)).max(100),
|
||||
|
||||
@ -200,10 +200,30 @@ export const createAppSettings = (
|
||||
himalayasMaxJobsPerTerm: { value: 50, default: 50, override: null },
|
||||
weworkremotelyMaxJobsPerTerm: { value: 50, default: 50, override: null },
|
||||
fourdayweekMaxJobsPerTerm: { value: 50, default: 50, override: null },
|
||||
qajobsboardMaxJobsPerTerm: { value: 100, default: 100, override: null },
|
||||
arcRemoteJobsPaths: {
|
||||
value: ["/remote-jobs/playwright", "/remote-jobs/cypress"],
|
||||
default: ["/remote-jobs/playwright", "/remote-jobs/cypress"],
|
||||
override: null,
|
||||
},
|
||||
arcMaxJobsPerPath: { value: 120, default: 120, override: null },
|
||||
leverCompanies: { value: [], default: [], override: null },
|
||||
ashbyCompanies: { value: [], default: [], override: null },
|
||||
greenhouseCompanies: { value: [], default: [], override: null },
|
||||
workdayTenants: { value: [], default: [], override: null },
|
||||
smartrecruitersCompanies: { value: [], default: [], override: null },
|
||||
smartrecruitersMaxJobsPerCompany: {
|
||||
value: 100,
|
||||
default: 100,
|
||||
override: null,
|
||||
},
|
||||
elutaRssLocations: { value: [], default: [], override: null },
|
||||
elutaMaxJobsPerTerm: { value: 100, default: 100, override: null },
|
||||
bctenetRssUrls: { value: [], default: [], override: null },
|
||||
bctenetMaxJobsPerTerm: { value: 400, default: 400, override: null },
|
||||
icimsTenants: { value: [], default: [], override: null },
|
||||
icimsMaxJobsPerTenant: { value: 250, default: 250, override: null },
|
||||
icimsMaxPagesPerSearch: { value: 10, default: 10, override: null },
|
||||
searchTerms: {
|
||||
value: ["Software Engineer"],
|
||||
default: ["Software Engineer"],
|
||||
|
||||
@ -213,10 +213,22 @@ export interface AppSettings {
|
||||
himalayasMaxJobsPerTerm: Resolved<number>;
|
||||
weworkremotelyMaxJobsPerTerm: Resolved<number>;
|
||||
fourdayweekMaxJobsPerTerm: Resolved<number>;
|
||||
qajobsboardMaxJobsPerTerm: Resolved<number>;
|
||||
arcRemoteJobsPaths: Resolved<string[]>;
|
||||
arcMaxJobsPerPath: Resolved<number>;
|
||||
leverCompanies: Resolved<string[]>;
|
||||
ashbyCompanies: Resolved<string[]>;
|
||||
greenhouseCompanies: Resolved<string[]>;
|
||||
workdayTenants: Resolved<string[]>;
|
||||
smartrecruitersCompanies: Resolved<string[]>;
|
||||
smartrecruitersMaxJobsPerCompany: Resolved<number>;
|
||||
elutaRssLocations: Resolved<string[]>;
|
||||
elutaMaxJobsPerTerm: Resolved<number>;
|
||||
bctenetRssUrls: Resolved<string[]>;
|
||||
bctenetMaxJobsPerTerm: Resolved<number>;
|
||||
icimsTenants: Resolved<string[]>;
|
||||
icimsMaxJobsPerTenant: Resolved<number>;
|
||||
icimsMaxPagesPerSearch: Resolved<number>;
|
||||
searchTerms: Resolved<string[]>;
|
||||
workplaceTypes: Resolved<Array<"remote" | "hybrid" | "onsite">>;
|
||||
blockedCompanyKeywords: Resolved<string[]>;
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user