Jobber/.env.example
ilia 7b3dfb002a
Some checks failed
CI / Linting (Biome) (push) Failing after 36s
CI / Tests (push) Successful in 5m54s
CI / Type Check (adzuna-extractor) (push) Successful in 1m6s
CI / Type Check (gradcracker-extractor) (push) Successful in 1m9s
CI / Type Check (hiringcafe-extractor) (push) Successful in 1m5s
CI / Type Check (orchestrator) (push) Successful in 1m21s
CI / Type Check (startupjobs-extractor) (push) Successful in 1m4s
CI / Type Check (ukvisajobs-extractor) (push) Successful in 1m4s
CI / Documentation (push) Successful in 1m52s
feat(extractors): add 17 job source extractors and cross-source dedup
Adds extractor packages: arbeitnow, ashby, careerjet, fourdayweek,
greenhouse, himalayas, jobicy, jooble, lever, reed, remoteok, remotive,
themuse, usajobs, weworkremotely, workday — each with manifest, package
metadata and README.

Pipeline / shared:
- shared/job-fingerprint: stable hash for cross-source dedup, with tests
- discover-jobs: dedup via fingerprint and richer per-source merging
- jobs repository: fingerprint-aware upsert / lookup
- settings-registry, settings types/routes, demo-defaults: knobs for the
  new sources
- shared extractors index: register the new manifests
- location-support, profiles route: small fixes for the new sources

Tooling:
- scripts/smoke-extractors.ts to sanity-check each source locally
- scripts/jobber-cron-{cherepaha,dobkin}.env.example: per-host cron
  templates (CHANGEME placeholders only)
- .env.example: documented env vars for the new extractors
- .gitignore: ignore extractors/*/storage/ runtime caches (was ukvisajobs only)

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-12 20:17:52 -04:00

213 lines
11 KiB
Plaintext

# =============================================================================
# Job Ops - Environment Variables
# Copy this file to .env and fill in your values
# =============================================================================
MODEL=google/gemini-3-flash-preview
# Self-hosted Ollama (e.g. 16GB GPU): use a 22B-class model for scoring/tailoring; pull the tag on the server first.
# MODEL=mistral-small:22b
# LLM_PROVIDER=ollama
# LLM_BASE_URL=http://127.0.0.1:11434
# Heavier option (~may offload layers to CPU on 16GB): qwen2.5:32b
# DEPRECATED (auto-copied to LLM_API_KEY for compatibility)
# OPENROUTER_API_KEY=your_openrouter_api_key_here
# Self-hosted RxResume base URL, e.g., http://rxresume.local.net
# Defaults to https://v4.rxresu.me
# RXRESUME_URL=
# Optional: load resume JSON from disk instead of the RxResume API (scoring, tailoring, cover letters).
# Path is absolute or relative to the orchestrator process cwd (often `orchestrator/` when using `npm run dev` there).
# Takes precedence over Settings → local path. PDF export still uses RxResume when enabled.
# Example (monorepo): hand-authored v5 JSON may live under `data/resumes/` (that folder is gitignored by default).
# If you use seeded search profiles with `resumeLocalPath` + login auto-activate, leave this unset so Settings → local path wins.
# JOBOPS_LOCAL_RESUME_PATH=../data/resumes/ilia-dobkin.json
# RXResume credentials for PDF generation
# Create an account at: https://v4.rxresu.me
RXRESUME_EMAIL=your_email@example.com
RXRESUME_PASSWORD=your_password_here
# Optional: Basic Auth for write access
# the app is fully unauthenticated if this isn't set, which is the default
# When set, all write actions (POST/PATCH/DELETE) require Basic Auth.
# Optional second user (e.g. paired with a second search profile / `basicAuthUser` in profile JSON):
# BASIC_AUTH_USER_2=
# BASIC_AUTH_PASSWORD_2=
# Example local pairing with DB-seeded profiles (change passwords before exposing the UI):
# BASIC_AUTH_USER=ilia
# BASIC_AUTH_PASSWORD=changeme-ilia
# BASIC_AUTH_USER_2=cherepaha
# BASIC_AUTH_PASSWORD_2=changeme-cherepaha
BASIC_AUTH_USER=
BASIC_AUTH_PASSWORD=
# Optional: client build only — skip RxResume steps in the onboarding wizard (search without PDF export).
# Prefer setting `JOBOPS_LOCAL_RESUME_PATH` above: the API tells the UI to skip RxResume onboarding automatically.
# Otherwise: copy `orchestrator/.env.example` → `orchestrator/.env` and set VITE_SKIP_RXRESUME_ONBOARDING=true
# (Vite only reads `orchestrator/.env`, not this root file.)
# Docker: Vite vars need IMAGE BUILD time (Dockerfile ARG / docker-compose build args), not runtime .env.
# VITE_SKIP_RXRESUME_ONBOARDING=true
# Public base URL used to generate tracer links when PDFs are created by
# background/pipeline runs (where request host cannot be inferred).
# Example: JOBOPS_PUBLIC_BASE_URL=https://jobops.example.com
JOBOPS_PUBLIC_BASE_URL=
# =============================================================================
# Gmail OAuth (Tracking Inbox) - optional
# =============================================================================
# Required to connect Gmail from the UI.
GMAIL_OAUTH_CLIENT_ID=
GMAIL_OAUTH_CLIENT_SECRET=
# Optional override for OAuth callback URL.
# If unset, defaults to <request-origin>/oauth/gmail/callback
# GMAIL_OAUTH_REDIRECT_URI=http://localhost:3005/oauth/gmail/callback
# =============================================================================
# UKVisaJobs (UK visa sponsorship jobs) - optional
# =============================================================================
# Provide email/password for automatic login and token refresh.
# See extractors/ukvisajobs/README.md for detailed instructions.
UKVISAJOBS_EMAIL=
UKVISAJOBS_PASSWORD=
UKVISAJOBS_HEADLESS=true
# =============================================================================
# Adzuna (multi-country API source) - optional
# =============================================================================
# Register at https://developer.adzuna.com/admin/access_details
ADZUNA_APP_ID=
ADZUNA_APP_KEY=
# Default cap per search term (orchestrator run budget / settings can override).
# ADZUNA_MAX_JOBS_PER_TERM=50
# API page size (Adzuna max 50).
# ADZUNA_RESULTS_PER_PAGE=50
# Optional global `where` text for Adzuna. Pipeline runs usually use Settings → search cities
# instead; leave unset unless you want a fixed location for standalone extractor use.
# ADZUNA_LOCATION_QUERY=
# Only for running the extractor CLI alone; the pipeline sets country from your run (us / ca / gb / …).
# ADZUNA_COUNTRY=gb
# =============================================================================
# JobSpy - Job search configuration
# =============================================================================
# Filter for remote-only jobs (default: 0 = disabled)
# JOBSPY_IS_REMOTE=0
# =============================================================================
# USAJOBS API (US federal jobs) - optional, US-only
# =============================================================================
# Register at https://developer.usajobs.gov/APIRequest/Index
# USAJOBS requires a User-Agent that is a real contact email (per their TOS).
# Leave unset to disable the source.
# USAJOBS_API_KEY=
# USAJOBS_USER_AGENT=you@example.com
# USAJOBS_MAX_JOBS_PER_TERM=100
# =============================================================================
# Jobicy (remote jobs feed) - optional, no auth
# =============================================================================
# Public JSON endpoint, capped at 50 results per call.
# JOBICY_MAX_JOBS_PER_TERM=100
# =============================================================================
# The Muse (jobs API) - optional, API key recommended
# =============================================================================
# https://www.themuse.com/developers/api/v2 — works without a key but is
# heavily rate-limited. Set THEMUSE_API_KEY for higher quotas.
# THEMUSE_API_KEY=
# THEMUSE_MAX_JOBS_PER_TERM=100
# =============================================================================
# Jooble (aggregator API) - optional
# =============================================================================
# Sign up at https://jooble.org/api/about for an API key.
# JOOBLE_API_KEY=
# JOOBLE_MAX_JOBS_PER_TERM=100
# =============================================================================
# Careerjet (publisher API v4) - optional
# =============================================================================
# Register at https://www.careerjet.com/partners/api/ — declare API key + server IP(s).
# CAREERJET_AFFID=your_api_key
# CAREERJET_REFERER=https://your-site.com/path-to-job-search/
# CAREERJET_USER_IP=203.0.113.1
# Optional override for the required user_agent query param:
# CAREERJET_USER_AGENT=Mozilla/5.0 ...
# CAREERJET_MAX_JOBS_PER_TERM=100
# =============================================================================
# Reed.co.uk (UK jobs API) - optional, UK-only
# =============================================================================
# Register at https://www.reed.co.uk/developers/jobseeker for an API key.
# REED_API_KEY=
# REED_MAX_JOBS_PER_TERM=100
# =============================================================================
# Remote OK (remote jobs feed) - optional, no auth
# =============================================================================
# Public single-shot JSON feed at https://remoteok.com/api. We filter
# client-side by your search terms (matched against position + tags).
# Per Remote OK's TOS, link back to the original posting URLs when republishing.
# REMOTEOK_MAX_JOBS_PER_TERM=100
# =============================================================================
# Remotive (remote jobs feed) - optional, no auth
# =============================================================================
# Public JSON API at https://remotive.com/api/remote-jobs?limit=N&search=term.
# Each search term is sent as the `search` parameter.
# REMOTIVE_MAX_JOBS_PER_TERM=100
# =============================================================================
# Arbeitnow (multi-ATS aggregator) - optional, no auth
# =============================================================================
# Public JSON API at https://www.arbeitnow.com/api/job-board-api?page=N.
# Aggregates from Greenhouse, SmartRecruiters, Join, TeamTailor, Recruitee,
# and Comeet. No server-side search; filtering is done client-side.
# ARBEITNOW_MAX_JOBS_PER_TERM=100
# =============================================================================
# Himalayas (remote jobs feed) - optional, no auth
# =============================================================================
# Public JSON API at https://himalayas.app/jobs/api?limit=N&offset=M.
# No server-side search; filtering is done client-side by title + categories.
# HIMALAYAS_MAX_JOBS_PER_TERM=100
# =============================================================================
# We Work Remotely (RSS feed) - optional, no auth
# =============================================================================
# Public RSS at https://weworkremotely.com/remote-jobs.rss (all categories).
# Single fetch; filtering is done client-side by title + skills + category.
# WEWORKREMOTELY_MAX_JOBS_PER_TERM=100
# =============================================================================
# 4 Day Week (reduced-schedule jobs) - optional, no auth
# =============================================================================
# Public JSON API at https://4dayweek.io/api/jobs?page=N.
# Paginated; filtering is done client-side by title + tech stack.
# No job description in listings; links to 4dayweek.io for details.
# FOURDAYWEEK_MAX_JOBS_PER_TERM=100
# =============================================================================
# Public ATS sources (Lever / Ashby / Greenhouse) - optional
# =============================================================================
# Comma- or newline-separated company slugs. The slug is the path segment used
# in each provider's public job board, e.g. `lever.co/some-company` → "some-company".
# LEVER_COMPANIES=netflix,figma
# ASHBY_COMPANIES=ramp,linear
# GREENHOUSE_COMPANIES=stripe,airbnb
# =============================================================================
# Workday (public career sites) - optional
# =============================================================================
# Newline- or comma-separated entries. Each entry is either:
# 1) A career-site URL we'll auto-parse, e.g.
# https://nvidia.wd5.myworkdayjobs.com/en-US/NVIDIAExternalCareerSite
# 2) A JSON object with explicit fields:
# {"company":"NVIDIA","tenantUrl":"https://nvidia.wd5.myworkdayjobs.com","tenant":"nvidia","site":"NVIDIAExternalCareerSite","locale":"en-US"}
# WORKDAY_TENANTS=