31 Commits

Author SHA1 Message Date
03d293699a feat(qajobsboard): fetch job detail pages for concrete location text
Some checks failed
CI / Linting (Biome) (push) Failing after 40s
CI / Tests (push) Successful in 5m13s
CI / Type Check (adzuna-extractor) (push) Successful in 1m10s
CI / Type Check (gradcracker-extractor) (push) Successful in 1m14s
CI / Type Check (hiringcafe-extractor) (push) Successful in 1m11s
CI / Type Check (orchestrator) (push) Successful in 1m30s
CI / Type Check (startupjobs-extractor) (push) Successful in 1m12s
CI / Type Check (ukvisajobs-extractor) (push) Successful in 1m13s
CI / Documentation (push) Successful in 1m57s
The jobs.json feed often labels roles Remote/Worldwide while the public
job page JSON-LD and description include the real city (e.g. Mumbai/Nagpur).
Enrich vague rows by reading each QAJobsBoard detail URL before import.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-16 17:20:53 -04:00
0a63316100 fix(discovery): block countries in vague locations via job description
Some checks failed
CI / Linting (Biome) (push) Failing after 41s
CI / Tests (push) Successful in 5m22s
CI / Type Check (adzuna-extractor) (push) Successful in 1m9s
CI / Type Check (gradcracker-extractor) (push) Successful in 1m14s
CI / Type Check (hiringcafe-extractor) (push) Successful in 1m11s
CI / Type Check (orchestrator) (push) Successful in 1m28s
CI / Type Check (startupjobs-extractor) (push) Successful in 1m13s
CI / Type Check (ukvisajobs-extractor) (push) Successful in 1m12s
CI / Documentation (push) Successful in 2m0s
QAJobsBoard and similar feeds often store Worldwide/Remote while the real
country is only in the description. Scan title and description when location
is vague, and prefer concrete locations from QAJobsBoard postings.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-16 17:15:18 -04:00
2e44a131e1 fix(jobs): treat isRemote as 100% remote only; tighten cron for Canada QA
Reject hybrid or partial-office postings at ingest so the Remote badge and
filters match fully remote roles. Cron can PATCH search geography, remote-only
workplace types, and QA search terms before each scheduled pipeline run.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-16 15:53:55 -04:00
f5179304c1 feat(discovery): blocked countries filter and smoke subprocess fixes
Some checks failed
CI / Linting (Biome) (push) Failing after 41s
CI / Tests (push) Successful in 5m27s
CI / Type Check (adzuna-extractor) (push) Successful in 1m9s
CI / Type Check (gradcracker-extractor) (push) Successful in 1m13s
CI / Type Check (hiringcafe-extractor) (push) Successful in 1m9s
CI / Type Check (orchestrator) (push) Successful in 1m24s
CI / Type Check (startupjobs-extractor) (push) Successful in 1m8s
CI / Type Check (ukvisajobs-extractor) (push) Successful in 1m9s
CI / Documentation (push) Successful in 1m59s
Add blockedCountries in Settings so pipeline discovery drops jobs whose
location mentions listed countries (existing discovered rows are kept).
Document the feature, fix smoke tsconfig inheritance for nested extractors,
and run smoke via an absolute-tsconfig wrapper.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-16 11:41:29 -04:00
c840f289e1 feat(extractors): expand catalog, smoke coverage, and sourcing docs
Some checks failed
CI / Linting (Biome) (push) Failing after 40s
CI / Tests (push) Successful in 5m54s
CI / Type Check (adzuna-extractor) (push) Successful in 1m8s
CI / Type Check (gradcracker-extractor) (push) Successful in 1m11s
CI / Type Check (hiringcafe-extractor) (push) Successful in 1m8s
CI / Type Check (orchestrator) (push) Successful in 1m23s
CI / Type Check (startupjobs-extractor) (push) Successful in 1m6s
CI / Type Check (ukvisajobs-extractor) (push) Successful in 1m7s
CI / Documentation (push) Successful in 1m54s
Adds Arc.dev, BC T-Net, Eluta, iCIMS tenants, QAJobsBoard, and SmartRecruiters
manifests with registry/settings/UI wiring; registers full extractor list in
smoke-extractors and documents supplementary board access paths. Aligns Careerjet
v4 with the url query parameter and fixes strict typing in QAJobsBoard.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-15 22:36:23 -04:00
7b3dfb002a feat(extractors): add 17 job source extractors and cross-source dedup
Some checks failed
CI / Linting (Biome) (push) Failing after 36s
CI / Tests (push) Successful in 5m54s
CI / Type Check (adzuna-extractor) (push) Successful in 1m6s
CI / Type Check (gradcracker-extractor) (push) Successful in 1m9s
CI / Type Check (hiringcafe-extractor) (push) Successful in 1m5s
CI / Type Check (orchestrator) (push) Successful in 1m21s
CI / Type Check (startupjobs-extractor) (push) Successful in 1m4s
CI / Type Check (ukvisajobs-extractor) (push) Successful in 1m4s
CI / Documentation (push) Successful in 1m52s
Adds extractor packages: arbeitnow, ashby, careerjet, fourdayweek,
greenhouse, himalayas, jobicy, jooble, lever, reed, remoteok, remotive,
themuse, usajobs, weworkremotely, workday — each with manifest, package
metadata and README.

Pipeline / shared:
- shared/job-fingerprint: stable hash for cross-source dedup, with tests
- discover-jobs: dedup via fingerprint and richer per-source merging
- jobs repository: fingerprint-aware upsert / lookup
- settings-registry, settings types/routes, demo-defaults: knobs for the
  new sources
- shared extractors index: register the new manifests
- location-support, profiles route: small fixes for the new sources

Tooling:
- scripts/smoke-extractors.ts to sanity-check each source locally
- scripts/jobber-cron-{cherepaha,dobkin}.env.example: per-host cron
  templates (CHANGEME placeholders only)
- .env.example: documented env vars for the new extractors
- .gitignore: ignore extractors/*/storage/ runtime caches (was ukvisajobs only)

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-12 20:17:52 -04:00
fea00ae656 feat: search profiles, cover letters, discovery fixes
- Add search profiles (DB, API, settings UI) and wire into scorer/pipeline search terms.
- Add cover letter generation (service, job action, JobDetail UI).
- Align JobSpy Indeed country with country-level search geography when settings conflict; warn in logs.
- Infer country from search cities via inferCountryKeyFromSearchGeography (shared).
- Ignore extractor venv/storage and local data in Biome; ignore orchestrator/storage and JobSpy .venv in git.
- Vite: do not watch orchestrator/storage (prevents reloads during startup.jobs pipeline).
- JobSpy: document Python 3.10+ and venv setup in README/requirements.
- Onboarding and settings: local resume path handling, orchestrator .env.example for Vite.

Made-with: Cursor
2026-04-05 19:35:14 -04:00
Ryan Foote
0b22c08d7d
feat: add support for indicating workplaceTypes (#296) 2026-03-21 20:43:43 +00:00
0x1355
7f517776df
fix: auto-detect jobspy venv so contributors don't need PYTHON_PATH (#293) 2026-03-20 08:18:27 +00:00
Shaheer Sarfaraz
71e640b563
Add startup.jobs extractor support (#279)
* Add startup.jobs extractor support

* Harden startup.jobs extractor inputs

* Wire startupjobs into Docker and CI

* Tighten startupjobs review follow-ups

* fix: publish ghcr during release workflow

* feat: add startupjobs max jobs configuration and update related tests
2026-03-17 12:20:45 +00:00
DaKheera47
11d1e9820b Update license 2026-03-10 15:46:57 +00:00
Shaheer Sarfaraz
3da5ea35b4
Deduplicate shared helpers and enforce aliased imports (#228)
* Deduplicate string cleanup helpers and not-found responses

* Enforce aliased imports for infra and shared modules

* Enforce @client/@server aliases for deep relative imports

* Deduplicate visa sponsor and location filter definitions

* Use shared city filter export in extractor location checks
2026-02-22 16:13:52 +00:00
Shaheer Sarfaraz
82e142a8a8
Auto-Registering Extractor System (#223)
* initial commit?

* Address PR feedback on extractor discovery and startup resilience

* Address latest PR review comments

* fix city resolution fallback when input parses empty

* address PR feedback on extractor registry and pipeline validation

* address copilot comments on manifests and registry startup

* fix extractor discovery export handling and env isolation in tests

* enforce duplicate manifest id failures in strict mode

* Fix remaining extractor registry and runtime review comments

* docs

* docs

* test all, logic remains in extractors

* Address PR review feedback on extractor registry and validation

* Revert extractor moduleResolution to bundler

* Enforce shared city filtering across all discovery sources

* Deduplicate extractor strict city post-filtering
2026-02-21 17:44:07 +00:00
Shaheer Sarfaraz
19266fe5eb
City search (#217)
* wave 1, jobspy only

* combine usa/ca to united states

* strict city location filter

* hide and show based on focus

* UI changes

* allow clicking cross!

* pill animate in

* animate out, uggo fix

* animate out

* framer motion

* animate component height

* adzuna

* hiring cafe implementation

* refactor: centralize shared search-city parsing and matching

* feat: migrate city setting to searchCities with legacy fallback

* docs: update pipeline and extractor city-search wording

* fix(orchestrator): normalize tokenized paste behavior

* fix(shared): tighten city matching semantics

* docs(extractors): document city-location knobs and geocoding note
2026-02-21 00:42:09 +00:00
Shaheer Sarfaraz
eed5c2adba
Gemini api key issue (#204)
* uggo ternary fix

* fix ai studio url

* service returns a 403 if unauthed

* pass validation correctly

* fix response format

* Update orchestrator/src/client/pages/settings/utils.ts

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* fix nested ternaries client

* server fix

* Address PR #204 review feedback and stabilize CI

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-02-20 00:01:34 +00:00
Shaheer Sarfaraz
d34a9f041b
Hiring cafe extractor (#192)
* feat(hiringcafe): register new source across shared/server/client enums

* feat(hiringcafe-extractor): add browser-backed Hiring Cafe dataset extractor

* feat(orchestrator): integrate Hiring Cafe discovery service into pipeline

* feat(orchestrator-ui): add Hiring Cafe to source availability and run estimates

* chore(hiringcafe): wire CI/docker and add extractor documentation

* chore(format): apply biome formatting for Hiring Cafe integration

* add original websites

* coomints

* number or null
2026-02-19 12:51:55 +00:00
Shaheer Sarfaraz
c5c6675f04
feat: add Adzuna extractor with orchestrator integration (#177)
* feat(settings): add adzuna source fields and country compatibility

* feat(discovery): integrate adzuna extractor into pipeline

* feat(client): wire adzuna in source selection and run budgeting

* docs(extractors): add adzuna guide and configuration notes

* chore(workspaces): register adzuna extractor in lockfile

* fix(adzuna): run extractor via npm script instead of npx

* fix(adzuna): execute extractor via node+tsx without shell

* fix(adzuna): prefer npm run start without shell, fallback to tsx

* fix(docker): include adzuna extractor workspace in image

* chore(adzuna): reuse shared type-conversion utilities

* type-check adzuna

* formatting

* deeedooop

* better instructions
2026-02-17 16:49:42 +00:00
Shaheer Sarfaraz
4e1ea28301
Enable Glassdoor as a JobSpy source (#126)
* feat(shared): add glassdoor to job source model

* feat(jobspy): support glassdoor site in scraper and discovery

* feat(pipeline): include glassdoor in source selection and API schema

* feat(ui): add glassdoor toggle to jobspy settings and run estimates

* test/docs: cover glassdoor jobspy integration end-to-end

* fix(jobspy): make glassdoor always-on without settings toggle

* fix(jobspy): fallback glassdoor when location is country-level

* refactor(jobspy): drop direct pandas usage in wrapper

* feat(pipeline): gate glassdoor by supported countries

* fix(jobspy): restore pandas output and keep glassdoor disable copy

* fix(jobspy): map country-level glassdoor searches to city fallbacks

* feat(ui): require glassdoor city for country-level runs
2026-02-10 17:57:49 +00:00
Shaheer Sarfaraz
a409aa5ee0
Live scraping updates in pipeline UI (#100)
* initial commit

* fix clear script

* cancelling pipelines

* formatting
2026-02-07 22:44:00 +00:00
Shaheer Sarfaraz
b94f85b149
Reduce low risk duplication (#79)
* clean up helpers

* shared in it's own top level folder

* workspaces setup

* build fix

* disable workspaces?

* run ci

* rename job-flow to gradcracker

* optional dependencies

* formatting?

* more optional modules

* allow post install runs

* node bump

* remove post install

* add optionals

* add more

* formatting

* comments, but im unsure

* run typescript DIRECTLY

* better build

* camoufox simplification

* lint

* build process doesn't exist

* build fix

* lockfile

* type check everything, build only for client

* rename steps correctly

* import from package!

* fix formatting

* don't fetch twice

* fix concern
2026-02-02 21:30:14 +00:00
Devin Collins
65952259ce
feat: add remote jobs filter for JobSpy (#70)
* feat(types): add jobspyIsRemote to TypeScript type definitions

- Add jobspyIsRemote boolean fields to AppSettings interface
- Follow three-field pattern: value, default, override
- Update test fixture with default values (false)

* feat(validation): add jobspyIsRemote validation schema

Add Zod validation schema for jobspyIsRemote boolean setting to ensure type safety in the settings API endpoint.

* feat(db): add jobspyIsRemote to database repository setting keys

* feat(api): add jobspyIsRemote storage to settings API route

* feat(service): add jobspyIsRemote to settings service with environment variable support

* feat(jobspy): add isRemote parameter to JobSpy service interface

* feat(pipeline): pass isRemote setting to JobSpy service

* feat(python): add is_remote parameter to JobSpy scraper script

* feat(ui): add Remote Jobs checkbox to JobSpy settings

* feat(ui): add Remote badge to job display

- Display Remote badge when job.isRemote === true
- Position badge next to Source badge in JobHeader
- Use Badge component with outline variant
- Badge does not display when isRemote is false or null

* docs(env): add JOBSPY_IS_REMOTE environment variable documentation

- Added JobSpy section to .env.example with JOBSPY_IS_REMOTE variable
- Documents remote-only job filtering option with default value of 0 (disabled)
- Follows existing .env.example format with clear section headers and descriptions

* test(remote-jobs): verify end-to-end functionality with comprehensive feedback loops
2026-01-31 16:48:17 +00:00
DaKheera47
4ffaf06b1d in extractors 2026-01-25 13:34:16 +00:00
DaKheera47
2cf9249159 gradcracker limits 2026-01-15 19:17:23 +00:00
DaKheera47
bdae9d13cc ukvisajobs api search results come in after login 2026-01-15 18:00:00 +00:00
DaKheera47
b914026d8b file saved as name 2026-01-08 23:54:06 +00:00
DaKheera47
2b2af06bb8 autologin for ukvisajobs 2026-01-07 23:53:01 +00:00
DaKheera47
572cb1d42d keywords can be set from UI 2025-12-26 22:25:55 +00:00
DaKheera47
bd7baafbec passing max ukvisajobs 2025-12-26 20:47:28 +00:00
DaKheera47
0f36d9b8a6 initial implementation 2025-12-26 20:17:05 +00:00
DaKheera47
deb30efa44 better default serach temrs for me 2025-12-17 16:36:00 +00:00
DaKheera47
d24f71ab3d rename extractors to their own folder 2025-12-14 22:44:37 +00:00