26 Commits

Author SHA1 Message Date
7b3dfb002a feat(extractors): add 17 job source extractors and cross-source dedup
Some checks failed
CI / Linting (Biome) (push) Failing after 36s
CI / Tests (push) Successful in 5m54s
CI / Type Check (adzuna-extractor) (push) Successful in 1m6s
CI / Type Check (gradcracker-extractor) (push) Successful in 1m9s
CI / Type Check (hiringcafe-extractor) (push) Successful in 1m5s
CI / Type Check (orchestrator) (push) Successful in 1m21s
CI / Type Check (startupjobs-extractor) (push) Successful in 1m4s
CI / Type Check (ukvisajobs-extractor) (push) Successful in 1m4s
CI / Documentation (push) Successful in 1m52s
Adds extractor packages: arbeitnow, ashby, careerjet, fourdayweek,
greenhouse, himalayas, jobicy, jooble, lever, reed, remoteok, remotive,
themuse, usajobs, weworkremotely, workday — each with manifest, package
metadata and README.

Pipeline / shared:
- shared/job-fingerprint: stable hash for cross-source dedup, with tests
- discover-jobs: dedup via fingerprint and richer per-source merging
- jobs repository: fingerprint-aware upsert / lookup
- settings-registry, settings types/routes, demo-defaults: knobs for the
  new sources
- shared extractors index: register the new manifests
- location-support, profiles route: small fixes for the new sources

Tooling:
- scripts/smoke-extractors.ts to sanity-check each source locally
- scripts/jobber-cron-{cherepaha,dobkin}.env.example: per-host cron
  templates (CHANGEME placeholders only)
- .env.example: documented env vars for the new extractors
- .gitignore: ignore extractors/*/storage/ runtime caches (was ukvisajobs only)

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-12 20:17:52 -04:00
fea00ae656 feat: search profiles, cover letters, discovery fixes
- Add search profiles (DB, API, settings UI) and wire into scorer/pipeline search terms.
- Add cover letter generation (service, job action, JobDetail UI).
- Align JobSpy Indeed country with country-level search geography when settings conflict; warn in logs.
- Infer country from search cities via inferCountryKeyFromSearchGeography (shared).
- Ignore extractor venv/storage and local data in Biome; ignore orchestrator/storage and JobSpy .venv in git.
- Vite: do not watch orchestrator/storage (prevents reloads during startup.jobs pipeline).
- JobSpy: document Python 3.10+ and venv setup in README/requirements.
- Onboarding and settings: local resume path handling, orchestrator .env.example for Vite.

Made-with: Cursor
2026-04-05 19:35:14 -04:00
Ryan Foote
0b22c08d7d
feat: add support for indicating workplaceTypes (#296) 2026-03-21 20:43:43 +00:00
0x1355
7f517776df
fix: auto-detect jobspy venv so contributors don't need PYTHON_PATH (#293) 2026-03-20 08:18:27 +00:00
Shaheer Sarfaraz
71e640b563
Add startup.jobs extractor support (#279)
* Add startup.jobs extractor support

* Harden startup.jobs extractor inputs

* Wire startupjobs into Docker and CI

* Tighten startupjobs review follow-ups

* fix: publish ghcr during release workflow

* feat: add startupjobs max jobs configuration and update related tests
2026-03-17 12:20:45 +00:00
DaKheera47
11d1e9820b Update license 2026-03-10 15:46:57 +00:00
Shaheer Sarfaraz
3da5ea35b4
Deduplicate shared helpers and enforce aliased imports (#228)
* Deduplicate string cleanup helpers and not-found responses

* Enforce aliased imports for infra and shared modules

* Enforce @client/@server aliases for deep relative imports

* Deduplicate visa sponsor and location filter definitions

* Use shared city filter export in extractor location checks
2026-02-22 16:13:52 +00:00
Shaheer Sarfaraz
82e142a8a8
Auto-Registering Extractor System (#223)
* initial commit?

* Address PR feedback on extractor discovery and startup resilience

* Address latest PR review comments

* fix city resolution fallback when input parses empty

* address PR feedback on extractor registry and pipeline validation

* address copilot comments on manifests and registry startup

* fix extractor discovery export handling and env isolation in tests

* enforce duplicate manifest id failures in strict mode

* Fix remaining extractor registry and runtime review comments

* docs

* docs

* test all, logic remains in extractors

* Address PR review feedback on extractor registry and validation

* Revert extractor moduleResolution to bundler

* Enforce shared city filtering across all discovery sources

* Deduplicate extractor strict city post-filtering
2026-02-21 17:44:07 +00:00
Shaheer Sarfaraz
19266fe5eb
City search (#217)
* wave 1, jobspy only

* combine usa/ca to united states

* strict city location filter

* hide and show based on focus

* UI changes

* allow clicking cross!

* pill animate in

* animate out, uggo fix

* animate out

* framer motion

* animate component height

* adzuna

* hiring cafe implementation

* refactor: centralize shared search-city parsing and matching

* feat: migrate city setting to searchCities with legacy fallback

* docs: update pipeline and extractor city-search wording

* fix(orchestrator): normalize tokenized paste behavior

* fix(shared): tighten city matching semantics

* docs(extractors): document city-location knobs and geocoding note
2026-02-21 00:42:09 +00:00
Shaheer Sarfaraz
eed5c2adba
Gemini api key issue (#204)
* uggo ternary fix

* fix ai studio url

* service returns a 403 if unauthed

* pass validation correctly

* fix response format

* Update orchestrator/src/client/pages/settings/utils.ts

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* fix nested ternaries client

* server fix

* Address PR #204 review feedback and stabilize CI

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-02-20 00:01:34 +00:00
Shaheer Sarfaraz
d34a9f041b
Hiring cafe extractor (#192)
* feat(hiringcafe): register new source across shared/server/client enums

* feat(hiringcafe-extractor): add browser-backed Hiring Cafe dataset extractor

* feat(orchestrator): integrate Hiring Cafe discovery service into pipeline

* feat(orchestrator-ui): add Hiring Cafe to source availability and run estimates

* chore(hiringcafe): wire CI/docker and add extractor documentation

* chore(format): apply biome formatting for Hiring Cafe integration

* add original websites

* coomints

* number or null
2026-02-19 12:51:55 +00:00
Shaheer Sarfaraz
c5c6675f04
feat: add Adzuna extractor with orchestrator integration (#177)
* feat(settings): add adzuna source fields and country compatibility

* feat(discovery): integrate adzuna extractor into pipeline

* feat(client): wire adzuna in source selection and run budgeting

* docs(extractors): add adzuna guide and configuration notes

* chore(workspaces): register adzuna extractor in lockfile

* fix(adzuna): run extractor via npm script instead of npx

* fix(adzuna): execute extractor via node+tsx without shell

* fix(adzuna): prefer npm run start without shell, fallback to tsx

* fix(docker): include adzuna extractor workspace in image

* chore(adzuna): reuse shared type-conversion utilities

* type-check adzuna

* formatting

* deeedooop

* better instructions
2026-02-17 16:49:42 +00:00
Shaheer Sarfaraz
4e1ea28301
Enable Glassdoor as a JobSpy source (#126)
* feat(shared): add glassdoor to job source model

* feat(jobspy): support glassdoor site in scraper and discovery

* feat(pipeline): include glassdoor in source selection and API schema

* feat(ui): add glassdoor toggle to jobspy settings and run estimates

* test/docs: cover glassdoor jobspy integration end-to-end

* fix(jobspy): make glassdoor always-on without settings toggle

* fix(jobspy): fallback glassdoor when location is country-level

* refactor(jobspy): drop direct pandas usage in wrapper

* feat(pipeline): gate glassdoor by supported countries

* fix(jobspy): restore pandas output and keep glassdoor disable copy

* fix(jobspy): map country-level glassdoor searches to city fallbacks

* feat(ui): require glassdoor city for country-level runs
2026-02-10 17:57:49 +00:00
Shaheer Sarfaraz
a409aa5ee0
Live scraping updates in pipeline UI (#100)
* initial commit

* fix clear script

* cancelling pipelines

* formatting
2026-02-07 22:44:00 +00:00
Shaheer Sarfaraz
b94f85b149
Reduce low risk duplication (#79)
* clean up helpers

* shared in it's own top level folder

* workspaces setup

* build fix

* disable workspaces?

* run ci

* rename job-flow to gradcracker

* optional dependencies

* formatting?

* more optional modules

* allow post install runs

* node bump

* remove post install

* add optionals

* add more

* formatting

* comments, but im unsure

* run typescript DIRECTLY

* better build

* camoufox simplification

* lint

* build process doesn't exist

* build fix

* lockfile

* type check everything, build only for client

* rename steps correctly

* import from package!

* fix formatting

* don't fetch twice

* fix concern
2026-02-02 21:30:14 +00:00
Devin Collins
65952259ce
feat: add remote jobs filter for JobSpy (#70)
* feat(types): add jobspyIsRemote to TypeScript type definitions

- Add jobspyIsRemote boolean fields to AppSettings interface
- Follow three-field pattern: value, default, override
- Update test fixture with default values (false)

* feat(validation): add jobspyIsRemote validation schema

Add Zod validation schema for jobspyIsRemote boolean setting to ensure type safety in the settings API endpoint.

* feat(db): add jobspyIsRemote to database repository setting keys

* feat(api): add jobspyIsRemote storage to settings API route

* feat(service): add jobspyIsRemote to settings service with environment variable support

* feat(jobspy): add isRemote parameter to JobSpy service interface

* feat(pipeline): pass isRemote setting to JobSpy service

* feat(python): add is_remote parameter to JobSpy scraper script

* feat(ui): add Remote Jobs checkbox to JobSpy settings

* feat(ui): add Remote badge to job display

- Display Remote badge when job.isRemote === true
- Position badge next to Source badge in JobHeader
- Use Badge component with outline variant
- Badge does not display when isRemote is false or null

* docs(env): add JOBSPY_IS_REMOTE environment variable documentation

- Added JobSpy section to .env.example with JOBSPY_IS_REMOTE variable
- Documents remote-only job filtering option with default value of 0 (disabled)
- Follows existing .env.example format with clear section headers and descriptions

* test(remote-jobs): verify end-to-end functionality with comprehensive feedback loops
2026-01-31 16:48:17 +00:00
DaKheera47
4ffaf06b1d in extractors 2026-01-25 13:34:16 +00:00
DaKheera47
2cf9249159 gradcracker limits 2026-01-15 19:17:23 +00:00
DaKheera47
bdae9d13cc ukvisajobs api search results come in after login 2026-01-15 18:00:00 +00:00
DaKheera47
b914026d8b file saved as name 2026-01-08 23:54:06 +00:00
DaKheera47
2b2af06bb8 autologin for ukvisajobs 2026-01-07 23:53:01 +00:00
DaKheera47
572cb1d42d keywords can be set from UI 2025-12-26 22:25:55 +00:00
DaKheera47
bd7baafbec passing max ukvisajobs 2025-12-26 20:47:28 +00:00
DaKheera47
0f36d9b8a6 initial implementation 2025-12-26 20:17:05 +00:00
DaKheera47
deb30efa44 better default serach temrs for me 2025-12-17 16:36:00 +00:00
DaKheera47
d24f71ab3d rename extractors to their own folder 2025-12-14 22:44:37 +00:00