6.1 KiB
6.1 KiB
Job Ops
Automated job discovery -> AI suitability scoring -> tailored resume PDFs -> a dashboard to review/apply (with optional Notion sync).
How it works (pipeline)
- Crawl:
job-extractor(Crawlee + Playwright + Camoufox) visits Gradcracker search pages, opens each job page, extracts structured fields + the job description, and captures the real application URL by clicking the apply button (skipped for already-known jobs). - Import + dedupe:
orchestratorreads the Crawlee dataset (job-extractor/storage/datasets/default/*.json) and inserts new jobs into SQLite (jobs.job_urlis unique). - Score:
orchestratorscores up to 50 unprocessed jobs via OpenRouter (cached assuitabilityScore/suitabilityReason). - Select: take the top
Njobs aboveminSuitabilityScore. - Process: for each selected job:
- generate a tailored resume summary via OpenRouter (stored on the job)
- generate a PDF by injecting the summary into
resume-generator/base.json, writing a temp resume JSON, then runningresume-generator/rxresume_automation.py(Playwright automatesrxresu.meimport -> export)
- Review/apply: the React dashboard shows job status, score, links, and PDFs; clicking
Mark Appliedoptionally creates a Notion page.
Live progress is streamed to the UI via Server-Sent Events at GET /api/pipeline/progress (the crawler emits stdout lines prefixed with JOBOPS_PROGRESS; the orchestrator forwards them).
Architecture (Mermaid)
flowchart LR
subgraph UI[User Interface]
DASH[React Dashboard]
end
subgraph ORCH[Orchestrator (Node/TS)]
API[Express API<br/>/api/*]
PIPE[Pipeline Runner]
DB[(SQLite<br/>jobs.db)]
PDFS[(PDFs<br/>pdfs/)]
end
subgraph CRAWL[job-extractor (Crawlee/Playwright)]
C1[Seed search URLs<br/>(locations x roles)]
C2[Parse list pages<br/>enqueue job pages]
C3[Parse job pages<br/>extract JD + apply URL]
DS[(Crawlee dataset<br/>storage/datasets/default)]
end
subgraph EXT[External Services]
GC[Gradcracker]
OR[OpenRouter]
RX[rxresu.me]
NO[Notion (optional)]
N8N[n8n / cron (optional)]
end
N8N -->|POST /api/webhook/trigger| API
DASH -->|REST| API
API -->|REST JSON| DASH
DASH -->|SSE connect| API
API -->|progress events| DASH
API -->|run pipeline| PIPE
PIPE -->|spawn| CRAWL
C1 --> GC
C2 --> GC
C3 --> GC
CRAWL --> DS
API -->|read| DS
API --> DB
PIPE -->|score + summary| OR
PIPE -->|spawn python| RX
RX -->|export| PDFS
API -->|serve /pdfs/*| PDFS
API -->|create page| NO
Repo layout
job-ops/
orchestrator/ # Express API + React dashboard + pipeline
src/server/ # API routes, pipeline, DB, services
src/client/ # UI (polls jobs, listens to SSE progress)
src/shared/ # shared types (Job, PipelineRun, etc.)
job-extractor/ # Crawlee crawler (Gradcracker)
resume-generator/ # Python Playwright automation for rxresu.me
base.json # your exported base resume (template)
data/ # persisted runtime artifacts (Docker default)
jobs.db # SQLite database
pdfs/ # generated PDFs (resume_<jobId>.pdf)
docker-compose.yml # single-container deployment
Dockerfile # builds orchestrator + installs browsers
Data model (SQLite)
jobs- from crawl:
title,employer,jobUrl,applicationLink,deadline,salary,location,jobDescription, etc. - enrichments:
status(discovered->processing->ready->applied/rejected),suitabilityScore,suitabilityReason,tailoredSummary,pdfPath,notionPageId
- from crawl:
pipeline_runs: audit log of runs (running/completed/failed, counts, error)
Running (Docker)
- Create
.envat repo root (cp .env.example .env) and set:OPENROUTER_API_KEYRXRESUME_EMAIL,RXRESUME_PASSWORD- optional:
NOTION_API_KEY,NOTION_DATABASE_ID,WEBHOOK_SECRET
- Put your exported RXResume JSON at
resume-generator/base.json. - Start:
docker compose up -d --build - Open:
- Dashboard/UI:
http://localhost:3005 - API:
http://localhost:3005/api - Health:
http://localhost:3005/health
- Dashboard/UI:
Persistent data lives in ./data (bind-mounted into the container).
Running (local dev)
Prereqs: Node 20+, Python 3.10+, Playwright browsers (Firefox).
Install Node deps (both packages):
cd orchestrator && npm install
cd ../job-extractor && npm install
Configure the orchestrator env + DB:
cd ../orchestrator
cp .env.example .env
npm run db:migrate
npm run dev
Set up the resume generator (used for PDF export):
cd ../resume-generator
python -m venv .venv
# Windows PowerShell:
.\.venv\Scripts\Activate.ps1
# macOS/Linux:
# source .venv/bin/activate
pip install playwright
python -m playwright install firefox
If you're on Windows, set PYTHON_PATH in orchestrator/.env to your venv python (e.g. ..\resume-generator\.venv\Scripts\python.exe) or use Docker/WSL.
Dev URLs:
- API:
http://localhost:3001/api - UI (Vite):
http://localhost:5173
Key endpoints
- Jobs:
GET /api/jobs,POST /api/jobs/:id/process,POST /api/jobs/:id/apply,POST /api/jobs/:id/reject,POST /api/jobs/process-discovered - Pipeline:
POST /api/pipeline/run,GET /api/pipeline/status,GET /api/pipeline/progress(SSE) - Webhook:
POST /api/webhook/trigger(optional auth viaWEBHOOK_SECRET) - Ops:
DELETE /api/database(wipes DB)
Notes / sharp edges
- Crawl targets: edit
job-extractor/src/main.tsto change the Gradcracker location/role matrix. - Notion sync is schema-dependent:
orchestrator/src/server/services/notion.tsassumes property names; adjust to match your Notion database. - Pipeline config knobs:
POST /api/pipeline/runaccepts{ topN, minSuitabilityScore };PIPELINE_TOP_N/PIPELINE_MIN_SCOREare used bynpm run pipeline:run(CLI runner). - Anti-bot reality: crawling is headless + "humanized", but sites can still block; expect occasional flakiness.
License
MIT