diff --git a/README.md b/README.md index be9fb9a..a64953b 100644 --- a/README.md +++ b/README.md @@ -1,126 +1,166 @@ -# Job Ops 🚀 +# Job Ops -Automated job discovery, scoring, and resume generation pipeline. +Automated job discovery -> AI suitability scoring -> tailored resume PDFs -> a dashboard to review/apply (with optional Notion sync). -## Features +## How it works (pipeline) -- **Job Crawler** - Discovers jobs from Gradcracker and other sources -- **AI Scoring** - Ranks jobs by suitability using OpenRouter API -- **Resume Generator** - Creates tailored PDFs via RXResume automation -- **Dashboard UI** - React-based interface for reviewing and applying +1. **Crawl**: `job-extractor` (Crawlee + Playwright + Camoufox) visits Gradcracker search pages, opens each job page, extracts structured fields + the job description, and captures the real application URL by clicking the apply button (skipped for already-known jobs). +2. **Import + dedupe**: `orchestrator` reads the Crawlee dataset (`job-extractor/storage/datasets/default/*.json`) and inserts new jobs into SQLite (`jobs.job_url` is unique). +3. **Score**: `orchestrator` scores up to 50 unprocessed jobs via OpenRouter (cached as `suitabilityScore`/`suitabilityReason`). +4. **Select**: take the top `N` jobs above `minSuitabilityScore`. +5. **Process**: for each selected job: + - generate a tailored resume summary via OpenRouter (stored on the job) + - generate a PDF by injecting the summary into `resume-generator/base.json`, writing a temp resume JSON, then running `resume-generator/rxresume_automation.py` (Playwright automates `rxresu.me` import -> export) +6. **Review/apply**: the React dashboard shows job status, score, links, and PDFs; clicking `Mark Applied` optionally creates a Notion page. -## Quick Start with Docker +Live progress is streamed to the UI via Server-Sent Events at `GET /api/pipeline/progress` (the crawler emits stdout lines prefixed with `JOBOPS_PROGRESS`; the orchestrator forwards them). -### 1. Configure Environment +## Architecture (Mermaid) -```bash -# Copy the example env file -cp .env.example .env +```mermaid +flowchart LR + subgraph UI[User Interface] + DASH[React Dashboard] + end -# Edit with your credentials -nano .env + subgraph ORCH[Orchestrator (Node/TS)] + API[Express API
/api/*] + PIPE[Pipeline Runner] + DB[(SQLite
jobs.db)] + PDFS[(PDFs
pdfs/)] + end + + subgraph CRAWL[job-extractor (Crawlee/Playwright)] + C1[Seed search URLs
(locations x roles)] + C2[Parse list pages
enqueue job pages] + C3[Parse job pages
extract JD + apply URL] + DS[(Crawlee dataset
storage/datasets/default)] + end + + subgraph EXT[External Services] + GC[Gradcracker] + OR[OpenRouter] + RX[rxresu.me] + NO[Notion (optional)] + N8N[n8n / cron (optional)] + end + + N8N -->|POST /api/webhook/trigger| API + DASH -->|REST| API + API -->|REST JSON| DASH + DASH -->|SSE connect| API + API -->|progress events| DASH + + API -->|run pipeline| PIPE + + PIPE -->|spawn| CRAWL + C1 --> GC + C2 --> GC + C3 --> GC + CRAWL --> DS + API -->|read| DS + API --> DB + + PIPE -->|score + summary| OR + PIPE -->|spawn python| RX + RX -->|export| PDFS + API -->|serve /pdfs/*| PDFS + + API -->|create page| NO ``` -Required environment variables: -- `OPENROUTER_API_KEY` - Get from [openrouter.ai/keys](https://openrouter.ai/keys) -- `RXRESUME_EMAIL` - Your [rxresu.me](https://rxresu.me) account email -- `RXRESUME_PASSWORD` - Your RXResume password +## Repo layout -### 2. Add Your Base Resume - -Place your resume JSON at `resume-generator/base.json`. -You can export this from RXResume. - -### 3. Run - -```bash -# Build and start -docker compose up -d - -# View logs -docker compose logs -f - -# Stop -docker compose down +``` +job-ops/ + orchestrator/ # Express API + React dashboard + pipeline + src/server/ # API routes, pipeline, DB, services + src/client/ # UI (polls jobs, listens to SSE progress) + src/shared/ # shared types (Job, PipelineRun, etc.) + job-extractor/ # Crawlee crawler (Gradcracker) + resume-generator/ # Python Playwright automation for rxresu.me + base.json # your exported base resume (template) + data/ # persisted runtime artifacts (Docker default) + jobs.db # SQLite database + pdfs/ # generated PDFs (resume_.pdf) + docker-compose.yml # single-container deployment + Dockerfile # builds orchestrator + installs browsers ``` -### 4. Access +## Data model (SQLite) -- **Dashboard**: http://localhost:3001 -- **API**: http://localhost:3001/api -- **Health**: http://localhost:3001/health +- `jobs` + - from crawl: `title`, `employer`, `jobUrl`, `applicationLink`, `deadline`, `salary`, `location`, `jobDescription`, etc. + - enrichments: `status` (`discovered` -> `processing` -> `ready` -> `applied`/`rejected`), `suitabilityScore`, `suitabilityReason`, `tailoredSummary`, `pdfPath`, `notionPageId` +- `pipeline_runs`: audit log of runs (`running`/`completed`/`failed`, counts, error) -## Data Persistence +## Running (Docker) -All data is stored in the `./data` directory: -- `data/jobs.db` - SQLite database -- `data/pdfs/` - Generated resume PDFs +1. Create `.env` at repo root (`cp .env.example .env`) and set: + - `OPENROUTER_API_KEY` + - `RXRESUME_EMAIL`, `RXRESUME_PASSWORD` + - optional: `NOTION_API_KEY`, `NOTION_DATABASE_ID`, `WEBHOOK_SECRET` +2. Put your exported RXResume JSON at `resume-generator/base.json`. +3. Start: `docker compose up -d --build` +4. Open: + - Dashboard/UI: `http://localhost:3005` + - API: `http://localhost:3005/api` + - Health: `http://localhost:3005/health` -## Development +Persistent data lives in `./data` (bind-mounted into the container). -### Without Docker +## Running (local dev) + +Prereqs: Node 20+, Python 3.10+, Playwright browsers (Firefox). + +Install Node deps (both packages): ```bash -# Install dependencies cd orchestrator && npm install cd ../job-extractor && npm install +``` -# Set up Python environment for resume generator -cd ../resume-generator -python3 -m venv .venv -source .venv/bin/activate -pip install playwright -playwright install chromium +Configure the orchestrator env + DB: -# Run orchestrator (from orchestrator folder) +```bash cd ../orchestrator -cp .env.example .env # Configure your env +cp .env.example .env npm run db:migrate npm run dev ``` -### Build Docker Image +Set up the resume generator (used for PDF export): ```bash -docker build -t job-ops:latest . +cd ../resume-generator +python -m venv .venv +# Windows PowerShell: +.\.venv\Scripts\Activate.ps1 +# macOS/Linux: +# source .venv/bin/activate +pip install playwright +python -m playwright install firefox ``` -### Push to Docker Hub +If you're on Windows, set `PYTHON_PATH` in `orchestrator/.env` to your venv python (e.g. `..\resume-generator\.venv\Scripts\python.exe`) or use Docker/WSL. -```bash -docker tag job-ops:latest yourusername/job-ops:latest -docker push yourusername/job-ops:latest -``` +Dev URLs: +- API: `http://localhost:3001/api` +- UI (Vite): `http://localhost:5173` -## API Endpoints +## Key endpoints -| Method | Endpoint | Description | -|--------|----------|-------------| -| GET | `/api/jobs` | List all jobs | -| GET | `/api/jobs/:id` | Get job details | -| PATCH | `/api/jobs/:id` | Update job | -| POST | `/api/jobs/:id/process` | Generate resume for job | -| POST | `/api/jobs/:id/apply` | Mark as applied | -| POST | `/api/jobs/:id/reject` | Skip job | -| POST | `/api/jobs/process-discovered` | Process all discovered jobs | -| GET | `/api/pipeline/status` | Pipeline status | -| POST | `/api/pipeline/run` | Trigger pipeline | -| GET | `/api/pipeline/progress` | SSE progress stream | -| DELETE | `/api/database` | Clear all data | +- Jobs: `GET /api/jobs`, `POST /api/jobs/:id/process`, `POST /api/jobs/:id/apply`, `POST /api/jobs/:id/reject`, `POST /api/jobs/process-discovered` +- Pipeline: `POST /api/pipeline/run`, `GET /api/pipeline/status`, `GET /api/pipeline/progress` (SSE) +- Webhook: `POST /api/webhook/trigger` (optional auth via `WEBHOOK_SECRET`) +- Ops: `DELETE /api/database` (wipes DB) -## Architecture +## Notes / sharp edges -``` -job-ops/ -├── orchestrator/ # Node.js backend + React frontend -│ ├── src/server/ # Express API, services, DB -│ └── src/client/ # React dashboard -├── job-extractor/ # Crawlee-based job crawler -├── resume-generator/ # Python Playwright automation -├── data/ # SQLite DB + generated PDFs -├── Dockerfile -└── docker-compose.yml -``` +- **Crawl targets**: edit `job-extractor/src/main.ts` to change the Gradcracker location/role matrix. +- **Notion sync is schema-dependent**: `orchestrator/src/server/services/notion.ts` assumes property names; adjust to match your Notion database. +- **Pipeline config knobs**: `POST /api/pipeline/run` accepts `{ topN, minSuitabilityScore }`; `PIPELINE_TOP_N`/`PIPELINE_MIN_SCORE` are used by `npm run pipeline:run` (CLI runner). +- **Anti-bot reality**: crawling is headless + "humanized", but sites can still block; expect occasional flakiness. ## License