readme
This commit is contained in:
parent
87804480fc
commit
cdd6956eb7
212
README.md
212
README.md
@ -1,126 +1,166 @@
|
||||
# Job Ops 🚀
|
||||
# Job Ops
|
||||
|
||||
Automated job discovery, scoring, and resume generation pipeline.
|
||||
Automated job discovery -> AI suitability scoring -> tailored resume PDFs -> a dashboard to review/apply (with optional Notion sync).
|
||||
|
||||
## Features
|
||||
## How it works (pipeline)
|
||||
|
||||
- **Job Crawler** - Discovers jobs from Gradcracker and other sources
|
||||
- **AI Scoring** - Ranks jobs by suitability using OpenRouter API
|
||||
- **Resume Generator** - Creates tailored PDFs via RXResume automation
|
||||
- **Dashboard UI** - React-based interface for reviewing and applying
|
||||
1. **Crawl**: `job-extractor` (Crawlee + Playwright + Camoufox) visits Gradcracker search pages, opens each job page, extracts structured fields + the job description, and captures the real application URL by clicking the apply button (skipped for already-known jobs).
|
||||
2. **Import + dedupe**: `orchestrator` reads the Crawlee dataset (`job-extractor/storage/datasets/default/*.json`) and inserts new jobs into SQLite (`jobs.job_url` is unique).
|
||||
3. **Score**: `orchestrator` scores up to 50 unprocessed jobs via OpenRouter (cached as `suitabilityScore`/`suitabilityReason`).
|
||||
4. **Select**: take the top `N` jobs above `minSuitabilityScore`.
|
||||
5. **Process**: for each selected job:
|
||||
- generate a tailored resume summary via OpenRouter (stored on the job)
|
||||
- generate a PDF by injecting the summary into `resume-generator/base.json`, writing a temp resume JSON, then running `resume-generator/rxresume_automation.py` (Playwright automates `rxresu.me` import -> export)
|
||||
6. **Review/apply**: the React dashboard shows job status, score, links, and PDFs; clicking `Mark Applied` optionally creates a Notion page.
|
||||
|
||||
## Quick Start with Docker
|
||||
Live progress is streamed to the UI via Server-Sent Events at `GET /api/pipeline/progress` (the crawler emits stdout lines prefixed with `JOBOPS_PROGRESS`; the orchestrator forwards them).
|
||||
|
||||
### 1. Configure Environment
|
||||
## Architecture (Mermaid)
|
||||
|
||||
```bash
|
||||
# Copy the example env file
|
||||
cp .env.example .env
|
||||
```mermaid
|
||||
flowchart LR
|
||||
subgraph UI[User Interface]
|
||||
DASH[React Dashboard]
|
||||
end
|
||||
|
||||
# Edit with your credentials
|
||||
nano .env
|
||||
subgraph ORCH[Orchestrator (Node/TS)]
|
||||
API[Express API<br/>/api/*]
|
||||
PIPE[Pipeline Runner]
|
||||
DB[(SQLite<br/>jobs.db)]
|
||||
PDFS[(PDFs<br/>pdfs/)]
|
||||
end
|
||||
|
||||
subgraph CRAWL[job-extractor (Crawlee/Playwright)]
|
||||
C1[Seed search URLs<br/>(locations x roles)]
|
||||
C2[Parse list pages<br/>enqueue job pages]
|
||||
C3[Parse job pages<br/>extract JD + apply URL]
|
||||
DS[(Crawlee dataset<br/>storage/datasets/default)]
|
||||
end
|
||||
|
||||
subgraph EXT[External Services]
|
||||
GC[Gradcracker]
|
||||
OR[OpenRouter]
|
||||
RX[rxresu.me]
|
||||
NO[Notion (optional)]
|
||||
N8N[n8n / cron (optional)]
|
||||
end
|
||||
|
||||
N8N -->|POST /api/webhook/trigger| API
|
||||
DASH -->|REST| API
|
||||
API -->|REST JSON| DASH
|
||||
DASH -->|SSE connect| API
|
||||
API -->|progress events| DASH
|
||||
|
||||
API -->|run pipeline| PIPE
|
||||
|
||||
PIPE -->|spawn| CRAWL
|
||||
C1 --> GC
|
||||
C2 --> GC
|
||||
C3 --> GC
|
||||
CRAWL --> DS
|
||||
API -->|read| DS
|
||||
API --> DB
|
||||
|
||||
PIPE -->|score + summary| OR
|
||||
PIPE -->|spawn python| RX
|
||||
RX -->|export| PDFS
|
||||
API -->|serve /pdfs/*| PDFS
|
||||
|
||||
API -->|create page| NO
|
||||
```
|
||||
|
||||
Required environment variables:
|
||||
- `OPENROUTER_API_KEY` - Get from [openrouter.ai/keys](https://openrouter.ai/keys)
|
||||
- `RXRESUME_EMAIL` - Your [rxresu.me](https://rxresu.me) account email
|
||||
- `RXRESUME_PASSWORD` - Your RXResume password
|
||||
## Repo layout
|
||||
|
||||
### 2. Add Your Base Resume
|
||||
|
||||
Place your resume JSON at `resume-generator/base.json`.
|
||||
You can export this from RXResume.
|
||||
|
||||
### 3. Run
|
||||
|
||||
```bash
|
||||
# Build and start
|
||||
docker compose up -d
|
||||
|
||||
# View logs
|
||||
docker compose logs -f
|
||||
|
||||
# Stop
|
||||
docker compose down
|
||||
```
|
||||
job-ops/
|
||||
orchestrator/ # Express API + React dashboard + pipeline
|
||||
src/server/ # API routes, pipeline, DB, services
|
||||
src/client/ # UI (polls jobs, listens to SSE progress)
|
||||
src/shared/ # shared types (Job, PipelineRun, etc.)
|
||||
job-extractor/ # Crawlee crawler (Gradcracker)
|
||||
resume-generator/ # Python Playwright automation for rxresu.me
|
||||
base.json # your exported base resume (template)
|
||||
data/ # persisted runtime artifacts (Docker default)
|
||||
jobs.db # SQLite database
|
||||
pdfs/ # generated PDFs (resume_<jobId>.pdf)
|
||||
docker-compose.yml # single-container deployment
|
||||
Dockerfile # builds orchestrator + installs browsers
|
||||
```
|
||||
|
||||
### 4. Access
|
||||
## Data model (SQLite)
|
||||
|
||||
- **Dashboard**: http://localhost:3001
|
||||
- **API**: http://localhost:3001/api
|
||||
- **Health**: http://localhost:3001/health
|
||||
- `jobs`
|
||||
- from crawl: `title`, `employer`, `jobUrl`, `applicationLink`, `deadline`, `salary`, `location`, `jobDescription`, etc.
|
||||
- enrichments: `status` (`discovered` -> `processing` -> `ready` -> `applied`/`rejected`), `suitabilityScore`, `suitabilityReason`, `tailoredSummary`, `pdfPath`, `notionPageId`
|
||||
- `pipeline_runs`: audit log of runs (`running`/`completed`/`failed`, counts, error)
|
||||
|
||||
## Data Persistence
|
||||
## Running (Docker)
|
||||
|
||||
All data is stored in the `./data` directory:
|
||||
- `data/jobs.db` - SQLite database
|
||||
- `data/pdfs/` - Generated resume PDFs
|
||||
1. Create `.env` at repo root (`cp .env.example .env`) and set:
|
||||
- `OPENROUTER_API_KEY`
|
||||
- `RXRESUME_EMAIL`, `RXRESUME_PASSWORD`
|
||||
- optional: `NOTION_API_KEY`, `NOTION_DATABASE_ID`, `WEBHOOK_SECRET`
|
||||
2. Put your exported RXResume JSON at `resume-generator/base.json`.
|
||||
3. Start: `docker compose up -d --build`
|
||||
4. Open:
|
||||
- Dashboard/UI: `http://localhost:3005`
|
||||
- API: `http://localhost:3005/api`
|
||||
- Health: `http://localhost:3005/health`
|
||||
|
||||
## Development
|
||||
Persistent data lives in `./data` (bind-mounted into the container).
|
||||
|
||||
### Without Docker
|
||||
## Running (local dev)
|
||||
|
||||
Prereqs: Node 20+, Python 3.10+, Playwright browsers (Firefox).
|
||||
|
||||
Install Node deps (both packages):
|
||||
|
||||
```bash
|
||||
# Install dependencies
|
||||
cd orchestrator && npm install
|
||||
cd ../job-extractor && npm install
|
||||
```
|
||||
|
||||
# Set up Python environment for resume generator
|
||||
cd ../resume-generator
|
||||
python3 -m venv .venv
|
||||
source .venv/bin/activate
|
||||
pip install playwright
|
||||
playwright install chromium
|
||||
Configure the orchestrator env + DB:
|
||||
|
||||
# Run orchestrator (from orchestrator folder)
|
||||
```bash
|
||||
cd ../orchestrator
|
||||
cp .env.example .env # Configure your env
|
||||
cp .env.example .env
|
||||
npm run db:migrate
|
||||
npm run dev
|
||||
```
|
||||
|
||||
### Build Docker Image
|
||||
Set up the resume generator (used for PDF export):
|
||||
|
||||
```bash
|
||||
docker build -t job-ops:latest .
|
||||
cd ../resume-generator
|
||||
python -m venv .venv
|
||||
# Windows PowerShell:
|
||||
.\.venv\Scripts\Activate.ps1
|
||||
# macOS/Linux:
|
||||
# source .venv/bin/activate
|
||||
pip install playwright
|
||||
python -m playwright install firefox
|
||||
```
|
||||
|
||||
### Push to Docker Hub
|
||||
If you're on Windows, set `PYTHON_PATH` in `orchestrator/.env` to your venv python (e.g. `..\resume-generator\.venv\Scripts\python.exe`) or use Docker/WSL.
|
||||
|
||||
```bash
|
||||
docker tag job-ops:latest yourusername/job-ops:latest
|
||||
docker push yourusername/job-ops:latest
|
||||
```
|
||||
Dev URLs:
|
||||
- API: `http://localhost:3001/api`
|
||||
- UI (Vite): `http://localhost:5173`
|
||||
|
||||
## API Endpoints
|
||||
## Key endpoints
|
||||
|
||||
| Method | Endpoint | Description |
|
||||
|--------|----------|-------------|
|
||||
| GET | `/api/jobs` | List all jobs |
|
||||
| GET | `/api/jobs/:id` | Get job details |
|
||||
| PATCH | `/api/jobs/:id` | Update job |
|
||||
| POST | `/api/jobs/:id/process` | Generate resume for job |
|
||||
| POST | `/api/jobs/:id/apply` | Mark as applied |
|
||||
| POST | `/api/jobs/:id/reject` | Skip job |
|
||||
| POST | `/api/jobs/process-discovered` | Process all discovered jobs |
|
||||
| GET | `/api/pipeline/status` | Pipeline status |
|
||||
| POST | `/api/pipeline/run` | Trigger pipeline |
|
||||
| GET | `/api/pipeline/progress` | SSE progress stream |
|
||||
| DELETE | `/api/database` | Clear all data |
|
||||
- Jobs: `GET /api/jobs`, `POST /api/jobs/:id/process`, `POST /api/jobs/:id/apply`, `POST /api/jobs/:id/reject`, `POST /api/jobs/process-discovered`
|
||||
- Pipeline: `POST /api/pipeline/run`, `GET /api/pipeline/status`, `GET /api/pipeline/progress` (SSE)
|
||||
- Webhook: `POST /api/webhook/trigger` (optional auth via `WEBHOOK_SECRET`)
|
||||
- Ops: `DELETE /api/database` (wipes DB)
|
||||
|
||||
## Architecture
|
||||
## Notes / sharp edges
|
||||
|
||||
```
|
||||
job-ops/
|
||||
├── orchestrator/ # Node.js backend + React frontend
|
||||
│ ├── src/server/ # Express API, services, DB
|
||||
│ └── src/client/ # React dashboard
|
||||
├── job-extractor/ # Crawlee-based job crawler
|
||||
├── resume-generator/ # Python Playwright automation
|
||||
├── data/ # SQLite DB + generated PDFs
|
||||
├── Dockerfile
|
||||
└── docker-compose.yml
|
||||
```
|
||||
- **Crawl targets**: edit `job-extractor/src/main.ts` to change the Gradcracker location/role matrix.
|
||||
- **Notion sync is schema-dependent**: `orchestrator/src/server/services/notion.ts` assumes property names; adjust to match your Notion database.
|
||||
- **Pipeline config knobs**: `POST /api/pipeline/run` accepts `{ topN, minSuitabilityScore }`; `PIPELINE_TOP_N`/`PIPELINE_MIN_SCORE` are used by `npm run pipeline:run` (CLI runner).
|
||||
- **Anti-bot reality**: crawling is headless + "humanized", but sites can still block; expect occasional flakiness.
|
||||
|
||||
## License
|
||||
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user