better readme

This commit is contained in:
DaKheera47 2026-01-15 17:46:36 +00:00
parent 9264ecc1ce
commit 133a636de0

118
README.md
View File

@ -1,103 +1,37 @@
# Job Ops
# Job-Ops
Automated job discovery -> AI suitability scoring -> tailored resume PDFs -> a dashboard to review/apply (with optional Notion sync).
AI-powered job discovery and application pipeline. Automatically finds jobs, scores them against your profile, and generates tailored resumes.
## How it works (pipeline)
## Workflow
1. **Search**: Scrapes Gradcracker, Indeed, LinkedIn, and UK Visa Sponsorship jobs.
2. **Score**: AI ranks jobs by suitability using OpenRouter.
3. **Tailor**: Generates a custom resume summary for top-tier matches.
4. **Export**: Automates [RxResume](https://rxresu.me) to create tailored PDFs.
5. **Manage**: Review and mark jobs as "Applied" via the dashboard (syncs to Notion).
1. **Crawl**: `extractors/gradcracker` (Crawlee + Playwright + Camoufox) visits Gradcracker search pages, opens each job page, extracts structured fields + the job description, and captures the real application URL by clicking the apply button (skipped for already-known jobs).
2. **Import + dedupe**: `orchestrator` reads the Crawlee dataset (`extractors/gradcracker/storage/datasets/default/*.json`) and inserts new jobs into SQLite (`jobs.job_url` is unique).
3. **Score**: `orchestrator` scores up to 50 unprocessed jobs via OpenRouter (cached as `suitabilityScore`/`suitabilityReason`).
4. **Select**: take the top `N` jobs above `minSuitabilityScore`.
5. **Process**: for each selected job:
- generate a tailored resume summary via OpenRouter (stored on the job)
- generate a PDF by injecting the summary into `resume-generator/base.json`, writing a temp resume JSON, then running `resume-generator/rxresume_automation.py` (Playwright automates `rxresu.me` import -> export)
6. **Review/apply**: the React dashboard shows job status, score, links, and PDFs; clicking `Mark Applied` optionally creates a Notion page.
## Quick Start
```bash
# 1. Setup environment
cp .env.example .env
Live progress is streamed to the UI via Server-Sent Events at `GET /api/pipeline/progress` (the crawler emits stdout lines prefixed with `JOBOPS_PROGRESS`; the orchestrator forwards them).
# 2. Run with Docker
docker compose up -d --build
## Architecture (Mermaid)
```mermaid
flowchart LR
subgraph UI["User Interface"]
DASH["React Dashboard"]
end
subgraph ORCH["Orchestrator (Node/TS)"]
API["Express API<br/>/api/*"]
PIPE["Pipeline Runner"]
DB[(SQLite<br/>jobs.db)]
PDFS[(PDFs<br/>pdfs/)]
end
subgraph CRAWL["extractors/gradcracker (Crawlee/Playwright)"]
C1["Seed search URLs<br/>(locations x roles)"]
C2["Parse list pages<br/>enqueue job pages"]
C3["Parse job pages<br/>extract JD + apply URL"]
DS[(Crawlee dataset<br/>storage/datasets/default)]
end
subgraph EXT["External Services"]
GC["Gradcracker"]
OR["OpenRouter"]
RX["rxresu.me"]
NO["Notion (optional)"]
N8N["n8n / cron (optional)"]
end
N8N -->|"POST /api/webhook/trigger"| API
DASH <-->|"REST"| API
DASH <-->|"SSE progress"| API
PIPE -->|"spawn"| CRAWL
C1 --> GC
C2 --> GC
C3 --> GC
CRAWL --> DS
API -->|"read"| DS
API --> DB
PIPE -->|"score + summary"| OR
PIPE -->|"spawn python"| RX
RX -->|"export"| PDFS
API -->|"serve /pdfs/*"| PDFS
API -->|"create page"| NO
# 3. Access Dashboard
# http://localhost:3005
```
## Repo layout
## Setup
Essential variables in `.env`:
- `OPENROUTER_API_KEY`: For job scoring and tailoring.
- `RXRESUME_EMAIL`/`PASSWORD`: To automate PDF exports.
- `JOBSPY_SEARCH_TERMS`: Keywords for Indeed/LinkedIn scraping.
```
job-ops/
orchestrator/ # Express API + React dashboard + pipeline
src/server/ # API routes, pipeline, DB, services
src/client/ # UI (polls jobs, listens to SSE progress)
src/shared/ # shared types (Job, PipelineRun, etc.)
extractors/gradcracker/ # Crawlee crawler (Gradcracker)
extractors/jobspy/ # JobSpy wrapper (Indeed/LinkedIn/etc)
extractors/ukvisajobs/ # UK Visa Jobs API extractor
resume-generator/ # Python Playwright automation for rxresu.me
base.json # your exported base resume (template)
data/ # persisted runtime artifacts (Docker default)
jobs.db # SQLite database
pdfs/ # generated PDFs (resume_<jobId>.pdf)
docker-compose.yml # single-container deployment
Dockerfile # builds orchestrator + installs browsers
```
## Data model (SQLite)
- `jobs`
- from crawl: `title`, `employer`, `jobUrl`, `applicationLink`, `deadline`, `salary`, `location`, `jobDescription`, `source` (gradcracker/indeed/linkedin/ukvisajobs), etc.
- enrichments: `status` (`discovered` -> `processing` -> `ready` -> `applied`/`skipped`), `suitabilityScore`, `suitabilityReason`, `tailoredSummary`, `pdfPath`, `notionPageId`
- `pipeline_runs`: audit log of runs (`running`/`completed`/`failed`, counts, error)
## Running (Docker)
1. Create `.env` at repo root (`cp .env.example .env`) and set:
- `OPENROUTER_API_KEY`
- `RXRESUME_EMAIL`, `RXRESUME_PASSWORD`
- optional: `NOTION_API_KEY`, `NOTION_DATABASE_ID`, `WEBHOOK_SECRET`
## Structure
- `/orchestrator`: React frontend + Node.js backend & pipeline.
- `/extractors`: Specialized scrapers (Gradcracker, JobSpy, UKVisaJobs).
- `/resume-generator`: Python script for RxResume PDF automation.
- `/data`: Persistent storage for SQLite DB and generated PDFs.
2. Put your exported RXResume JSON at `resume-generator/base.json`.
3. Start: `docker compose up -d --build`
4. Open: