diff --git a/README.md b/README.md
index be9fb9a..a64953b 100644
--- a/README.md
+++ b/README.md
@@ -1,126 +1,166 @@
-# Job Ops 🚀
+# Job Ops
-Automated job discovery, scoring, and resume generation pipeline.
+Automated job discovery -> AI suitability scoring -> tailored resume PDFs -> a dashboard to review/apply (with optional Notion sync).
-## Features
+## How it works (pipeline)
-- **Job Crawler** - Discovers jobs from Gradcracker and other sources
-- **AI Scoring** - Ranks jobs by suitability using OpenRouter API
-- **Resume Generator** - Creates tailored PDFs via RXResume automation
-- **Dashboard UI** - React-based interface for reviewing and applying
+1. **Crawl**: `job-extractor` (Crawlee + Playwright + Camoufox) visits Gradcracker search pages, opens each job page, extracts structured fields + the job description, and captures the real application URL by clicking the apply button (skipped for already-known jobs).
+2. **Import + dedupe**: `orchestrator` reads the Crawlee dataset (`job-extractor/storage/datasets/default/*.json`) and inserts new jobs into SQLite (`jobs.job_url` is unique).
+3. **Score**: `orchestrator` scores up to 50 unprocessed jobs via OpenRouter (cached as `suitabilityScore`/`suitabilityReason`).
+4. **Select**: take the top `N` jobs above `minSuitabilityScore`.
+5. **Process**: for each selected job:
+ - generate a tailored resume summary via OpenRouter (stored on the job)
+ - generate a PDF by injecting the summary into `resume-generator/base.json`, writing a temp resume JSON, then running `resume-generator/rxresume_automation.py` (Playwright automates `rxresu.me` import -> export)
+6. **Review/apply**: the React dashboard shows job status, score, links, and PDFs; clicking `Mark Applied` optionally creates a Notion page.
-## Quick Start with Docker
+Live progress is streamed to the UI via Server-Sent Events at `GET /api/pipeline/progress` (the crawler emits stdout lines prefixed with `JOBOPS_PROGRESS`; the orchestrator forwards them).
-### 1. Configure Environment
+## Architecture (Mermaid)
-```bash
-# Copy the example env file
-cp .env.example .env
+```mermaid
+flowchart LR
+ subgraph UI[User Interface]
+ DASH[React Dashboard]
+ end
-# Edit with your credentials
-nano .env
+ subgraph ORCH[Orchestrator (Node/TS)]
+ API[Express API
/api/*]
+ PIPE[Pipeline Runner]
+ DB[(SQLite
jobs.db)]
+ PDFS[(PDFs
pdfs/)]
+ end
+
+ subgraph CRAWL[job-extractor (Crawlee/Playwright)]
+ C1[Seed search URLs
(locations x roles)]
+ C2[Parse list pages
enqueue job pages]
+ C3[Parse job pages
extract JD + apply URL]
+ DS[(Crawlee dataset
storage/datasets/default)]
+ end
+
+ subgraph EXT[External Services]
+ GC[Gradcracker]
+ OR[OpenRouter]
+ RX[rxresu.me]
+ NO[Notion (optional)]
+ N8N[n8n / cron (optional)]
+ end
+
+ N8N -->|POST /api/webhook/trigger| API
+ DASH -->|REST| API
+ API -->|REST JSON| DASH
+ DASH -->|SSE connect| API
+ API -->|progress events| DASH
+
+ API -->|run pipeline| PIPE
+
+ PIPE -->|spawn| CRAWL
+ C1 --> GC
+ C2 --> GC
+ C3 --> GC
+ CRAWL --> DS
+ API -->|read| DS
+ API --> DB
+
+ PIPE -->|score + summary| OR
+ PIPE -->|spawn python| RX
+ RX -->|export| PDFS
+ API -->|serve /pdfs/*| PDFS
+
+ API -->|create page| NO
```
-Required environment variables:
-- `OPENROUTER_API_KEY` - Get from [openrouter.ai/keys](https://openrouter.ai/keys)
-- `RXRESUME_EMAIL` - Your [rxresu.me](https://rxresu.me) account email
-- `RXRESUME_PASSWORD` - Your RXResume password
+## Repo layout
-### 2. Add Your Base Resume
-
-Place your resume JSON at `resume-generator/base.json`.
-You can export this from RXResume.
-
-### 3. Run
-
-```bash
-# Build and start
-docker compose up -d
-
-# View logs
-docker compose logs -f
-
-# Stop
-docker compose down
+```
+job-ops/
+ orchestrator/ # Express API + React dashboard + pipeline
+ src/server/ # API routes, pipeline, DB, services
+ src/client/ # UI (polls jobs, listens to SSE progress)
+ src/shared/ # shared types (Job, PipelineRun, etc.)
+ job-extractor/ # Crawlee crawler (Gradcracker)
+ resume-generator/ # Python Playwright automation for rxresu.me
+ base.json # your exported base resume (template)
+ data/ # persisted runtime artifacts (Docker default)
+ jobs.db # SQLite database
+ pdfs/ # generated PDFs (resume_.pdf)
+ docker-compose.yml # single-container deployment
+ Dockerfile # builds orchestrator + installs browsers
```
-### 4. Access
+## Data model (SQLite)
-- **Dashboard**: http://localhost:3001
-- **API**: http://localhost:3001/api
-- **Health**: http://localhost:3001/health
+- `jobs`
+ - from crawl: `title`, `employer`, `jobUrl`, `applicationLink`, `deadline`, `salary`, `location`, `jobDescription`, etc.
+ - enrichments: `status` (`discovered` -> `processing` -> `ready` -> `applied`/`rejected`), `suitabilityScore`, `suitabilityReason`, `tailoredSummary`, `pdfPath`, `notionPageId`
+- `pipeline_runs`: audit log of runs (`running`/`completed`/`failed`, counts, error)
-## Data Persistence
+## Running (Docker)
-All data is stored in the `./data` directory:
-- `data/jobs.db` - SQLite database
-- `data/pdfs/` - Generated resume PDFs
+1. Create `.env` at repo root (`cp .env.example .env`) and set:
+ - `OPENROUTER_API_KEY`
+ - `RXRESUME_EMAIL`, `RXRESUME_PASSWORD`
+ - optional: `NOTION_API_KEY`, `NOTION_DATABASE_ID`, `WEBHOOK_SECRET`
+2. Put your exported RXResume JSON at `resume-generator/base.json`.
+3. Start: `docker compose up -d --build`
+4. Open:
+ - Dashboard/UI: `http://localhost:3005`
+ - API: `http://localhost:3005/api`
+ - Health: `http://localhost:3005/health`
-## Development
+Persistent data lives in `./data` (bind-mounted into the container).
-### Without Docker
+## Running (local dev)
+
+Prereqs: Node 20+, Python 3.10+, Playwright browsers (Firefox).
+
+Install Node deps (both packages):
```bash
-# Install dependencies
cd orchestrator && npm install
cd ../job-extractor && npm install
+```
-# Set up Python environment for resume generator
-cd ../resume-generator
-python3 -m venv .venv
-source .venv/bin/activate
-pip install playwright
-playwright install chromium
+Configure the orchestrator env + DB:
-# Run orchestrator (from orchestrator folder)
+```bash
cd ../orchestrator
-cp .env.example .env # Configure your env
+cp .env.example .env
npm run db:migrate
npm run dev
```
-### Build Docker Image
+Set up the resume generator (used for PDF export):
```bash
-docker build -t job-ops:latest .
+cd ../resume-generator
+python -m venv .venv
+# Windows PowerShell:
+.\.venv\Scripts\Activate.ps1
+# macOS/Linux:
+# source .venv/bin/activate
+pip install playwright
+python -m playwright install firefox
```
-### Push to Docker Hub
+If you're on Windows, set `PYTHON_PATH` in `orchestrator/.env` to your venv python (e.g. `..\resume-generator\.venv\Scripts\python.exe`) or use Docker/WSL.
-```bash
-docker tag job-ops:latest yourusername/job-ops:latest
-docker push yourusername/job-ops:latest
-```
+Dev URLs:
+- API: `http://localhost:3001/api`
+- UI (Vite): `http://localhost:5173`
-## API Endpoints
+## Key endpoints
-| Method | Endpoint | Description |
-|--------|----------|-------------|
-| GET | `/api/jobs` | List all jobs |
-| GET | `/api/jobs/:id` | Get job details |
-| PATCH | `/api/jobs/:id` | Update job |
-| POST | `/api/jobs/:id/process` | Generate resume for job |
-| POST | `/api/jobs/:id/apply` | Mark as applied |
-| POST | `/api/jobs/:id/reject` | Skip job |
-| POST | `/api/jobs/process-discovered` | Process all discovered jobs |
-| GET | `/api/pipeline/status` | Pipeline status |
-| POST | `/api/pipeline/run` | Trigger pipeline |
-| GET | `/api/pipeline/progress` | SSE progress stream |
-| DELETE | `/api/database` | Clear all data |
+- Jobs: `GET /api/jobs`, `POST /api/jobs/:id/process`, `POST /api/jobs/:id/apply`, `POST /api/jobs/:id/reject`, `POST /api/jobs/process-discovered`
+- Pipeline: `POST /api/pipeline/run`, `GET /api/pipeline/status`, `GET /api/pipeline/progress` (SSE)
+- Webhook: `POST /api/webhook/trigger` (optional auth via `WEBHOOK_SECRET`)
+- Ops: `DELETE /api/database` (wipes DB)
-## Architecture
+## Notes / sharp edges
-```
-job-ops/
-├── orchestrator/ # Node.js backend + React frontend
-│ ├── src/server/ # Express API, services, DB
-│ └── src/client/ # React dashboard
-├── job-extractor/ # Crawlee-based job crawler
-├── resume-generator/ # Python Playwright automation
-├── data/ # SQLite DB + generated PDFs
-├── Dockerfile
-└── docker-compose.yml
-```
+- **Crawl targets**: edit `job-extractor/src/main.ts` to change the Gradcracker location/role matrix.
+- **Notion sync is schema-dependent**: `orchestrator/src/server/services/notion.ts` assumes property names; adjust to match your Notion database.
+- **Pipeline config knobs**: `POST /api/pipeline/run` accepts `{ topN, minSuitabilityScore }`; `PIPELINE_TOP_N`/`PIPELINE_MIN_SCORE` are used by `npm run pipeline:run` (CLI runner).
+- **Anti-bot reality**: crawling is headless + "humanized", but sites can still block; expect occasional flakiness.
## License