Jobber/DEPLOY_GITEA_VM_CRON_TELEGRAM.md
2026-04-05 20:03:06 -04:00

330 lines
12 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Deploy on a VM, run the pipeline on a schedule, notify Telegram
End-to-end checklist:
1. Push this repo to your Git host (Gitea, GitHub, etc.).
2. On the server: install Docker + Compose v2, clone the repo, copy `.env.example``.env`, run `docker compose up -d`.
3. Confirm the UI/API on the mapped host port (default **3005** → container **3001**).
4. Add a cron job that `POST`s `/api/pipeline/run` (see §3).
5. Optional: Telegram via `scripts/jobber-pipeline-telegram.sh`, pipeline webhook relay, or n8n (see §4).
---
## Git remote (example)
Replace host, user/org, and repo name with yours:
```bash
git remote add gitea git@GITEA_HOST:YOUR_USER/Jobber.git
# or: git remote set-url gitea ...
git push -u gitea main
```
If you have uncommitted changes:
```bash
git add -A && git commit -m "Your message" && git push gitea main
```
---
## 1. Deploy on a Linux VM (bare metal or cloud)
1. Install **Docker** and **Docker Compose** (plugin v2).
2. Clone from your Git server (SSH or HTTPS):
```bash
git clone git@GITEA_HOST:YOUR_USER/Jobber.git
cd Jobber
```
3. Environment:
```bash
cp .env.example .env
# Edit .env: MODEL / LLM keys, RXRESUME_*, search settings, etc.
```
4. Start:
```bash
docker compose up -d --build
```
The image build sets `VITE_SKIP_RXRESUME_ONBOARDING=true` by default so the first-run wizard only asks for **LLM** (Reactive Resume / PDF template steps are skipped; configure those in **Settings** if needed). Rebuild after changing that build arg.
5. Open the UI: `http://<VM-IP>:3005` (port mapped in `docker-compose.yml`).
6. Persist data: compose mounts `./data` — back up that directory.
---
## 2. Deploy as a container (same image, any host)
Same as the VM path: only Docker is required.
- Ensure port **3005** (or your chosen host port) is reachable if you use the UI from another machine.
- For **only** API/cron from localhost, bind to `127.0.0.1:3005` by changing the `ports:` line in `docker-compose.yml` (e.g. `"127.0.0.1:3005:3001"`).
Inside the container the app listens on **3001**; the default host map is **3005 → 3001**.
**Cron on the host** should call the API on the host:
- Browser: `http://127.0.0.1:3005`
- **API**: same origin; `/api/...` is served by the app.
If the API is only on a Docker network, use the container name and port `3001` from another container, or publish `3005` on the host and use `127.0.0.1:3005` from cron.
---
## 3. Run the pipeline on a schedule (cron)
`POST /api/pipeline/run` **starts** the pipeline in the **background** and returns quickly (`{ ok: true, data: { message: "Pipeline started" } }`). That is enough for scheduling.
Example crontab (host timezone — adjust hours):
```cron
# 08:00, 14:00, 20:00 daily
0 8,14,20 * * * /usr/local/bin/jobops-pipeline-run.sh >> /var/log/jobops-pipeline.log 2>&1
```
Create `/usr/local/bin/jobops-pipeline-run.sh`:
```bash
#!/usr/bin/env bash
set -euo pipefail
BASE_URL="${JOBOPS_URL:-http://127.0.0.1:3005}"
# If BASIC_AUTH_USER / BASIC_AUTH_PASSWORD are set in .env, uncomment:
# AUTH=(-u "${BASIC_AUTH_USER:?}:${BASIC_AUTH_PASSWORD:?}")
curl -sS -X POST "${BASE_URL}/api/pipeline/run" \
-H "Content-Type: application/json" \
-d '{}' \
"${AUTH[@]:-}" \
| tee -a /var/log/jobops-pipeline.log
echo >> /var/log/jobops-pipeline.log
```
```bash
sudo chmod +x /usr/local/bin/jobops-pipeline-run.sh
```
Set `JOBOPS_URL` in roots crontab or `/etc/environment` if the app is on another host.
**Basic auth:** When `BASIC_AUTH_USER` and `BASIC_AUTH_PASSWORD` are in `.env`, non-GET API calls need Basic auth — use `curl -u user:pass` as above.
---
## 4. Telegram notifications
The app does **not** send Telegram by itself. Practical options:
### Option A — Pipeline webhook (recommended)
1. In the app: **Settings → Webhooks** (or env `PIPELINE_WEBHOOK_URL` / `WEBHOOK_SECRET`) set a URL that receives JSON when a run **completes or fails**.
2. Point that URL to a small relay that maps the JSON to Telegram `sendMessage`.
Telegram HTTP API:
```text
https://api.telegram.org/bot<BOT_TOKEN>/sendMessage
```
Body (JSON):
```json
{
"chat_id": "<YOUR_CHAT_ID>",
"text": "Pipeline finished: ..."
}
```
Host the relay on the same VM (Flask/FastAPI/Node, or n8n). Keep **bot token** and **chat id** in environment variables.
Payload shape (sanitized) includes fields like `event`, `pipelineRunId`, `jobsDiscovered`, `jobsProcessed`, `error` — see `orchestrator/src/server/pipeline/steps/notify-webhook.ts`.
### Option B — Shipped script: run pipeline + Telegram summary (cron-friendly)
The repo includes `scripts/jobber-pipeline-telegram.sh`: it `POST`s `/api/pipeline/run`, polls `GET /api/pipeline/status` until the run ends, then sends one Telegram with **status**, **jobsDiscovered**, and **jobsProcessed** (and **errorMessage** if failed).
**1. Dependencies on the host** (LXC/VM that runs cron):
```bash
apt-get update && apt-get install -y jq curl
```
**2. Install script and secrets** (after `git pull` in `/opt/Jobber` or your clone path):
```bash
install -m 755 /opt/Jobber/scripts/jobber-pipeline-telegram.sh /usr/local/bin/jobber-pipeline-telegram.sh
cp /opt/Jobber/scripts/jobber-cron.env.example /root/.jobber-cron.env
chmod 600 /root/.jobber-cron.env
nano /root/.jobber-cron.env
```
Fill **`TELEGRAM_BOT_TOKEN`** (from @BotFather) and **`TELEGRAM_CHAT_ID`**. For a **private** chat with your bot, use `message.chat.id` from `getUpdates` (same as your Telegram user id in the JSON). **`JOBOPS_URL`** defaults to `http://127.0.0.1:3005` when Jobber runs on the same host.
**3. Manual test** (before cron):
```bash
/usr/local/bin/jobber-pipeline-telegram.sh
```
You should get one Telegram when the pipeline finishes. Optional log: append `>> /var/log/jobber-pipeline.log 2>&1` on the cron line.
**4. Cron** (host **local** timezone — check with `timedatectl` — `crontab -e` as root):
```cron
# 09:00, 13:00, 18:00 daily — pipeline + Telegram summary
0 9,13,18 * * * /usr/local/bin/jobber-pipeline-telegram.sh >> /var/log/jobber-pipeline.log 2>&1
```
Other examples: `0 8,14,20 * * *` for 08:00 / 14:00 / 20:00.
**5. Pull latest code and redeploy** (on the VM, from the repo root, e.g. `/opt/Jobber`):
```bash
cd /opt/Jobber
git fetch origin && git pull --ff-only
install -m 755 scripts/jobber-pipeline-telegram.sh /usr/local/bin/jobber-pipeline-telegram.sh
docker compose up -d --build
```
Wait until `curl -sf http://127.0.0.1:3005/health` succeeds before relying on cron (container needs a few seconds after start).
**5b. Copy your local SQLite to the VM (profiles, settings, jobs)** — optional; use when you want the same **search profile**, `activeProfileId`, and job rows as on your laptop.
1. **Stop** the app that holds the DB open: local `npm run dev` (Ctrl+C) and on the VM `docker compose stop` (or `docker stop job-ops`).
2. **Checkpoint WAL** on the machine that owns the canonical DB (usually your laptop), so a copy is self-contained:
```bash
cd /path/to/Jobber
sqlite3 data/jobs.db "PRAGMA wal_checkpoint(FULL);"
```
3. **Copy** a consistent DB file to the VMs `./data/` (same path Docker mounts). Prefer a **SQLite backup** (avoids WAL races while `npm run dev` is running):
```bash
sqlite3 data/jobs.db ".backup 'data/jobs.deploy.db'"
scp ./data/jobs.deploy.db YOUR_USER@10.0.10.178:/opt/Jobber/data/jobs.db
rm -f data/jobs.deploy.db
```
On the VM, with the container **stopped**, **delete stale sidecars** or SQLite may report corrupt DB:
```bash
rm -f /opt/Jobber/data/jobs.db-wal /opt/Jobber/data/jobs.db-shm
```
Alternatively `rsync` only `jobs.db` after `PRAGMA wal_checkpoint(FULL)` on the laptop with nothing else writing to the file.
4. On the VM: `docker compose up -d` and verify `GET /api/settings` / the Settings UI shows your profile.
**Default pipeline sources** (empty JSON body to `POST /api/pipeline/run`, e.g. cron script) include **Glassdoor** via JobSpy with Indeed and LinkedIn. Glassdoors API often returns errors in logs; LinkedIn/Indeed can still produce rows. To force an explicit list from cron, set `JOBBER_PIPELINE_SOURCES` in `/root/.jobber-cron.env` (see `scripts/jobber-cron.env.example`).
**Security:** Never commit `/root/.jobber-cron.env` or paste bot tokens in Git. Revoke the token in BotFather if it was exposed.
### Option B2 — Minimal curl-only (no wait-for-finish)
If you only want to **trigger** the pipeline from cron without this script, use §3. For a one-off Telegram without polling:
```bash
TELEGRAM_BOT_TOKEN="123456:ABC..."
CHAT_ID="your_numeric_chat_id"
MSG="$(printf 'Pipeline finished. Check dashboard.')"
curl -sS -X POST "https://api.telegram.org/bot${TELEGRAM_BOT_TOKEN}/sendMessage" \
-H "Content-Type: application/json" \
-d "{\"chat_id\":\"${CHAT_ID}\",\"text\":$(echo "$MSG" | jq -Rs .)}"
```
**chat_id:** Message your bot, then open `https://api.telegram.org/bot<TOKEN>/getUpdates` and read `message.chat.id` (if `result` is empty, send **Start** to the bot first, or call `deleteWebhook` if a webhook was set).
### Option C — External automation
Use n8n, Grafana OnCall, or similar: schedule → `POST /api/pipeline/run` → wait/poll → Telegram node.
---
## 5. Security notes
- Do not commit `.env` or Telegram tokens.
- Prefer Basic Auth if the instance is reachable from the internet.
- Restrict firewall so only your IP (or VPN) can reach the published port if exposed.
---
## 6. Git remotes (reference)
```bash
git remote -v
git push origin main # or: git push gitea main — whatever you configured
```
---
## Related
- Env knobs: `PIPELINE_WEBHOOK_URL`, `WEBHOOK_SECRET`, `BASIC_AUTH_USER`, `BASIC_AUTH_PASSWORD` in `.env.example`.
- Local docs: `npm run docs:dev` from the repo root.
---
## 7. Proxmox: VM vs LXC, sizing, fast setup
### VM or container (LXC)?
| | **QEMU VM (recommended)** | **LXC** |
|---|---------------------------|--------|
| **Docker** | Works the same as on any Linux server. | Possible with `nesting=1` (and sometimes `keyctl`); more Proxmox/Docker footguns. |
| **This app** | Playwright/Firefox + Node inside Docker — predictable. | Same stack *can* work in nested Docker, but troubleshooting is harder. |
| **Overhead** | Slightly higher RAM for a full kernel. | Lower overhead per CT. |
**Choose a VM** unless you already run Docker in LXC on this cluster and know the knobs. For speed and fewer surprises: **Ubuntu 24.04 LTS cloud image**, 24 vCPU, 48 GB RAM, **≥ 40 GB** disk (Docker layers + `./data`).
**Rough sizing**
- **Light personal use:** 2 vCPU, **4 GB RAM**, 40 GB disk — often enough.
- **Comfortable (pipelines + browsers + headroom):** 4 vCPU, **68 GB RAM**, 64 GB disk.
- **Tight:** 2 GB RAM can work for idle UI only; **scraping/LLM runs will swap or OOM** — avoid.
### Proxmox UI (once per guest)
1. **Create VM** → ISO or cloud-init image (e.g. Ubuntu 24.04).
2. **Network**: bridge (e.g. `vmbr0`) so the guest gets a LAN IP.
3. **Disk**: virtio, discard on if SSD.
4. **CPU type:** `host` if single-node and you want a tiny perf edge; `kvm64` is fine.
5. After install: **Guest agent** optional but handy for IP in Proxmox UI.
### One-shot shell setup (inside the Ubuntu VM)
Run as a user with `sudo`. Set `REPO_URL` to your Git remote (HTTPS or SSH). First build can take several minutes.
```bash
set -euo pipefail
REPO_URL="${REPO_URL:-https://github.com/YOUR_USER/Jobber.git}" # change
APP_DIR="${APP_DIR:-$HOME/Jobber}"
sudo apt-get update
sudo apt-get install -y ca-certificates curl git
# Docker Engine + Compose plugin (official convenience script; review if you prefer manual repo install)
curl -fsSL https://get.docker.com | sudo sh
sudo usermod -aG docker "$USER"
# Log out and back in so `docker` works without sudo, or use `newgrp docker` for this session:
newgrp docker || true
git clone "$REPO_URL" "$APP_DIR"
cd "$APP_DIR"
cp .env.example .env
echo "Edit .env now (LLM keys, RXRESUME, etc.), then run: docker compose up -d --build"
```
Then edit `.env`, then:
```bash
cd "$APP_DIR"
docker compose up -d --build
```
Open `http://<VM-IP>:3005`. Persist backups of `$APP_DIR/data` and your `.env`.