All checks were successful
CI / skip-ci-check (pull_request) Successful in 8s
CI / lint-and-test (pull_request) Successful in 17s
CI / secret-scanning (pull_request) Successful in 8s
CI / dependency-scan (pull_request) Successful in 18s
CI / ansible-validation (pull_request) Successful in 54s
CI / sast-scan (pull_request) Successful in 29s
CI / license-check (pull_request) Successful in 14s
CI / vault-check (pull_request) Successful in 13s
CI / container-scan (pull_request) Successful in 8s
CI / sonar-analysis (pull_request) Successful in 8s
CI / playbook-test (pull_request) Successful in 27s
CI / workflow-summary (pull_request) Successful in 6s
Consolidate sprint status into handoff docs, add Listmonk/Mattermost/Mailcow and Vikunja SSO guides, Beszel alerts script, mattermost inventory, and mark phases 0–1 complete with phase 2 backlog for edge Caddy and security. Co-authored-by: Cursor <cursoragent@cursor.com>
445 lines
23 KiB
Markdown
445 lines
23 KiB
Markdown
# Levkin self-hosted stack — plan & decisions
|
||
|
||
Reference doc for the Proxmox homelab. Lives alongside the Cursor project that has the Proxmox info.
|
||
|
||
**Conventions:**
|
||
- All groups run inside an LXC unless marked **VM**.
|
||
- Inside each LXC: one `docker-compose.yml`, managed by **Dockge** where applicable.
|
||
- Caddy on the `edge` LXC is the only thing exposed to the internet.
|
||
- Authentik on the `identity` LXC is the source of truth for who you are.
|
||
- Vaultwarden stays standalone (it's the break-glass path if Authentik dies).
|
||
|
||
---
|
||
|
||
## Progress summary (updated 2026-05-24)
|
||
|
||
| Area | Status |
|
||
|------|--------|
|
||
| **Phase 0** Foundation | ✅ **Done** — pve10 LXCs static; UniFi VM DHCP reservations; auth + apex DNS; Caddy on **VM 106** @ `.50` (edge LXC = Phase 1.5) |
|
||
| **Phase 1** Identity (Authentik) | ✅ LXC **217** @ `10.0.10.21` — admin + TOTP |
|
||
| **Phase 2** Monitoring | ✅ LXC **218** — Kuma (17 monitors), Dockge, Umami, Beszel (16 agents), SMTP |
|
||
| **Phase 3** Cal.com | ✅ LXC **210** — booking + auto consult button; **OIDC deferred** (no enterprise license) |
|
||
| **Phase 4** SSO | ✅ Vikunja, Listmonk, Mattermost, Mailcow — browser smoke tests remaining |
|
||
| **Phase 5–8** | ⏳ Immich, Crater, Outline, automation depth — after P0 backlog |
|
||
| **Comms health** | ✅ Mailcow + Listmonk restored 2026-05-23 — [mailcow-lan-proxy-fix.md](mailcow-lan-proxy-fix.md) |
|
||
| **Site consolidation** | ⏳ **Partial** — git LXCs + levkin.ca LXC 220; optional later: static on Caddy VM |
|
||
| **dev-apps** | ⏳ punimTag **9101** on pve201 until testing done |
|
||
| **Nextcloud retire** | ✅ VM **201** stopped, `onboot 0`, Caddy removed (~8 GiB RAM freed) |
|
||
| **Portainer retire** | ✅ VM **109** destroyed 2026-05-23 (~16 GiB on pve10) |
|
||
| **Security pass** | 🟡 Partial — SSH keys + apt + cron 2026-05-23 — [security-remediation-plan.md](security-remediation-plan.md) |
|
||
|
||
---
|
||
|
||
## Capacity headroom (live check 2026-05-24)
|
||
|
||
Use this before adding LXCs/VMs. Re-check with `pvesm status` and `free -h` on each node.
|
||
|
||
### pve10 (PVENAS) — **primary place for new homelab services**
|
||
|
||
| Resource | Total | Used | **Available** | Notes |
|
||
|----------|-------|------|---------------|--------|
|
||
| **local-lvm** (thin) | ~1.67 TiB | ~22% | **~1.30 TiB** | New guests on **local-lvm** only (NAS SP00 degraded) |
|
||
| **RAM** (host) | 62 GiB | ~40 GiB | **~22 GiB** | Portainer **109** + Nextcloud **201** freed |
|
||
|
||
**Running:** LXCs 210, 215–221; VMs 102–108, 117, 150, 200. **Stopped:** 101 Jellyfin, 201 Nextcloud.
|
||
|
||
**Headroom:** ~**20+ GiB RAM** for Immich, Crater, or dev-apps LXC.
|
||
|
||
**Still available to free:**
|
||
|
||
| Stop / retire | Frees (maxmem) |
|
||
|---------------|----------------|
|
||
| ~~Portainer VM **109**~~ | ✅ **16 GiB** freed |
|
||
| ~~Nextcloud VM **201**~~ | ✅ **8 GiB** freed |
|
||
| Hermes VM **117** (if not needed) | **16 GiB** |
|
||
| Site LXCs 215/216 → Caddy static (optional) | **~1 GiB** |
|
||
|
||
### pve201 (pve) — **do not add new homelab services**
|
||
|
||
| Resource | Total | Used | **Available** | Notes |
|
||
|----------|-------|------|---------------|--------|
|
||
| **local-lvm** | ~1.67 TiB | ~46% | **~922 GiB** | Disk OK |
|
||
| **RAM** | 125 GiB | ~105 GiB | **~19 GiB** | GPU **104** (64 GB), DebianDesktop **100** (24 GB ✅ rebooted), punim **9101** (16 GB) |
|
||
|
||
**Verdict:** New stacks on **pve10** only. pve201: stop/migrate punim after testing.
|
||
|
||
---
|
||
|
||
## Current state (May 2026)
|
||
|
||
**Already running:**
|
||
- Caddy reverse proxy — currently on a **VM** (should migrate to LXC, see "Caddy migration" section)
|
||
- Mailcow — VM, mail domain is `levkine.ca` (with e)
|
||
- Vaultwarden, Vikunja, n8n, Listmonk, Mattermost — across various LXCs/VMs
|
||
- **Cal.com** — LXC id `210`, `cal.levkin.ca`, Postgres included, admin user `ilia`, 15-min consult event live at `cal.levkin.ca/ilia/consult` with Jitsi link
|
||
- Caddy entries live for: `levkin.ca`, `caseware.levkin.ca`, `auto.levkin.ca`, `iliadobkin.com`, `cal.levkin.ca`, `listmonk.levkin.ca`, `pdf.levkin.ca`, `search.levkin.ca`, `auth.levkin.ca`, `stats.levkin.ca`, **`status.levkin.ca`**
|
||
- **Authentik** — LXC **217** @ `10.0.10.21`, `https://auth.levkin.ca`, admin + TOTP enrolled
|
||
- **Monitoring** — LXC **218** @ `10.0.10.22`: Uptime Kuma `:3001`, Dockge `:5001`, Umami `:3000` (LAN-only) — [monitoring-stack.md](monitoring-stack.md)
|
||
- **Umami** + **Authentik** admin/TOTP/backup codes — done
|
||
- **Uptime Kuma** — monitors live; email alerts via Mailcow — see [monitoring-stack.md](monitoring-stack.md)
|
||
- **Dockge** on 218 — manages local `/opt/monitoring` stack
|
||
- **Snapshots** `backup-20260522` on LXCs **217**, **218**
|
||
- **Jellyfin** (VM 101) — stopped
|
||
- LXC **210, 215–221** — static via `pct set`; **Caddy VM 106** — static in-guest `.50`
|
||
- **Nextcloud VM 201** — retired (stopped, `onboot 0`, Caddy removed)
|
||
- ~~**Portainer VM 109**~~ — **removed** 2026-05-23 (~16 GiB RAM freed on pve10)
|
||
- **Marketing sites** — LXC **220** (`levkin.ca`), **215/216/219** (git deploy), not yet on Caddy VM static roots
|
||
- **punimTag dev** — pve201 LXC **9101** @ `10.0.10.121` (16 GB) — leave until testing done; then `dev-apps` on pve10
|
||
|
||
**Decisions locked in:**
|
||
- Container manager: **Dockge** (not Portainer, not Coolify/Dokploy/CapRover)
|
||
- Chat: **Mattermost only** — no Matrix/Synapse
|
||
- Knowledge tool: **Outline** for client-facing, **SiYuan** if/when PhD work picks up (don't run Affine + Trilium too)
|
||
- Bookmark manager: **Linkwarden** (full-page archive is the killer feature)
|
||
- Authentik is the SSO target; Vaultwarden stays standalone
|
||
|
||
---
|
||
|
||
## LXC / VM grouping table
|
||
|
||
| Group | What's inside | Why grouped | LXC or VM |
|
||
|---|---|---|---|
|
||
| **edge** | Caddy reverse proxy, Crowdsec/Fail2ban | The front door — small, stable, restart rarely | LXC, 1 vCPU, 1GB RAM |
|
||
| **identity** | Authentik (+ Postgres + Redis), Vaultwarden | Auth-critical — touch rarely, back up religiously | LXC, 2 vCPU, 2GB RAM |
|
||
| **comms** | Mailcow | Mailcow's compose is huge (15+ containers) and self-contained — wants its own host | **VM**, 4GB RAM |
|
||
| **automation** | n8n, Windmill (later), Huginn (later) | Active workloads, frequent updates, you'll touch these a lot | LXC, 2–4 vCPU, 4GB RAM |
|
||
| **productivity** | Vikunja, Listmonk, Outline, Mealie, Linkwarden | Personal/team productivity, low-resource | LXC, 2 vCPU, 4GB RAM |
|
||
| **media** | Immich, Nextcloud, Paperless-ngx | Large storage, GPU passthrough useful for Immich ML | **VM** if GPU passthrough, else LXC. Lots of disk. |
|
||
| **business** | Cal.com ✅, Crater | Client-facing, financial — back up often | LXC, 2 vCPU, 2GB RAM |
|
||
| **monitoring** | Uptime Kuma ✅, Dockge ✅, Umami ✅, Beszel (later) | Ops stack on LXC **218** | LXC, 2 vCPU, 2GB RAM |
|
||
| **labs** | Anything experimental — Flowise, Trigger.dev | Things you're trying out, can be wiped | LXC, scratch space |
|
||
|
||
### Why this grouping (cheat sheet)
|
||
|
||
- One service goes bad → only its group restarts.
|
||
- Need a kernel upgrade for one stack → snapshot the LXC, upgrade, roll back if broken.
|
||
- Mailcow's huge surface area is isolated in its own VM.
|
||
- Edge LXC is tiny and stable → perfect for the layer everything depends on.
|
||
- Backup cadence per group (see Backups section).
|
||
- Resource limits per LXC mean a runaway container can't eat n8n's RAM.
|
||
|
||
---
|
||
|
||
## Subdomains
|
||
|
||
Only expose what actually needs to be public. Internal services use Tailscale/Wireguard for remote access.
|
||
|
||
### Expose publicly
|
||
|
||
| Subdomain | Service | Group | Why public | Status |
|
||
|---|---|---|---|---|
|
||
| `levkin.ca` | Company site (spec + `/folders`) | edge | Main brand | ✅ LXC 220 — **DNS must point to home IP** (was parked elsewhere) |
|
||
| `caseware.levkin.ca` | Static site | edge | Marketing | ✅ live |
|
||
| `auto.levkin.ca` | Static site | edge | Marketing | ✅ live |
|
||
| `iliadobkin.com` | Portfolio (SDET) | edge | Personal site | ✅ live (pve10 LXC 219) |
|
||
| `cal.levkin.ca` | Cal.com | business | Clients book on it | ✅ live |
|
||
| `listmonk.levkin.ca` | Listmonk | productivity | Unsubscribe URLs must resolve | ✅ live |
|
||
| `mail.levkine.ca` | Mailcow | comms | Mail server | ✅ live |
|
||
| `auth.levkin.ca` | Authentik | identity | OIDC redirect URLs need external resolution | ✅ live |
|
||
| `bill.levkin.ca` | Crater | business | Clients view invoices | ⏳ Phase 6 |
|
||
| `cloud.levkin.ca` | Nextcloud | media | **Retiring** — decommission VM 201 after cutover | 🗑️ |
|
||
| `photos.levkin.ca` | Immich | media | Mobile apps need public hostname | ⏳ Phase 5 |
|
||
| `vault.levkin.ca` | Vaultwarden | identity | Mobile clients need public hostname | ⏳ |
|
||
| `notes.levkin.ca` | Outline | productivity | Sharing docs with clients | ⏳ |
|
||
| `chat.levkin.ca` | Mattermost | comms | Only if inviting outside users | ⏳ optional |
|
||
|
||
### Keep internal only (no public DNS, no Caddy block)
|
||
|
||
Reachable only via local network or Tailscale/Wireguard:
|
||
|
||
| Service | Reason |
|
||
|---|---|
|
||
| Umami admin UI | Only you need the dashboard. Tracking endpoint can be public, dashboard isn't. |
|
||
| Uptime Kuma | Status dashboard is for you. Don't advertise infrastructure. |
|
||
| Beszel | Metrics are admin-only. |
|
||
| Dockge | Admin UI — local only. |
|
||
| n8n editor | UI shouldn't be exposed. Webhooks go on `hooks.levkin.ca` if needed. |
|
||
| Huginn / Windmill / Flowise | Admin tools. |
|
||
| Vikunja | Personal task manager. |
|
||
| Mealie | Family recipes. |
|
||
| Trigger.dev | Internal automation. |
|
||
| Paperless-ngx | Personal documents. Never expose. |
|
||
| SiYuan | Personal knowledge. |
|
||
| Linkwarden | Personal bookmarks. |
|
||
|
||
### Borderline (decide per service)
|
||
|
||
| Subdomain | Service | Notes |
|
||
|---|---|---|
|
||
| `stats.levkin.ca` | Umami | Public tracker script; admin UI prefer LAN `:3000` |
|
||
| `status.levkin.ca` | Uptime Kuma | **Public status page** only (not admin UI) |
|
||
| *(none)* | Beszel | **LAN/Tailscale** `10.0.10.22:8090` — host metrics, no public DNS |
|
||
|
||
---
|
||
|
||
## Phased rollout
|
||
|
||
### Phase 0 — Foundation ✅
|
||
1. ✅ Caddy running (on VM — migrate to LXC in Phase 1.5)
|
||
2. ✅ **Static IP audit** — all pve10 LXCs pinned via `pct set`; Caddy VM static `.50`; homelab VMs pinned via UniFi DHCP — see [host-list.md](host-list.md)
|
||
3. ✅ DNS for `auth.levkin.ca` + `levkin.ca` apex → home IP
|
||
4. ✅ `identity` LXC **217** @ `10.0.10.21` (2 vCPU, 2GB RAM, 20GB `local-lvm`, Debian 12 + Docker Compose)
|
||
|
||
### Phase 1 — Identity ✅
|
||
1. ✅ Deploy Authentik in `identity` LXC (Authentik + Postgres + Redis, official compose at `/opt/authentik`)
|
||
2. ✅ Caddy: `auth.levkin.ca` → `10.0.10.21:9000` (simple passthrough, no forward-auth)
|
||
3. ✅ Admin user (`admin`), TOTP enrolled
|
||
4. ✅ `authentik Admins` group (skip custom `users` group until more accounts)
|
||
5. ✅ Static backup codes; **don't OIDC other apps until Cal.com test**
|
||
|
||
### Phase 2 — Next infra (was Phase 1.5) — Caddy migration to LXC ⏳
|
||
|
||
Deferred until after sprint merge. Authentik + SSO are stable; edge migration is the next structural change.
|
||
|
||
Why Caddy belongs in an LXC, not a VM:
|
||
- ~50MB OS overhead vs ~512MB for a VM
|
||
- Boot/restart in 2-5s vs 20-40s (matters when reloading config)
|
||
- Snapshot/backup is faster
|
||
- Caddy is a Go binary doing reverse-proxy work — no need for kernel isolation
|
||
- Near-native network performance
|
||
|
||
Steps:
|
||
1. Create `edge` LXC: Debian 12, 1 vCPU, 512MB RAM, 8GB disk, **static IP from host list**
|
||
2. Install Caddy via official Debian repo:
|
||
```bash
|
||
apt install -y debian-keyring debian-archive-keyring apt-transport-https
|
||
curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/gpg.key' | gpg --dearmor -o /usr/share/keyrings/caddy-stable-archive-keyring.gpg
|
||
curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/debian.deb.txt' | tee /etc/apt/sources.list.d/caddy-stable.list
|
||
apt update && apt install caddy
|
||
```
|
||
3. Copy `Caddyfile` + custom snippets (`(security-headers)` etc.) from the VM
|
||
4. Add a **test subdomain** (e.g. `test.levkin.ca`) pointing at the new LXC — verify TLS issues and routing works
|
||
5. Cut over: update router port-forward (80/443) to the new LXC IP. DNS A records don't need to change if they point to your home IP.
|
||
6. Watch Mailcow, Cal.com, Listmonk, the marketing sites for ~24h
|
||
7. Keep the old VM snapshot for a week, then delete
|
||
|
||
### Phase 2 — Quick wins ✅
|
||
1. ✅ **Umami** — tracking on levkin.ca, caseware, auto, and iliadobkin.com (portfolio)
|
||
2. ✅ **Uptime Kuma** — monitors in UI
|
||
3. ✅ **Dockge** — logged in; register `/opt/monitoring` stack (see [monitoring-stack.md](monitoring-stack.md))
|
||
4. ✅ **Kuma email alerts** — SMTP via Mailcow — [monitoring-stack.md](monitoring-stack.md)
|
||
|
||
### Phase 3 — Cal.com (mostly done) ✅
|
||
1. ✅ Cal.com deployed in `business` LXC (id 210, Postgres included)
|
||
2. ✅ `cal.levkin.ca` proxied via Caddy
|
||
3. ✅ Booking link live at `cal.levkin.ca/ilia/consult` with Jitsi location
|
||
4. ✅ Email working via `cal@levkine.ca` SMTP through Mailcow
|
||
5. ⏳ **Cal.com OIDC** — **deferred** ([cal-authentik-oidc.md](cal-authentik-oidc.md)) — needs enterprise `CALCOM_LICENSE_KEY`
|
||
6. ✅ `auto.levkin.ca` consult button → `cal.levkin.ca/ilia/consult`
|
||
|
||
### Phase 4 — SSO migration ✅
|
||
1. ✅ **Vikunja** — [vikunja-authentik-oidc.md](vikunja-authentik-oidc.md)
|
||
2. ~~**Nextcloud**~~ — skipped (VM 201 retired)
|
||
3. ✅ **Listmonk** — [listmonk-authentik-oidc.md](listmonk-authentik-oidc.md) (v6.1.0)
|
||
4. ✅ **Mattermost** — [mattermost-authentik-gitlab-oauth.md](mattermost-authentik-gitlab-oauth.md)
|
||
5. ✅ **Mailcow** — [mailcow-authentik-oidc.md](mailcow-authentik-oidc.md)
|
||
|
||
**Remaining:** browser smoke tests as `ilia`; rotate OIDC secrets when done.
|
||
|
||
For each: keep a local admin password as a break-glass account.
|
||
|
||
### Phase 5 — Family / personal wins (~1 evening)
|
||
1. **Immich** in `media` VM — install mobile apps for you and family, enable auto-upload. Face recognition runs in background; "my kids 2024" works within a couple days.
|
||
2. Skip PhotoPrism — Immich covers it.
|
||
|
||
### Phase 6 — Business / consulting (~1–2 evenings)
|
||
1. **Crater** in `business` LXC — tax rates, company info, Stripe integration if you want online payment
|
||
2. **Beszel** hub in `monitoring` LXC + agents on each LXC — one dashboard for resource usage
|
||
|
||
### Phase 7 — Automation depth (ongoing)
|
||
Only when you have a real use case:
|
||
1. **Huginn** in `automation` — first agent: competitor pages, kosher product availability, grant deadlines
|
||
2. **Windmill** in `automation` — first script: rewrite an n8n flow with too many code nodes
|
||
3. **Flowise** in `labs` — first flow: chat-with-docs against your consulting notes
|
||
|
||
### Phase 8 — Knowledge / research
|
||
1. **Outline** in `productivity` LXC — client-facing wiki + your notes
|
||
2. **Linkwarden** in `productivity` LXC — bookmarks with full-page archive
|
||
3. **Paperless-ngx** in `media` — scan and OCR the paper that's accumulating
|
||
4. **SiYuan** — only if/when PhD or long-form research becomes relevant
|
||
|
||
---
|
||
|
||
## Static IP audit
|
||
|
||
**Maintain a `host-list.md` file** (in this Cursor project, alongside this plan) with every LXC/VM, its current IP, its target static IP, and DHCP/static status. Cursor will use this as the source of truth when scripting changes.
|
||
|
||
Suggested format:
|
||
|
||
| LXC/VM ID | Name | Role | Current IP | Target static IP | DHCP/Static | Notes |
|
||
|---|---|---|---|---|---|---|
|
||
| 210 | cal | Cal.com | 10.0.10.228/24 (DHCP) | 10.0.10.228/24 | ⏳ static | Convert ASAP |
|
||
| ... | ... | ... | ... | ... | ... | ... |
|
||
|
||
### Recommended IP plan
|
||
|
||
Use `/24` subnets within `10.0.10.0/24` (or whatever your LAN is) with role-based ranges so it's scannable:
|
||
|
||
| Range | Reserved for |
|
||
|---|---|
|
||
| `.1 - .9` | Network gear (router, switches, APs) |
|
||
| `.10 - .19` | Proxmox host(s) + PBS |
|
||
| `.20 - .39` | Edge / identity / comms (critical infra) |
|
||
| `.40 - .79` | Application LXCs (productivity, automation, business, monitoring) |
|
||
| `.80 - .99` | Media VM(s) |
|
||
| `.100 - .199` | DHCP pool (clients, phones, laptops) |
|
||
| `.200 - .249` | Labs / experimental |
|
||
| `.250 - .254` | Reserved |
|
||
|
||
### How to set static on a Proxmox LXC
|
||
|
||
Two methods — pick one and stick with it:
|
||
|
||
**Method A — Proxmox CLI (recommended, survives reboots cleanly):**
|
||
```bash
|
||
pct set <ID> -net0 name=eth0,bridge=vmbr0,ip=10.0.10.X/24,gw=10.0.10.1
|
||
pct reboot <ID>
|
||
```
|
||
|
||
**Method B — Router DHCP reservation:**
|
||
- Reserve the IP in your router's DHCP table by MAC address. LXC stays "DHCP" technically, but always gets the same IP.
|
||
- Easier if you have many hosts and one router.
|
||
- Risk: if the LXC's MAC changes (rebuild from snapshot to new ID), reservation breaks.
|
||
|
||
**Recommendation:** Method A (`pct set`) for everything critical (edge, identity, comms, business). Method B is fine for labs/experimental LXCs.
|
||
|
||
### Audit checklist
|
||
|
||
1. List every LXC: `pct list`
|
||
2. List every VM: `qm list`
|
||
3. For each, run `pct exec <ID> -- ip a` (or `qm guest exec <ID> -- ip a` for VMs) and check whether the IP came from DHCP
|
||
4. Fill in `host-list.md`
|
||
5. Pick target IPs from the range plan above
|
||
6. Convert one at a time, lowest-risk first (labs → productivity → business → comms → identity → edge)
|
||
7. **After each conversion**, verify the Caddy reverse-proxy entry still works (curl from outside)
|
||
8. Update `host-list.md` status column
|
||
|
||
### Hosts known to need conversion right now
|
||
|
||
- ~~**LXC 210 (cal)**~~ — static at `10.0.10.228` ✅
|
||
- **Site LXCs 220, 215/216/219** — static; served via Caddy → nginx on each LXC (git deploy). Optional future: static files on Caddy VM only.
|
||
|
||
---
|
||
|
||
## Backlog (priority order)
|
||
|
||
### P0 — status (2026-05-24)
|
||
|
||
| # | Item | Status |
|
||
|---|------|--------|
|
||
| 1 | Umami / Kuma / Dockge | ✅ |
|
||
| 2 | Portainer VM 109 | ✅ removed |
|
||
| 3 | Nextcloud VM 201 | ✅ retired |
|
||
| 4 | Listmonk → LXC 221 | ✅ + SMTP + VM 113 destroyed |
|
||
| 5 | Beszel agents | ✅ **16 systems** |
|
||
| 6 | Kuma monitors + email | ✅ **17 monitors**, all alert-linked |
|
||
| 7 | DNS `levkin.ca` apex | ✅ |
|
||
| 8 | Vikunja OIDC infra | ✅ live — browser test as `ilia` still manual |
|
||
| 9 | UniFi DHCP listmonk MAC | ⏳ manual @ UniFi |
|
||
| 10 | NAS / Jellyfin / DebianDesktop | **deferred** |
|
||
| 11 | Cal OIDC | deferred (no license) |
|
||
|
||
### P1 — next
|
||
|
||
See **[handoff-next-steps.md](handoff-next-steps.md)** — SSO smoke tests, secret rotation.
|
||
|
||
### Phase 2 backlog (was P1 infra)
|
||
|
||
1. **Caddy → edge LXC** @ `10.0.10.20`
|
||
2. **Security remediation** — [security-remediation-plan.md](security-remediation-plan.md)
|
||
3. **NAS / Jellyfin** — disk `W4J0L3PY`
|
||
|
||
### P1 — when ready
|
||
- **Outline** — wiki for client docs
|
||
- **Linkwarden** — bookmarks with full-page archive
|
||
- **Plane** — Jira-lite project management (pair with Mattermost)
|
||
|
||
### P2 — when you have a real need
|
||
- **Crater** — invoicing (Phase 6)
|
||
- **Immich** — photos (Phase 5)
|
||
- **Paperless-ngx** — document scanning (Phase 8)
|
||
- **Huginn** — first when you have a monitoring use case
|
||
- **Windmill** — when n8n hits limits
|
||
- **Trigger.dev** — durable background jobs in code (better fit than Windmill for QA work)
|
||
- **PrivateBin** — encrypted paste for sharing secrets with contractors
|
||
- **Addy.io** — email aliases
|
||
- **SiYuan** — if PhD work picks up
|
||
- **Flowise** — labs only, when LLM workflow use case appears
|
||
|
||
### Skip / declined
|
||
- ~~PhotoPrism~~ — Immich covers it
|
||
- ~~Activepieces~~ — you already have n8n
|
||
- ~~Affine / Trilium~~ — picked Outline + SiYuan instead
|
||
- ~~Matrix/Synapse + Element~~ — staying on Mattermost
|
||
- ~~Coolify / Dokploy / CapRover~~ — Dockge is enough; revisit only if writing many custom apps
|
||
|
||
---
|
||
|
||
## Backup strategy
|
||
|
||
- **Proxmox Backup Server (PBS)** or `vzdump` to a NAS — snapshot each LXC/VM nightly
|
||
- **Critical groups** (`identity`, `comms`, `business`): 7 daily + 4 weekly + 12 monthly
|
||
- **Productivity/automation**: 7 daily + 4 weekly
|
||
- **Labs**: 3 daily, no long retention
|
||
- **Off-site copy** of `identity` and `business` LXCs — these contain auth and billing data. Encrypted copy to Wasabi or Backblaze B2.
|
||
|
||
The whole LXC gets snapshotted — much simpler than file-level container backup.
|
||
|
||
**Done on pve10 (2026-05-22):** `pct snapshot` **`backup-20260522`** on LXCs **217** (identity) and **218** (monitoring).
|
||
|
||
---
|
||
|
||
## Next steps (priority order)
|
||
|
||
See **[handoff-2026-05-24.md](handoff-2026-05-24.md)** for sprint status checklist.
|
||
|
||
| # | Task | Status | Effort | Frees / unlocks |
|
||
|---|------|--------|--------|-----------------|
|
||
| 1 | **Kuma SMTP** | ✅ done | — | — |
|
||
| 2 | **Cal.com → Authentik OIDC** | ⏸ **deferred** | — | Needs `CALCOM_LICENSE_KEY`; infra ready — [sso-selfhosted-matrix.md](sso-selfhosted-matrix.md) |
|
||
| 3 | **auto.levkin.ca** → Cal booking link | ✅ | — | Consult button live |
|
||
| 4 | **Stop Portainer VM 109** | ✅ | — | Removed 2026-05-23; **~16 GiB RAM** on pve10 |
|
||
| 5 | **Retire Nextcloud VM 201** | ✅ | — | ~8 GiB RAM freed |
|
||
| 6 | **Vikunja → Authentik OIDC** | 🟡 infra OK | 15 min | Browser login as `ilia` |
|
||
| 7 | **UniFi DHCP reservations** | ⏳ | 20 min | [unifi-static-dhcp.md](unifi-static-dhcp.md) |
|
||
| 8 | **DNS levkin.ca apex** | ✅ | — | `142.180.237.136` |
|
||
| 9 | **Beszel + Kuma** | ✅ | — | 16 Beszel agents; 17 Kuma monitors |
|
||
| 10 | ~~**Listmonk SMTP**~~ | ✅ | — | UI + vault |
|
||
| 10 | **NAS.SP00** disk → Jellyfin | ⏳ hardware | — | VM 101 |
|
||
| 11 | **DebianDesktop reboot** | ✅ | — | VM 100 rebooted; 24 GB active on pve201 |
|
||
| 12 | **Caddy → edge LXC `.20`** | ⏳ defer | ~30 min | Phase 1.5 |
|
||
| 13 | **dev-apps LXC** | ⏳ defer | half day | After punim testing |
|
||
| 14 | **Static sites → Caddy VM** | ⏳ optional | 1 h | Defer |
|
||
|
||
**Defer:** Immich, Crater, Outline; Listmonk/Mattermost/Mailcow SSO after Vikunja; Cal OIDC until license.
|
||
|
||
### Adding a new service — quick rule
|
||
|
||
| Want to add… | Node | RAM budget | Prerequisite |
|
||
|--------------|------|------------|--------------|
|
||
| Small app (Mealie, Linkwarden) | pve10 | 2 GB LXC | ~22 GiB free on pve10 |
|
||
| Medium (Outline, Crater) | pve10 | 4 GB LXC | Portainer + Nextcloud already freed |
|
||
| Heavy (Immich + ML) | pve10 or pve201 GPU | 4–8 GB+ | NAS healthy; pve201 only after GPU/punim sized down |
|
||
| Dev sandbox | pve10 `dev-apps` | 6–8 GB | punim 9101 migration only after testing |
|
||
|
||
### Nextcloud decommission (VM 201)
|
||
|
||
1. Confirm export in `exports/nextcloud-2026-05-21/` is complete
|
||
2. Delete **Nextcloud** monitor in Kuma
|
||
3. Remove `nextcloud.levkin.ca` from Caddy VM
|
||
4. Stop VM 201; update [host-list.md](host-list.md)
|
||
5. After NAS healthy: optional `vzdump` archive then delete disk
|
||
|
||
---
|
||
|
||
## Important rules
|
||
|
||
1. **Never put Authentik behind itself.** `auth.levkin.ca` is a simple Caddy passthrough — no forward-auth, no fancy dependencies. If Authentik goes down, you'd lose access to Authentik.
|
||
2. **Vaultwarden stays standalone.** It's your break-glass path if Authentik dies. Don't OIDC it.
|
||
3. **Keep a local admin password on every SSO-wired app.** OIDC integrations break during upgrades — you need to log in to fix them.
|
||
4. **Local admin to Proxmox host.** Independent of Authentik and Vaultwarden. Written down somewhere physical.
|
||
5. **Don't expose admin UIs publicly.** Dockge, Beszel, Uptime Kuma admin, n8n editor — use Tailscale or Wireguard for remote access.
|
||
6. **Static IPs for every LXC.** DHCP will eventually move them and Caddy will break. Set via `pct set <id> -net0 ...ip=10.0.10.X/24,gw=...` or a router reservation.
|
||
7. **Cal.com LXC (210)** — static at `.228` ✅.
|
||
8. **Maintain `host-list.md`** as the single source of truth for IPs. Update it whenever a new LXC/VM is created or migrated.
|