# Levkin self-hosted stack — plan & decisions Reference doc for the Proxmox homelab. Lives alongside the Cursor project that has the Proxmox info. **Conventions:** - All groups run inside an LXC unless marked **VM**. - Inside each LXC: one `docker-compose.yml`, managed by **Dockge** where applicable. - Caddy on the `edge` LXC is the only thing exposed to the internet. - Authentik on the `identity` LXC is the source of truth for who you are. - Vaultwarden stays standalone (it's the break-glass path if Authentik dies). --- ## Progress summary (updated 2026-05-23) | Area | Status | |------|--------| | **Phase 0** Foundation | ✅ Mostly done — static IPs on pve10 LXCs; Caddy still on **VM 106** | | **Phase 1** Identity (Authentik) | ✅ LXC **217** @ `10.0.10.21` | | **Phase 2** Monitoring (Kuma, Dockge, Umami) | ✅ LXC **218** @ `10.0.10.22` | | **Phase 3** Cal.com | ✅ LXC **210** — OIDC + auto site button still open | | **Phase 4** SSO migration | ⏳ Not started (Cal → Authentik first) | | **Phase 5–8** Immich, Crater, Outline, etc. | ⏳ Deferred | | **Site consolidation** | ⏳ **Partial** — **levkin.ca** on LXC **220** @ `10.0.10.60` ✅; caseware/auto/portfolio on **215/216/219** ([site-lxc-git.md](site-lxc-git.md)); moving all static to Caddy VM is optional later | | **dev-apps** (punim/pote/mirrormatch) | ⏳ **Not started** — punimTag **9101** still on **pve201** (active testing; do not migrate yet) | | **Nextcloud retire** | ⏳ VM **201 is running again** on pve10 — finish decommission | | **Portainer retire** | ⏳ VM **109 still running** (16 GB maxmem) on pve10 — stop after Dockge confirmed | --- ## Capacity headroom (live check 2026-05-23) Use this before adding LXCs/VMs. Re-check with `pvesm status` and `free -h` on each node. ### pve10 (PVENAS) — **primary place for new homelab services** | Resource | Total | Used | **Available** | Notes | |----------|-------|------|---------------|--------| | **local-lvm** (thin) | ~1.67 TiB | ~22% | **~1.30 TiB** | Plenty of disk for new LXCs | | **RAM** (host) | 62 GiB | ~44 GiB | **~17 GiB** | Enough for **2–3 small LXCs** (2 GB each) as-is | **Realistic new capacity on pve10 (without stopping anything):** ~**4–6 GiB RAM** + **100–200 GiB disk** for one productivity/media LXC (Outline, Mealie, Immich-lite). **If you free RAM first (recommended):** | Stop / retire | Frees (maxmem) | |---------------|----------------| | Portainer VM **109** | **16 GiB** | | Nextcloud VM **201** | **8 GiB** | | Hermes VM **117** (if not needed) | **16 GiB** | | Site LXCs 215/216 → Caddy static (future) | **~1 GiB** | After Portainer + Nextcloud off: **~41 GiB effective headroom** on pve10 — room for Immich, Crater, Beszel, or a **dev-apps** LXC (6–8 GiB). ### pve201 (pve) — **do not add new services** | Resource | Total | Used | **Available** | Notes | |----------|-------|------|---------------|--------| | **local-lvm** | ~1.67 TiB | ~46% | **~922 GiB** | Disk OK | | **RAM** | 125 GiB | ~122 GiB | **~3 GiB** | Saturated; GPU VM **104** (73 GB), punimTag **9101** (16 GB) | **Verdict:** New stacks belong on **pve10**. pve201 only benefits from **stopping/migrating** guests (punim after testing, GPU resize, old Kuma already stopped). --- ## Current state (May 2026) **Already running:** - Caddy reverse proxy — currently on a **VM** (should migrate to LXC, see "Caddy migration" section) - Mailcow — VM, mail domain is `levkine.ca` (with e) - Vaultwarden, Vikunja, n8n, Listmonk, Mattermost, Nextcloud — across various LXCs - **Cal.com** — LXC id `210`, `cal.levkin.ca`, Postgres included, admin user `ilia`, 15-min consult event live at `cal.levkin.ca/ilia/consult` with Jitsi link - Caddy entries live for: `levkin.ca`, `caseware.levkin.ca`, `auto.levkin.ca`, `iliadobkin.com`, `cal.levkin.ca`, `listmonk.levkin.ca`, `pdf.levkin.ca`, `search.levkin.ca`, `auth.levkin.ca`, `stats.levkin.ca` - **Authentik** — LXC **217** @ `10.0.10.21`, `https://auth.levkin.ca`, admin + TOTP enrolled - **Monitoring** — LXC **218** @ `10.0.10.22`: Uptime Kuma `:3001`, Dockge `:5001`, Umami `:3000` (LAN-only) — [monitoring-stack.md](monitoring-stack.md) - **Umami** + **Authentik** admin/TOTP/backup codes — done - **Uptime Kuma** — monitors live; email alerts via Mailcow — see [monitoring-stack.md](monitoring-stack.md) - **Dockge** on 218 — manages local `/opt/monitoring` stack - **Snapshots** `backup-20260522` on LXCs **217**, **218** - **Jellyfin** (VM 101) — stopped - LXC **210, 215–218, 219** — static via `pct set`; **Caddy VM 106** — static in-guest `.50` - **Nextcloud VM 201** — export done; VM **still running** on pve10 — **retire next** (8 GB RAM reclaimed) - **Portainer VM 109** — still **running** on pve10 (16 GB) — retire; Dockge on 218 replaces it - **Marketing sites** — LXC **220** (`levkin.ca`), **215/216/219** (git deploy), not yet on Caddy VM static roots - **punimTag dev** — pve201 LXC **9101** @ `10.0.10.121` (16 GB) — leave until testing done; then `dev-apps` on pve10 **Decisions locked in:** - Container manager: **Dockge** (not Portainer, not Coolify/Dokploy/CapRover) - Chat: **Mattermost only** — no Matrix/Synapse - Knowledge tool: **Outline** for client-facing, **SiYuan** if/when PhD work picks up (don't run Affine + Trilium too) - Bookmark manager: **Linkwarden** (full-page archive is the killer feature) - Authentik is the SSO target; Vaultwarden stays standalone --- ## LXC / VM grouping table | Group | What's inside | Why grouped | LXC or VM | |---|---|---|---| | **edge** | Caddy reverse proxy, Crowdsec/Fail2ban | The front door — small, stable, restart rarely | LXC, 1 vCPU, 1GB RAM | | **identity** | Authentik (+ Postgres + Redis), Vaultwarden | Auth-critical — touch rarely, back up religiously | LXC, 2 vCPU, 2GB RAM | | **comms** | Mailcow | Mailcow's compose is huge (15+ containers) and self-contained — wants its own host | **VM**, 4GB RAM | | **automation** | n8n, Windmill (later), Huginn (later) | Active workloads, frequent updates, you'll touch these a lot | LXC, 2–4 vCPU, 4GB RAM | | **productivity** | Vikunja, Listmonk, Outline, Mealie, Linkwarden | Personal/team productivity, low-resource | LXC, 2 vCPU, 4GB RAM | | **media** | Immich, Nextcloud, Paperless-ngx | Large storage, GPU passthrough useful for Immich ML | **VM** if GPU passthrough, else LXC. Lots of disk. | | **business** | Cal.com ✅, Crater | Client-facing, financial — back up often | LXC, 2 vCPU, 2GB RAM | | **monitoring** | Uptime Kuma ✅, Dockge ✅, Umami ✅, Beszel (later) | Ops stack on LXC **218** | LXC, 2 vCPU, 2GB RAM | | **labs** | Anything experimental — Flowise, Trigger.dev | Things you're trying out, can be wiped | LXC, scratch space | ### Why this grouping (cheat sheet) - One service goes bad → only its group restarts. - Need a kernel upgrade for one stack → snapshot the LXC, upgrade, roll back if broken. - Mailcow's huge surface area is isolated in its own VM. - Edge LXC is tiny and stable → perfect for the layer everything depends on. - Backup cadence per group (see Backups section). - Resource limits per LXC mean a runaway container can't eat n8n's RAM. --- ## Subdomains Only expose what actually needs to be public. Internal services use Tailscale/Wireguard for remote access. ### Expose publicly | Subdomain | Service | Group | Why public | Status | |---|---|---|---|---| | `levkin.ca` | Company site (spec + `/folders`) | edge | Main brand | ✅ LXC 220 — **DNS must point to home IP** (was parked elsewhere) | | `caseware.levkin.ca` | Static site | edge | Marketing | ✅ live | | `auto.levkin.ca` | Static site | edge | Marketing | ✅ live | | `iliadobkin.com` | Portfolio (SDET) | edge | Personal site | ✅ live (pve10 LXC 219) | | `cal.levkin.ca` | Cal.com | business | Clients book on it | ✅ live | | `listmonk.levkin.ca` | Listmonk | productivity | Unsubscribe URLs must resolve | ✅ live | | `mail.levkine.ca` | Mailcow | comms | Mail server | ✅ live | | `auth.levkin.ca` | Authentik | identity | OIDC redirect URLs need external resolution | ✅ live | | `bill.levkin.ca` | Crater | business | Clients view invoices | ⏳ Phase 6 | | `cloud.levkin.ca` | Nextcloud | media | **Retiring** — decommission VM 201 after cutover | 🗑️ | | `photos.levkin.ca` | Immich | media | Mobile apps need public hostname | ⏳ Phase 5 | | `vault.levkin.ca` | Vaultwarden | identity | Mobile clients need public hostname | ⏳ | | `notes.levkin.ca` | Outline | productivity | Sharing docs with clients | ⏳ | | `chat.levkin.ca` | Mattermost | comms | Only if inviting outside users | ⏳ optional | ### Keep internal only (no public DNS, no Caddy block) Reachable only via local network or Tailscale/Wireguard: | Service | Reason | |---|---| | Umami admin UI | Only you need the dashboard. Tracking endpoint can be public, dashboard isn't. | | Uptime Kuma | Status dashboard is for you. Don't advertise infrastructure. | | Beszel | Metrics are admin-only. | | Dockge | Admin UI — local only. | | n8n editor | UI shouldn't be exposed. Webhooks go on `hooks.levkin.ca` if needed. | | Huginn / Windmill / Flowise | Admin tools. | | Vikunja | Personal task manager. | | Mealie | Family recipes. | | Trigger.dev | Internal automation. | | Paperless-ngx | Personal documents. Never expose. | | SiYuan | Personal knowledge. | | Linkwarden | Personal bookmarks. | ### Borderline (decide per service) | Subdomain | Service | Notes | |---|---|---| | `stats.levkin.ca` | Umami collector | Only the tracking script endpoint needs to be public; admin UI stays internal | | `status.levkin.ca` | Uptime Kuma | Kuma supports a separate public status page URL — that one can be public | --- ## Phased rollout ### Phase 0 — Foundation 1. ✅ Caddy running (on VM — migrate to LXC in Phase 1.5) 2. ✅ **Static IP audit (partial)** — all LXCs on pve10 pinned; Caddy VM static `.50`; remaining VMs on stable DHCP — see [host-list.md](host-list.md) 3. ✅ DNS for `auth.levkin.ca` → home IP (verified 2026-05-22) 4. ✅ `identity` LXC **217** @ `10.0.10.21` (2 vCPU, 2GB RAM, 20GB `local-lvm`, Debian 12 + Docker Compose) ### Phase 1 — Identity ✅ 1. ✅ Deploy Authentik in `identity` LXC (Authentik + Postgres + Redis, official compose at `/opt/authentik`) 2. ✅ Caddy: `auth.levkin.ca` → `10.0.10.21:9000` (simple passthrough, no forward-auth) 3. ✅ Admin user (`admin`), TOTP enrolled 4. ✅ `authentik Admins` group (skip custom `users` group until more accounts) 5. ✅ Static backup codes; **don't OIDC other apps until Cal.com test** ### Phase 1.5 — Caddy migration to LXC (~30 min) Why now (after Phase 1, before bulk SSO work in Phase 4): Authentik is stable enough to absorb a small change, but you haven't yet built the dependency web of OIDC integrations that would make a Caddy reload risky. Why Caddy belongs in an LXC, not a VM: - ~50MB OS overhead vs ~512MB for a VM - Boot/restart in 2-5s vs 20-40s (matters when reloading config) - Snapshot/backup is faster - Caddy is a Go binary doing reverse-proxy work — no need for kernel isolation - Near-native network performance Steps: 1. Create `edge` LXC: Debian 12, 1 vCPU, 512MB RAM, 8GB disk, **static IP from host list** 2. Install Caddy via official Debian repo: ```bash apt install -y debian-keyring debian-archive-keyring apt-transport-https curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/gpg.key' | gpg --dearmor -o /usr/share/keyrings/caddy-stable-archive-keyring.gpg curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/debian.deb.txt' | tee /etc/apt/sources.list.d/caddy-stable.list apt update && apt install caddy ``` 3. Copy `Caddyfile` + custom snippets (`(security-headers)` etc.) from the VM 4. Add a **test subdomain** (e.g. `test.levkin.ca`) pointing at the new LXC — verify TLS issues and routing works 5. Cut over: update router port-forward (80/443) to the new LXC IP. DNS A records don't need to change if they point to your home IP. 6. Watch Mailcow, Cal.com, Listmonk, the marketing sites for ~24h 7. Keep the old VM snapshot for a week, then delete ### Phase 2 — Quick wins ✅ 1. ✅ **Umami** — tracking on levkin.ca, caseware, auto, and iliadobkin.com (portfolio) 2. ✅ **Uptime Kuma** — monitors in UI 3. ✅ **Dockge** — logged in; register `/opt/monitoring` stack (see [monitoring-stack.md](monitoring-stack.md)) 4. ✅ **Kuma email alerts** — SMTP via Mailcow (see [homelab-status-2026-05-22.md](homelab-status-2026-05-22.md)) ### Phase 3 — Cal.com (mostly done) ✅ 1. ✅ Cal.com deployed in `business` LXC (id 210, Postgres included) 2. ✅ `cal.levkin.ca` proxied via Caddy 3. ✅ Booking link live at `cal.levkin.ca/ilia/consult` with Jitsi location 4. ✅ Email working via `cal@levkine.ca` SMTP through Mailcow 5. ⏳ **Wire Cal.com to Authentik via OIDC** (first real SSO connection — do this after Phase 1) 6. ⏳ Update `auto.levkin.ca` button → `cal.levkin.ca/ilia/consult` (currently points to placeholder) ### Phase 4 — SSO migration (~half a day, staged) Wire each to Authentik, least-risky first: 1. **Vikunja** (OIDC native) — easy, single-user impact 2. ~~**Nextcloud**~~ — **skipped** (VM 201 retiring) 3. **Listmonk** (OIDC native, admin only) — easy 4. **Mattermost** (SAML or OIDC native) — moderate 5. **Mailcow** (OIDC) — last, because mail-critical For each: keep a local admin password as a break-glass account. ### Phase 5 — Family / personal wins (~1 evening) 1. **Immich** in `media` VM — install mobile apps for you and family, enable auto-upload. Face recognition runs in background; "my kids 2024" works within a couple days. 2. Skip PhotoPrism — Immich covers it. ### Phase 6 — Business / consulting (~1–2 evenings) 1. **Crater** in `business` LXC — tax rates, company info, Stripe integration if you want online payment 2. **Beszel** hub in `monitoring` LXC + agents on each LXC — one dashboard for resource usage ### Phase 7 — Automation depth (ongoing) Only when you have a real use case: 1. **Huginn** in `automation` — first agent: competitor pages, kosher product availability, grant deadlines 2. **Windmill** in `automation` — first script: rewrite an n8n flow with too many code nodes 3. **Flowise** in `labs` — first flow: chat-with-docs against your consulting notes ### Phase 8 — Knowledge / research 1. **Outline** in `productivity` LXC — client-facing wiki + your notes 2. **Linkwarden** in `productivity` LXC — bookmarks with full-page archive 3. **Paperless-ngx** in `media` — scan and OCR the paper that's accumulating 4. **SiYuan** — only if/when PhD or long-form research becomes relevant --- ## Static IP audit **Maintain a `host-list.md` file** (in this Cursor project, alongside this plan) with every LXC/VM, its current IP, its target static IP, and DHCP/static status. Cursor will use this as the source of truth when scripting changes. Suggested format: | LXC/VM ID | Name | Role | Current IP | Target static IP | DHCP/Static | Notes | |---|---|---|---|---|---|---| | 210 | cal | Cal.com | 10.0.10.228/24 (DHCP) | 10.0.10.228/24 | ⏳ static | Convert ASAP | | ... | ... | ... | ... | ... | ... | ... | ### Recommended IP plan Use `/24` subnets within `10.0.10.0/24` (or whatever your LAN is) with role-based ranges so it's scannable: | Range | Reserved for | |---|---| | `.1 - .9` | Network gear (router, switches, APs) | | `.10 - .19` | Proxmox host(s) + PBS | | `.20 - .39` | Edge / identity / comms (critical infra) | | `.40 - .79` | Application LXCs (productivity, automation, business, monitoring) | | `.80 - .99` | Media VM(s) | | `.100 - .199` | DHCP pool (clients, phones, laptops) | | `.200 - .249` | Labs / experimental | | `.250 - .254` | Reserved | ### How to set static on a Proxmox LXC Two methods — pick one and stick with it: **Method A — Proxmox CLI (recommended, survives reboots cleanly):** ```bash pct set -net0 name=eth0,bridge=vmbr0,ip=10.0.10.X/24,gw=10.0.10.1 pct reboot ``` **Method B — Router DHCP reservation:** - Reserve the IP in your router's DHCP table by MAC address. LXC stays "DHCP" technically, but always gets the same IP. - Easier if you have many hosts and one router. - Risk: if the LXC's MAC changes (rebuild from snapshot to new ID), reservation breaks. **Recommendation:** Method A (`pct set`) for everything critical (edge, identity, comms, business). Method B is fine for labs/experimental LXCs. ### Audit checklist 1. List every LXC: `pct list` 2. List every VM: `qm list` 3. For each, run `pct exec -- ip a` (or `qm guest exec -- ip a` for VMs) and check whether the IP came from DHCP 4. Fill in `host-list.md` 5. Pick target IPs from the range plan above 6. Convert one at a time, lowest-risk first (labs → productivity → business → comms → identity → edge) 7. **After each conversion**, verify the Caddy reverse-proxy entry still works (curl from outside) 8. Update `host-list.md` status column ### Hosts known to need conversion right now - ~~**LXC 210 (cal)**~~ — static at `10.0.10.228` ✅ - **Site LXCs 220, 215/216/219** — static; served via Caddy → nginx on each LXC (git deploy). Optional future: static files on Caddy VM only. --- ## Backlog (priority order) ### P0 — next (Phase 1–2 largely ✅) 1. ~~Umami~~ ✅ 2. ~~Uptime Kuma~~ ✅ 3. ~~Dockge~~ ✅ 4. **Cal.com → Authentik OIDC** — first SSO 5. **Retire Nextcloud VM 201** + **Portainer VM 109** — frees **~24 GiB** on pve10 6. **Beszel** — fits on monitoring LXC 218 or small agent LXCs 7. **Mealie** — new small LXC on pve10 (~2 GB) ### P1 — when ready - **Outline** — wiki for client docs - **Linkwarden** — bookmarks with full-page archive - **Plane** — Jira-lite project management (pair with Mattermost) ### P2 — when you have a real need - **Crater** — invoicing (Phase 6) - **Immich** — photos (Phase 5) - **Paperless-ngx** — document scanning (Phase 8) - **Huginn** — first when you have a monitoring use case - **Windmill** — when n8n hits limits - **Trigger.dev** — durable background jobs in code (better fit than Windmill for QA work) - **PrivateBin** — encrypted paste for sharing secrets with contractors - **Addy.io** — email aliases - **SiYuan** — if PhD work picks up - **Flowise** — labs only, when LLM workflow use case appears ### Skip / declined - ~~PhotoPrism~~ — Immich covers it - ~~Activepieces~~ — you already have n8n - ~~Affine / Trilium~~ — picked Outline + SiYuan instead - ~~Matrix/Synapse + Element~~ — staying on Mattermost - ~~Coolify / Dokploy / CapRover~~ — Dockge is enough; revisit only if writing many custom apps --- ## Backup strategy - **Proxmox Backup Server (PBS)** or `vzdump` to a NAS — snapshot each LXC/VM nightly - **Critical groups** (`identity`, `comms`, `business`): 7 daily + 4 weekly + 12 monthly - **Productivity/automation**: 7 daily + 4 weekly - **Labs**: 3 daily, no long retention - **Off-site copy** of `identity` and `business` LXCs — these contain auth and billing data. Encrypted copy to Wasabi or Backblaze B2. The whole LXC gets snapshotted — much simpler than file-level container backup. **Done on pve10 (2026-05-22):** `pct snapshot` **`backup-20260522`** on LXCs **217** (identity) and **218** (monitoring). --- ## Next steps (priority order) See **[homelab-status-2026-05-22.md](homelab-status-2026-05-22.md)** for automation checklist. | # | Task | Status | Effort | Frees / unlocks | |---|------|--------|--------|-----------------| | 1 | **Kuma SMTP** | ✅ done | — | — | | 2 | **Cal.com → Authentik OIDC** | ⏳ **next** | 1–2 h | First SSO; test before Vikunja/Listmonk | | 3 | **auto.levkin.ca** → Cal booking link | ⏳ | 15 min | Phase 3 item 6 | | 4 | **Stop Portainer VM 109** | ⏳ | 10 min | **~16 GiB RAM** on pve10 | | 5 | **Retire Nextcloud VM 201** | ⏳ | 30 min | **~8 GiB RAM**; remove Caddy + Kuma monitor | | 6 | **UniFi DHCP reservations** | ⏳ | 20 min | [unifi-static-dhcp.md](unifi-static-dhcp.md) | | 7 | **Beszel** on 218 or agents | ⏳ | 1 h | Capacity visibility before Immich | | 8 | **NAS.SP00** disk → Jellyfin | ⏳ hardware | — | VM 101 | | 9 | **Caddy → edge LXC `.20`** | ⏳ defer | ~30 min | Phase 1.5 | | 10 | **dev-apps LXC** (pote, mirrormatch, then punim) | ⏳ defer | half day | pve201 RAM; punim **last** | | 11 | **Static sites → Caddy VM** (optional) | ⏳ defer | 1 h | ~1 GiB; breaks git-on-LXC workflow unless you move deploy to Caddy | **Defer:** Immich, Crater, Outline, Plane, SSO for Vikunja/Listmonk/Mailcow until rows 2–5 done. ### Adding a new service — quick rule | Want to add… | Node | RAM budget | Prerequisite | |--------------|------|------------|--------------| | Small app (Mealie, Linkwarden) | pve10 | 2 GB LXC | Stop 109 and/or 201 first if host feels tight | | Medium (Outline, Crater) | pve10 | 4 GB LXC | Free **~24 GiB** via Portainer + Nextcloud retire | | Heavy (Immich + ML) | pve10 or pve201 GPU | 4–8 GB+ | NAS healthy; pve201 only after GPU/punim sized down | | Dev sandbox | pve10 `dev-apps` | 6–8 GB | punim 9101 migration only after testing | ### Nextcloud decommission (VM 201) 1. Confirm export in `exports/nextcloud-2026-05-21/` is complete 2. Delete **Nextcloud** monitor in Kuma 3. Remove `nextcloud.levkin.ca` from Caddy VM 4. Stop VM 201; update [host-list.md](host-list.md) 5. After NAS healthy: optional `vzdump` archive then delete disk --- ## Important rules 1. **Never put Authentik behind itself.** `auth.levkin.ca` is a simple Caddy passthrough — no forward-auth, no fancy dependencies. If Authentik goes down, you'd lose access to Authentik. 2. **Vaultwarden stays standalone.** It's your break-glass path if Authentik dies. Don't OIDC it. 3. **Keep a local admin password on every SSO-wired app.** OIDC integrations break during upgrades — you need to log in to fix them. 4. **Local admin to Proxmox host.** Independent of Authentik and Vaultwarden. Written down somewhere physical. 5. **Don't expose admin UIs publicly.** Dockge, Beszel, Uptime Kuma admin, n8n editor — use Tailscale or Wireguard for remote access. 6. **Static IPs for every LXC.** DHCP will eventually move them and Caddy will break. Set via `pct set -net0 ...ip=10.0.10.X/24,gw=...` or a router reservation. 7. **Cal.com LXC (210)** — static at `.228` ✅. 8. **Maintain `host-list.md`** as the single source of truth for IPs. Update it whenever a new LXC/VM is created or migrated.