ansible/docs/guides/levkin-selfhost-plan-2.md
ilia 0f34c51fc8
All checks were successful
CI / skip-ci-check (pull_request) Successful in 8s
CI / lint-and-test (pull_request) Successful in 17s
CI / secret-scanning (pull_request) Successful in 8s
CI / dependency-scan (pull_request) Successful in 18s
CI / ansible-validation (pull_request) Successful in 54s
CI / sast-scan (pull_request) Successful in 29s
CI / license-check (pull_request) Successful in 14s
CI / vault-check (pull_request) Successful in 13s
CI / container-scan (pull_request) Successful in 8s
CI / sonar-analysis (pull_request) Successful in 8s
CI / playbook-test (pull_request) Successful in 27s
CI / workflow-summary (pull_request) Successful in 6s
Complete homelab post-sprint: SSO docs, monitoring scripts, phase 0/1 closure.
Consolidate sprint status into handoff docs, add Listmonk/Mattermost/Mailcow
and Vikunja SSO guides, Beszel alerts script, mattermost inventory, and
mark phases 0–1 complete with phase 2 backlog for edge Caddy and security.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-24 12:13:55 -04:00

445 lines
23 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Levkin self-hosted stack — plan & decisions
Reference doc for the Proxmox homelab. Lives alongside the Cursor project that has the Proxmox info.
**Conventions:**
- All groups run inside an LXC unless marked **VM**.
- Inside each LXC: one `docker-compose.yml`, managed by **Dockge** where applicable.
- Caddy on the `edge` LXC is the only thing exposed to the internet.
- Authentik on the `identity` LXC is the source of truth for who you are.
- Vaultwarden stays standalone (it's the break-glass path if Authentik dies).
---
## Progress summary (updated 2026-05-24)
| Area | Status |
|------|--------|
| **Phase 0** Foundation | ✅ **Done** — pve10 LXCs static; UniFi VM DHCP reservations; auth + apex DNS; Caddy on **VM 106** @ `.50` (edge LXC = Phase 1.5) |
| **Phase 1** Identity (Authentik) | ✅ LXC **217** @ `10.0.10.21` — admin + TOTP |
| **Phase 2** Monitoring | ✅ LXC **218** — Kuma (17 monitors), Dockge, Umami, Beszel (16 agents), SMTP |
| **Phase 3** Cal.com | ✅ LXC **210** — booking + auto consult button; **OIDC deferred** (no enterprise license) |
| **Phase 4** SSO | ✅ Vikunja, Listmonk, Mattermost, Mailcow — browser smoke tests remaining |
| **Phase 58** | ⏳ Immich, Crater, Outline, automation depth — after P0 backlog |
| **Comms health** | ✅ Mailcow + Listmonk restored 2026-05-23 — [mailcow-lan-proxy-fix.md](mailcow-lan-proxy-fix.md) |
| **Site consolidation** | ⏳ **Partial** — git LXCs + levkin.ca LXC 220; optional later: static on Caddy VM |
| **dev-apps** | ⏳ punimTag **9101** on pve201 until testing done |
| **Nextcloud retire** | ✅ VM **201** stopped, `onboot 0`, Caddy removed (~8 GiB RAM freed) |
| **Portainer retire** | ✅ VM **109** destroyed 2026-05-23 (~16 GiB on pve10) |
| **Security pass** | 🟡 Partial — SSH keys + apt + cron 2026-05-23 — [security-remediation-plan.md](security-remediation-plan.md) |
---
## Capacity headroom (live check 2026-05-24)
Use this before adding LXCs/VMs. Re-check with `pvesm status` and `free -h` on each node.
### pve10 (PVENAS) — **primary place for new homelab services**
| Resource | Total | Used | **Available** | Notes |
|----------|-------|------|---------------|--------|
| **local-lvm** (thin) | ~1.67 TiB | ~22% | **~1.30 TiB** | New guests on **local-lvm** only (NAS SP00 degraded) |
| **RAM** (host) | 62 GiB | ~40 GiB | **~22 GiB** | Portainer **109** + Nextcloud **201** freed |
**Running:** LXCs 210, 215221; VMs 102108, 117, 150, 200. **Stopped:** 101 Jellyfin, 201 Nextcloud.
**Headroom:** ~**20+ GiB RAM** for Immich, Crater, or dev-apps LXC.
**Still available to free:**
| Stop / retire | Frees (maxmem) |
|---------------|----------------|
| ~~Portainer VM **109**~~ | ✅ **16 GiB** freed |
| ~~Nextcloud VM **201**~~ | ✅ **8 GiB** freed |
| Hermes VM **117** (if not needed) | **16 GiB** |
| Site LXCs 215/216 → Caddy static (optional) | **~1 GiB** |
### pve201 (pve) — **do not add new homelab services**
| Resource | Total | Used | **Available** | Notes |
|----------|-------|------|---------------|--------|
| **local-lvm** | ~1.67 TiB | ~46% | **~922 GiB** | Disk OK |
| **RAM** | 125 GiB | ~105 GiB | **~19 GiB** | GPU **104** (64 GB), DebianDesktop **100** (24 GB ✅ rebooted), punim **9101** (16 GB) |
**Verdict:** New stacks on **pve10** only. pve201: stop/migrate punim after testing.
---
## Current state (May 2026)
**Already running:**
- Caddy reverse proxy — currently on a **VM** (should migrate to LXC, see "Caddy migration" section)
- Mailcow — VM, mail domain is `levkine.ca` (with e)
- Vaultwarden, Vikunja, n8n, Listmonk, Mattermost — across various LXCs/VMs
- **Cal.com** — LXC id `210`, `cal.levkin.ca`, Postgres included, admin user `ilia`, 15-min consult event live at `cal.levkin.ca/ilia/consult` with Jitsi link
- Caddy entries live for: `levkin.ca`, `caseware.levkin.ca`, `auto.levkin.ca`, `iliadobkin.com`, `cal.levkin.ca`, `listmonk.levkin.ca`, `pdf.levkin.ca`, `search.levkin.ca`, `auth.levkin.ca`, `stats.levkin.ca`, **`status.levkin.ca`**
- **Authentik** — LXC **217** @ `10.0.10.21`, `https://auth.levkin.ca`, admin + TOTP enrolled
- **Monitoring** — LXC **218** @ `10.0.10.22`: Uptime Kuma `:3001`, Dockge `:5001`, Umami `:3000` (LAN-only) — [monitoring-stack.md](monitoring-stack.md)
- **Umami** + **Authentik** admin/TOTP/backup codes — done
- **Uptime Kuma** — monitors live; email alerts via Mailcow — see [monitoring-stack.md](monitoring-stack.md)
- **Dockge** on 218 — manages local `/opt/monitoring` stack
- **Snapshots** `backup-20260522` on LXCs **217**, **218**
- **Jellyfin** (VM 101) — stopped
- LXC **210, 215221** — static via `pct set`; **Caddy VM 106** — static in-guest `.50`
- **Nextcloud VM 201** — retired (stopped, `onboot 0`, Caddy removed)
- ~~**Portainer VM 109**~~ — **removed** 2026-05-23 (~16 GiB RAM freed on pve10)
- **Marketing sites** — LXC **220** (`levkin.ca`), **215/216/219** (git deploy), not yet on Caddy VM static roots
- **punimTag dev** — pve201 LXC **9101** @ `10.0.10.121` (16 GB) — leave until testing done; then `dev-apps` on pve10
**Decisions locked in:**
- Container manager: **Dockge** (not Portainer, not Coolify/Dokploy/CapRover)
- Chat: **Mattermost only** — no Matrix/Synapse
- Knowledge tool: **Outline** for client-facing, **SiYuan** if/when PhD work picks up (don't run Affine + Trilium too)
- Bookmark manager: **Linkwarden** (full-page archive is the killer feature)
- Authentik is the SSO target; Vaultwarden stays standalone
---
## LXC / VM grouping table
| Group | What's inside | Why grouped | LXC or VM |
|---|---|---|---|
| **edge** | Caddy reverse proxy, Crowdsec/Fail2ban | The front door — small, stable, restart rarely | LXC, 1 vCPU, 1GB RAM |
| **identity** | Authentik (+ Postgres + Redis), Vaultwarden | Auth-critical — touch rarely, back up religiously | LXC, 2 vCPU, 2GB RAM |
| **comms** | Mailcow | Mailcow's compose is huge (15+ containers) and self-contained — wants its own host | **VM**, 4GB RAM |
| **automation** | n8n, Windmill (later), Huginn (later) | Active workloads, frequent updates, you'll touch these a lot | LXC, 24 vCPU, 4GB RAM |
| **productivity** | Vikunja, Listmonk, Outline, Mealie, Linkwarden | Personal/team productivity, low-resource | LXC, 2 vCPU, 4GB RAM |
| **media** | Immich, Nextcloud, Paperless-ngx | Large storage, GPU passthrough useful for Immich ML | **VM** if GPU passthrough, else LXC. Lots of disk. |
| **business** | Cal.com ✅, Crater | Client-facing, financial — back up often | LXC, 2 vCPU, 2GB RAM |
| **monitoring** | Uptime Kuma ✅, Dockge ✅, Umami ✅, Beszel (later) | Ops stack on LXC **218** | LXC, 2 vCPU, 2GB RAM |
| **labs** | Anything experimental — Flowise, Trigger.dev | Things you're trying out, can be wiped | LXC, scratch space |
### Why this grouping (cheat sheet)
- One service goes bad → only its group restarts.
- Need a kernel upgrade for one stack → snapshot the LXC, upgrade, roll back if broken.
- Mailcow's huge surface area is isolated in its own VM.
- Edge LXC is tiny and stable → perfect for the layer everything depends on.
- Backup cadence per group (see Backups section).
- Resource limits per LXC mean a runaway container can't eat n8n's RAM.
---
## Subdomains
Only expose what actually needs to be public. Internal services use Tailscale/Wireguard for remote access.
### Expose publicly
| Subdomain | Service | Group | Why public | Status |
|---|---|---|---|---|
| `levkin.ca` | Company site (spec + `/folders`) | edge | Main brand | ✅ LXC 220 — **DNS must point to home IP** (was parked elsewhere) |
| `caseware.levkin.ca` | Static site | edge | Marketing | ✅ live |
| `auto.levkin.ca` | Static site | edge | Marketing | ✅ live |
| `iliadobkin.com` | Portfolio (SDET) | edge | Personal site | ✅ live (pve10 LXC 219) |
| `cal.levkin.ca` | Cal.com | business | Clients book on it | ✅ live |
| `listmonk.levkin.ca` | Listmonk | productivity | Unsubscribe URLs must resolve | ✅ live |
| `mail.levkine.ca` | Mailcow | comms | Mail server | ✅ live |
| `auth.levkin.ca` | Authentik | identity | OIDC redirect URLs need external resolution | ✅ live |
| `bill.levkin.ca` | Crater | business | Clients view invoices | ⏳ Phase 6 |
| `cloud.levkin.ca` | Nextcloud | media | **Retiring** — decommission VM 201 after cutover | 🗑️ |
| `photos.levkin.ca` | Immich | media | Mobile apps need public hostname | ⏳ Phase 5 |
| `vault.levkin.ca` | Vaultwarden | identity | Mobile clients need public hostname | ⏳ |
| `notes.levkin.ca` | Outline | productivity | Sharing docs with clients | ⏳ |
| `chat.levkin.ca` | Mattermost | comms | Only if inviting outside users | ⏳ optional |
### Keep internal only (no public DNS, no Caddy block)
Reachable only via local network or Tailscale/Wireguard:
| Service | Reason |
|---|---|
| Umami admin UI | Only you need the dashboard. Tracking endpoint can be public, dashboard isn't. |
| Uptime Kuma | Status dashboard is for you. Don't advertise infrastructure. |
| Beszel | Metrics are admin-only. |
| Dockge | Admin UI — local only. |
| n8n editor | UI shouldn't be exposed. Webhooks go on `hooks.levkin.ca` if needed. |
| Huginn / Windmill / Flowise | Admin tools. |
| Vikunja | Personal task manager. |
| Mealie | Family recipes. |
| Trigger.dev | Internal automation. |
| Paperless-ngx | Personal documents. Never expose. |
| SiYuan | Personal knowledge. |
| Linkwarden | Personal bookmarks. |
### Borderline (decide per service)
| Subdomain | Service | Notes |
|---|---|---|
| `stats.levkin.ca` | Umami | Public tracker script; admin UI prefer LAN `:3000` |
| `status.levkin.ca` | Uptime Kuma | **Public status page** only (not admin UI) |
| *(none)* | Beszel | **LAN/Tailscale** `10.0.10.22:8090` — host metrics, no public DNS |
---
## Phased rollout
### Phase 0 — Foundation ✅
1. ✅ Caddy running (on VM — migrate to LXC in Phase 1.5)
2.**Static IP audit** — all pve10 LXCs pinned via `pct set`; Caddy VM static `.50`; homelab VMs pinned via UniFi DHCP — see [host-list.md](host-list.md)
3. ✅ DNS for `auth.levkin.ca` + `levkin.ca` apex → home IP
4.`identity` LXC **217** @ `10.0.10.21` (2 vCPU, 2GB RAM, 20GB `local-lvm`, Debian 12 + Docker Compose)
### Phase 1 — Identity ✅
1. ✅ Deploy Authentik in `identity` LXC (Authentik + Postgres + Redis, official compose at `/opt/authentik`)
2. ✅ Caddy: `auth.levkin.ca``10.0.10.21:9000` (simple passthrough, no forward-auth)
3. ✅ Admin user (`admin`), TOTP enrolled
4.`authentik Admins` group (skip custom `users` group until more accounts)
5. ✅ Static backup codes; **don't OIDC other apps until Cal.com test**
### Phase 2 — Next infra (was Phase 1.5) — Caddy migration to LXC ⏳
Deferred until after sprint merge. Authentik + SSO are stable; edge migration is the next structural change.
Why Caddy belongs in an LXC, not a VM:
- ~50MB OS overhead vs ~512MB for a VM
- Boot/restart in 2-5s vs 20-40s (matters when reloading config)
- Snapshot/backup is faster
- Caddy is a Go binary doing reverse-proxy work — no need for kernel isolation
- Near-native network performance
Steps:
1. Create `edge` LXC: Debian 12, 1 vCPU, 512MB RAM, 8GB disk, **static IP from host list**
2. Install Caddy via official Debian repo:
```bash
apt install -y debian-keyring debian-archive-keyring apt-transport-https
curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/gpg.key' | gpg --dearmor -o /usr/share/keyrings/caddy-stable-archive-keyring.gpg
curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/debian.deb.txt' | tee /etc/apt/sources.list.d/caddy-stable.list
apt update && apt install caddy
```
3. Copy `Caddyfile` + custom snippets (`(security-headers)` etc.) from the VM
4. Add a **test subdomain** (e.g. `test.levkin.ca`) pointing at the new LXC — verify TLS issues and routing works
5. Cut over: update router port-forward (80/443) to the new LXC IP. DNS A records don't need to change if they point to your home IP.
6. Watch Mailcow, Cal.com, Listmonk, the marketing sites for ~24h
7. Keep the old VM snapshot for a week, then delete
### Phase 2 — Quick wins ✅
1. ✅ **Umami** — tracking on levkin.ca, caseware, auto, and iliadobkin.com (portfolio)
2. ✅ **Uptime Kuma** — monitors in UI
3. ✅ **Dockge** — logged in; register `/opt/monitoring` stack (see [monitoring-stack.md](monitoring-stack.md))
4. ✅ **Kuma email alerts** — SMTP via Mailcow — [monitoring-stack.md](monitoring-stack.md)
### Phase 3 — Cal.com (mostly done) ✅
1. ✅ Cal.com deployed in `business` LXC (id 210, Postgres included)
2. ✅ `cal.levkin.ca` proxied via Caddy
3. ✅ Booking link live at `cal.levkin.ca/ilia/consult` with Jitsi location
4. ✅ Email working via `cal@levkine.ca` SMTP through Mailcow
5. ⏳ **Cal.com OIDC** — **deferred** ([cal-authentik-oidc.md](cal-authentik-oidc.md)) — needs enterprise `CALCOM_LICENSE_KEY`
6. ✅ `auto.levkin.ca` consult button → `cal.levkin.ca/ilia/consult`
### Phase 4 — SSO migration ✅
1. ✅ **Vikunja** — [vikunja-authentik-oidc.md](vikunja-authentik-oidc.md)
2. ~~**Nextcloud**~~ — skipped (VM 201 retired)
3. ✅ **Listmonk** — [listmonk-authentik-oidc.md](listmonk-authentik-oidc.md) (v6.1.0)
4. ✅ **Mattermost** — [mattermost-authentik-gitlab-oauth.md](mattermost-authentik-gitlab-oauth.md)
5. ✅ **Mailcow** — [mailcow-authentik-oidc.md](mailcow-authentik-oidc.md)
**Remaining:** browser smoke tests as `ilia`; rotate OIDC secrets when done.
For each: keep a local admin password as a break-glass account.
### Phase 5 — Family / personal wins (~1 evening)
1. **Immich** in `media` VM — install mobile apps for you and family, enable auto-upload. Face recognition runs in background; "my kids 2024" works within a couple days.
2. Skip PhotoPrism — Immich covers it.
### Phase 6 — Business / consulting (~12 evenings)
1. **Crater** in `business` LXC — tax rates, company info, Stripe integration if you want online payment
2. **Beszel** hub in `monitoring` LXC + agents on each LXC — one dashboard for resource usage
### Phase 7 — Automation depth (ongoing)
Only when you have a real use case:
1. **Huginn** in `automation` — first agent: competitor pages, kosher product availability, grant deadlines
2. **Windmill** in `automation` — first script: rewrite an n8n flow with too many code nodes
3. **Flowise** in `labs` — first flow: chat-with-docs against your consulting notes
### Phase 8 — Knowledge / research
1. **Outline** in `productivity` LXC — client-facing wiki + your notes
2. **Linkwarden** in `productivity` LXC — bookmarks with full-page archive
3. **Paperless-ngx** in `media` — scan and OCR the paper that's accumulating
4. **SiYuan** — only if/when PhD or long-form research becomes relevant
---
## Static IP audit
**Maintain a `host-list.md` file** (in this Cursor project, alongside this plan) with every LXC/VM, its current IP, its target static IP, and DHCP/static status. Cursor will use this as the source of truth when scripting changes.
Suggested format:
| LXC/VM ID | Name | Role | Current IP | Target static IP | DHCP/Static | Notes |
|---|---|---|---|---|---|---|
| 210 | cal | Cal.com | 10.0.10.228/24 (DHCP) | 10.0.10.228/24 | ⏳ static | Convert ASAP |
| ... | ... | ... | ... | ... | ... | ... |
### Recommended IP plan
Use `/24` subnets within `10.0.10.0/24` (or whatever your LAN is) with role-based ranges so it's scannable:
| Range | Reserved for |
|---|---|
| `.1 - .9` | Network gear (router, switches, APs) |
| `.10 - .19` | Proxmox host(s) + PBS |
| `.20 - .39` | Edge / identity / comms (critical infra) |
| `.40 - .79` | Application LXCs (productivity, automation, business, monitoring) |
| `.80 - .99` | Media VM(s) |
| `.100 - .199` | DHCP pool (clients, phones, laptops) |
| `.200 - .249` | Labs / experimental |
| `.250 - .254` | Reserved |
### How to set static on a Proxmox LXC
Two methods — pick one and stick with it:
**Method A — Proxmox CLI (recommended, survives reboots cleanly):**
```bash
pct set <ID> -net0 name=eth0,bridge=vmbr0,ip=10.0.10.X/24,gw=10.0.10.1
pct reboot <ID>
```
**Method B — Router DHCP reservation:**
- Reserve the IP in your router's DHCP table by MAC address. LXC stays "DHCP" technically, but always gets the same IP.
- Easier if you have many hosts and one router.
- Risk: if the LXC's MAC changes (rebuild from snapshot to new ID), reservation breaks.
**Recommendation:** Method A (`pct set`) for everything critical (edge, identity, comms, business). Method B is fine for labs/experimental LXCs.
### Audit checklist
1. List every LXC: `pct list`
2. List every VM: `qm list`
3. For each, run `pct exec <ID> -- ip a` (or `qm guest exec <ID> -- ip a` for VMs) and check whether the IP came from DHCP
4. Fill in `host-list.md`
5. Pick target IPs from the range plan above
6. Convert one at a time, lowest-risk first (labs → productivity → business → comms → identity → edge)
7. **After each conversion**, verify the Caddy reverse-proxy entry still works (curl from outside)
8. Update `host-list.md` status column
### Hosts known to need conversion right now
- ~~**LXC 210 (cal)**~~ — static at `10.0.10.228` ✅
- **Site LXCs 220, 215/216/219** — static; served via Caddy → nginx on each LXC (git deploy). Optional future: static files on Caddy VM only.
---
## Backlog (priority order)
### P0 — status (2026-05-24)
| # | Item | Status |
|---|------|--------|
| 1 | Umami / Kuma / Dockge | ✅ |
| 2 | Portainer VM 109 | ✅ removed |
| 3 | Nextcloud VM 201 | ✅ retired |
| 4 | Listmonk → LXC 221 | ✅ + SMTP + VM 113 destroyed |
| 5 | Beszel agents | ✅ **16 systems** |
| 6 | Kuma monitors + email | ✅ **17 monitors**, all alert-linked |
| 7 | DNS `levkin.ca` apex | ✅ |
| 8 | Vikunja OIDC infra | ✅ live — browser test as `ilia` still manual |
| 9 | UniFi DHCP listmonk MAC | ⏳ manual @ UniFi |
| 10 | NAS / Jellyfin / DebianDesktop | **deferred** |
| 11 | Cal OIDC | deferred (no license) |
### P1 — next
See **[handoff-next-steps.md](handoff-next-steps.md)** — SSO smoke tests, secret rotation.
### Phase 2 backlog (was P1 infra)
1. **Caddy → edge LXC** @ `10.0.10.20`
2. **Security remediation** — [security-remediation-plan.md](security-remediation-plan.md)
3. **NAS / Jellyfin** — disk `W4J0L3PY`
### P1 — when ready
- **Outline** — wiki for client docs
- **Linkwarden** — bookmarks with full-page archive
- **Plane** — Jira-lite project management (pair with Mattermost)
### P2 — when you have a real need
- **Crater** — invoicing (Phase 6)
- **Immich** — photos (Phase 5)
- **Paperless-ngx** — document scanning (Phase 8)
- **Huginn** — first when you have a monitoring use case
- **Windmill** — when n8n hits limits
- **Trigger.dev** — durable background jobs in code (better fit than Windmill for QA work)
- **PrivateBin** — encrypted paste for sharing secrets with contractors
- **Addy.io** — email aliases
- **SiYuan** — if PhD work picks up
- **Flowise** — labs only, when LLM workflow use case appears
### Skip / declined
- ~~PhotoPrism~~ — Immich covers it
- ~~Activepieces~~ — you already have n8n
- ~~Affine / Trilium~~ — picked Outline + SiYuan instead
- ~~Matrix/Synapse + Element~~ — staying on Mattermost
- ~~Coolify / Dokploy / CapRover~~ — Dockge is enough; revisit only if writing many custom apps
---
## Backup strategy
- **Proxmox Backup Server (PBS)** or `vzdump` to a NAS — snapshot each LXC/VM nightly
- **Critical groups** (`identity`, `comms`, `business`): 7 daily + 4 weekly + 12 monthly
- **Productivity/automation**: 7 daily + 4 weekly
- **Labs**: 3 daily, no long retention
- **Off-site copy** of `identity` and `business` LXCs — these contain auth and billing data. Encrypted copy to Wasabi or Backblaze B2.
The whole LXC gets snapshotted — much simpler than file-level container backup.
**Done on pve10 (2026-05-22):** `pct snapshot` **`backup-20260522`** on LXCs **217** (identity) and **218** (monitoring).
---
## Next steps (priority order)
See **[handoff-2026-05-24.md](handoff-2026-05-24.md)** for sprint status checklist.
| # | Task | Status | Effort | Frees / unlocks |
|---|------|--------|--------|-----------------|
| 1 | **Kuma SMTP** | ✅ done | — | — |
| 2 | **Cal.com → Authentik OIDC** | ⏸ **deferred** | — | Needs `CALCOM_LICENSE_KEY`; infra ready — [sso-selfhosted-matrix.md](sso-selfhosted-matrix.md) |
| 3 | **auto.levkin.ca** → Cal booking link | ✅ | — | Consult button live |
| 4 | **Stop Portainer VM 109** | ✅ | — | Removed 2026-05-23; **~16 GiB RAM** on pve10 |
| 5 | **Retire Nextcloud VM 201** | ✅ | — | ~8 GiB RAM freed |
| 6 | **Vikunja → Authentik OIDC** | 🟡 infra OK | 15 min | Browser login as `ilia` |
| 7 | **UniFi DHCP reservations** | ⏳ | 20 min | [unifi-static-dhcp.md](unifi-static-dhcp.md) |
| 8 | **DNS levkin.ca apex** | ✅ | — | `142.180.237.136` |
| 9 | **Beszel + Kuma** | ✅ | — | 16 Beszel agents; 17 Kuma monitors |
| 10 | ~~**Listmonk SMTP**~~ | ✅ | — | UI + vault |
| 10 | **NAS.SP00** disk → Jellyfin | ⏳ hardware | — | VM 101 |
| 11 | **DebianDesktop reboot** | ✅ | — | VM 100 rebooted; 24 GB active on pve201 |
| 12 | **Caddy → edge LXC `.20`** | ⏳ defer | ~30 min | Phase 1.5 |
| 13 | **dev-apps LXC** | ⏳ defer | half day | After punim testing |
| 14 | **Static sites → Caddy VM** | ⏳ optional | 1 h | Defer |
**Defer:** Immich, Crater, Outline; Listmonk/Mattermost/Mailcow SSO after Vikunja; Cal OIDC until license.
### Adding a new service — quick rule
| Want to add… | Node | RAM budget | Prerequisite |
|--------------|------|------------|--------------|
| Small app (Mealie, Linkwarden) | pve10 | 2 GB LXC | ~22 GiB free on pve10 |
| Medium (Outline, Crater) | pve10 | 4 GB LXC | Portainer + Nextcloud already freed |
| Heavy (Immich + ML) | pve10 or pve201 GPU | 48 GB+ | NAS healthy; pve201 only after GPU/punim sized down |
| Dev sandbox | pve10 `dev-apps` | 68 GB | punim 9101 migration only after testing |
### Nextcloud decommission (VM 201)
1. Confirm export in `exports/nextcloud-2026-05-21/` is complete
2. Delete **Nextcloud** monitor in Kuma
3. Remove `nextcloud.levkin.ca` from Caddy VM
4. Stop VM 201; update [host-list.md](host-list.md)
5. After NAS healthy: optional `vzdump` archive then delete disk
---
## Important rules
1. **Never put Authentik behind itself.** `auth.levkin.ca` is a simple Caddy passthrough — no forward-auth, no fancy dependencies. If Authentik goes down, you'd lose access to Authentik.
2. **Vaultwarden stays standalone.** It's your break-glass path if Authentik dies. Don't OIDC it.
3. **Keep a local admin password on every SSO-wired app.** OIDC integrations break during upgrades — you need to log in to fix them.
4. **Local admin to Proxmox host.** Independent of Authentik and Vaultwarden. Written down somewhere physical.
5. **Don't expose admin UIs publicly.** Dockge, Beszel, Uptime Kuma admin, n8n editor — use Tailscale or Wireguard for remote access.
6. **Static IPs for every LXC.** DHCP will eventually move them and Caddy will break. Set via `pct set <id> -net0 ...ip=10.0.10.X/24,gw=...` or a router reservation.
7. **Cal.com LXC (210)** — static at `.228` ✅.
8. **Maintain `host-list.md`** as the single source of truth for IPs. Update it whenever a new LXC/VM is created or migrated.