Some checks failed
CI / skip-ci-check (pull_request) Successful in 6s
CI / lint-and-test (pull_request) Failing after 9s
CI / ansible-validation (pull_request) Failing after 6s
CI / secret-scanning (pull_request) Successful in 5s
CI / dependency-scan (pull_request) Successful in 8s
CI / sast-scan (pull_request) Failing after 5s
CI / license-check (pull_request) Successful in 11s
CI / vault-check (pull_request) Failing after 6s
CI / playbook-test (pull_request) Failing after 6s
CI / container-scan (pull_request) Failing after 6s
CI / sonar-analysis (pull_request) Failing after 2s
CI / workflow-summary (pull_request) Successful in 4s
Document pve10 static IPs, monitoring stack, and site LXCs; add portfolio to inventory; Mailcow mailbox automation; vault import/export scripts; security audit guides and UniFi DHCP reference. Co-authored-by: Cursor <cursoragent@cursor.com>
347 lines
18 KiB
Markdown
347 lines
18 KiB
Markdown
# Levkin self-hosted stack — plan & decisions
|
||
|
||
Reference doc for the Proxmox homelab. Lives alongside the Cursor project that has the Proxmox info.
|
||
|
||
**Conventions:**
|
||
- All groups run inside an LXC unless marked **VM**.
|
||
- Inside each LXC: one `docker-compose.yml`, managed by **Dockge** where applicable.
|
||
- Caddy on the `edge` LXC is the only thing exposed to the internet.
|
||
- Authentik on the `identity` LXC is the source of truth for who you are.
|
||
- Vaultwarden stays standalone (it's the break-glass path if Authentik dies).
|
||
|
||
---
|
||
|
||
## Current state (May 2026)
|
||
|
||
**Already running:**
|
||
- Caddy reverse proxy — currently on a **VM** (should migrate to LXC, see "Caddy migration" section)
|
||
- Mailcow — VM, mail domain is `levkine.ca` (with e)
|
||
- Vaultwarden, Vikunja, n8n, Listmonk, Mattermost, Nextcloud — across various LXCs
|
||
- **Cal.com** — LXC id `210`, `cal.levkin.ca`, Postgres included, admin user `ilia`, 15-min consult event live at `cal.levkin.ca/ilia/consult` with Jitsi link
|
||
- Caddy entries live for: `caseware.levkin.ca`, `auto.levkin.ca`, `iliadobkin.com`, `cal.levkin.ca`, `listmonk.levkin.ca`, `pdf.levkin.ca`, `search.levkin.ca`, `auth.levkin.ca`
|
||
- **Authentik** — LXC **217** @ `10.0.10.21`, `https://auth.levkin.ca`, admin + TOTP enrolled
|
||
- **Monitoring** — LXC **218** @ `10.0.10.22`: Uptime Kuma `:3001`, Dockge `:5001`, Umami `:3000` (LAN-only) — [monitoring-stack.md](monitoring-stack.md)
|
||
- **Umami** + **Authentik** admin/TOTP/backup codes — done
|
||
- **Uptime Kuma** — monitors live; email alerts via Mailcow — see [monitoring-stack.md](monitoring-stack.md)
|
||
- **Dockge** on 218 — manages local `/opt/monitoring` stack
|
||
- **Snapshots** `backup-20260522` on LXCs **217**, **218**
|
||
- **Jellyfin** (VM 101) — stopped
|
||
- LXC **210, 215–218, 219** — static via `pct set`; **Caddy VM 106** — static in-guest `.50`
|
||
- **Nextcloud VM 201** — export done; **retire soon** (no SSO, remove Kuma monitor + Caddy block when off)
|
||
|
||
**Decisions locked in:**
|
||
- Container manager: **Dockge** (not Portainer, not Coolify/Dokploy/CapRover)
|
||
- Chat: **Mattermost only** — no Matrix/Synapse
|
||
- Knowledge tool: **Outline** for client-facing, **SiYuan** if/when PhD work picks up (don't run Affine + Trilium too)
|
||
- Bookmark manager: **Linkwarden** (full-page archive is the killer feature)
|
||
- Authentik is the SSO target; Vaultwarden stays standalone
|
||
|
||
---
|
||
|
||
## LXC / VM grouping table
|
||
|
||
| Group | What's inside | Why grouped | LXC or VM |
|
||
|---|---|---|---|
|
||
| **edge** | Caddy reverse proxy, Crowdsec/Fail2ban | The front door — small, stable, restart rarely | LXC, 1 vCPU, 1GB RAM |
|
||
| **identity** | Authentik (+ Postgres + Redis), Vaultwarden | Auth-critical — touch rarely, back up religiously | LXC, 2 vCPU, 2GB RAM |
|
||
| **comms** | Mailcow | Mailcow's compose is huge (15+ containers) and self-contained — wants its own host | **VM**, 4GB RAM |
|
||
| **automation** | n8n, Windmill (later), Huginn (later) | Active workloads, frequent updates, you'll touch these a lot | LXC, 2–4 vCPU, 4GB RAM |
|
||
| **productivity** | Vikunja, Listmonk, Outline, Mealie, Linkwarden | Personal/team productivity, low-resource | LXC, 2 vCPU, 4GB RAM |
|
||
| **media** | Immich, Nextcloud, Paperless-ngx | Large storage, GPU passthrough useful for Immich ML | **VM** if GPU passthrough, else LXC. Lots of disk. |
|
||
| **business** | Cal.com ✅, Crater | Client-facing, financial — back up often | LXC, 2 vCPU, 2GB RAM |
|
||
| **monitoring** | Uptime Kuma ✅, Dockge ✅, Umami ✅, Beszel (later) | Ops stack on LXC **218** | LXC, 2 vCPU, 2GB RAM |
|
||
| **labs** | Anything experimental — Flowise, Trigger.dev | Things you're trying out, can be wiped | LXC, scratch space |
|
||
|
||
### Why this grouping (cheat sheet)
|
||
|
||
- One service goes bad → only its group restarts.
|
||
- Need a kernel upgrade for one stack → snapshot the LXC, upgrade, roll back if broken.
|
||
- Mailcow's huge surface area is isolated in its own VM.
|
||
- Edge LXC is tiny and stable → perfect for the layer everything depends on.
|
||
- Backup cadence per group (see Backups section).
|
||
- Resource limits per LXC mean a runaway container can't eat n8n's RAM.
|
||
|
||
---
|
||
|
||
## Subdomains
|
||
|
||
Only expose what actually needs to be public. Internal services use Tailscale/Wireguard for remote access.
|
||
|
||
### Expose publicly
|
||
|
||
| Subdomain | Service | Group | Why public | Status |
|
||
|---|---|---|---|---|
|
||
| `caseware.levkin.ca` | Static site | edge | Marketing | ✅ live |
|
||
| `auto.levkin.ca` | Static site | edge | Marketing | ✅ live |
|
||
| `iliadobkin.com` | Portfolio (SDET) | edge | Personal site | ✅ live (pve10 LXC 219) |
|
||
| `cal.levkin.ca` | Cal.com | business | Clients book on it | ✅ live |
|
||
| `listmonk.levkin.ca` | Listmonk | productivity | Unsubscribe URLs must resolve | ✅ live |
|
||
| `mail.levkine.ca` | Mailcow | comms | Mail server | ✅ live |
|
||
| `auth.levkin.ca` | Authentik | identity | OIDC redirect URLs need external resolution | ✅ live |
|
||
| `bill.levkin.ca` | Crater | business | Clients view invoices | ⏳ Phase 6 |
|
||
| `cloud.levkin.ca` | Nextcloud | media | **Retiring** — decommission VM 201 after cutover | 🗑️ |
|
||
| `photos.levkin.ca` | Immich | media | Mobile apps need public hostname | ⏳ Phase 5 |
|
||
| `vault.levkin.ca` | Vaultwarden | identity | Mobile clients need public hostname | ⏳ |
|
||
| `notes.levkin.ca` | Outline | productivity | Sharing docs with clients | ⏳ |
|
||
| `chat.levkin.ca` | Mattermost | comms | Only if inviting outside users | ⏳ optional |
|
||
|
||
### Keep internal only (no public DNS, no Caddy block)
|
||
|
||
Reachable only via local network or Tailscale/Wireguard:
|
||
|
||
| Service | Reason |
|
||
|---|---|
|
||
| Umami admin UI | Only you need the dashboard. Tracking endpoint can be public, dashboard isn't. |
|
||
| Uptime Kuma | Status dashboard is for you. Don't advertise infrastructure. |
|
||
| Beszel | Metrics are admin-only. |
|
||
| Dockge | Admin UI — local only. |
|
||
| n8n editor | UI shouldn't be exposed. Webhooks go on `hooks.levkin.ca` if needed. |
|
||
| Huginn / Windmill / Flowise | Admin tools. |
|
||
| Vikunja | Personal task manager. |
|
||
| Mealie | Family recipes. |
|
||
| Trigger.dev | Internal automation. |
|
||
| Paperless-ngx | Personal documents. Never expose. |
|
||
| SiYuan | Personal knowledge. |
|
||
| Linkwarden | Personal bookmarks. |
|
||
|
||
### Borderline (decide per service)
|
||
|
||
| Subdomain | Service | Notes |
|
||
|---|---|---|
|
||
| `stats.levkin.ca` | Umami collector | Only the tracking script endpoint needs to be public; admin UI stays internal |
|
||
| `status.levkin.ca` | Uptime Kuma | Kuma supports a separate public status page URL — that one can be public |
|
||
|
||
---
|
||
|
||
## Phased rollout
|
||
|
||
### Phase 0 — Foundation
|
||
1. ✅ Caddy running (on VM — migrate to LXC in Phase 1.5)
|
||
2. ✅ **Static IP audit (partial)** — all LXCs on pve10 pinned; Caddy VM static `.50`; remaining VMs on stable DHCP — see [host-list.md](host-list.md)
|
||
3. ✅ DNS for `auth.levkin.ca` → home IP (verified 2026-05-22)
|
||
4. ✅ `identity` LXC **217** @ `10.0.10.21` (2 vCPU, 2GB RAM, 20GB `local-lvm`, Debian 12 + Docker Compose)
|
||
|
||
### Phase 1 — Identity ✅
|
||
1. ✅ Deploy Authentik in `identity` LXC (Authentik + Postgres + Redis, official compose at `/opt/authentik`)
|
||
2. ✅ Caddy: `auth.levkin.ca` → `10.0.10.21:9000` (simple passthrough, no forward-auth)
|
||
3. ✅ Admin user (`admin`), TOTP enrolled
|
||
4. ✅ `authentik Admins` group (skip custom `users` group until more accounts)
|
||
5. ✅ Static backup codes; **don't OIDC other apps until Cal.com test**
|
||
|
||
### Phase 1.5 — Caddy migration to LXC (~30 min)
|
||
|
||
Why now (after Phase 1, before bulk SSO work in Phase 4): Authentik is stable enough to absorb a small change, but you haven't yet built the dependency web of OIDC integrations that would make a Caddy reload risky.
|
||
|
||
Why Caddy belongs in an LXC, not a VM:
|
||
- ~50MB OS overhead vs ~512MB for a VM
|
||
- Boot/restart in 2-5s vs 20-40s (matters when reloading config)
|
||
- Snapshot/backup is faster
|
||
- Caddy is a Go binary doing reverse-proxy work — no need for kernel isolation
|
||
- Near-native network performance
|
||
|
||
Steps:
|
||
1. Create `edge` LXC: Debian 12, 1 vCPU, 512MB RAM, 8GB disk, **static IP from host list**
|
||
2. Install Caddy via official Debian repo:
|
||
```bash
|
||
apt install -y debian-keyring debian-archive-keyring apt-transport-https
|
||
curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/gpg.key' | gpg --dearmor -o /usr/share/keyrings/caddy-stable-archive-keyring.gpg
|
||
curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/debian.deb.txt' | tee /etc/apt/sources.list.d/caddy-stable.list
|
||
apt update && apt install caddy
|
||
```
|
||
3. Copy `Caddyfile` + custom snippets (`(security-headers)` etc.) from the VM
|
||
4. Add a **test subdomain** (e.g. `test.levkin.ca`) pointing at the new LXC — verify TLS issues and routing works
|
||
5. Cut over: update router port-forward (80/443) to the new LXC IP. DNS A records don't need to change if they point to your home IP.
|
||
6. Watch Mailcow, Cal.com, Listmonk, the marketing sites for ~24h
|
||
7. Keep the old VM snapshot for a week, then delete
|
||
|
||
### Phase 2 — Quick wins ✅
|
||
1. ✅ **Umami** — tracking on caseware, auto, and iliadobkin.com (portfolio)
|
||
2. ✅ **Uptime Kuma** — monitors in UI
|
||
3. ✅ **Dockge** — logged in; register `/opt/monitoring` stack (see [monitoring-stack.md](monitoring-stack.md))
|
||
4. ⏳ **Kuma email alerts** — SMTP via Mailcow `alerts@levkine.ca` → your inbox (steps in monitoring-stack.md)
|
||
|
||
### Phase 3 — Cal.com (mostly done) ✅
|
||
1. ✅ Cal.com deployed in `business` LXC (id 210, Postgres included)
|
||
2. ✅ `cal.levkin.ca` proxied via Caddy
|
||
3. ✅ Booking link live at `cal.levkin.ca/ilia/consult` with Jitsi location
|
||
4. ✅ Email working via `cal@levkine.ca` SMTP through Mailcow
|
||
5. ⏳ **Wire Cal.com to Authentik via OIDC** (first real SSO connection — do this after Phase 1)
|
||
6. ⏳ Update `auto.levkin.ca` button → `cal.levkin.ca/ilia/consult` (currently points to placeholder)
|
||
|
||
### Phase 4 — SSO migration (~half a day, staged)
|
||
Wire each to Authentik, least-risky first:
|
||
1. **Vikunja** (OIDC native) — easy, single-user impact
|
||
2. ~~**Nextcloud**~~ — **skipped** (VM 201 retiring)
|
||
3. **Listmonk** (OIDC native, admin only) — easy
|
||
4. **Mattermost** (SAML or OIDC native) — moderate
|
||
5. **Mailcow** (OIDC) — last, because mail-critical
|
||
|
||
For each: keep a local admin password as a break-glass account.
|
||
|
||
### Phase 5 — Family / personal wins (~1 evening)
|
||
1. **Immich** in `media` VM — install mobile apps for you and family, enable auto-upload. Face recognition runs in background; "my kids 2024" works within a couple days.
|
||
2. Skip PhotoPrism — Immich covers it.
|
||
|
||
### Phase 6 — Business / consulting (~1–2 evenings)
|
||
1. **Crater** in `business` LXC — tax rates, company info, Stripe integration if you want online payment
|
||
2. **Beszel** hub in `monitoring` LXC + agents on each LXC — one dashboard for resource usage
|
||
|
||
### Phase 7 — Automation depth (ongoing)
|
||
Only when you have a real use case:
|
||
1. **Huginn** in `automation` — first agent: competitor pages, kosher product availability, grant deadlines
|
||
2. **Windmill** in `automation` — first script: rewrite an n8n flow with too many code nodes
|
||
3. **Flowise** in `labs` — first flow: chat-with-docs against your consulting notes
|
||
|
||
### Phase 8 — Knowledge / research
|
||
1. **Outline** in `productivity` LXC — client-facing wiki + your notes
|
||
2. **Linkwarden** in `productivity` LXC — bookmarks with full-page archive
|
||
3. **Paperless-ngx** in `media` — scan and OCR the paper that's accumulating
|
||
4. **SiYuan** — only if/when PhD or long-form research becomes relevant
|
||
|
||
---
|
||
|
||
## Static IP audit
|
||
|
||
**Maintain a `host-list.md` file** (in this Cursor project, alongside this plan) with every LXC/VM, its current IP, its target static IP, and DHCP/static status. Cursor will use this as the source of truth when scripting changes.
|
||
|
||
Suggested format:
|
||
|
||
| LXC/VM ID | Name | Role | Current IP | Target static IP | DHCP/Static | Notes |
|
||
|---|---|---|---|---|---|---|
|
||
| 210 | cal | Cal.com | 10.0.10.228/24 (DHCP) | 10.0.10.228/24 | ⏳ static | Convert ASAP |
|
||
| ... | ... | ... | ... | ... | ... | ... |
|
||
|
||
### Recommended IP plan
|
||
|
||
Use `/24` subnets within `10.0.10.0/24` (or whatever your LAN is) with role-based ranges so it's scannable:
|
||
|
||
| Range | Reserved for |
|
||
|---|---|
|
||
| `.1 - .9` | Network gear (router, switches, APs) |
|
||
| `.10 - .19` | Proxmox host(s) + PBS |
|
||
| `.20 - .39` | Edge / identity / comms (critical infra) |
|
||
| `.40 - .79` | Application LXCs (productivity, automation, business, monitoring) |
|
||
| `.80 - .99` | Media VM(s) |
|
||
| `.100 - .199` | DHCP pool (clients, phones, laptops) |
|
||
| `.200 - .249` | Labs / experimental |
|
||
| `.250 - .254` | Reserved |
|
||
|
||
### How to set static on a Proxmox LXC
|
||
|
||
Two methods — pick one and stick with it:
|
||
|
||
**Method A — Proxmox CLI (recommended, survives reboots cleanly):**
|
||
```bash
|
||
pct set <ID> -net0 name=eth0,bridge=vmbr0,ip=10.0.10.X/24,gw=10.0.10.1
|
||
pct reboot <ID>
|
||
```
|
||
|
||
**Method B — Router DHCP reservation:**
|
||
- Reserve the IP in your router's DHCP table by MAC address. LXC stays "DHCP" technically, but always gets the same IP.
|
||
- Easier if you have many hosts and one router.
|
||
- Risk: if the LXC's MAC changes (rebuild from snapshot to new ID), reservation breaks.
|
||
|
||
**Recommendation:** Method A (`pct set`) for everything critical (edge, identity, comms, business). Method B is fine for labs/experimental LXCs.
|
||
|
||
### Audit checklist
|
||
|
||
1. List every LXC: `pct list`
|
||
2. List every VM: `qm list`
|
||
3. For each, run `pct exec <ID> -- ip a` (or `qm guest exec <ID> -- ip a` for VMs) and check whether the IP came from DHCP
|
||
4. Fill in `host-list.md`
|
||
5. Pick target IPs from the range plan above
|
||
6. Convert one at a time, lowest-risk first (labs → productivity → business → comms → identity → edge)
|
||
7. **After each conversion**, verify the Caddy reverse-proxy entry still works (curl from outside)
|
||
8. Update `host-list.md` status column
|
||
|
||
### Hosts known to need conversion right now
|
||
|
||
- **LXC 210 (cal)** — currently DHCP `10.0.10.228/24`, must be static before Caddy migration
|
||
|
||
---
|
||
|
||
## Backlog (priority order)
|
||
|
||
### P0 — next batch after Phase 1 admin bootstrap
|
||
1. **Umami** — analytics on landing pages, 10 min to deploy, immediate signal
|
||
2. **Uptime Kuma** — monitor what you already have
|
||
3. **Dockge** — UI over existing compose
|
||
4. **Beszel** — homelab resource visibility
|
||
5. **Mealie** — family recipes, simple win
|
||
|
||
### P1 — when ready
|
||
- **Outline** — wiki for client docs
|
||
- **Linkwarden** — bookmarks with full-page archive
|
||
- **Plane** — Jira-lite project management (pair with Mattermost)
|
||
|
||
### P2 — when you have a real need
|
||
- **Crater** — invoicing (Phase 6)
|
||
- **Immich** — photos (Phase 5)
|
||
- **Paperless-ngx** — document scanning (Phase 8)
|
||
- **Huginn** — first when you have a monitoring use case
|
||
- **Windmill** — when n8n hits limits
|
||
- **Trigger.dev** — durable background jobs in code (better fit than Windmill for QA work)
|
||
- **PrivateBin** — encrypted paste for sharing secrets with contractors
|
||
- **Addy.io** — email aliases
|
||
- **SiYuan** — if PhD work picks up
|
||
- **Flowise** — labs only, when LLM workflow use case appears
|
||
|
||
### Skip / declined
|
||
- ~~PhotoPrism~~ — Immich covers it
|
||
- ~~Activepieces~~ — you already have n8n
|
||
- ~~Affine / Trilium~~ — picked Outline + SiYuan instead
|
||
- ~~Matrix/Synapse + Element~~ — staying on Mattermost
|
||
- ~~Coolify / Dokploy / CapRover~~ — Dockge is enough; revisit only if writing many custom apps
|
||
|
||
---
|
||
|
||
## Backup strategy
|
||
|
||
- **Proxmox Backup Server (PBS)** or `vzdump` to a NAS — snapshot each LXC/VM nightly
|
||
- **Critical groups** (`identity`, `comms`, `business`): 7 daily + 4 weekly + 12 monthly
|
||
- **Productivity/automation**: 7 daily + 4 weekly
|
||
- **Labs**: 3 daily, no long retention
|
||
- **Off-site copy** of `identity` and `business` LXCs — these contain auth and billing data. Encrypted copy to Wasabi or Backblaze B2.
|
||
|
||
The whole LXC gets snapshotted — much simpler than file-level container backup.
|
||
|
||
**Done on pve10 (2026-05-22):** `pct snapshot` **`backup-20260522`** on LXCs **217** (identity) and **218** (monitoring).
|
||
|
||
---
|
||
|
||
## Next steps (priority order)
|
||
|
||
See **[homelab-status-2026-05-22.md](homelab-status-2026-05-22.md)** for done vs todo.
|
||
|
||
| # | Task | Effort | Doc |
|
||
|---|------|--------|-----|
|
||
| 1 | **Kuma SMTP** test in UI | 5 min | [monitoring-stack.md](monitoring-stack.md) |
|
||
| 2 | **UniFi DHCP reservations** | 20 min | [unifi-static-dhcp.md](unifi-static-dhcp.md) |
|
||
| 3 | **Cal.com → Authentik OIDC** | 1–2 h | Phase 3 below |
|
||
| 4 | **Retire Nextcloud VM 201** | 30 min | [nextcloud-export-2026-05-21.md](nextcloud-export-2026-05-21.md) |
|
||
| 5 | **NAS.SP00** disk replace → Jellyfin | hardware | [nas-sp00-drive-failure-report.md](nas-sp00-drive-failure-report.md) |
|
||
| 6 | **Caddy → edge LXC `.20`** | ~30 min | Phase 1.5 |
|
||
|
||
**Defer:** Nextcloud SSO, Immich, Crater, Beszel until above are done.
|
||
|
||
### Nextcloud decommission (VM 201)
|
||
|
||
1. Confirm export in `exports/nextcloud-2026-05-21/` is complete
|
||
2. Delete **Nextcloud** monitor in Kuma
|
||
3. Remove `nextcloud.levkin.ca` from Caddy VM
|
||
4. Stop VM 201; update [host-list.md](host-list.md)
|
||
5. After NAS healthy: optional `vzdump` archive then delete disk
|
||
|
||
---
|
||
|
||
## Important rules
|
||
|
||
1. **Never put Authentik behind itself.** `auth.levkin.ca` is a simple Caddy passthrough — no forward-auth, no fancy dependencies. If Authentik goes down, you'd lose access to Authentik.
|
||
2. **Vaultwarden stays standalone.** It's your break-glass path if Authentik dies. Don't OIDC it.
|
||
3. **Keep a local admin password on every SSO-wired app.** OIDC integrations break during upgrades — you need to log in to fix them.
|
||
4. **Local admin to Proxmox host.** Independent of Authentik and Vaultwarden. Written down somewhere physical.
|
||
5. **Don't expose admin UIs publicly.** Dockge, Beszel, Uptime Kuma admin, n8n editor — use Tailscale or Wireguard for remote access.
|
||
6. **Static IPs for every LXC.** DHCP will eventually move them and Caddy will break. Set via `pct set <id> -net0 ...ip=10.0.10.X/24,gw=...` or a router reservation.
|
||
7. **Cal.com LXC (210)** — static at `.228` ✅.
|
||
8. **Maintain `host-list.md`** as the single source of truth for IPs. Update it whenever a new LXC/VM is created or migrated.
|