ilia/ansible

Fork 0

ilia de49b34cdc

CI / skip-ci-check (pull_request) Successful in 6s

Details

CI / lint-and-test (pull_request) Failing after 9s

Details

CI / ansible-validation (pull_request) Failing after 6s

Details

CI / secret-scanning (pull_request) Successful in 5s

Details

CI / dependency-scan (pull_request) Successful in 8s

Details

CI / sast-scan (pull_request) Failing after 5s

Details

CI / license-check (pull_request) Successful in 11s

Details

CI / vault-check (pull_request) Failing after 6s

Details

CI / playbook-test (pull_request) Failing after 6s

Details

CI / container-scan (pull_request) Failing after 6s

Details

CI / sonar-analysis (pull_request) Failing after 2s

Details

CI / workflow-summary (pull_request) Successful in 4s

Details

Add homelab monitoring, portfolio site, and vault tooling.

Document pve10 static IPs, monitoring stack, and site LXCs; add portfolio
to inventory; Mailcow mailbox automation; vault import/export scripts;
security audit guides and UniFi DHCP reference.

Co-authored-by: Cursor <cursoragent@cursor.com>

2026-05-22 16:25:07 -04:00

18 KiB

Raw Blame History

Levkin self-hosted stack — plan & decisions

Reference doc for the Proxmox homelab. Lives alongside the Cursor project that has the Proxmox info.

Conventions:

All groups run inside an LXC unless marked VM.
Inside each LXC: one docker-compose.yml, managed by Dockge where applicable.
Caddy on the edge LXC is the only thing exposed to the internet.
Authentik on the identity LXC is the source of truth for who you are.
Vaultwarden stays standalone (it's the break-glass path if Authentik dies).

Current state (May 2026)

Already running:

Caddy reverse proxy — currently on a VM (should migrate to LXC, see "Caddy migration" section)
Mailcow — VM, mail domain is levkine.ca (with e)
Vaultwarden, Vikunja, n8n, Listmonk, Mattermost, Nextcloud — across various LXCs
Cal.com — LXC id 210, cal.levkin.ca, Postgres included, admin user ilia, 15-min consult event live at cal.levkin.ca/ilia/consult with Jitsi link
Caddy entries live for: caseware.levkin.ca, auto.levkin.ca, iliadobkin.com, cal.levkin.ca, listmonk.levkin.ca, pdf.levkin.ca, search.levkin.ca, auth.levkin.ca
Authentik — LXC 217 @ 10.0.10.21, https://auth.levkin.ca, admin + TOTP enrolled
Monitoring — LXC 218 @ 10.0.10.22: Uptime Kuma :3001, Dockge :5001, Umami :3000 (LAN-only) — monitoring-stack.md
Umami + Authentik admin/TOTP/backup codes — done
Uptime Kuma — monitors live; email alerts via Mailcow — see monitoring-stack.md
Dockge on 218 — manages local /opt/monitoring stack
Snapshots backup-20260522 on LXCs 217, 218
Jellyfin (VM 101) — stopped
LXC 210, 215–218, 219 — static via pct set; Caddy VM 106 — static in-guest .50
Nextcloud VM 201 — export done; retire soon (no SSO, remove Kuma monitor + Caddy block when off)

Decisions locked in:

Container manager: Dockge (not Portainer, not Coolify/Dokploy/CapRover)
Chat: Mattermost only — no Matrix/Synapse
Knowledge tool: Outline for client-facing, SiYuan if/when PhD work picks up (don't run Affine + Trilium too)
Bookmark manager: Linkwarden (full-page archive is the killer feature)
Authentik is the SSO target; Vaultwarden stays standalone

LXC / VM grouping table

Group	What's inside	Why grouped	LXC or VM
edge	Caddy reverse proxy, Crowdsec/Fail2ban	The front door — small, stable, restart rarely	LXC, 1 vCPU, 1GB RAM
identity	Authentik (+ Postgres + Redis), Vaultwarden	Auth-critical — touch rarely, back up religiously	LXC, 2 vCPU, 2GB RAM
comms	Mailcow	Mailcow's compose is huge (15+ containers) and self-contained — wants its own host	VM, 4GB RAM
automation	n8n, Windmill (later), Huginn (later)	Active workloads, frequent updates, you'll touch these a lot	LXC, 2–4 vCPU, 4GB RAM
productivity	Vikunja, Listmonk, Outline, Mealie, Linkwarden	Personal/team productivity, low-resource	LXC, 2 vCPU, 4GB RAM
media	Immich, Nextcloud, Paperless-ngx	Large storage, GPU passthrough useful for Immich ML	VM if GPU passthrough, else LXC. Lots of disk.
business	Cal.com ✅, Crater	Client-facing, financial — back up often	LXC, 2 vCPU, 2GB RAM
monitoring	Uptime Kuma ✅, Dockge ✅, Umami ✅, Beszel (later)	Ops stack on LXC 218	LXC, 2 vCPU, 2GB RAM
labs	Anything experimental — Flowise, Trigger.dev	Things you're trying out, can be wiped	LXC, scratch space

Why this grouping (cheat sheet)

One service goes bad → only its group restarts.
Need a kernel upgrade for one stack → snapshot the LXC, upgrade, roll back if broken.
Mailcow's huge surface area is isolated in its own VM.
Edge LXC is tiny and stable → perfect for the layer everything depends on.
Backup cadence per group (see Backups section).
Resource limits per LXC mean a runaway container can't eat n8n's RAM.

Subdomains

Only expose what actually needs to be public. Internal services use Tailscale/Wireguard for remote access.

Expose publicly

Subdomain	Service	Group	Why public	Status
`caseware.levkin.ca`	Static site	edge	Marketing	✅ live
`auto.levkin.ca`	Static site	edge	Marketing	✅ live
`iliadobkin.com`	Portfolio (SDET)	edge	Personal site	✅ live (pve10 LXC 219)
`cal.levkin.ca`	Cal.com	business	Clients book on it	✅ live
`listmonk.levkin.ca`	Listmonk	productivity	Unsubscribe URLs must resolve	✅ live
`mail.levkine.ca`	Mailcow	comms	Mail server	✅ live
`auth.levkin.ca`	Authentik	identity	OIDC redirect URLs need external resolution	✅ live
`bill.levkin.ca`	Crater	business	Clients view invoices	⏳ Phase 6
`cloud.levkin.ca`	Nextcloud	media	Retiring — decommission VM 201 after cutover	🗑️
`photos.levkin.ca`	Immich	media	Mobile apps need public hostname	⏳ Phase 5
`vault.levkin.ca`	Vaultwarden	identity	Mobile clients need public hostname	⏳
`notes.levkin.ca`	Outline	productivity	Sharing docs with clients	⏳
`chat.levkin.ca`	Mattermost	comms	Only if inviting outside users	⏳ optional

Keep internal only (no public DNS, no Caddy block)

Reachable only via local network or Tailscale/Wireguard:

Service	Reason
Umami admin UI	Only you need the dashboard. Tracking endpoint can be public, dashboard isn't.
Uptime Kuma	Status dashboard is for you. Don't advertise infrastructure.
Beszel	Metrics are admin-only.
Dockge	Admin UI — local only.
n8n editor	UI shouldn't be exposed. Webhooks go on `hooks.levkin.ca` if needed.
Huginn / Windmill / Flowise	Admin tools.
Vikunja	Personal task manager.
Mealie	Family recipes.
Trigger.dev	Internal automation.
Paperless-ngx	Personal documents. Never expose.
SiYuan	Personal knowledge.
Linkwarden	Personal bookmarks.

Borderline (decide per service)

Subdomain	Service	Notes
`stats.levkin.ca`	Umami collector	Only the tracking script endpoint needs to be public; admin UI stays internal
`status.levkin.ca`	Uptime Kuma	Kuma supports a separate public status page URL — that one can be public

Phased rollout

Phase 0 — Foundation

✅ Caddy running (on VM — migrate to LXC in Phase 1.5)
✅ Static IP audit (partial) — all LXCs on pve10 pinned; Caddy VM static .50; remaining VMs on stable DHCP — see host-list.md
✅ DNS for auth.levkin.ca → home IP (verified 2026-05-22)
✅ identity LXC 217 @ 10.0.10.21 (2 vCPU, 2GB RAM, 20GB local-lvm, Debian 12 + Docker Compose)

Phase 1 — Identity ✅

✅ Deploy Authentik in identity LXC (Authentik + Postgres + Redis, official compose at /opt/authentik)
✅ Caddy: auth.levkin.ca → 10.0.10.21:9000 (simple passthrough, no forward-auth)
✅ Admin user (admin), TOTP enrolled
✅ authentik Admins group (skip custom users group until more accounts)
✅ Static backup codes; don't OIDC other apps until Cal.com test

Phase 1.5 — Caddy migration to LXC (~30 min)

Why now (after Phase 1, before bulk SSO work in Phase 4): Authentik is stable enough to absorb a small change, but you haven't yet built the dependency web of OIDC integrations that would make a Caddy reload risky.

Why Caddy belongs in an LXC, not a VM:

~50MB OS overhead vs ~512MB for a VM
Boot/restart in 2-5s vs 20-40s (matters when reloading config)
Snapshot/backup is faster
Caddy is a Go binary doing reverse-proxy work — no need for kernel isolation
Near-native network performance

Steps:

Create edge LXC: Debian 12, 1 vCPU, 512MB RAM, 8GB disk, static IP from host list

Install Caddy via official Debian repo:

apt install -y debian-keyring debian-archive-keyring apt-transport-https
curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/gpg.key' | gpg --dearmor -o /usr/share/keyrings/caddy-stable-archive-keyring.gpg
curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/debian.deb.txt' | tee /etc/apt/sources.list.d/caddy-stable.list
apt update && apt install caddy

Copy Caddyfile + custom snippets ((security-headers) etc.) from the VM
Add a test subdomain (e.g. test.levkin.ca) pointing at the new LXC — verify TLS issues and routing works
Cut over: update router port-forward (80/443) to the new LXC IP. DNS A records don't need to change if they point to your home IP.
Watch Mailcow, Cal.com, Listmonk, the marketing sites for ~24h
Keep the old VM snapshot for a week, then delete

Phase 2 — Quick wins ✅

✅ Umami — tracking on caseware, auto, and iliadobkin.com (portfolio)
✅ Uptime Kuma — monitors in UI
✅ Dockge — logged in; register /opt/monitoring stack (see monitoring-stack.md)
⏳ Kuma email alerts — SMTP via Mailcow alerts@levkine.ca → your inbox (steps in monitoring-stack.md)

Phase 3 — Cal.com (mostly done) ✅

✅ Cal.com deployed in business LXC (id 210, Postgres included)
✅ cal.levkin.ca proxied via Caddy
✅ Booking link live at cal.levkin.ca/ilia/consult with Jitsi location
✅ Email working via cal@levkine.ca SMTP through Mailcow
⏳ Wire Cal.com to Authentik via OIDC (first real SSO connection — do this after Phase 1)
⏳ Update auto.levkin.ca button → cal.levkin.ca/ilia/consult (currently points to placeholder)

Phase 4 — SSO migration (~half a day, staged)

Wire each to Authentik, least-risky first:

Vikunja (OIDC native) — easy, single-user impact
~~Nextcloud~~ — skipped (VM 201 retiring)
Listmonk (OIDC native, admin only) — easy
Mattermost (SAML or OIDC native) — moderate
Mailcow (OIDC) — last, because mail-critical

For each: keep a local admin password as a break-glass account.

Phase 5 — Family / personal wins (~1 evening)

Immich in media VM — install mobile apps for you and family, enable auto-upload. Face recognition runs in background; "my kids 2024" works within a couple days.
Skip PhotoPrism — Immich covers it.

Phase 6 — Business / consulting (~1–2 evenings)

Crater in business LXC — tax rates, company info, Stripe integration if you want online payment
Beszel hub in monitoring LXC + agents on each LXC — one dashboard for resource usage

Phase 7 — Automation depth (ongoing)

Only when you have a real use case:

Huginn in automation — first agent: competitor pages, kosher product availability, grant deadlines
Windmill in automation — first script: rewrite an n8n flow with too many code nodes
Flowise in labs — first flow: chat-with-docs against your consulting notes

Phase 8 — Knowledge / research

Outline in productivity LXC — client-facing wiki + your notes
Linkwarden in productivity LXC — bookmarks with full-page archive
Paperless-ngx in media — scan and OCR the paper that's accumulating
SiYuan — only if/when PhD or long-form research becomes relevant

Static IP audit

Maintain a host-list.md file (in this Cursor project, alongside this plan) with every LXC/VM, its current IP, its target static IP, and DHCP/static status. Cursor will use this as the source of truth when scripting changes.

Suggested format:

LXC/VM ID	Name	Role	Current IP	Target static IP	DHCP/Static	Notes
210	cal	Cal.com	10.0.10.228/24 (DHCP)	10.0.10.228/24	⏳ static	Convert ASAP
...	...	...	...	...	...	...

Recommended IP plan

Use /24 subnets within 10.0.10.0/24 (or whatever your LAN is) with role-based ranges so it's scannable:

Range	Reserved for
`.1 - .9`	Network gear (router, switches, APs)
`.10 - .19`	Proxmox host(s) + PBS
`.20 - .39`	Edge / identity / comms (critical infra)
`.40 - .79`	Application LXCs (productivity, automation, business, monitoring)
`.80 - .99`	Media VM(s)
`.100 - .199`	DHCP pool (clients, phones, laptops)
`.200 - .249`	Labs / experimental
`.250 - .254`	Reserved

How to set static on a Proxmox LXC

Two methods — pick one and stick with it:

Method A — Proxmox CLI (recommended, survives reboots cleanly):

pct set <ID> -net0 name=eth0,bridge=vmbr0,ip=10.0.10.X/24,gw=10.0.10.1
pct reboot <ID>

Method B — Router DHCP reservation:

Reserve the IP in your router's DHCP table by MAC address. LXC stays "DHCP" technically, but always gets the same IP.
Easier if you have many hosts and one router.
Risk: if the LXC's MAC changes (rebuild from snapshot to new ID), reservation breaks.

Recommendation: Method A (pct set) for everything critical (edge, identity, comms, business). Method B is fine for labs/experimental LXCs.

Audit checklist

List every LXC: pct list
List every VM: qm list
For each, run pct exec <ID> -- ip a (or qm guest exec <ID> -- ip a for VMs) and check whether the IP came from DHCP
Fill in host-list.md
Pick target IPs from the range plan above
Convert one at a time, lowest-risk first (labs → productivity → business → comms → identity → edge)
After each conversion, verify the Caddy reverse-proxy entry still works (curl from outside)
Update host-list.md status column

Hosts known to need conversion right now

LXC 210 (cal) — currently DHCP 10.0.10.228/24, must be static before Caddy migration

Backlog (priority order)

P0 — next batch after Phase 1 admin bootstrap

Umami — analytics on landing pages, 10 min to deploy, immediate signal
Uptime Kuma — monitor what you already have
Dockge — UI over existing compose
Beszel — homelab resource visibility
Mealie — family recipes, simple win

P1 — when ready

Outline — wiki for client docs
Linkwarden — bookmarks with full-page archive
Plane — Jira-lite project management (pair with Mattermost)

P2 — when you have a real need

Crater — invoicing (Phase 6)
Immich — photos (Phase 5)
Paperless-ngx — document scanning (Phase 8)
Huginn — first when you have a monitoring use case
Windmill — when n8n hits limits
Trigger.dev — durable background jobs in code (better fit than Windmill for QA work)
PrivateBin — encrypted paste for sharing secrets with contractors
Addy.io — email aliases
SiYuan — if PhD work picks up
Flowise — labs only, when LLM workflow use case appears

Skip / declined

~~PhotoPrism~~ — Immich covers it
~~Activepieces~~ — you already have n8n
~~Affine / Trilium~~ — picked Outline + SiYuan instead
~~Matrix/Synapse + Element~~ — staying on Mattermost
~~Coolify / Dokploy / CapRover~~ — Dockge is enough; revisit only if writing many custom apps

Backup strategy

Proxmox Backup Server (PBS) or vzdump to a NAS — snapshot each LXC/VM nightly
Critical groups (identity, comms, business): 7 daily + 4 weekly + 12 monthly
Productivity/automation: 7 daily + 4 weekly
Labs: 3 daily, no long retention
Off-site copy of identity and business LXCs — these contain auth and billing data. Encrypted copy to Wasabi or Backblaze B2.

The whole LXC gets snapshotted — much simpler than file-level container backup.

Done on pve10 (2026-05-22): pct snapshot backup-20260522 on LXCs 217 (identity) and 218 (monitoring).

Next steps (priority order)

See homelab-status-2026-05-22.md for done vs todo.

#	Task	Effort	Doc
1	Kuma SMTP test in UI	5 min	monitoring-stack.md
2	UniFi DHCP reservations	20 min	unifi-static-dhcp.md
3	Cal.com → Authentik OIDC	1–2 h	Phase 3 below
4	Retire Nextcloud VM 201	30 min	nextcloud-export-2026-05-21.md
5	NAS.SP00 disk replace → Jellyfin	hardware	nas-sp00-drive-failure-report.md
6	Caddy → edge LXC `.20`	~30 min	Phase 1.5

Defer: Nextcloud SSO, Immich, Crater, Beszel until above are done.

Nextcloud decommission (VM 201)

Confirm export in exports/nextcloud-2026-05-21/ is complete
Delete Nextcloud monitor in Kuma
Remove nextcloud.levkin.ca from Caddy VM
Stop VM 201; update host-list.md
After NAS healthy: optional vzdump archive then delete disk

Important rules

Never put Authentik behind itself. auth.levkin.ca is a simple Caddy passthrough — no forward-auth, no fancy dependencies. If Authentik goes down, you'd lose access to Authentik.
Vaultwarden stays standalone. It's your break-glass path if Authentik dies. Don't OIDC it.
Keep a local admin password on every SSO-wired app. OIDC integrations break during upgrades — you need to log in to fix them.
Local admin to Proxmox host. Independent of Authentik and Vaultwarden. Written down somewhere physical.
Don't expose admin UIs publicly. Dockge, Beszel, Uptime Kuma admin, n8n editor — use Tailscale or Wireguard for remote access.
Static IPs for every LXC. DHCP will eventually move them and Caddy will break. Set via pct set <id> -net0 ...ip=10.0.10.X/24,gw=... or a router reservation.
Cal.com LXC (210) — static at .228 ✅.
Maintain host-list.md as the single source of truth for IPs. Update it whenever a new LXC/VM is created or migrated.

18 KiB Raw Blame History Unescape Escape

Levkin self-hosted stack — plan & decisions

Current state (May 2026)

LXC / VM grouping table

Why this grouping (cheat sheet)

Subdomains

Expose publicly

Keep internal only (no public DNS, no Caddy block)

Borderline (decide per service)

Phased rollout

Phase 0 — Foundation

Phase 1 — Identity ✅

Phase 1.5 — Caddy migration to LXC (~30 min)

Phase 2 — Quick wins ✅

Phase 3 — Cal.com (mostly done) ✅

Phase 4 — SSO migration (~half a day, staged)

Phase 5 — Family / personal wins (~1 evening)

Phase 6 — Business / consulting (~1–2 evenings)

Phase 7 — Automation depth (ongoing)

Phase 8 — Knowledge / research

Static IP audit

Recommended IP plan

How to set static on a Proxmox LXC

Audit checklist

Hosts known to need conversion right now

Backlog (priority order)

P0 — next batch after Phase 1 admin bootstrap

P1 — when ready

P2 — when you have a real need

Skip / declined

Backup strategy

Next steps (priority order)

Nextcloud decommission (VM 201)

Important rules

18 KiB

Raw Blame History