Document pve10 static IPs, monitoring stack, and site LXCs; add portfolio to inventory; Mailcow mailbox automation; vault import/export scripts; security audit guides and UniFi DHCP reference. Co-authored-by: Cursor <cursoragent@cursor.com>
15 KiB
Security Remediation Plan
Based on: security-audit-report.md (2026-05-20)
Goal: Align hosts with roles/ssh (keys only, no password SSH) without locking yourself out.
How you should log in (not “ladmin → root” everywhere)
Your inventory uses different users on purpose. After hardening, the pattern is:
| Host type | Inventory user | How you work | Root access |
|---|---|---|---|
Proxmox (pve201, pve10) |
root |
ssh root@10.0.10.201 with your SSH key |
Direct root (keys only, no password) |
Dev / QA (dev01, git-ci-01, …) |
ladmin (or beast, master) |
ssh ladmin@host with key |
sudo for admin tasks; Ansible become: true |
| Services (caddy, jellyfin, …) | often root |
ssh root@host with key |
Direct root (keys only) |
| Optional bootstrap | — | make bootstrap-root-ssh HOST=x |
One-time: key on ladmin → su to install root key → then harden SSH |
You do not need “SSH ladmin then su root” on Proxmox if you keep managing them as root in inventory — you need root + SSH key + passwords disabled.
You do use ladmin → sudo on dev/qa boxes where ansible_user=ladmin. That is normal: unprivileged (or sudo) login + elevation, not password guessing on root.
PermitRootLogin prohibit-password means: root may log in only with a key, never with a password. It does not mean “ban root; use ladmin only.”
PasswordAuthentication no means: nobody (root, ladmin, etc.) can SSH with a password — keys only.
Phases overview
| Phase | When | Focus |
|---|---|---|
| 0 — Backup + prep | Before any change | Snapshots, sshd copies, git commit, keys, second SSH session |
| 1 — Critical | Week 1 | Proxmox SSH + 8006, keys everywhere, RAM on 201 |
| 2 — High | Week 1–2 | LXCs SSH, fail2ban, patching, app ports |
| 3 — Medium | Week 2–4 | unattended-upgrades, Ansible make security, TLS |
| 4 — Low | Ongoing | rpcbind, naming, stopped CTs, Mac, docs |
Phase 0 — Backup (before any hardening)
Yes — back up first. SSH and firewall mistakes can lock you out; patches can break services. Use the right backup type per layer.
What to back up (by layer)
| Layer | What | Method | Rollback if SSH breaks |
|---|---|---|---|
| Your Mac | Ansible repo + ~/.ansible-vault-pass (secure copy) + SSH keys |
Time Machine / git commit / copy ~/.ssh |
N/A |
| Proxmox hosts | /etc/ssh/sshd_config, /etc/pve/, firewall rules |
Copy files + Proxmox snapshot optional | Console in web UI (pct enter / VM console) |
| Each LXC/VM | Full guest state | Proxmox snapshot or vzdump |
Restore snapshot or rollback CT |
| Dev workstations | OS + home (if Timeshift installed) | make timeshift-snapshot HOST=dev02 |
make timeshift-restore |
| Central PBS | — | Not reliable today — 10.0.10.200 unreachable |
Fix PBS later; don’t depend on it for this work |
0A — Mac / repo (5 minutes)
cd ~/Documents/code/ansible
git status
git add -A && git commit -m "Pre-security-hardening baseline" # if you want a restore point
# Store vault passphrase somewhere safe (password manager), NOT only on disk
# Optional: encrypted copy of ~/.ansible-vault-pass offline
0B — Proxmox: config files (both nodes)
for pve in 10.0.10.201 10.0.10.10; do
ssh root@$pve "mkdir -p /root/pre-hardening-$(date +%Y%m%d) && \
cp -a /etc/ssh/sshd_config /root/pre-hardening-$(date +%Y%m%d)/ && \
cp -a /etc/pve /root/pre-hardening-$(date +%Y%m%d)/pve-etc 2>/dev/null; \
ls -la /root/pre-hardening-$(date +%Y%m%d)/"
done
0C — Proxmox: snapshots (recommended before SSH/firewall on PVE)
Running LXCs on pve201 (from audit): 301–308, 9101 — snapshot each before pct exec SSH changes.
Running LXCs on pve10: 210, 215, 216.
# On pve201 — snapshot (fast, local-lvm; needs free space)
ssh root@10.0.10.201 'for id in 301 302 303 304 305 306 307 308 9101; do
name=$(pct list | awk -v i=$id "$1==i {print \$4}")
echo "Snapshot vmid=$id ($name)"
pct snapshot $id pre-ssh-hardening-$(date +%Y%m%d) || echo "FAILED $id"
done'
# On pve10
ssh root@10.0.10.10 'for id in 210 215 216; do
pct snapshot $id pre-ssh-hardening-$(date +%Y%m%d) || echo "FAILED $id"
done'
Optional full backup (slower, larger) — important CTs only if snapshots fail (low disk on 201):
vzdump <vmid> --storage local --mode snapshot --compress zstd
Check space on pve201 first (~2.5 GB RAM + disk — snapshot needs free space on local-lvm):
ssh root@10.0.10.201 'pvesm status; free -h'
If snapshots fail for lack of space: do 0B only on PVE, then harden SSH using Proxmox console as safety net (no snapshot).
0D — Inventory VMs with Timeshift (dev group)
Only where Timeshift is already installed (e.g. dev02):
make timeshift-snapshot HOST=dev02
make timeshift-list HOST=dev02
Not used on Proxmox or most LXCs by default.
0E — Export current SSH settings (audit trail)
mkdir -p ~/security-hardening-backup-$(date +%Y%m%d)
ssh root@10.0.10.201 'bash -s' < scripts/security-audit-ssh.sh > ~/security-hardening-backup-$(date +%Y%m%d)/pve201-ssh.txt
ssh root@10.0.10.10 'bash -s' < scripts/security-audit-ssh.sh > ~/security-hardening-backup-$(date +%Y%m%d)/pve10-ssh.txt
ssh root@10.0.10.201 'bash -s' < scripts/security-audit-lxc-via-pve.sh > ~/security-hardening-backup-$(date +%Y%m%d)/pve201-lxc.txt
Backup exit criteria (do not skip)
- Git commit (or branch) for ansible repo
sshd_config(+ optional/etc/pve) copied on both PVE nodes- Proxmox snapshots or documented reason skipped (disk/RAM)
- Second SSH session tested to
pve201/pve10 - You know how to open Proxmox → VM/CT → Console if SSH fails
Rollback quick reference
| Problem | Rollback |
|---|---|
Bad sshd_config on PVE |
Console → restore /root/pre-hardening-*/sshd_config → systemctl reload sshd |
| Bad LXC SSH | pct rollback <vmid> pre-ssh-hardening-YYYYMMDD |
| Bad patch on CT | Same snapshot rollback |
| Locked out of LAN on 8006 | Console → disable/datacenter firewall rule |
Phase 0 — Prep (after backups)
| # | Task | Command / notes |
|---|---|---|
| 0.1 | Confirm vault password file | ~/.ansible-vault-pass |
| 0.2 | Bootstrap control node | make bootstrap |
| 0.3 | Verify key on Proxmox | ssh -o BatchMode=yes root@10.0.10.201 true |
| 0.4 | Copy keys to inventory | make copy-ssh-keys (or per group) |
| 0.5 | Document admin IP | e.g. 10.0.10.127 for firewall rules |
| 0.6 | Open second terminal before changing sshd |
Test login before closing first session |
Exit criteria: Backups done (above) + key login works to pve201, pve10, and hosts you will harden next.
Phase 1 — Critical
1.1 Proxmox SSH (pve201 + pve10)
Issue: PermitRootLogin yes + PasswordAuthentication yes — password brute force on root.
Fix (per host, after 0.3):
# On pve201 OR pve10 — keep existing session open!
sed -i 's/^#*PermitRootLogin.*/PermitRootLogin prohibit-password/' /etc/ssh/sshd_config
sed -i 's/^#*PasswordAuthentication.*/PasswordAuthentication no/' /etc/ssh/sshd_config
sshd -t && systemctl reload sshd
Verify (new terminal): ssh -o BatchMode=yes root@10.0.10.201 true
Ansible (later): dedicated play for [proxmox] with roles/ssh (today make security only targets dev playbook).
| Host | Priority |
|---|---|
| pve201 | P0 |
| pve10 | P0 |
1.2 Restrict Proxmox UI/API (port 8006)
Issue: Anyone on LAN can hit full cluster API.
Fix (choose one):
- A — Proxmox firewall (recommended): Datacenter → Firewall → add rule: accept
8006from10.0.10.0/24and/or your Mac IP; drop others. - B — SSH tunnel only: no LAN exposure;
ssh -L 8006:127.0.0.1:8006 root@10.0.10.201→ browserhttps://127.0.0.1:8006.
Do not block 8006 globally without A or B in place.
1.3 RAM on pve201 (~2.5 GB free)
Issue: New guests or updates risk OOM.
Fix:
ssh root@10.0.10.201 'free -h; pct list'
# Stop non-essential CTs/VMs or migrate workload to pve10
Review running guests from make proxmox-info ALL=true; stop labs you do not need.
1.4 Deploy SSH keys to unreachable inventory hosts
Issue: Cannot audit or Ansible-manage hosts without keys.
Order:
make copy-ssh-key HOST=caddy(and each[services]host)make bootstrap-root-ssh HOST=listmonkwhere root password still works but key does notmake copy-ssh-keys GROUP=qaforladminhosts
Exit criteria: make ping succeeds for each group you will harden in phase 2.
Phase 2 — High
2.1 LXC SSH — disable password auth (all running CTs)
Issue: passwordauthentication yes on every audited LXC.
Fix from Proxmox host (no Mac SSH to CT required):
# pve201 — example for each running VMID
for id in 301 302 303 304 305 306 307 308 9101; do
pct exec $id -- sed -i 's/^#*PasswordAuthentication.*/PasswordAuthentication no/' /etc/ssh/sshd_config
pct exec $id -- bash -c 'sshd -t && systemctl reload sshd' || pct exec $id -- systemctl reload ssh
done
# pve10
for id in 210 215 216; do
pct exec $id -- sed -i 's/^#*PasswordAuthentication.*/PasswordAuthentication no/' /etc/ssh/sshd_config
pct exec $id -- systemctl reload sshd
done
Before disable: install your key on CTs you need (make copy-ssh-key HOST=vikanjans, etc.).
Note: CTs already have permitrootlogin without-password — keep that; only turn off passwords.
2.2 fail2ban on hypervisors
Issue: No brute-force protection on SSH (and eventually 8006 if proxied).
ssh root@10.0.10.201 'apt install -y fail2ban && systemctl enable --now fail2ban'
ssh root@10.0.10.10 'apt install -y fail2ban && systemctl enable --now fail2ban'
Optional: extend to high-value LXCs via roles/monitoring_server or manual install.
2.3 Patch backlog
| Target | Pending | Action |
|---|---|---|
| pve201 | ~105 | apt update && apt upgrade -y (maintenance window) |
| pve10 | ~92 | same |
| LXCs 303, 306, 307, 9101 | 79–89 | pct exec <id> -- apt update && apt upgrade -y |
| caseware, auto (pve10) | ~40 | same |
Order: hypervisors first (after snapshot), then LXCs one by one.
2.4 Application ports on 0.0.0.0
Issue: HTTP services exposed on LAN without TLS/auth.
| LXC / host | Port | Fix |
|---|---|---|
| qbit (91) | 8080 | Prefer VPN; or Caddy + auth; bind to internal IP |
| searchXNG (70) | 8080 | Same |
| punimTagFE (121) | 8000 | Behind Caddy; firewall allow only 10.0.10.0/24 |
| vaultwarden (142) | 8080 | Already in inventory — reverse proxy + TLS |
| portfolio | 106:80 (pve10 LXC 219, nginx) | Migrated 2026-05-22; pve201 LXC 306 destroyed |
| vikunja (159) | 3456 | Proxy via Caddy (todo.levkin.ca) |
Pattern: App listens 127.0.0.1 only; Caddy (10.0.10.50) terminates TLS for public URLs in inventory.
2.5 pve10 infrastructure
| Issue | Fix |
|---|---|
ZFS NAS.SP00 suspended |
zpool status; import/clear errors |
| PBS 10.0.10.200 unreachable | Fix network/service or remove stale datastore |
| Load ~30 | Identify heavy VMs; migrate or stop |
Phase 3 — Medium
3.1 unattended-upgrades
Hypervisors + important LXCs:
apt install -y unattended-upgrades apt-listchanges
dpkg-reconfigure -plow unattended-upgrades
3.2 Ansible security roles (by group)
Today make security runs playbooks/development.yml on dev only.
Expand with new/changed playbooks:
| Group | Playbook idea | Roles |
|---|---|---|
[proxmox] |
playbooks/infrastructure/proxmox-hardening.yml |
ssh, monitoring_server |
[services] |
extend playbooks/servers.yml |
ssh, base, fail2ban |
[qa] |
tag run on qa hosts | ssh |
| LXCs | optional pct + Ansible over SSH after keys |
ssh |
Workflow:
make check HOST=pve201 # after proxmox play exists
make dev HOST=dev01 --tags security
3.3 UFW on LXCs
Only punimTagFE-dev has UFW today. Template for others:
- Allow 22 from
10.0.10.0/24 - Allow app port only if needed on LAN
- Default deny incoming
Use roles/ssh UFW tasks or Proxmox guest firewall (firewall=1 on net0).
3.4 Align names / inventory
| Proxmox name | Ansible | Action |
|---|---|---|
| punimTagFE-dev | punimTag-dev | Rename CT or update app_projects name |
| vikunja-debian | vikanjans | OK (IP 159) |
| qbit-debian | qBittorrent | OK (IP 91) |
3.5 Mac (control machine)
| Issue | Fix |
|---|---|
| Firewall off | System Settings → Firewall → On |
| FileVault off | Enable FileVault |
Docker on *:3000 |
Bind to 127.0.0.1 unless LAN needed |
Phase 4 — Low
| Item | Fix |
|---|---|
| rpcbind (111) on pve201 / 9101 | Disable if unused: systemctl disable rpcbind |
| X11Forwarding on Proxmox | Set no in sshd |
| Stopped CTs 9001, 9401 | Leave stopped or destroy if unused |
make security-audit target |
Add Makefile → runs audit scripts, appends to report |
| Quarterly re-audit | Re-run scripts/security-audit-lxc-via-pve.sh |
Suggested calendar
| Week | Critical | High | Medium |
|---|---|---|---|
| 1 | 0.x prep, 1.1 SSH both PVE, 1.2 firewall 8006, 1.4 keys | 2.1 LXC passwords off (after keys), 2.2 fail2ban | — |
| 2 | 1.3 RAM 201 | 2.3 patch PVE + LXCs, 2.4 Caddy for 8080 services | 3.1 unattended-upgrades |
| 3 | — | 2.5 pve10 ZFS/PBS/load | 3.2 Ansible plays for proxmox + services |
| 4 | — | — | 3.3 UFW, 3.4 naming, 3.5 Mac |
Rollback (if locked out of SSH)
- Proxmox: use console in web UI (or physical/IPMI) → edit
/etc/ssh/sshd_config→PasswordAuthentication yestemporarily → reload sshd. - LXC:
pct enter <vmid>from PVE host.
Tracking checklist
Copy into your issue tracker or tick in security-audit-report.md:
Backup (Phase 0 — before everything)
- Git commit / branch for ansible repo
- PVE
sshd_configbackup on 201 + 10 - Proxmox CT snapshots (or vzdump) on critical LXCs
- Audit outputs saved locally (
security-hardening-backup-*) - Console access tested in Proxmox UI
Critical
- pve201 SSH: prohibit-password + no passwords
- pve10 SSH: same
- 8006 restricted to admin subnet/IP
- SSH keys on all inventory hosts
- pve201 RAM relieved
High
- All running LXCs: PasswordAuthentication no
- fail2ban on pve201 + pve10
- Patch pve201, pve10, LXCs with 40+ upgrades
- qBit / searchXNG / punimTag / vaultwarden port exposure reduced
- pve10 ZFS + PBS investigated
Medium
- unattended-upgrades on PVE + key LXCs
make security(or new plays) for proxmox, services, qa- UFW on critical LXCs
- Mac firewall + FileVault
Low
- rpcbind, X11, audit Makefile, naming cleanup
Quick reference: your login after plan
# Proxmox
ssh root@10.0.10.201 # key only
# Dev / QA
ssh ladmin@10.0.10.223 # key only → sudo -i when you need root
# Services (inventory root)
ssh root@10.0.10.50 # key only
# Proxmox UI (if 8006 restricted)
ssh -L 8006:127.0.0.1:8006 root@10.0.10.201
# → https://127.0.0.1:8006