# Security Remediation Plan **Based on:** [security-audit-report.md](security-audit-report.md) (2026-05-20) **Goal:** Align hosts with `roles/ssh` (keys only, no password SSH) without locking yourself out. --- ## How you should log in (not “ladmin → root” everywhere) Your inventory uses **different users on purpose**. After hardening, the pattern is: | Host type | Inventory user | How you work | Root access | |-----------|----------------|--------------|-------------| | **Proxmox** (`pve201`, `pve10`) | `root` | `ssh root@10.0.10.201` with **your SSH key** | Direct root (keys only, no password) | | **Dev / QA** (`dev01`, `git-ci-01`, …) | `ladmin` (or `beast`, `master`) | `ssh ladmin@host` with **key** | `sudo` for admin tasks; Ansible `become: true` | | **Services** (caddy, jellyfin, …) | often `root` | `ssh root@host` with **key** | Direct root (keys only) | | **Optional bootstrap** | — | `make bootstrap-root-ssh HOST=x` | One-time: key on `ladmin` → `su` to install **root** key → then harden SSH | **You do not need** “SSH ladmin then su root” on Proxmox if you keep managing them as `root` in inventory — you need **root + SSH key + passwords disabled**. **You do** use ladmin → sudo on dev/qa boxes where `ansible_user=ladmin`. That is normal: unprivileged (or sudo) login + elevation, not password guessing on root. **`PermitRootLogin prohibit-password`** means: root may log in **only with a key**, never with a password. It does **not** mean “ban root; use ladmin only.” **`PasswordAuthentication no`** means: **nobody** (root, ladmin, etc.) can SSH with a password — keys only. --- ## Phases overview | Phase | When | Focus | |-------|------|--------| | **0 — Backup + prep** | Before any change | Snapshots, `sshd` copies, git commit, keys, second SSH session | | **1 — Critical** | Week 1 | Proxmox SSH + 8006, keys everywhere, RAM on 201 | | **2 — High** | Week 1–2 | LXCs SSH, fail2ban, patching, app ports | | **3 — Medium** | Week 2–4 | unattended-upgrades, Ansible `make security`, TLS | | **4 — Low** | Ongoing | rpcbind, naming, stopped CTs, Mac, docs | --- ## Phase 0 — Backup (before any hardening) **Yes — back up first.** SSH and firewall mistakes can lock you out; patches can break services. Use the right backup type per layer. ### What to back up (by layer) | Layer | What | Method | Rollback if SSH breaks | |-------|------|--------|-------------------------| | **Your Mac** | Ansible repo + `~/.ansible-vault-pass` (secure copy) + SSH keys | Time Machine / git commit / copy `~/.ssh` | N/A | | **Proxmox hosts** | `/etc/ssh/sshd_config`, `/etc/pve/`, firewall rules | Copy files + **Proxmox snapshot** optional | **Console** in web UI (`pct enter` / VM console) | | **Each LXC/VM** | Full guest state | **Proxmox snapshot** or `vzdump` | Restore snapshot or rollback CT | | **Dev workstations** | OS + home (if Timeshift installed) | `make timeshift-snapshot HOST=dev02` | `make timeshift-restore` | | **Central PBS** | — | **Not reliable today** — `10.0.10.200` unreachable | Fix PBS later; don’t depend on it for this work | ### 0A — Mac / repo (5 minutes) ```bash cd ~/Documents/code/ansible git status git add -A && git commit -m "Pre-security-hardening baseline" # if you want a restore point # Store vault passphrase somewhere safe (password manager), NOT only on disk # Optional: encrypted copy of ~/.ansible-vault-pass offline ``` ### 0B — Proxmox: config files (both nodes) ```bash for pve in 10.0.10.201 10.0.10.10; do ssh root@$pve "mkdir -p /root/pre-hardening-$(date +%Y%m%d) && \ cp -a /etc/ssh/sshd_config /root/pre-hardening-$(date +%Y%m%d)/ && \ cp -a /etc/pve /root/pre-hardening-$(date +%Y%m%d)/pve-etc 2>/dev/null; \ ls -la /root/pre-hardening-$(date +%Y%m%d)/" done ``` ### 0C — Proxmox: snapshots (recommended before SSH/firewall on PVE) **Running LXCs on pve201** (from audit): 301–308, 9101 — snapshot each before `pct exec` SSH changes. **Running LXCs on pve10:** 210, 215, 216. ```bash # On pve201 — snapshot (fast, local-lvm; needs free space) ssh root@10.0.10.201 'for id in 301 302 303 304 305 306 307 308 9101; do name=$(pct list | awk -v i=$id "$1==i {print \$4}") echo "Snapshot vmid=$id ($name)" pct snapshot $id pre-ssh-hardening-$(date +%Y%m%d) || echo "FAILED $id" done' # On pve10 ssh root@10.0.10.10 'for id in 210 215 216; do pct snapshot $id pre-ssh-hardening-$(date +%Y%m%d) || echo "FAILED $id" done' ``` **Optional full backup** (slower, larger) — important CTs only if snapshots fail (low disk on 201): ```bash vzdump --storage local --mode snapshot --compress zstd ``` **Check space on pve201 first** (~2.5 GB RAM + disk — snapshot needs free space on `local-lvm`): ```bash ssh root@10.0.10.201 'pvesm status; free -h' ``` If snapshots fail for lack of space: do **0B only** on PVE, then harden SSH using **Proxmox console** as safety net (no snapshot). ### 0D — Inventory VMs with Timeshift (`dev` group) Only where Timeshift is already installed (e.g. `dev02`): ```bash make timeshift-snapshot HOST=dev02 make timeshift-list HOST=dev02 ``` Not used on Proxmox or most LXCs by default. ### 0E — Export current SSH settings (audit trail) ```bash mkdir -p ~/security-hardening-backup-$(date +%Y%m%d) ssh root@10.0.10.201 'bash -s' < scripts/security-audit-ssh.sh > ~/security-hardening-backup-$(date +%Y%m%d)/pve201-ssh.txt ssh root@10.0.10.10 'bash -s' < scripts/security-audit-ssh.sh > ~/security-hardening-backup-$(date +%Y%m%d)/pve10-ssh.txt ssh root@10.0.10.201 'bash -s' < scripts/security-audit-lxc-via-pve.sh > ~/security-hardening-backup-$(date +%Y%m%d)/pve201-lxc.txt ``` ### Backup exit criteria (do not skip) - [ ] Git commit (or branch) for ansible repo - [ ] `sshd_config` (+ optional `/etc/pve`) copied on **both** PVE nodes - [ ] Proxmox snapshots **or** documented reason skipped (disk/RAM) - [ ] Second SSH session tested to `pve201` / `pve10` - [ ] You know how to open **Proxmox → VM/CT → Console** if SSH fails ### Rollback quick reference | Problem | Rollback | |---------|----------| | Bad `sshd_config` on PVE | Console → restore `/root/pre-hardening-*/sshd_config` → `systemctl reload sshd` | | Bad LXC SSH | `pct rollback pre-ssh-hardening-YYYYMMDD` | | Bad patch on CT | Same snapshot rollback | | Locked out of LAN on 8006 | Console → disable/datacenter firewall rule | --- ## Phase 0 — Prep (after backups) | # | Task | Command / notes | |---|------|----------------| | 0.1 | Confirm vault password file | `~/.ansible-vault-pass` | | 0.2 | Bootstrap control node | `make bootstrap` | | 0.3 | Verify key on Proxmox | `ssh -o BatchMode=yes root@10.0.10.201 true` | | 0.4 | Copy keys to inventory | `make copy-ssh-keys` (or per group) | | 0.5 | Document admin IP | e.g. `10.0.10.127` for firewall rules | | 0.6 | Open **second terminal** before changing `sshd` | Test login before closing first session | **Exit criteria:** Backups done (above) + key login works to `pve201`, `pve10`, and hosts you will harden next. --- ## Phase 1 — Critical ### 1.1 Proxmox SSH (pve201 + pve10) **Issue:** `PermitRootLogin yes` + `PasswordAuthentication yes` — password brute force on root. **Fix (per host, after 0.3):** ```bash # On pve201 OR pve10 — keep existing session open! sed -i 's/^#*PermitRootLogin.*/PermitRootLogin prohibit-password/' /etc/ssh/sshd_config sed -i 's/^#*PasswordAuthentication.*/PasswordAuthentication no/' /etc/ssh/sshd_config sshd -t && systemctl reload sshd ``` **Verify (new terminal):** `ssh -o BatchMode=yes root@10.0.10.201 true` **Ansible (later):** dedicated play for `[proxmox]` with `roles/ssh` (today `make security` only targets `dev` playbook). | Host | Priority | |------|----------| | pve201 | P0 | | pve10 | P0 | --- ### 1.2 Restrict Proxmox UI/API (port 8006) **Issue:** Anyone on LAN can hit full cluster API. **Fix (choose one):** - **A — Proxmox firewall (recommended):** Datacenter → Firewall → add rule: accept `8006` from `10.0.10.0/24` and/or your Mac IP; drop others. - **B — SSH tunnel only:** no LAN exposure; `ssh -L 8006:127.0.0.1:8006 root@10.0.10.201` → browser `https://127.0.0.1:8006`. **Do not** block 8006 globally without A or B in place. --- ### 1.3 RAM on pve201 (~2.5 GB free) **Issue:** New guests or updates risk OOM. **Fix:** ```bash ssh root@10.0.10.201 'free -h; pct list' # Stop non-essential CTs/VMs or migrate workload to pve10 ``` Review running guests from `make proxmox-info ALL=true`; stop labs you do not need. --- ### 1.4 Deploy SSH keys to unreachable inventory hosts **Issue:** Cannot audit or Ansible-manage hosts without keys. **Order:** 1. `make copy-ssh-key HOST=caddy` (and each `[services]` host) 2. `make bootstrap-root-ssh HOST=listmonk` where root password still works but key does not 3. `make copy-ssh-keys GROUP=qa` for `ladmin` hosts **Exit criteria:** `make ping` succeeds for each group you will harden in phase 2. --- ## Phase 2 — High ### 2.1 LXC SSH — disable password auth (all running CTs) **Issue:** `passwordauthentication yes` on every audited LXC. **Fix from Proxmox host (no Mac SSH to CT required):** ```bash # pve201 — example for each running VMID for id in 301 302 303 304 305 306 307 308 9101; do pct exec $id -- sed -i 's/^#*PasswordAuthentication.*/PasswordAuthentication no/' /etc/ssh/sshd_config pct exec $id -- bash -c 'sshd -t && systemctl reload sshd' || pct exec $id -- systemctl reload ssh done # pve10 for id in 210 215 216; do pct exec $id -- sed -i 's/^#*PasswordAuthentication.*/PasswordAuthentication no/' /etc/ssh/sshd_config pct exec $id -- systemctl reload sshd done ``` **Before disable:** install your key on CTs you need (`make copy-ssh-key HOST=vikanjans`, etc.). **Note:** CTs already have `permitrootlogin without-password` — keep that; only turn off passwords. --- ### 2.2 fail2ban on hypervisors **Issue:** No brute-force protection on SSH (and eventually 8006 if proxied). ```bash ssh root@10.0.10.201 'apt install -y fail2ban && systemctl enable --now fail2ban' ssh root@10.0.10.10 'apt install -y fail2ban && systemctl enable --now fail2ban' ``` Optional: extend to high-value LXCs via `roles/monitoring_server` or manual install. --- ### 2.3 Patch backlog | Target | Pending | Action | |--------|---------|--------| | pve201 | ~105 | `apt update && apt upgrade -y` (maintenance window) | | pve10 | ~92 | same | | LXCs 303, 306, 307, 9101 | 79–89 | `pct exec -- apt update && apt upgrade -y` | | caseware, auto (pve10) | ~40 | same | **Order:** hypervisors first (after snapshot), then LXCs one by one. --- ### 2.4 Application ports on `0.0.0.0` **Issue:** HTTP services exposed on LAN without TLS/auth. | LXC / host | Port | Fix | |------------|------|-----| | qbit (91) | 8080 | Prefer VPN; or Caddy + auth; bind to internal IP | | searchXNG (70) | 8080 | Same | | punimTagFE (121) | 8000 | Behind Caddy; firewall allow only 10.0.10.0/24 | | vaultwarden (142) | 8080 | Already in inventory — reverse proxy + TLS | | portfolio | **106:80** (pve10 LXC 219, nginx) | Migrated 2026-05-22; pve201 LXC **306 destroyed** | | vikunja (159) | 3456 | Proxy via Caddy (`todo.levkin.ca`) | **Pattern:** App listens `127.0.0.1` only; **Caddy** (`10.0.10.50`) terminates TLS for public URLs in inventory. --- ### 2.5 pve10 infrastructure | Issue | Fix | |-------|-----| | ZFS `NAS.SP00` suspended | `zpool status`; import/clear errors | | PBS 10.0.10.200 unreachable | Fix network/service or remove stale datastore | | Load ~30 | Identify heavy VMs; migrate or stop | --- ## Phase 3 — Medium ### 3.1 unattended-upgrades Hypervisors + important LXCs: ```bash apt install -y unattended-upgrades apt-listchanges dpkg-reconfigure -plow unattended-upgrades ``` ### 3.2 Ansible security roles (by group) Today `make security` runs `playbooks/development.yml` on **`dev` only**. **Expand with new/changed playbooks:** | Group | Playbook idea | Roles | |-------|---------------|-------| | `[proxmox]` | `playbooks/infrastructure/proxmox-hardening.yml` | `ssh`, monitoring_server | | `[services]` | extend `playbooks/servers.yml` | `ssh`, `base`, fail2ban | | `[qa]` | tag run on qa hosts | `ssh` | | LXCs | optional `pct` + Ansible over SSH after keys | `ssh` | **Workflow:** ```bash make check HOST=pve201 # after proxmox play exists make dev HOST=dev01 --tags security ``` ### 3.3 UFW on LXCs Only **punimTagFE-dev** has UFW today. Template for others: - Allow 22 from `10.0.10.0/24` - Allow app port only if needed on LAN - Default deny incoming Use `roles/ssh` UFW tasks or Proxmox guest firewall (`firewall=1` on `net0`). ### 3.4 Align names / inventory | Proxmox name | Ansible | Action | |--------------|---------|--------| | punimTagFE-dev | punimTag-dev | Rename CT or update `app_projects` name | | vikunja-debian | vikanjans | OK (IP 159) | | qbit-debian | qBittorrent | OK (IP 91) | ### 3.5 Mac (control machine) | Issue | Fix | |-------|-----| | Firewall off | System Settings → Firewall → On | | FileVault off | Enable FileVault | | Docker on `*:3000` | Bind to `127.0.0.1` unless LAN needed | --- ## Phase 4 — Low | Item | Fix | |------|-----| | rpcbind (111) on pve201 / 9101 | Disable if unused: `systemctl disable rpcbind` | | X11Forwarding on Proxmox | Set `no` in sshd | | Stopped CTs 9001, 9401 | Leave stopped or destroy if unused | | `make security-audit` target | Add Makefile → runs audit scripts, appends to report | | Quarterly re-audit | Re-run `scripts/security-audit-lxc-via-pve.sh` | --- ## Suggested calendar | Week | Critical | High | Medium | |------|----------|------|--------| | **1** | 0.x prep, 1.1 SSH both PVE, 1.2 firewall 8006, 1.4 keys | 2.1 LXC passwords off (after keys), 2.2 fail2ban | — | | **2** | 1.3 RAM 201 | 2.3 patch PVE + LXCs, 2.4 Caddy for 8080 services | 3.1 unattended-upgrades | | **3** | — | 2.5 pve10 ZFS/PBS/load | 3.2 Ansible plays for proxmox + services | | **4** | — | — | 3.3 UFW, 3.4 naming, 3.5 Mac | --- ## Rollback (if locked out of SSH) - Proxmox: use **console** in web UI (or physical/IPMI) → edit `/etc/ssh/sshd_config` → `PasswordAuthentication yes` temporarily → reload sshd. - LXC: `pct enter ` from PVE host. --- ## Tracking checklist Copy into your issue tracker or tick in [security-audit-report.md](security-audit-report.md): **Backup (Phase 0 — before everything)** - [ ] Git commit / branch for ansible repo - [ ] PVE `sshd_config` backup on 201 + 10 - [ ] Proxmox CT snapshots (or vzdump) on critical LXCs - [ ] Audit outputs saved locally (`security-hardening-backup-*`) - [ ] Console access tested in Proxmox UI **Critical** - [ ] pve201 SSH: prohibit-password + no passwords - [ ] pve10 SSH: same - [ ] 8006 restricted to admin subnet/IP - [ ] SSH keys on all inventory hosts - [ ] pve201 RAM relieved **High** - [ ] All running LXCs: PasswordAuthentication no - [ ] fail2ban on pve201 + pve10 - [ ] Patch pve201, pve10, LXCs with 40+ upgrades - [ ] qBit / searchXNG / punimTag / vaultwarden port exposure reduced - [ ] pve10 ZFS + PBS investigated **Medium** - [ ] unattended-upgrades on PVE + key LXCs - [ ] `make security` (or new plays) for proxmox, services, qa - [ ] UFW on critical LXCs - [ ] Mac firewall + FileVault **Low** - [ ] rpcbind, X11, audit Makefile, naming cleanup --- ## Quick reference: your login after plan ```bash # Proxmox ssh root@10.0.10.201 # key only # Dev / QA ssh ladmin@10.0.10.223 # key only → sudo -i when you need root # Services (inventory root) ssh root@10.0.10.50 # key only # Proxmox UI (if 8006 restricted) ssh -L 8006:127.0.0.1:8006 root@10.0.10.201 # → https://127.0.0.1:8006 ```