Some checks failed
CI / skip-ci-check (pull_request) Successful in 7s
CI / lint-and-test (pull_request) Failing after 10s
CI / secret-scanning (pull_request) Successful in 7s
CI / dependency-scan (pull_request) Successful in 16s
CI / sast-scan (pull_request) Successful in 29s
CI / ansible-validation (pull_request) Failing after 54s
CI / license-check (pull_request) Successful in 14s
CI / vault-check (pull_request) Successful in 12s
CI / container-scan (pull_request) Successful in 7s
CI / sonar-analysis (pull_request) Successful in 7s
CI / playbook-test (pull_request) Successful in 25s
CI / workflow-summary (pull_request) Successful in 5s
Cal Authentik OIDC playbook/role (deferred until license), Vikunja OIDC docs and vault secrets, SSO matrix, mailcow LAN proxy fix, extended security audit docs, maintenance_cron role with group_vars split, and inventory updates (vikunja rename, identity/monitoring/cal host_vars). Co-authored-by: Cursor <cursoragent@cursor.com>
442 lines
19 KiB
Markdown
442 lines
19 KiB
Markdown
# Security Audit Report
|
||
|
||
**Last audit:** 2026-05-23 (re-run after SSH keys + `make maintenance`)
|
||
**Previous audit:** 2026-05-20
|
||
**Auditor:** `scripts/security-audit-*.sh`, Ansible `maintenance` + `maintenance_cron` roles
|
||
**Repo baseline** (`roles/ssh/defaults/main.yml`): `PermitRootLogin prohibit-password`, `PasswordAuthentication no`, UFW enabled.
|
||
|
||
---
|
||
|
||
## 2026-05-23 — Actions completed
|
||
|
||
| Action | Status |
|
||
|--------|--------|
|
||
| SSH keys → caseware, auto, cal, vikunja, mailcow, listmonk | ✅ All six reachable as `root` |
|
||
| SSH keys → mailcow/listmonk VMs | ✅ Via brief VM shutdown + disk inject on pve201 (no guest agent) |
|
||
| Inventory rename `vikanjans` → `vikunja` | ✅ `hosts` + `proxmox_vmid=301` |
|
||
| `apt upgrade` fleet (skip reboot) | ✅ 14 hosts via Ansible; auto via `pct exec` on pve10 |
|
||
| Tier 1 cron (journal + apt) | ✅ `roles/maintenance_cron` on PVE, sites, comms, ansible, hermes, etc. |
|
||
| Tier 2 cron (docker prune) | ✅ identity, monitoring, vikunja; git-ci-01 keeps `docker-prune-ci` |
|
||
| VM 104 (GPU-Dev) RAM 72→64 GiB | ✅ pve201; host free RAM ~1.7→10 GiB |
|
||
| Fix broken `host_vars` (ansibleVM, listmonk) | ✅ Plain YAML; old blobs → `*.vault-bak` |
|
||
| Vault `vault_*_become_password` + maintenance vaultwardenVM | ✅ 2026-05-23 |
|
||
| caddy root SSH + maintenance | ✅ `bootstrap-root-ssh-caddy`; inventory `ansible_user=root` |
|
||
| ansibleVM maintenance | ✅ become password in vault |
|
||
|
||
### Post-maintenance SSH reachability
|
||
|
||
| Host | SSH | Notes |
|
||
|------|-----|-------|
|
||
| caseware | ✅ | |
|
||
| auto | ✅ | Was slow from laptop earlier; OK after upgrade |
|
||
| cal | ✅ | |
|
||
| vikunja | ✅ | LXC 301 @ 10.0.10.159 |
|
||
| mailcow | ✅ | ~1 min downtime for key inject |
|
||
| listmonk | ✅ | ~1 min downtime for key inject |
|
||
|
||
### Maintenance playbook recap (`skip_reboot=true`)
|
||
|
||
| Host | Result |
|
||
|------|--------|
|
||
| pve201, pve10, caseware, cal, vikunja, mailcow, listmonk, identity, monitoring, hermes, levkin, portfolio, git-ci-01, sonarqube-01 | ✅ upgraded |
|
||
| caddy | ✅ (as `root`; no `sudo` package on host) |
|
||
| ansibleVM | ✅ (`vault_ansiblevm_become_password`) |
|
||
| vaultwardenVM | ✅ (`vault_vaultwarden_become_password`) |
|
||
|
||
### Open security gaps (unchanged until `make security`)
|
||
|
||
| Control | Fleet status | Risk if fixed wrong |
|
||
|---------|--------------|---------------------|
|
||
| `PasswordAuthentication yes` | Most LXCs + both PVE | **Low break risk** if SSH keys tested first in a second session |
|
||
| `PermitRootLogin yes` | pve201, pve10, sonarqube-01 | Same — use `prohibit-password`, not `no`, if you need root+key |
|
||
| fail2ban | Off everywhere | Enabling is safe; may lock you out only if you brute-force yourself |
|
||
| UFW | Off (except one dev LXC) | **Medium risk** — wrong rules drop SSH/80/443; apply via Ansible `roles/ssh` after allowlist |
|
||
| unattended-upgrades | hermes, ansibleVM only | Safe; schedule reboots separately |
|
||
| Proxmox :8006 | Open on LAN | Restrict in PVE firewall — **won't break VMs** |
|
||
| Docker on `0.0.0.0` | identity, monitoring, vaultwarden, qBit | Bind to `127.0.0.1` — **can break access** if Caddy route missing; test URL after |
|
||
| Tailscale | **Deferred** | Off by choice; remote access via **UniFi VPN** to LAN |
|
||
|
||
See [Risk explanations (2026-05-23)](#risk-explanations-2026-05-23) and [fail2ban vs password SSH](#fail2ban-vs-password-ssh) below.
|
||
|
||
---
|
||
|
||
## GPU-Dev (pve201 VM 104) — Ollama / LLMs
|
||
|
||
| Resource | Current |
|
||
|----------|---------|
|
||
| Host | pve201, VMID **104**, `GPU-Dev-Debian` |
|
||
| LAN IP | **10.0.10.122** (inventory `devGPU` @ 10.0.30.63 is a different network — use `.122` from LAN) |
|
||
| RAM | **64 GiB** guest (~60 GiB available when idle) |
|
||
| GPU | **RTX 4080 16 GiB** (PCI passthrough `hostpci0`) |
|
||
| Workload | **Ollama** already running (~3.6 GiB VRAM in sample) |
|
||
|
||
### Getting the most from RAM + GPU
|
||
|
||
1. **Right-size models to VRAM** — On a 16 GiB 4080, prefer quantised models that fit entirely in VRAM (e.g. 7B–14B Q4/Q5, or 32B Q2/Q3 if you accept quality trade-offs). If a model spills to CPU RAM, throughput drops sharply.
|
||
2. **One heavy model at a time** — Ollama loads models on demand; set `OLLAMA_MAX_LOADED_MODELS=1` (or keep only one client) so you do not fragment 64 GiB RAM + 16 GiB VRAM across several large weights.
|
||
3. **Parallel requests** — `OLLAMA_NUM_PARALLEL` defaults are conservative; raise only if VRAM headroom exists (watch `nvidia-smi` while under load).
|
||
4. **Keep guest RAM for KV cache** — With 64 GiB you can run larger context windows; set `OLLAMA_CONTEXT_LENGTH` / model `num_ctx` to what you need, not maximum “just because”.
|
||
5. **CPU offload only when needed** — `num_gpu` layers = all layers for speed; partial offload is for models that do not fit in VRAM, not for tuning.
|
||
6. **Disk** — Store models on fast local disk (not NFS); `ollama pull` once, prune old tags periodically (`ollama list` / remove unused).
|
||
7. **Proxmox** — Do not balloon GPU VM RAM; GPU passthrough already reserves most of the 64 GiB. Freeing pve201 meant lowering this VM from 72→64 GiB, not overcommitting other guests on 201.
|
||
8. **Optional** — [Open WebUI](https://github.com/open-webui/open-webui) on localhost + Caddy TLS; bind Ollama to `127.0.0.1:11434` only (LAN via VPN).
|
||
|
||
**Not in Ansible yet:** add `devGPU` / `10.0.10.122` to inventory when you want playbooks (cron, hardening) on this box.
|
||
|
||
---
|
||
|
||
## fail2ban vs password SSH
|
||
|
||
**What fail2ban does:** After too many failed SSH logins from an IP, it adds a **temporary firewall ban** for that IP (typically 10–60 minutes). It does **not** disable password authentication globally.
|
||
|
||
**Can passwords stay on if fail2ban is on?** Technically yes — fail2ban only rate-limits brute force; passwords are still weaker than keys. Best practice on servers: **keys + `PasswordAuthentication no` + fail2ban** (defence in depth).
|
||
|
||
**Your Proxmox console fallback:** If you lock yourself out of SSH on a guest, you can still use **Proxmox → VM → Console** or `pct enter` / `qm guest exec` from pve201/pve10. That is a good break-glass path, but it is **not** a substitute for keys on hosts you manage daily — console is slow and easy to misconfigure under pressure.
|
||
|
||
**Recommendation:** Enable fail2ban via `make security` with `ignoreip` including `10.0.10.0/24` and your UniFi VPN client subnet. Then disable password SSH once keys work everywhere you care about.
|
||
|
||
---
|
||
|
||
## Risk explanations (2026-05-23)
|
||
|
||
### Password SSH (`PasswordAuthentication yes`)
|
||
|
||
**How bad:** High on internet-facing IPs; medium on `10.0.10.0/24` only. Anyone who can reach :22 can try passwords indefinitely (no fail2ban).
|
||
|
||
**Will fixing break things?** No, if you (1) confirm key login works, (2) set `PasswordAuthentication no`, (3) keep a second SSH session open, (4) reload sshd. Breakage happens only if keys are missing/wrong.
|
||
|
||
### Root login (`PermitRootLogin yes` on hypervisors)
|
||
|
||
**How bad:** High — root + password on PVE is full cluster compromise.
|
||
|
||
**Will fixing break things?** Use `prohibit-password` (keys only), not `no`, unless you have another admin user with sudo. Ansible playbooks expect root on PVE today.
|
||
|
||
### fail2ban off
|
||
|
||
**How bad:** Medium — relies on LAN trust; SSH noise from scanners still fills logs.
|
||
|
||
**Will fixing break things?** Rarely. Tune `ignoreip` to your admin IP/subnet so your own typos don't ban you.
|
||
|
||
### UFW off
|
||
|
||
**How bad:** Medium on segmented LAN; high if any host has a public IP.
|
||
|
||
**Will fixing break things?** **Yes, if misconfigured** — default deny without allowing 22 from admin IP, 80/443 from Caddy, or Docker-published ports you still need. Use Ansible `roles/ssh` (UFW after SSH rules) and test.
|
||
|
||
### unattended-upgrades off
|
||
|
||
**How bad:** Medium — security patches lag until manual maintenance.
|
||
|
||
**Will fixing break things?** Usually no. Kernel updates may require reboot; use `Unattended-Upgrade::Automatic-Reboot "false"` until you want reboot windows.
|
||
|
||
### Proxmox UI :8006 exposed
|
||
|
||
**How bad:** **Critical** on untrusted networks — API gives VM/storage control.
|
||
|
||
**Will fixing break things?** Restricting to `10.0.10.0/24` does not break normal LAN admin access.
|
||
|
||
### HTTP services on all interfaces (8080, 3000, …)
|
||
|
||
**How bad:** High without TLS/auth at the edge; medium behind Caddy + LAN only.
|
||
|
||
**Will fixing break things?** **Yes** if you bind to `127.0.0.1` before Caddy `reverse_proxy` is updated. Order: Caddy route → test → then bind Docker to localhost.
|
||
|
||
### Remote access (Tailscale deferred)
|
||
|
||
**Decision:** Tailscale off; use **UniFi site-to-site / VPN** into `10.0.10.0/24` for admin and Ollama/GPU access.
|
||
|
||
**Security:** Ensure VPN is required for SSH and Proxmox :8006 from outside; do not port-forward :22/:8006 on the router without IP allowlists.
|
||
|
||
### pve201 RAM (was 97% used)
|
||
|
||
**How bad:** **Critical** — OOM kills guests, swap thrashing.
|
||
|
||
**Mitigation done:** VM 104 reduced 73728→65536 MiB (~8 GiB freed on hypervisor). Still tight; consider moving git-ci-01 or other workloads to pve10.
|
||
|
||
---
|
||
|
||
## 2026-05-20 — Original audit
|
||
|
||
**Scope:** Proxmox nodes `pve201` (10.0.10.201) and `pve10` (10.0.10.10), all LXCs via `pct exec`, SSH deep-dive on hypervisors.
|
||
|
||
---
|
||
|
||
## Executive summary
|
||
|
||
| Area | Critical | High | Medium |
|
||
|------|----------|------|--------|
|
||
| Hypervisors (201, 10) | 2 | 4 | 2 |
|
||
| LXCs on 201 (10 running) | 0 | 10 | 8 |
|
||
| LXCs on 10 (3 running) | 0 | 3 | 3 |
|
||
|
||
**Top priorities**
|
||
|
||
1. Harden **SSH on both Proxmox hosts** (root + passwords currently allowed).
|
||
2. Restrict **Proxmox API/UI port 8006** to admin IPs.
|
||
3. Disable **password SSH on all LXCs**; deploy keys + `make copy-ssh-keys` for inventory IPs.
|
||
4. Patch hosts with **40–105** pending apt upgrades (hypervisors worst).
|
||
5. Put **HTTP services** (8080, 8000, qBit, etc.) behind reverse proxy + TLS or bind to internal IPs.
|
||
|
||
---
|
||
|
||
## Proxmox hypervisors
|
||
|
||
### pve201 — 10.0.10.201 (`pve`)
|
||
|
||
| Resource | Status |
|
||
|----------|--------|
|
||
| OS | Debian 12, PVE 8.4.16, kernel 6.8.12-18-pve |
|
||
| RAM free | ~2.5 GB / 126 GB (**critical**) |
|
||
| Pending apt | **105** |
|
||
| UFW / fail2ban / unattended-upgrades | **None** |
|
||
|
||
#### SSH audit (dedicated)
|
||
|
||
| Setting | Current | Target |
|
||
|---------|---------|--------|
|
||
| `permitrootlogin` | **yes** | `prohibit-password` |
|
||
| `passwordauthentication` | **yes** | `no` |
|
||
| `pubkeyauthentication` | yes | yes |
|
||
| `maxauthtries` | 6 | 3–4 |
|
||
| `x11forwarding` | yes | no (on servers) |
|
||
| Root keys | 3 keys in `authorized_keys` | audit/remove unused |
|
||
|
||
#### Exposed services
|
||
|
||
| Port | Service | Risk |
|
||
|------|---------|------|
|
||
| 22 | SSH | Brute-force (no fail2ban) |
|
||
| 8006 | Proxmox API/UI | **Critical** — full cluster control |
|
||
| 3128 | spiceproxy | Medium |
|
||
| 111 | rpcbind | Low — reduce exposure |
|
||
|
||
#### Fixes (pve201)
|
||
|
||
```bash
|
||
# 1) SSH — prefer Ansible after limiting to your IP
|
||
make copy-ssh-key HOST=pve201 # if needed
|
||
# Manual quick fix on host:
|
||
sed -i 's/^#*PermitRootLogin.*/PermitRootLogin prohibit-password/' /etc/ssh/sshd_config
|
||
sed -i 's/^#*PasswordAuthentication.*/PasswordAuthentication no/' /etc/ssh/sshd_config
|
||
sshd -t && systemctl reload sshd
|
||
|
||
# 2) Proxmox firewall — Datacenter → Firewall → restrict 8006 to 10.0.10.0/24 or admin IP
|
||
# Or iptables on host for port 8006
|
||
|
||
# 3) fail2ban
|
||
apt install fail2ban -y
|
||
systemctl enable --now fail2ban
|
||
|
||
# 4) Auto security updates
|
||
apt install unattended-upgrades apt-listchanges -y
|
||
dpkg-reconfigure -plow unattended-upgrades
|
||
|
||
# 5) Patch
|
||
apt update && apt upgrade -y
|
||
```
|
||
|
||
**Ansible (when ready):** add `pve201` / `pve10` to a `proxmox` group play with `roles/ssh` + `roles/monitoring_server` (fail2ban).
|
||
Do **not** lock yourself out — test with second session first.
|
||
|
||
---
|
||
|
||
### pve10 — 10.0.10.10 (`PVENAS`)
|
||
|
||
| Resource | Status |
|
||
|----------|--------|
|
||
| OS | Debian 13 (trixie), PVE, kernel 6.17.13-3-pve |
|
||
| Load | **~30** on 24 CPUs (overloaded) |
|
||
| Pending apt | **92** |
|
||
| UFW / fail2ban / unattended-upgrades | **None** |
|
||
| ZFS `NAS.SP00` | **inactive** (I/O suspended) |
|
||
| PBS `PVEBUVD00` → 10.0.10.200:8007 | **unreachable** |
|
||
|
||
#### SSH audit (dedicated)
|
||
|
||
Same as pve201: `permitrootlogin yes`, `passwordauthentication yes`, 3 root authorized_keys.
|
||
|
||
#### Exposed services
|
||
|
||
| Port | Service | Risk |
|
||
|------|---------|------|
|
||
| 22 | SSH | High |
|
||
| 8006 | Proxmox API/UI | **Critical** |
|
||
| 2049, mountd, statd | NFS/RPC | High on LAN |
|
||
| 3128 | spiceproxy | Medium |
|
||
|
||
#### Fixes (pve10)
|
||
|
||
Same SSH / fail2ban / unattended-upgrades / patch steps as pve201.
|
||
|
||
Additional:
|
||
|
||
```bash
|
||
# Investigate ZFS pool
|
||
zpool status NAS.SP00
|
||
# Fix PBS connectivity or remove stale datastore from Proxmox UI
|
||
```
|
||
|
||
---
|
||
|
||
## LXCs on pve201 (via `pct exec`)
|
||
|
||
| VMID | Name | IP | Status | SSH root | Password auth | UFW | fail2ban | Upgrades | Public services |
|
||
|------|------|-----|--------|----------|---------------|-----|----------|----------|-----------------|
|
||
| 301 | vikunja-debian | 10.0.10.159 | running | without-password | **yes** | no | no | 0 | **3456**, 22 |
|
||
| 302 | qbit-debian | 10.0.10.91 | running | without-password | **yes** | no | no | 0 | **8080** (qBit), 22 |
|
||
| 303 | searchXNG-debian | 10.0.10.70 | running | without-password | **yes** | no | no | **83** | **8080**, 22 |
|
||
| 304 | wireguard-debian | 10.0.10.192 | running | without-password | **yes** | no | no | 0 | 22 |
|
||
| 305 | kuma-debian | 10.0.10.197 | **stopped** | — | — | — | — | — | replaced by LXC 218 |
|
||
| 306 | portfolio | — | **destroyed** | — | — | — | — | — | migrated → pve10 LXC **219** @ `10.0.10.106` (purged 2026-05-22) |
|
||
| 307 | jobber-delian | 10.0.10.178 | running | without-password | **yes** | no | no | **83** | **3005**, 22 |
|
||
| 308 | stirling-pdf | 10.0.10.43 | running | without-password | **yes** | no | no | 0 | **8080**, 22 |
|
||
| 9001 | pote-dev | 10.0.10.114 | **stopped** | — | — | — | — | — | — |
|
||
| 9101 | punimTagFE-dev | 10.0.10.121 | running | without-password | **yes** | **active** | no | **89** | **8000**, 111, 22 |
|
||
| 9401 | mirrormatch-dev | 10.0.10.141 | **stopped** | — | — | — | — | — | — |
|
||
|
||
**Inventory mapping:** `vikunja` → 159 (LXC 301), `qBittorrent` → 91, `punimTag` app → 121.
|
||
|
||
### Common LXC issues (pve201)
|
||
|
||
| Issue | Severity | Fix |
|
||
|-------|----------|-----|
|
||
| `passwordauthentication yes` on all LXCs | High | Set `PasswordAuthentication no` in `/etc/ssh/sshd_config`, reload sshd |
|
||
| No fail2ban | High | Install fail2ban or rely on Proxmox FW + LAN segmentation |
|
||
| Apps on `0.0.0.0:8080` / 8000 / 3456 | High | Bind to localhost + Caddy, or restrict via Proxmox guest firewall (`firewall=1` on net0 — enable rules) |
|
||
| 79–89 pending upgrades on several CTs | Medium | `pct exec <id> -- apt update && apt upgrade -y` |
|
||
| Stopped dev CTs (9001, 9401) | Low | Start when needed or keep stopped to reduce attack surface |
|
||
|
||
### Per-LXC fixes (pve201)
|
||
|
||
```bash
|
||
# Example: harden + patch vikunja (301) from Proxmox host
|
||
pct exec 301 -- sed -i 's/^#*PasswordAuthentication.*/PasswordAuthentication no/' /etc/ssh/sshd_config
|
||
pct exec 301 -- systemctl reload ssh
|
||
|
||
# Patch container
|
||
pct exec 303 -- bash -c 'apt update && apt upgrade -y'
|
||
|
||
# Copy your SSH key (from Mac, once password/key works)
|
||
make copy-ssh-key HOST=vikunja # 10.0.10.159
|
||
make copy-ssh-key HOST=qBittorrent # 10.0.10.91
|
||
```
|
||
|
||
**punimTagFE-dev (9101):** Only LXC with **UFW active** — extend rules to deny inbound except 22 from admin subnet; still disable password auth.
|
||
|
||
---
|
||
|
||
## LXCs on pve10 (via `pct exec`)
|
||
|
||
| VMID | Name | IP | Status | SSH root | Password auth | UFW | fail2ban | Upgrades | Public services |
|
||
|------|------|-----|--------|----------|---------------|-----|----------|----------|-----------------|
|
||
| 210 | cal | 10.0.10.228 | running | without-password | **yes** | no | no | 0 | **3000**, 22 |
|
||
| 215 | caseware | 10.0.10.105 | running | without-password | **yes** | no | no | **40** | **80** (nginx), 22 |
|
||
| 216 | auto | 10.0.10.59 | running | without-password | **yes** | no | no | **40** | **80** (nginx), 22 |
|
||
|
||
**Inventory mapping:** `caseware` → 105, `auto` → 59.
|
||
|
||
### Fixes (pve10 LXCs)
|
||
|
||
```bash
|
||
# SSH harden caseware (215)
|
||
pct exec 215 -- sed -i 's/^#*PasswordAuthentication.*/PasswordAuthentication no/' /etc/ssh/sshd_config
|
||
pct exec 215 -- systemctl reload sshd
|
||
|
||
# Patch
|
||
pct exec 215 -- apt update && apt upgrade -y
|
||
pct exec 216 -- apt update && apt upgrade -y
|
||
|
||
# Deploy keys from Mac
|
||
make copy-ssh-key HOST=caseware
|
||
make copy-ssh-key HOST=auto
|
||
```
|
||
|
||
**HTTP port 80 on caseware/auto:** Ensure TLS termination on Caddy (inventory host `caddy` 10.0.10.50) and no plain HTTP from WAN if exposed.
|
||
|
||
---
|
||
|
||
## SSH hardening checklist (all Linux targets)
|
||
|
||
Use this order to avoid lockout:
|
||
|
||
1. Confirm your key works: `ssh -o BatchMode=yes root@<ip> true`
|
||
2. Set `PasswordAuthentication no`
|
||
3. Set `PermitRootLogin prohibit-password` (LXCs already `without-password` — equivalent for keys-only)
|
||
4. `sshd -t && systemctl reload sshd`
|
||
5. Open **second terminal** and test before closing first
|
||
6. Optional: change SSH port, `MaxAuthTries 4`, disable `X11Forwarding`
|
||
|
||
**Ansible alignment:**
|
||
|
||
```bash
|
||
# After keys on host
|
||
make dev HOST=<hostname> --tags security
|
||
# or role ssh via playbooks that include roles/ssh
|
||
```
|
||
|
||
---
|
||
|
||
## Re-run audits
|
||
|
||
```bash
|
||
# Hypervisor full audit
|
||
ssh root@10.0.10.201 'bash -s' < scripts/security-audit-remote.sh
|
||
ssh root@10.0.10.10 'bash -s' < scripts/security-audit-remote.sh
|
||
|
||
# Hypervisor SSH-only
|
||
ssh root@10.0.10.201 'bash -s' < scripts/security-audit-ssh.sh
|
||
|
||
# All LXCs on a node
|
||
ssh root@10.0.10.201 'bash -s' < scripts/security-audit-lxc-via-pve.sh
|
||
ssh root@10.0.10.10 'bash -s' < scripts/security-audit-lxc-via-pve.sh
|
||
```
|
||
|
||
---
|
||
|
||
## Tracking
|
||
|
||
| Item | Owner | Status |
|
||
|------|-------|--------|
|
||
| SSH keys caseware, auto, cal, vikunja, mailcow, listmonk | 2026-05-23 | ☑ |
|
||
| Fleet `apt upgrade` (no reboot) | 2026-05-23 | ☑ all previously failed hosts fixed |
|
||
| Tier 1 cron (journal + apt) | 2026-05-23 | ☑ PVE + most hosts via Ansible |
|
||
| Tier 2 cron (docker prune) | 2026-05-23 | ☑ identity, monitoring, vikunja, git-ci-01 |
|
||
| VM 104 RAM 72→64 GiB | 2026-05-23 | ☑ |
|
||
| Inventory `vikunja` rename | 2026-05-23 | ☑ |
|
||
| Fix `host_vars` ansibleVM / listmonk merge | 2026-05-23 | ☑ plain YAML (review `*.vault-bak`) |
|
||
| SSH harden pve201 | | ☐ |
|
||
| SSH harden pve10 | | ☐ |
|
||
| Restrict 8006 on both nodes | | ☐ |
|
||
| fail2ban on hypervisors | | ☐ |
|
||
| `make security` on production groups | | ☐ |
|
||
| Disable password SSH on all LXCs | | ☐ |
|
||
| `copy-ssh-keys` remaining inventory | | ☐ partial |
|
||
| TLS / localhost bind for :8080 services | | ☐ |
|
||
| unattended-upgrades all production | | ☐ |
|
||
| Tailscale re-auth | | ⏸ deferred (UniFi VPN) |
|
||
| Fix ZFS NAS.SP00 on pve10 | | ☐ |
|
||
| caddy Ansible as root | 2026-05-23 | ☑ |
|
||
| vaultwardenVM / ansibleVM become in vault | 2026-05-23 | ☑ |
|
||
| Add GPU-Dev `10.0.10.122` to inventory | | ☐ |
|
||
| Ollama bind localhost + optional Open WebUI | | ☐ |
|
||
|
||
---
|
||
|
||
## Next steps (priority)
|
||
|
||
1. **`make security`** on one site host (e.g. caseware) with a second SSH session open — disable password SSH, enable UFW + fail2ban (`ignoreip` = LAN + VPN pool).
|
||
2. **Restrict Proxmox :8006** to `10.0.10.0/24` + VPN subnet on pve201 and pve10.
|
||
3. **Bind internal Docker ports** on identity / monitoring / vaultwarden to `127.0.0.1` after confirming Caddy routes.
|
||
4. **GPU-Dev:** point clients at `http://10.0.10.122:11434` over VPN; tune Ollama env vars; add host to inventory when automating.
|
||
5. **unattended-upgrades** on production LXCs (reboot policy manual).
|
||
6. Review `host_vars/*.vault-bak` and merge any secrets still needed into vault + plain host_vars.
|
||
|
||
---
|
||
|
||
## References
|
||
|
||
- **[Security remediation plan](security-remediation-plan.md)** — phased fixes (critical → low) and login model
|
||
- [Security hardening guide](security.md)
|
||
- [SECURITY_HARDENING_PLAN.md](../SECURITY_HARDENING_PLAN.md)
|
||
- Role defaults: `roles/ssh/defaults/main.yml`
|