ansible/docs/guides/monitoring-stack.md
ilia f0ff00a8dc
All checks were successful
CI / skip-ci-check (pull_request) Successful in 6s
CI / ansible-validation (pull_request) Successful in 46s
CI / lint-and-test (pull_request) Successful in 51s
CI / secret-scanning (pull_request) Successful in 6s
CI / dependency-scan (pull_request) Successful in 15s
CI / license-check (pull_request) Successful in 13s
CI / sast-scan (pull_request) Successful in 24s
CI / vault-check (pull_request) Successful in 11s
CI / container-scan (pull_request) Successful in 6s
CI / sonar-analysis (pull_request) Successful in 5s
CI / playbook-test (pull_request) Successful in 25s
CI / workflow-summary (pull_request) Successful in 4s
Add levkin.ca site, document git-ci-01 runner tuning
Inventory and Caddy playbook for levkin LXC 220; Makefile target
caddy-levkin. Document git-ci-01 disk (64G), capacity 2, prune cron,
and pve201 RAM limits in host_vars and homelab guides.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-22 22:38:56 -04:00

7.6 KiB

Monitoring stack (LXC 218)

Host: monitoring @ 10.0.10.22 (PVENAS pve10, VMID 218)
Compose: /opt/monitoring/compose.yml
Stacks dir (Dockge): /opt/stacks

All admin UIs are LAN-only (no public Caddy blocks). Use Tailscale or local network.

Service URL Port Notes
Uptime Kuma http://10.0.10.22:3001 3001 Admin + monitors configured (replaces pve201 LXC 305 @ .197, stopped)
Dockge http://10.0.10.22:5001 5001 Manage compose on this LXC only
Umami http://10.0.10.22:3000 3000 Password changed ; levkin.ca + caseware + auto + portfolio tracked

Secrets: /opt/monitoring/.env on the LXC (mode 600). Not in git.


Backups (pve10)

Guest VMID Snapshot Date
identity 217 backup-20260522 2026-05-22
monitoring 218 backup-20260522 2026-05-22

On pve10:

pct listsnapshot 217
pct listsnapshot 218
# Rollback if needed:
# pct rollback 217 backup-20260522

Optional off-node copy (when NAS healthy): vzdump 217 218 --storage local --mode snapshot --compress zstd


Uptime Kuma — monitors

Configured in UI (all green). Remove the Nextcloud monitor when VM 201 is retired.

Name URL
Authentik https://auth.levkin.ca
Cal.com https://cal.levkin.ca
Caseware / Auto marketing sites
Mailcow https://mail.levkine.ca
Listmonk, Gitea, Vault, Todo, PVE nodes per your dashboard

Uptime Kuma — email alerts (Mailcow)

Mail domain is levkine.ca (with e). Cal.com already sends via Mailcow as cal@levkine.ca.

Which email to use

Role Address Notes
SMTP server mail.levkine.ca Mailcow host
SMTP port 587 STARTTLS (not 465 unless you prefer SMTPS)
From (sender) alerts@levkine.ca Create mailbox in Mailcow if it does not exist
To (you) idobkin@gmail.com or ilia@levkine.ca Use whichever you read; Gmail is fine for alerts

1. Create mailbox in Mailcow (if needed)

Automated (needs Mailcow API key):

# Define mailbox in group_vars/all/mailcow.yml, password in vault:
make mailcow-mailbox MAILBOX=alerts
# (alias: make mailcow-create-alerts)

# Import from .env into vault once, then delete .env:
cp .env.example .env   # MAILCOW_API_KEY=... ALERTS_PASSWORD=...
make vault-import-env
make mailcow-mailbox MAILBOX=alerts

To add another mailbox tomorrow: edit mailcow.yml + vault_mailcow_mailbox_passwords.<name>, then make mailcow-mailbox MAILBOX=<name>.

Manual UI:

  1. https://mail.levkine.ca → admin login
  2. Email → Mailboxes → Addalerts@levkine.ca (strong password → store in Vaultwarden)
  3. Optional: alias monitoring@levkine.ca → same inbox

2. Add notification in Kuma

Automated (from your Mac, after mailbox exists):

cd /path/to/ansible
pip install uptime-kuma-api   # or: .venv/bin/pip install uptime-kuma-api
export KUMA_URL=http://10.0.10.22:3001 KUMA_USER=admin KUMA_PASSWORD='...'
export SMTP_USER=alerts@levkine.ca SMTP_PASS='...' SMTP_TO=idobkin@gmail.com
./scripts/kuma-setup-smtp.sh

Manual UI:

  1. http://10.0.10.22:3001SettingsNotificationsSetup Notification

  2. Type: Email (SMTP)

  3. Fill in:

    Field Value
    SMTP Host mail.levkine.ca
    SMTP Port 587
    Security TLS / STARTTLS
    Username alerts@levkine.ca
    Password mailbox password
    From Email alerts@levkine.ca
    To Email idobkin@gmail.com (or your @levkine.ca)
  4. Test → save

  5. Edit each monitor (or default) → Notifications → enable this channel

Alternative: Mattermost webhook (slack.levkin.ca) if you prefer chat over email.


Dockge — what to do after login

On server today:

Path Contents
/opt/monitoring/compose.yml Live stack (Docker project monitoring, 4 containers running)
/opt/stacks/monitoring/compose.yaml Copy for Dockge (same services)
/opt/stacks/authentik-ref/, cal-ref/ README only — no compose file (ignore)

Why “Scan Stacks Folder” looks empty

  • Scan only picks up folders under /opt/stacks that contain compose.yaml / compose.yml.
  • Your containers were started from /opt/monitoring, so Docker does not automatically link them to /opt/stacks/monitoring until you register that folder in Dockge.

Fix (pick one):

Dockge UI note (your version)

Settings → General only has hostname — there is no “Stacks directory” field. That path is fixed at deploy time:

DOCKGE_STACKS_DIR=/opt/stacks (already set in /opt/monitoring/compose.yml).

Stacks are managed from the home / dashboard page, not Settings.

  1. http://10.0.10.22:5001home (logo / dashboard, not Settings)
  2. + Create Stack (or Compose → new stack)
  3. Name: monitoring
  4. Path: /opt/stacks/monitoring (must contain compose.yaml)
  5. Open stack → review compose → do not Start until old project is stopped (below)

Option 2 — Scan from dashboard menu

  1. Stay on dashboard (not Settings)
  2. Top-right Scan Stacks Folder
  3. Pick monitoring if it appears (authentik-ref / cal-ref have no compose — ignore)

Avoid duplicate containers

Before starting from Dockge:

ssh root@10.0.10.22
cd /opt/monitoring && docker compose down
# Then start from Dockge UI on stack monitoring, OR:
cd /opt/stacks/monitoring && docker compose --env-file .env up -d

Until you do that, Kuma/Dockge/Umami keep running from /opt/monitoring; Dockge is optional for edits until cutover.

Optional reference stacks (read-only)

Create empty stacks under /opt/stacks/ only if you want a UI placeholder:

ssh root@10.0.10.22
mkdir -p /opt/stacks/authentik /opt/stacks/cal
# Copy compose for reference (does NOT control remote host):
scp root@10.0.10.21:/opt/authentik/compose.yml /opt/stacks/authentik/

To manage Authentik or Cal from Dockge long term, either move compose to 218 (not recommended) or install Dockge on each LXC later.

Step 3 — Retire Portainer

When comfortable: stop VM 109 (portainer) on pve10; use Dockge on 218 instead.


Umami

  • Running at http://10.0.10.22:3000 (LAN / Tailscale only)
  • Public tracking via https://stats.levkin.ca/script.js on levkin.ca (LXC 220), caseware, auto, and iliadobkin.com (portfolio LXC 219)

Three choices (pick one later; none block the sites):

Option Effort Notes
A — Skip public analytics 0 Use Umami dashboard on :3000 when you care; no DNS/Caddy
B — One DNS + Caddy block ~10 min A record → home IP + Caddy reverse_proxy 10.0.10.22:3000 on caddy VM
C — Re-add script tags 2 min After B works, insert script before </head> on 215/216

Suggested public hostname (instead of analytics): stats.levkin.ca (short, clear). Alternatives: umami.levkin.ca, metrics.levkin.ca.

stats.levkin.ca {
    import security-headers
    encode gzip
    reverse_proxy 10.0.10.22:3000
}

Script tag then: https://stats.levkin.ca/script.js

We are not stuck — marketing sites do not need Umami to render. Option A is fine for now.


Maintenance

ssh root@10.0.10.22
cd /opt/monitoring
docker compose --env-file .env pull
docker compose --env-file .env up -d
docker compose ps