proxMon/SETUP-AND-DEPLOY.md

26 KiB
Raw Blame History

Proxmox Monitor — Setup & Deploy Runbook

This document is a runbook, not reference material. Work from top to bottom. Each checkbox is an action. Don't skip verification steps — they exist because we've been burned by skipping them.

Target audience: the operator who owns the monitor service.

Total time: 23 hours end-to-end with a small test tier, plus however long your Proxmox fleet takes to roll out (1015 min per host).

Phases of this runbook:

§ Purpose Touch points
1 Preflight — confirm prerequisites Local only
2 Local build — produce artifacts Your workstation
3 Server deploy — one-time LXC bring-up Hypervisor + LXC
4 First agent — prove the pipeline One Proxmox host
5 Test tier — 23 hosts for 24h Small batch
6 Full rollout — the remaining hosts Fleet-wide
7 Rollback — when something goes wrong
8 Ongoing operations Upgrades, backups
9 Go / No-Go sign-off Final gate

Related docs (reference, not sequential):

  • server/docs/deploy-lxc.md — deeper LXC detail
  • agent/docs/install.md — single-host agent install
  • server/docs/Caddyfile.example — TLS/WSS proxy template
  • proxmox-monitor-konzept.md — design concept
  • docs/deployment-overview.md — high-level picture

§ 1. Preflight Checklist

1.1 Hardware & network

  • Server LXC can be provisioned on a Proxmox host in the RZ. Minimum: 1 GB RAM, 2 cores, 10 GB disk. Debian 12 template available.
  • DNS A record for monitor.<yourdomain> points at the public IP of the Proxmox host that hosts the LXC. Verify with dig +short monitor.<yourdomain>.
  • Port 443 inbound to the server LXC's public IP is open (Caddy will get Let's Encrypt certs via HTTP-01 and serve on 443).
  • Outbound HTTPS from every Proxmox host to monitor.<yourdomain> is open. Agents connect out; no inbound port is required on Proxmox hosts.
  • You have SSH root access to:
    • The hypervisor running the server LXC (for pct create / pct enter)
    • Every Proxmox host that will run an agent
  • Docker is installed and daemon is running on your build machine (docker --version should succeed and docker ps should not error). If not, use a Linux box (even the server LXC itself) as the build host.

1.2 Versions

  • Proxmox hosts are VE 8.3+ with OpenZFS 2.3+ (check with pveversion and zfs --version). If some hosts are older, either upgrade them first or accept that ZFS payloads will be empty on those.

1.3 Tools on your workstation

  • Elixir 1.19 + OTP 28 (elixir --version)
  • Mix + Hex (mix local.hex)
  • SSH + scp
  • sqlite3 CLI (for smoke-test DB inspection; optional)

1.4 Secrets plan

Write down (don't commit) the three secrets you'll need. Keep them in a password manager.

Secret Generated how
Dashboard password (plaintext) You choose it. Use a strong random string.
SECRET_KEY_BASE cd server && mix phx.gen.secret (64-byte base64)
Agent tokens Created by the admin UI, one per host, revealed once.

§ 2. Local Build

Do this once, on your build machine. Re-run for every upgrade.

2.1 Clone the repo

  • git clone <repo> if you don't already have it.
  • cd proxmox_monitor
  • git pull --ff-only origin main

2.2 Confirm tests are green

  • cd server && mix deps.get && mix test
  • cd ../agent && mix deps.get && mix test

Expected: both suites pass. If any test fails, stop here. Fix or cherry-pick a known-good commit before continuing.

2.3 Build the server release

  • Generate DASHBOARD_PASSWORD_HASH once:
cd server
mix run -e 'IO.puts(Argon2.hash_pwd_salt("<your-dashboard-password>"))'

Copy the $argon2id$... line into your password manager. You'll paste it into the LXC env file later.

  • Build the release (the placeholder is only needed to satisfy runtime.exs during build; the real value is set on the LXC at start time):
MIX_ENV=prod DASHBOARD_PASSWORD_HASH='placeholder' mix release --overwrite

The release step also runs mix assets.deploy as a pre-assemble step, so minified + digested JS/CSS are baked into the tarball automatically. You don't need to run assets.deploy separately.

Expected: _build/prod/rel/server/ contains bin/server, bin/migrate, erts-*, lib/, releases/, and lib/server-0.1.0/priv/static/cache_manifest.json.

  • Package the release:
tar -czf /tmp/server_release.tgz -C _build/prod/rel server
ls -lh /tmp/server_release.tgz

Expected: ~3060 MB tarball.

2.4 Build the agent binaries

Requires Docker running locally (or do this on a Linux host).

  • cd ../agent
  • ./scripts/build-linux.sh

Expected output (~510 min first run, much faster with Docker layer cache on subsequent runs):

Binaries written to /path/to/agent/dist:
proxmox-monitor-agent_linux_amd64
proxmox-monitor-agent_linux_arm64
  • Sanity check:
file dist/proxmox-monitor-agent_linux_amd64 | grep -E "ELF 64-bit"

Expected: ELF 64-bit LSB executable, x86-64.

If Docker isn't available on your workstation: scp the agent/ directory onto the server LXC after § 3, run ./scripts/build-linux.sh there, then scp the binaries back. The LXC doesn't need Docker at runtime.


§ 3. Server Deployment

One-time. Subsequent upgrades use § 8.1.

3.1 Create the LXC (on the hypervisor)

  • SSH to the hypervisor and run:
pct create 200 \
  /var/lib/vz/template/cache/debian-12-standard_12.7-1_amd64.tar.zst \
  --hostname proxmox-monitor \
  --memory 1024 --cores 2 \
  --rootfs local-zfs:10 \
  --net0 name=eth0,bridge=vmbr0,ip=dhcp \
  --unprivileged 1 --features nesting=0 --onboot 1
pct start 200

Adjust the container ID (200), bridge, and rootfs to match your environment.

  • Get the LXC's IP:
pct exec 200 -- ip -4 addr show eth0 | grep -Po 'inet \K[\d.]+'

Put this IP in LXC_IP for the rest of this section (use a shell variable, not a literal in every command — typos here cost hours).

3.2 Base packages inside the LXC

pct enter 200
  • Install Caddy + SQLite + tools:
apt-get update
apt-get install -y ca-certificates curl gnupg debian-keyring debian-archive-keyring apt-transport-https sqlite3

# Caddy's apt repo
curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/gpg.key' | \
  gpg --dearmor -o /usr/share/keyrings/caddy-stable-archive-keyring.gpg
curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/debian.deb.txt' \
  > /etc/apt/sources.list.d/caddy-stable.list
apt-get update
apt-get install -y caddy

caddy version  # sanity
  • Exit the container: exit.

3.3 Upload the release

On your workstation:

  • scp /tmp/server_release.tgz root@<LXC_IP>:/tmp/

Back inside the LXC (pct enter 200):

  • Unpack:
mkdir -p /opt/proxmox-monitor
tar -xzf /tmp/server_release.tgz -C /opt/proxmox-monitor
ls /opt/proxmox-monitor/server/bin/
# Expected: server, migrate, server.bat, migrate.bat

3.4 Data directory + environment file

  • Create the data dir:
install -d -m 0700 /var/lib/proxmox-monitor
install -d -m 0755 /etc/default
  • Create /etc/default/proxmox-monitor. Substitute the values you generated in § 2.3:
cat > /etc/default/proxmox-monitor <<'EOF'
DATABASE_PATH=/var/lib/proxmox-monitor/monitor.db
SECRET_KEY_BASE=<paste-output-of-mix-phx.gen.secret>
DASHBOARD_PASSWORD_HASH=<paste-$argon2id$-hash-from-2.3>
PHX_SERVER=true
PHX_HOST=monitor.example.com
PORT=4000
EOF
chmod 0600 /etc/default/proxmox-monitor

Gotchas:

  • DASHBOARD_PASSWORD_HASH contains $ characters. Use single quotes around the value in the heredoc, or escape each $ with \$. Double-quoted heredocs will silently eat them.
  • No spaces around =.
  • No quotes around values in the file itself.

3.5 Run the first migration

  • Apply migrations:
set -a; . /etc/default/proxmox-monitor; set +a
/opt/proxmox-monitor/server/bin/server eval 'Server.Release.migrate()'

Expected output:

[info] == Running 20260421200116 Server.Repo.Migrations.CreateHosts.change/0 forward
[info] create table hosts
...
[info] == Migrated 20260421200116 in 0.0s
[info] == Running 20260421202512 Server.Repo.Migrations.CreateMetrics.change/0 forward
[info] create table metrics
...
[info] == Migrated 20260421202512 in 0.0s
  • Verify the DB exists:
ls -la /var/lib/proxmox-monitor/monitor.db
sqlite3 /var/lib/proxmox-monitor/monitor.db '.tables'
# Expected: hosts  metrics  schema_migrations

3.6 systemd unit for the server

  • Write the unit:
cat > /etc/systemd/system/proxmox-monitor.service <<'EOF'
[Unit]
Description=Proxmox Monitor Server
After=network-online.target
Wants=network-online.target

[Service]
Type=exec
EnvironmentFile=/etc/default/proxmox-monitor
ExecStartPre=/opt/proxmox-monitor/server/bin/server eval 'Server.Release.migrate()'
ExecStart=/opt/proxmox-monitor/server/bin/server start
ExecStop=/opt/proxmox-monitor/server/bin/server stop
Restart=always
RestartSec=5
User=root

[Install]
WantedBy=multi-user.target
EOF

systemctl daemon-reload
systemctl enable --now proxmox-monitor
  • Watch it come up:
journalctl -u proxmox-monitor -f
# wait for: "Running ServerWeb.Endpoint with Bandit"
# then Ctrl+C
  • Smoke-test from inside the LXC:
curl -s http://127.0.0.1:4000/health | sqlite3 -cmd '.mode json' /dev/null  # or just cat
curl -s http://127.0.0.1:4000/health
# Expected: {"db":"ok","status":"ok","version":"0.1.0"}

If you see anything other than status:ok: stop. Check journalctl -u proxmox-monitor -n 100. Common causes: missing env var (check /etc/default), DB path not writable.

3.7 Caddy TLS + reverse proxy

  • Copy the template and edit:
cp /opt/proxmox-monitor/server/lib/server-0.1.0/priv/docs/Caddyfile.example \
   /etc/caddy/Caddyfile \
   2>/dev/null || \
  scp root@<YOUR-WORKSTATION>:proxmox_monitor/server/docs/Caddyfile.example \
      /etc/caddy/Caddyfile
# (The first form only works if you bundled docs into the release; the second
# pulls fresh from your checkout.)

sed -i "s/monitor.example.com/$ACTUAL_HOST/g" /etc/caddy/Caddyfile
caddy validate --config /etc/caddy/Caddyfile
  • Reload Caddy:
systemctl reload caddy
journalctl -u caddy -n 30
# Expected: "certificate obtained successfully" from Let's Encrypt
# (only on first reload after DNS is set correctly)
  • Verify from the public internet:
# From any outside machine:
curl -s https://monitor.example.com/health
# Expected: {"db":"ok","status":"ok","version":"0.1.0"}

If this fails:

  • curl -vI to see where it stops. Name resolution? TCP? TLS?
  • dig +short monitor.example.com — does it point to the expected IP?
  • Check the hypervisor's firewall / any cloud-level firewall for port 443.

3.8 Browser smoke-test

  • Open https://monitor.example.com/ in a browser.
  • Confirm: redirect to /login.
  • Log in with your dashboard password.
  • Confirm: empty overview page with "No hosts registered yet."

If login fails with "Incorrect password": your DASHBOARD_PASSWORD_HASH env doesn't match the password you typed. Re-generate and re-deploy §3.4.


§ 4. First Agent — Dry Run

Pick one Proxmox host. This run will validate the whole pipeline before you touch more hosts.

4.1 Register the host in the dashboard

  • Browser → https://monitor.example.com/admin/hosts.
  • Enter the short name (pve-host-01 or whatever matches your convention). Click Add.
  • The page reveals a token. Copy it now — it is shown only once.

4.2 Copy the binary + systemd unit to the host

From your workstation (substitute <HOST>):

export HOST=<proxmox-host-ip-or-name>

scp agent/dist/proxmox-monitor-agent_linux_amd64 \
    root@$HOST:/usr/local/bin/proxmox-monitor-agent
ssh root@$HOST 'chmod 0755 /usr/local/bin/proxmox-monitor-agent'

scp agent/rel/proxmox-monitor-agent.service \
    root@$HOST:/etc/systemd/system/

4.3 Write the agent config

SSH to the host (ssh root@$HOST) and:

install -d -m 0700 /etc/proxmox-monitor
install -d -m 0700 /var/cache/proxmox-monitor-agent

cat > /etc/proxmox-monitor/agent.toml <<'EOF'
server_url = "wss://monitor.example.com/socket/websocket"
token = "<paste-token-from-dashboard>"
host_id = "pve-host-01"

[intervals]
fast_seconds = 30
medium_seconds = 300
slow_seconds = 1800
EOF

chmod 0600 /etc/proxmox-monitor/agent.toml

4.4 Enable the agent

Still on the Proxmox host:

systemctl daemon-reload
systemctl enable --now proxmox-monitor-agent
  • Watch the log:
journalctl -u proxmox-monitor-agent -f

Expected within 10s:

agent: starting with host_id=pve-host-01
reporter: connected, joining host:pve-host-01
reporter: joined host:pve-host-01

Ctrl+C to stop tailing.

4.5 Confirm in the dashboard

  • Reload https://monitor.example.com/ — the card for pve-host-01 should show online, status green, with Load/RAM/Pools/VMs populated.
  • Click the card. Verify each section (ZFS pools, snapshots, storage, VMs) has real data.

4.6 Stop-and-restart verification

Verify the offline flip works as designed.

  • On the Proxmox host: systemctl stop proxmox-monitor-agent.
  • Dashboard card should switch to offline (grey border) within ~1s.
  • systemctl start proxmox-monitor-agent — card flips back to green within ~30s.

If the card stays green when the agent is stopped: the Channel terminate callback didn't fire, which usually means Caddy's read_timeout is set too short or absent. Check /etc/caddy/Caddyfile contains read_timeout 90s.

4.7 Token rotation sanity-check

  • In the admin UI, click Rotate on the host. Confirm.
  • On the Proxmox host, journalctl -u proxmox-monitor-agent -f — within ~30s the agent should log reporter: disconnected then begin reconnecting, failing with invalid_token.
  • Update /etc/proxmox-monitor/agent.toml with the new token and systemctl restart proxmox-monitor-agent. Verify green again.

§ 5. Test Tier (23 Hosts)

Pick 23 Proxmox hosts that are either non-critical, or critical but with existing independent monitoring you can fall back on.

5.1 Roll out

  • For each host, repeat § 4.14.5. Use distinct host_id values.

5.2 Observe for 24 hours

  • Leave the test tier running overnight.
  • Next morning, verify all three cards still show online.
  • Check journalctl -u proxmox-monitor on the server:
    • No [error] lines repeating.
    • retention: pruned N stale samples appears ≥ 1 time (retention fires hourly; after 48h it starts deleting).

5.3 Restart test

Reboot one of the Proxmox hosts. Watch the dashboard:

  • Card goes offline during the reboot.
  • Card flips back to online within a minute of the host coming back, without you touching anything.

5.4 Server reboot test

  • On the server LXC: systemctl restart proxmox-monitor.
  • All agents should briefly flip to offline, then back to online within ~30s as their Slipstream clients reconnect.
  • No agents should end up stuck offline requiring manual restart.

If any agent stays offline: its Slipstream reconnect backoff may need investigation. journalctl -u proxmox-monitor-agent -f on the affected host.

5.5 Go / No-Go gate for full rollout

Do NOT proceed to § 6 until all of these are true for 24h:

  • All test-tier hosts show online continuously.
  • No repeating error lines in server logs.
  • Retention has pruned ≥ 1 row.
  • Rotate + restart behavior works as expected.
  • Dashboard is responsive (<1s LiveView updates).

§ 6. Full Rollout

For each remaining Proxmox host:

  1. Admin UI → register host, copy token.
  2. scp binary + systemd unit.
  3. Write /etc/proxmox-monitor/agent.toml.
  4. systemctl enable --now proxmox-monitor-agent.
  5. Verify in dashboard.

6.1 Loop shortcut

Once you've done 34 hosts by hand and are confident, you can batch. The tricky part is that each host needs a unique token, so the admin-UI step still has to be interactive. One workflow:

# On your workstation:
for HOST in pve-host-04 pve-host-05 pve-host-06; do
  echo ">>> Setting up $HOST"
  echo "Register in the admin UI, paste token here, then press Enter:"
  read -s TOKEN
  scp agent/dist/proxmox-monitor-agent_linux_amd64 \
      root@$HOST:/usr/local/bin/proxmox-monitor-agent
  scp agent/rel/proxmox-monitor-agent.service \
      root@$HOST:/etc/systemd/system/
  ssh root@$HOST "chmod 0755 /usr/local/bin/proxmox-monitor-agent && \
    install -d -m 0700 /etc/proxmox-monitor /var/cache/proxmox-monitor-agent && \
    cat > /etc/proxmox-monitor/agent.toml <<EOF
server_url = \"wss://monitor.example.com/socket/websocket\"
token = \"$TOKEN\"
host_id = \"$HOST\"

[intervals]
fast_seconds = 30
medium_seconds = 300
slow_seconds = 1800
EOF
    chmod 0600 /etc/proxmox-monitor/agent.toml && \
    systemctl daemon-reload && \
    systemctl enable --now proxmox-monitor-agent"
  echo ">>> $HOST done."
done

6.2 Validation at scale

After every batch of ~5 hosts:

  • Open / and confirm the card count matches how many agents you've configured.
  • Sort/filter by offline — should be empty.
  • Click a random card and confirm real payload data.

6.3 Completion check

  • Overview shows all N hosts.
  • None are in offline or critical state (unless that's actually true of the host, e.g. a real DEGRADED pool).
  • VM search returns hits for a well-known VM name.

§ 7. Rollback

7.1 Disable a single agent

ssh root@$HOST 'systemctl disable --now proxmox-monitor-agent'

Dashboard card flips to offline. Delete from /admin/hosts if you want it gone entirely.

7.2 Take the whole service down

# Inside the server LXC
systemctl stop proxmox-monitor
systemctl stop caddy

Agents will keep trying to reconnect every few seconds (harmless). Dashboard is gone.

7.3 Roll back to a previous server release

If a new version misbehaves:

# On the LXC — assuming you kept the previous /tmp/server_release_PREV.tgz
systemctl stop proxmox-monitor
rm -rf /opt/proxmox-monitor/server
tar -xzf /tmp/server_release_PREV.tgz -C /opt/proxmox-monitor
systemctl start proxmox-monitor

Your SQLite DB has not been touched — rollbacks are cheap as long as the migration list didn't change between versions.

7.4 DB restore from backup

See § 8.4 for creating backups. To restore:

systemctl stop proxmox-monitor
cp /var/backups/proxmox-monitor/monitor-YYYY-MM-DD.db /var/lib/proxmox-monitor/monitor.db
chown root:root /var/lib/proxmox-monitor/monitor.db
systemctl start proxmox-monitor

Host tokens in the restored DB are still valid. Metrics from after the backup are lost — that's 48h max given the retention policy.


§ 8. Ongoing Operations

8.1 Upgrading the server

Work from the repo on your workstation:

# 1. Build
cd server
MIX_ENV=prod DASHBOARD_PASSWORD_HASH='placeholder' mix release --overwrite
tar -czf /tmp/server_release.tgz -C _build/prod/rel server

# 2. Upload, keeping the previous around for rollback
scp /tmp/server_release.tgz root@<LXC>:/tmp/

# 3. Swap on the LXC
ssh root@<LXC> '
  cp /tmp/server_release.tgz /tmp/server_release_PREV.tgz.bak   # optional
  systemctl stop proxmox-monitor
  mv /opt/proxmox-monitor/server /opt/proxmox-monitor/server.old
  tar -xzf /tmp/server_release.tgz -C /opt/proxmox-monitor
  systemctl start proxmox-monitor      # ExecStartPre runs migrate
  sleep 5
  systemctl status proxmox-monitor --no-pager
'

Verify /health responds before deleting the .old copy.

8.2 Upgrading an agent

scp agent/dist/proxmox-monitor-agent_linux_amd64 \
    root@$HOST:/usr/local/bin/proxmox-monitor-agent.new
ssh root@$HOST '
  mv /usr/local/bin/proxmox-monitor-agent{.new,}
  systemctl restart proxmox-monitor-agent
'

8.3 Token rotation (leak or routine)

  1. Dashboard → Admin → Rotate on the affected host.
  2. Copy the new token.
  3. SSH to the host: update /etc/proxmox-monitor/agent.toml, systemctl restart proxmox-monitor-agent.
  4. Verify card flips back to green.

The DB is small. SQLite's online backup is safe while the server runs.

Install a cron job inside the LXC:

cat > /etc/cron.d/proxmox-monitor-backup <<'EOF'
# Minute Hour Dom Month Dow  User  Command
30     3    *   *     *   root  install -d -m 0700 /var/backups/proxmox-monitor && \
  sqlite3 /var/lib/proxmox-monitor/monitor.db \
  ".backup /var/backups/proxmox-monitor/monitor-$(date +\%Y-\%m-\%d).db" && \
  find /var/backups/proxmox-monitor -name 'monitor-*.db' -mtime +30 -delete
EOF

Keeps 30 days of daily snapshots.

8.5 Log inspection

Server:

# Live
journalctl -u proxmox-monitor -f

# Last 500
journalctl -u proxmox-monitor -n 500 --no-pager

# Errors only
journalctl -u proxmox-monitor -p err --no-pager

Agents (from the server for any host):

ssh root@$HOST 'journalctl -u proxmox-monitor-agent -n 200 --no-pager'

8.6 External uptime monitoring

Point your uptime service (UptimeRobot, BetterUptime, your-own-Prometheus, etc.) at:

https://monitor.example.com/health

Expect {"status":"ok","db":"ok","version":"0.1.0"} with HTTP 200. Alert on anything else.

8.7 Changing the dashboard password

  1. On your workstation:
cd server
mix run -e 'IO.puts(Argon2.hash_pwd_salt("<new-password>"))'
  1. On the server LXC: edit /etc/default/proxmox-monitor, replace DASHBOARD_PASSWORD_HASH, systemctl restart proxmox-monitor.
  2. All existing sessions are invalidated on next request.

§ 9. Go / No-Go Sign-Off

Tick each box before declaring the rollout complete.

Production readiness

  • https://monitor.example.com/health returns 200 / status:ok.
  • External uptime monitor is configured and reporting green.
  • All intended Proxmox hosts appear on the overview and show online.
  • At least one full 48h retention cycle has completed (retention log shows pruning).
  • SQLite backup cron is installed and yesterday's .db file exists.
  • You have rolled back once on purpose (drill), proving § 7 works.

Access & secrets hygiene

  • Dashboard password is in a password manager, not a text file.
  • SECRET_KEY_BASE is in a password manager.
  • /etc/default/proxmox-monitor is 0600 root:root.
  • /etc/proxmox-monitor/agent.toml is 0600 root:root on every host.
  • You know how to rotate an agent token in < 2 minutes.

Documentation handoff

  • This runbook's checkboxes are all green for the current rollout.
  • If you're handing this to a teammate, you've walked them through one agent install and one token rotation live.

If all of the above are green, the monitor is in production.


Appendix A — Common Errors

Symptom First thing to check
Browser gets NET::ERR_CERT_AUTHORITY_INVALID Caddy didn't finish LE cert issuance. Wait 60s; then journalctl -u caddy.
Login page loops — correct password rejected DASHBOARD_PASSWORD_HASH mismatch. Regenerate.
Card stays offline after agent restart Wrong token or unknown_host (name mismatch). Check agent journal.
All agents reconnect every ~30s Caddy read_timeout missing or too short.
/health returns 503 Server process up, SQLite path unreadable or wrong permissions.
LXC can't bind port 4000 Another process owns it. `ss -ltnp
mix release fails with DASHBOARD error You forgot to set DASHBOARD_PASSWORD_HASH=placeholder at build.
Agent logs {:enoent, "pvesh"} Agent is running on a non-Proxmox host, or $PATH is empty under systemd.
Admin "Add host" redirects to /admin/hosts?host%5Bname%5D=… Asset bundle didn't ship; cache_manifest.json missing → LiveView JS never attaches → native HTML GET submit. Rebuild the release and redeploy.

Appendix B — File & Port Cheat Sheet

Server LXC
  /opt/proxmox-monitor/server/        release tree
  /etc/default/proxmox-monitor        env secrets, 0600
  /etc/systemd/system/proxmox-monitor.service
  /etc/caddy/Caddyfile
  /var/lib/proxmox-monitor/monitor.db SQLite
  /var/backups/proxmox-monitor/       daily backups
  tcp 443 (caddy)  → tcp 127.0.0.1:4000 (phoenix)

Proxmox host (per agent)
  /usr/local/bin/proxmox-monitor-agent
  /etc/proxmox-monitor/agent.toml     token + intervals, 0600
  /etc/systemd/system/proxmox-monitor-agent.service
  /var/cache/proxmox-monitor-agent/   Burrito unpack cache
  no listening ports