Proxmox Monitor · Setup & Deploy 1 / 1
Runbook

Proxmox Monitor

Setup & Deployment — Production Rollout

Agent-server monitoring for Proxmox hosts. Elixir/OTP backend, Burrito-packaged agents, Phoenix LiveView dashboard. This deck walks you from a clean environment to 20 hosts reporting, in order, with verification at every step.

Reference: SETUP-AND-DEPLOY.md · ~2–3h end-to-end + host rollout time
What you're deploying

Architecture

Two artifacts, independent pipelines, one dashboard

┌─────────────────────────┐ │ Server (LXC in RZ) │ agents ──WSS──│ · Phoenix release │ │ · SQLite │ │ · Caddy (TLS) │ └─────────────────────────┘ ▲ │ ssh ┌─────────────────────────┐ │ Operator workstation │ │ · builds server │ │ · builds agent binary │ └─────────────────────────┘ │ scp ▼ ┌─────────────────────────┐ │ Proxmox host (1 of N) │ │ · Burrito binary │ │ · systemd unit │ └─────────────────────────┘

Agents initiate outbound WSS — no inbound ports on Proxmox hosts.

Phases

Roadmap for this deck

  1. Preflight — confirm prerequisites
  2. Local build — produce the two artifacts
  3. Server deploy — one-time LXC bring-up
  4. First agent — prove the pipeline end-to-end
  5. Test tier — 2–3 hosts for 24h
  6. Full rollout — the remaining fleet
  7. Rollback — because things go wrong
  8. Ongoing operations — upgrades, backups, rotation
  9. Go / No-Go — final sign-off
§ 1 Preflight

Hardware & network

Server LXC
Debian 12 · 1 GB RAM · 2 cores · 10 GB

Unprivileged. Covers >20 agents comfortably.

DNS
A record → public IP

Verify: dig +short monitor.example.com

Inbound
TCP 443 → server LXC

Caddy handles Let's Encrypt via HTTP-01.

Outbound
HTTPS from every Proxmox host

No inbound port required on hosts.

SSH root access: hypervisor + every Proxmox host.

§ 1 Preflight

Versions & tools

Proxmox fleet

  • VE 8.3+
  • OpenZFS 2.3+ (for -j JSON output)
  • Older hosts will report empty ZFS payloads

Build machine

  • Elixir 1.19 + OTP 28
  • Mix + Hex
  • Docker daemon running (for Linux binaries)
  • SSH, scp, sqlite3 (optional)
No Docker? Run ./scripts/build-linux.sh on the server LXC itself instead.
§ 1 Preflight

Secrets plan

Three values — keep in a password manager, never in git

SecretHow to generate
DASHBOARD_PASSWORD_HASH mix run -e 'IO.puts(Argon2.hash_pwd_salt("<pw>"))'
SECRET_KEY_BASE mix phx.gen.secret (64-byte base64)
Per-agent tokens Admin UI → Add host reveals token once
Tokens are shown once. Paste into your password manager before clicking away.
§ 2 Local build

Tests first

If either suite is red, stop

cd server && mix deps.get && mix test
cd ../agent && mix deps.get && mix test
Server
58 tests, 0 failures
Agent
23 tests, 0 failures

Never build a release from a branch with failing tests.

§ 2 Local build

Hash the password

cd server
mix run -e 'IO.puts(Argon2.hash_pwd_salt("your-password"))'

Output looks like:

$argon2id$v=19$m=65536,t=3,p=4$dSB9...$x0OQ...
Copy the whole $argon2id$... string into your password manager. The plaintext password never leaves your head / password manager.
§ 2 Local build

Server release

MIX_ENV=prod DASHBOARD_PASSWORD_HASH='placeholder' \
  mix release --overwrite

tar -czf /tmp/server_release.tgz -C _build/prod/rel server
ls -lh /tmp/server_release.tgz

Expected: ~30–60 MB tarball.

The placeholder hash only needs to exist so config/runtime.exs accepts it. The real hash is supplied on the LXC at start time.
§ 2 Local build

Agent binaries

cd ../agent
./scripts/build-linux.sh

Expected output:

Binaries written to /.../agent/dist:
  proxmox-monitor-agent_linux_amd64
  proxmox-monitor-agent_linux_arm64

Sanity check:

file dist/proxmox-monitor-agent_linux_amd64 | grep 'ELF 64-bit'

First build: 5–10 min. Subsequent builds: seconds (Docker layer cache).

§ 3 Server deploy

Create the LXC

On the hypervisor:

pct create 200 \
  /var/lib/vz/template/cache/debian-12-standard_12.7-1_amd64.tar.zst \
  --hostname proxmox-monitor \
  --memory 1024 --cores 2 \
  --rootfs local-zfs:10 \
  --net0 name=eth0,bridge=vmbr0,ip=dhcp \
  --unprivileged 1 --features nesting=0 --onboot 1

pct start 200
pct exec 200 -- ip -4 addr show eth0 | grep -Po 'inet \K[\d.]+'

Save the IP as LXC_IP. Typos here cost hours.

§ 3 Server deploy

Base packages

pct enter 200 then:

apt-get update
apt-get install -y ca-certificates curl gnupg \
  debian-keyring debian-archive-keyring apt-transport-https sqlite3

curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/gpg.key' | \
  gpg --dearmor -o /usr/share/keyrings/caddy-stable-archive-keyring.gpg
curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/debian.deb.txt' \
  > /etc/apt/sources.list.d/caddy-stable.list

apt-get update && apt-get install -y caddy
caddy version
§ 3 Server deploy

Upload and extract the release

From your workstation:

scp /tmp/server_release.tgz root@$LXC_IP:/tmp/

Inside the LXC:

mkdir -p /opt/proxmox-monitor
tar -xzf /tmp/server_release.tgz -C /opt/proxmox-monitor
ls /opt/proxmox-monitor/server/bin/
# server  migrate  server.bat  migrate.bat
§ 3 Server deploy

Environment file

install -d -m 0700 /var/lib/proxmox-monitor

cat > /etc/default/proxmox-monitor <<'EOF'
DATABASE_PATH=/var/lib/proxmox-monitor/monitor.db
SECRET_KEY_BASE=<paste-mix-phx.gen.secret-output>
DASHBOARD_PASSWORD_HASH=<paste-$argon2id$-hash>
PHX_SERVER=true
PHX_HOST=monitor.example.com
PORT=4000
EOF
chmod 0600 /etc/default/proxmox-monitor
Single-quoted heredoc matters. A double-quoted one eats the $ characters in the Argon2 hash.
§ 3 Server deploy

Migrate & systemd

set -a; . /etc/default/proxmox-monitor; set +a
/opt/proxmox-monitor/server/bin/server eval 'Server.Release.migrate()'

sqlite3 /var/lib/proxmox-monitor/monitor.db '.tables'
# hosts  metrics  schema_migrations

Then install the systemd unit (see runbook §3.6) with:

ExecStartPre=/opt/proxmox-monitor/server/bin/server eval 'Server.Release.migrate()'
ExecStart=/opt/proxmox-monitor/server/bin/server start
Restart=always
RestartSec=5
systemctl daemon-reload && systemctl enable --now proxmox-monitor
§ 3 Server deploy

Caddy: TLS + WSS reverse-proxy

monitor.example.com {
    reverse_proxy 127.0.0.1:4000 {
        header_up X-Forwarded-Proto {scheme}
        header_up X-Forwarded-For {remote_host}
        transport http {
            read_timeout 90s
            dial_timeout 10s
        }
    }
}
read_timeout 90s is critical. Without it, every agent's WebSocket is torn down every ~30s and the dashboard stays permanently offline-looking.
caddy validate --config /etc/caddy/Caddyfile
systemctl reload caddy
§ 3 Server deploy

Server smoke test

From anywhere on the internet:

curl -s https://monitor.example.com/health

Expected:

{"db":"ok","status":"ok","version":"0.1.0"}

Then browser:

Login loops on "Incorrect password"? DASHBOARD_PASSWORD_HASH was not pasted correctly. Re-generate and redeploy §3.4.
§ 4 First agent

Register in the admin UI

  1. Browser → /admin/hosts
  2. "Register a new host" → enter short name (e.g. pve-host-01)
  3. Click Add
  4. The page reveals a token — copy it now
Tokens are shown exactly once. If you close the page without copying, Rotate and try again.
§ 4 First agent

Deploy binary + config

export HOST=pve-host-01

scp agent/dist/proxmox-monitor-agent_linux_amd64 \
    root@$HOST:/usr/local/bin/proxmox-monitor-agent
ssh root@$HOST 'chmod 0755 /usr/local/bin/proxmox-monitor-agent'

scp agent/rel/proxmox-monitor-agent.service \
    root@$HOST:/etc/systemd/system/

On the host — write the TOML config:

install -d -m 0700 /etc/proxmox-monitor /var/cache/proxmox-monitor-agent

cat > /etc/proxmox-monitor/agent.toml <<'EOF'
server_url = "wss://monitor.example.com/socket/websocket"
token = "<paste-token-from-dashboard>"
host_id = "pve-host-01"

[intervals]
fast_seconds = 30
medium_seconds = 300
slow_seconds = 1800
EOF
chmod 0600 /etc/proxmox-monitor/agent.toml
§ 4 First agent

Enable and verify

systemctl daemon-reload
systemctl enable --now proxmox-monitor-agent
journalctl -u proxmox-monitor-agent -f

Expected within 10s:

agent: starting with host_id=pve-host-01
reporter: connected, joining host:pve-host-01
reporter: joined host:pve-host-01

Reload the dashboard — the card should be online (green border) with Load / RAM / Pools / VMs populated.

§ 4 First agent

Verify the offline flip

Test

ssh root@$HOST \
  'systemctl stop proxmox-monitor-agent'

Dashboard card grey within ~1s.

ssh root@$HOST \
  'systemctl start proxmox-monitor-agent'

Green again within 30s.

If the card stays green

Channel terminate callback didn't run — usually Caddy.

  • Check /etc/caddy/Caddyfile has read_timeout 90s
  • systemctl reload caddy after fixing
§ 5 Test tier

2–3 hosts for 24h

Pick non-critical hosts, or hosts with independent monitoring to fall back on.

What to look for overnight:

Tests to actively run

§ 5 Test tier

Go / No-Go gate

Do NOT proceed to full rollout unless ALL are true for 24h

All test-tier hosts show online continuously
No repeating error lines in server logs
Retention has pruned at least one row
Token rotation + restart behaves as designed
Server-reboot drill: all agents recover without intervention
Dashboard is responsive (<1s LiveView updates)
§ 6 Full rollout

Batch loop

After 3–4 hosts by hand, batch:

for HOST in pve-host-04 pve-host-05 pve-host-06; do
  echo "Register $HOST in admin UI, paste token:"
  read -s TOKEN

  scp agent/dist/proxmox-monitor-agent_linux_amd64 \
      root@$HOST:/usr/local/bin/proxmox-monitor-agent
  scp agent/rel/proxmox-monitor-agent.service \
      root@$HOST:/etc/systemd/system/

  ssh root@$HOST "chmod 0755 /usr/local/bin/proxmox-monitor-agent &&
    install -d -m 0700 /etc/proxmox-monitor /var/cache/proxmox-monitor-agent &&
    cat > /etc/proxmox-monitor/agent.toml <<EOF
server_url = \"wss://monitor.example.com/socket/websocket\"
token = \"$TOKEN\"
host_id = \"$HOST\"
EOF
    chmod 0600 /etc/proxmox-monitor/agent.toml &&
    systemctl daemon-reload &&
    systemctl enable --now proxmox-monitor-agent"
done

After each batch of ~5: spot-check cards, filter for offline, open a random host detail.

§ 7 Rollback

Four escape hatches

One agent
ssh root@$HOST \
  'systemctl disable --now proxmox-monitor-agent'
Whole service
systemctl stop proxmox-monitor
systemctl stop caddy
Previous release
systemctl stop proxmox-monitor
rm -rf /opt/proxmox-monitor/server
tar -xzf /tmp/server_release_PREV.tgz \
    -C /opt/proxmox-monitor
systemctl start proxmox-monitor
Restore DB
systemctl stop proxmox-monitor
cp /var/backups/proxmox-monitor/monitor-YYYY-MM-DD.db \
   /var/lib/proxmox-monitor/monitor.db
systemctl start proxmox-monitor

Tokens survive DB restores. Metrics post-backup are lost (48h max by retention policy).

§ 8 Ongoing ops

Upgrades

Server

cd server
MIX_ENV=prod DASHBOARD_PASSWORD_HASH='placeholder' \
  mix release --overwrite
tar -czf /tmp/server_release.tgz -C _build/prod/rel server
scp /tmp/server_release.tgz root@$LXC:/tmp/

ssh root@$LXC '
  systemctl stop proxmox-monitor
  mv /opt/proxmox-monitor/server{,.old}
  tar -xzf /tmp/server_release.tgz -C /opt/proxmox-monitor
  systemctl start proxmox-monitor   # ExecStartPre runs migrate
'

Verify /health then delete server.old.

Agent

scp agent/dist/proxmox-monitor-agent_linux_amd64 \
    root@$HOST:/usr/local/bin/proxmox-monitor-agent.new

ssh root@$HOST '
  mv /usr/local/bin/proxmox-monitor-agent{.new,}
  systemctl restart proxmox-monitor-agent
'

No DB on the host, so agent upgrades are trivially atomic.

§ 8 Ongoing ops

SQLite backups

Install as a cron inside the LXC — keeps 30 daily snapshots:

cat > /etc/cron.d/proxmox-monitor-backup <<'EOF'
30 3 * * * root install -d -m 0700 /var/backups/proxmox-monitor && \
  sqlite3 /var/lib/proxmox-monitor/monitor.db \
    ".backup /var/backups/proxmox-monitor/monitor-$(date +\%Y-\%m-\%d).db" && \
  find /var/backups/proxmox-monitor -name 'monitor-*.db' -mtime +30 -delete
EOF

SQLite's online-backup command is safe while the server is running.

Verify at least one run before declaring the rollout complete.

§ 9 Sign-off

Production readiness

/health returns 200 with status:ok
External uptime monitor configured and green
All intended Proxmox hosts on overview, all online
≥1 full 48h retention cycle observed (pruning log present)
SQLite backup cron installed and yesterday's file exists
You have rolled back once on purpose (drill)
§ 9 Sign-off

Secrets hygiene

Dashboard password in a password manager, not a text file
SECRET_KEY_BASE in a password manager
/etc/default/proxmox-monitor is 0600 root:root
/etc/proxmox-monitor/agent.toml is 0600 root:root on every host
You can rotate an agent token in <2 minutes
A teammate has been walked through one agent install and one token rotation live
Appendix A

Common errors

SymptomFirst thing to check
CERT_AUTHORITY_INVALID in browserCaddy hasn't finished LE issuance. Wait 60s. journalctl -u caddy.
Login loops on correct passwordDASHBOARD_PASSWORD_HASH mismatch. Regenerate and redeploy.
Card stays offline after agent restartWrong token or unknown_host. Check agent journal.
All agents reconnect every ~30sCaddy read_timeout missing or too short.
/health returns 503Process up but DB unreadable. Check permissions + DATABASE_PATH.
LXC can't bind port 4000Another process owns it. ss -ltnp | grep 4000.
Agent logs {:enoent, "pvesh"}Not a Proxmox host, or empty $PATH under systemd.
Appendix B

File & port cheat sheet

Server LXC

/opt/proxmox-monitor/server/       release tree
/etc/default/proxmox-monitor       env secrets, 0600
/etc/systemd/system/proxmox-monitor.service
/etc/caddy/Caddyfile
/var/lib/proxmox-monitor/monitor.db
/var/backups/proxmox-monitor/      daily backups

tcp 443 (caddy) → tcp 127.0.0.1:4000 (phoenix)

Proxmox host (per agent)

/usr/local/bin/proxmox-monitor-agent
/etc/proxmox-monitor/agent.toml    token, 0600
/etc/systemd/system/proxmox-monitor-agent.service
/var/cache/proxmox-monitor-agent/  Burrito unpack

no listening ports
Done

MVP in production

All four phases from the concept shipped: monitoring skeleton, ZFS/VM/storage collectors, LiveView dashboard, packaged binaries. The operator has the runbook; agents report; retention prunes; backups run. Everything else is iteration.

Full runbook: SETUP-AND-DEPLOY.md · Concept: proxmox-monitor-konzept.md