Agent-server monitoring for Proxmox hosts. Elixir/OTP backend, Burrito-packaged agents, Phoenix LiveView dashboard. This deck walks you from a clean environment to 20 hosts reporting, in order, with verification at every step.
SETUP-AND-DEPLOY.md · ~2–3h end-to-end + host rollout time
Agents initiate outbound WSS — no inbound ports on Proxmox hosts.
Unprivileged. Covers >20 agents comfortably.
Verify: dig +short monitor.example.com
Caddy handles Let's Encrypt via HTTP-01.
No inbound port required on hosts.
SSH root access: hypervisor + every Proxmox host.
-j JSON output)sqlite3 (optional)./scripts/build-linux.sh on the server LXC itself instead.
| Secret | How to generate |
|---|---|
DASHBOARD_PASSWORD_HASH |
mix run -e 'IO.puts(Argon2.hash_pwd_salt("<pw>"))' |
SECRET_KEY_BASE |
mix phx.gen.secret (64-byte base64) |
| Per-agent tokens | Admin UI → Add host reveals token once |
cd server && mix deps.get && mix test
cd ../agent && mix deps.get && mix test
Never build a release from a branch with failing tests.
cd server
mix run -e 'IO.puts(Argon2.hash_pwd_salt("your-password"))'
Output looks like:
$argon2id$v=19$m=65536,t=3,p=4$dSB9...$x0OQ...
$argon2id$... string into your password manager.
The plaintext password never leaves your head / password manager.
MIX_ENV=prod DASHBOARD_PASSWORD_HASH='placeholder' \
mix release --overwrite
tar -czf /tmp/server_release.tgz -C _build/prod/rel server
ls -lh /tmp/server_release.tgz
Expected: ~30–60 MB tarball.
placeholder hash only needs to exist so config/runtime.exs
accepts it. The real hash is supplied on the LXC at start time.
cd ../agent
./scripts/build-linux.sh
Expected output:
Binaries written to /.../agent/dist:
proxmox-monitor-agent_linux_amd64
proxmox-monitor-agent_linux_arm64
Sanity check:
file dist/proxmox-monitor-agent_linux_amd64 | grep 'ELF 64-bit'
First build: 5–10 min. Subsequent builds: seconds (Docker layer cache).
On the hypervisor:
pct create 200 \
/var/lib/vz/template/cache/debian-12-standard_12.7-1_amd64.tar.zst \
--hostname proxmox-monitor \
--memory 1024 --cores 2 \
--rootfs local-zfs:10 \
--net0 name=eth0,bridge=vmbr0,ip=dhcp \
--unprivileged 1 --features nesting=0 --onboot 1
pct start 200
pct exec 200 -- ip -4 addr show eth0 | grep -Po 'inet \K[\d.]+'
Save the IP as LXC_IP. Typos here cost hours.
pct enter 200 then:
apt-get update
apt-get install -y ca-certificates curl gnupg \
debian-keyring debian-archive-keyring apt-transport-https sqlite3
curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/gpg.key' | \
gpg --dearmor -o /usr/share/keyrings/caddy-stable-archive-keyring.gpg
curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/debian.deb.txt' \
> /etc/apt/sources.list.d/caddy-stable.list
apt-get update && apt-get install -y caddy
caddy version
From your workstation:
scp /tmp/server_release.tgz root@$LXC_IP:/tmp/
Inside the LXC:
mkdir -p /opt/proxmox-monitor
tar -xzf /tmp/server_release.tgz -C /opt/proxmox-monitor
ls /opt/proxmox-monitor/server/bin/
# server migrate server.bat migrate.bat
install -d -m 0700 /var/lib/proxmox-monitor
cat > /etc/default/proxmox-monitor <<'EOF'
DATABASE_PATH=/var/lib/proxmox-monitor/monitor.db
SECRET_KEY_BASE=<paste-mix-phx.gen.secret-output>
DASHBOARD_PASSWORD_HASH=<paste-$argon2id$-hash>
PHX_SERVER=true
PHX_HOST=monitor.example.com
PORT=4000
EOF
chmod 0600 /etc/default/proxmox-monitor
$
characters in the Argon2 hash.
set -a; . /etc/default/proxmox-monitor; set +a
/opt/proxmox-monitor/server/bin/server eval 'Server.Release.migrate()'
sqlite3 /var/lib/proxmox-monitor/monitor.db '.tables'
# hosts metrics schema_migrations
Then install the systemd unit (see runbook §3.6) with:
ExecStartPre=/opt/proxmox-monitor/server/bin/server eval 'Server.Release.migrate()'
ExecStart=/opt/proxmox-monitor/server/bin/server start
Restart=always
RestartSec=5
systemctl daemon-reload && systemctl enable --now proxmox-monitor
monitor.example.com {
reverse_proxy 127.0.0.1:4000 {
header_up X-Forwarded-Proto {scheme}
header_up X-Forwarded-For {remote_host}
transport http {
read_timeout 90s
dial_timeout 10s
}
}
}
read_timeout 90s is critical. Without it, every agent's
WebSocket is torn down every ~30s and the dashboard stays permanently offline-looking.
caddy validate --config /etc/caddy/Caddyfile
systemctl reload caddy
From anywhere on the internet:
curl -s https://monitor.example.com/health
Expected:
{"db":"ok","status":"ok","version":"0.1.0"}
Then browser:
https://monitor.example.com/ → redirects to /loginDASHBOARD_PASSWORD_HASH was not pasted
correctly. Re-generate and redeploy §3.4.
/admin/hostspve-host-01)export HOST=pve-host-01
scp agent/dist/proxmox-monitor-agent_linux_amd64 \
root@$HOST:/usr/local/bin/proxmox-monitor-agent
ssh root@$HOST 'chmod 0755 /usr/local/bin/proxmox-monitor-agent'
scp agent/rel/proxmox-monitor-agent.service \
root@$HOST:/etc/systemd/system/
On the host — write the TOML config:
install -d -m 0700 /etc/proxmox-monitor /var/cache/proxmox-monitor-agent
cat > /etc/proxmox-monitor/agent.toml <<'EOF'
server_url = "wss://monitor.example.com/socket/websocket"
token = "<paste-token-from-dashboard>"
host_id = "pve-host-01"
[intervals]
fast_seconds = 30
medium_seconds = 300
slow_seconds = 1800
EOF
chmod 0600 /etc/proxmox-monitor/agent.toml
systemctl daemon-reload
systemctl enable --now proxmox-monitor-agent
journalctl -u proxmox-monitor-agent -f
Expected within 10s:
agent: starting with host_id=pve-host-01
reporter: connected, joining host:pve-host-01
reporter: joined host:pve-host-01
Reload the dashboard — the card should be online (green border) with Load / RAM / Pools / VMs populated.
ssh root@$HOST \
'systemctl stop proxmox-monitor-agent'
Dashboard card grey within ~1s.
ssh root@$HOST \
'systemctl start proxmox-monitor-agent'
Green again within 30s.
Channel terminate callback didn't run — usually Caddy.
/etc/caddy/Caddyfile has read_timeout 90ssystemctl reload caddy after fixingPick non-critical hosts, or hosts with independent monitoring to fall back on.
What to look for overnight:
[error] lines in server logretention: pruned N stale samples (starts firing after 48h)systemctl restart proxmox-monitor on the server → all agents flip offline, then green within 30s. No stuck agents.After 3–4 hosts by hand, batch:
for HOST in pve-host-04 pve-host-05 pve-host-06; do
echo "Register $HOST in admin UI, paste token:"
read -s TOKEN
scp agent/dist/proxmox-monitor-agent_linux_amd64 \
root@$HOST:/usr/local/bin/proxmox-monitor-agent
scp agent/rel/proxmox-monitor-agent.service \
root@$HOST:/etc/systemd/system/
ssh root@$HOST "chmod 0755 /usr/local/bin/proxmox-monitor-agent &&
install -d -m 0700 /etc/proxmox-monitor /var/cache/proxmox-monitor-agent &&
cat > /etc/proxmox-monitor/agent.toml <<EOF
server_url = \"wss://monitor.example.com/socket/websocket\"
token = \"$TOKEN\"
host_id = \"$HOST\"
EOF
chmod 0600 /etc/proxmox-monitor/agent.toml &&
systemctl daemon-reload &&
systemctl enable --now proxmox-monitor-agent"
done
After each batch of ~5: spot-check cards, filter for offline, open a random host detail.
ssh root@$HOST \
'systemctl disable --now proxmox-monitor-agent'
systemctl stop proxmox-monitor
systemctl stop caddy
systemctl stop proxmox-monitor
rm -rf /opt/proxmox-monitor/server
tar -xzf /tmp/server_release_PREV.tgz \
-C /opt/proxmox-monitor
systemctl start proxmox-monitor
systemctl stop proxmox-monitor
cp /var/backups/proxmox-monitor/monitor-YYYY-MM-DD.db \
/var/lib/proxmox-monitor/monitor.db
systemctl start proxmox-monitor
Tokens survive DB restores. Metrics post-backup are lost (48h max by retention policy).
cd server
MIX_ENV=prod DASHBOARD_PASSWORD_HASH='placeholder' \
mix release --overwrite
tar -czf /tmp/server_release.tgz -C _build/prod/rel server
scp /tmp/server_release.tgz root@$LXC:/tmp/
ssh root@$LXC '
systemctl stop proxmox-monitor
mv /opt/proxmox-monitor/server{,.old}
tar -xzf /tmp/server_release.tgz -C /opt/proxmox-monitor
systemctl start proxmox-monitor # ExecStartPre runs migrate
'
Verify /health then delete server.old.
scp agent/dist/proxmox-monitor-agent_linux_amd64 \
root@$HOST:/usr/local/bin/proxmox-monitor-agent.new
ssh root@$HOST '
mv /usr/local/bin/proxmox-monitor-agent{.new,}
systemctl restart proxmox-monitor-agent
'
No DB on the host, so agent upgrades are trivially atomic.
Install as a cron inside the LXC — keeps 30 daily snapshots:
cat > /etc/cron.d/proxmox-monitor-backup <<'EOF'
30 3 * * * root install -d -m 0700 /var/backups/proxmox-monitor && \
sqlite3 /var/lib/proxmox-monitor/monitor.db \
".backup /var/backups/proxmox-monitor/monitor-$(date +\%Y-\%m-\%d).db" && \
find /var/backups/proxmox-monitor -name 'monitor-*.db' -mtime +30 -delete
EOF
SQLite's online-backup command is safe while the server is running.
Verify at least one run before declaring the rollout complete.
/health returns 200 with status:okSECRET_KEY_BASE in a password manager/etc/default/proxmox-monitor is 0600 root:root/etc/proxmox-monitor/agent.toml is 0600 root:root on every host| Symptom | First thing to check |
|---|---|
CERT_AUTHORITY_INVALID in browser | Caddy hasn't finished LE issuance. Wait 60s. journalctl -u caddy. |
| Login loops on correct password | DASHBOARD_PASSWORD_HASH mismatch. Regenerate and redeploy. |
| Card stays offline after agent restart | Wrong token or unknown_host. Check agent journal. |
| All agents reconnect every ~30s | Caddy read_timeout missing or too short. |
/health returns 503 | Process up but DB unreadable. Check permissions + DATABASE_PATH. |
| LXC can't bind port 4000 | Another process owns it. ss -ltnp | grep 4000. |
Agent logs {:enoent, "pvesh"} | Not a Proxmox host, or empty $PATH under systemd. |
/opt/proxmox-monitor/server/ release tree
/etc/default/proxmox-monitor env secrets, 0600
/etc/systemd/system/proxmox-monitor.service
/etc/caddy/Caddyfile
/var/lib/proxmox-monitor/monitor.db
/var/backups/proxmox-monitor/ daily backups
tcp 443 (caddy) → tcp 127.0.0.1:4000 (phoenix)
/usr/local/bin/proxmox-monitor-agent
/etc/proxmox-monitor/agent.toml token, 0600
/etc/systemd/system/proxmox-monitor-agent.service
/var/cache/proxmox-monitor-agent/ Burrito unpack
no listening ports
All four phases from the concept shipped: monitoring skeleton, ZFS/VM/storage collectors, LiveView dashboard, packaged binaries. The operator has the runbook; agents report; retention prunes; backups run. Everything else is iteration.
Full runbook: SETUP-AND-DEPLOY.md · Concept: proxmox-monitor-konzept.md