diff --git a/SETUP-AND-DEPLOY-slides.html b/SETUP-AND-DEPLOY-slides.html new file mode 100644 index 0000000..a10f2a1 --- /dev/null +++ b/SETUP-AND-DEPLOY-slides.html @@ -0,0 +1,1028 @@ + + +
+ + ++ Agent-server monitoring for Proxmox hosts. Elixir/OTP backend, Burrito-packaged agents, + Phoenix LiveView dashboard. This deck walks you from a clean environment to 20 hosts reporting, + in order, with verification at every step. +
+SETUP-AND-DEPLOY.md · ~2–3h end-to-end + host rollout time
+ Agents initiate outbound WSS — no inbound ports on Proxmox hosts.
+Unprivileged. Covers >20 agents comfortably.
+Verify: dig +short monitor.example.com
Caddy handles Let's Encrypt via HTTP-01.
+No inbound port required on hosts.
+SSH root access: hypervisor + every Proxmox host.
+-j JSON output)sqlite3 (optional)./scripts/build-linux.sh on the server LXC itself instead.
+ | Secret | How to generate |
|---|---|
DASHBOARD_PASSWORD_HASH |
+ mix run -e 'IO.puts(Argon2.hash_pwd_salt("<pw>"))' |
+
SECRET_KEY_BASE |
+ mix phx.gen.secret (64-byte base64) |
+
| Per-agent tokens | +Admin UI → Add host reveals token once | +
cd server && mix deps.get && mix test
+cd ../agent && mix deps.get && mix test
+ Never build a release from a branch with failing tests.
+cd server
+mix run -e 'IO.puts(Argon2.hash_pwd_salt("your-password"))'
+ Output looks like:
+$argon2id$v=19$m=65536,t=3,p=4$dSB9...$x0OQ...
+ $argon2id$... string into your password manager.
+ The plaintext password never leaves your head / password manager.
+ MIX_ENV=prod DASHBOARD_PASSWORD_HASH='placeholder' \
+ mix release --overwrite
+
+tar -czf /tmp/server_release.tgz -C _build/prod/rel server
+ls -lh /tmp/server_release.tgz
+ Expected: ~30–60 MB tarball.
+placeholder hash only needs to exist so config/runtime.exs
+ accepts it. The real hash is supplied on the LXC at start time.
+ cd ../agent
+./scripts/build-linux.sh
+ Expected output:
+Binaries written to /.../agent/dist:
+ proxmox-monitor-agent_linux_amd64
+ proxmox-monitor-agent_linux_arm64
+ Sanity check:
+file dist/proxmox-monitor-agent_linux_amd64 | grep 'ELF 64-bit'
+ First build: 5–10 min. Subsequent builds: seconds (Docker layer cache).
+On the hypervisor:
+pct create 200 \
+ /var/lib/vz/template/cache/debian-12-standard_12.7-1_amd64.tar.zst \
+ --hostname proxmox-monitor \
+ --memory 1024 --cores 2 \
+ --rootfs local-zfs:10 \
+ --net0 name=eth0,bridge=vmbr0,ip=dhcp \
+ --unprivileged 1 --features nesting=0 --onboot 1
+
+pct start 200
+pct exec 200 -- ip -4 addr show eth0 | grep -Po 'inet \K[\d.]+'
+ Save the IP as LXC_IP. Typos here cost hours.
pct enter 200 then:
apt-get update
+apt-get install -y ca-certificates curl gnupg \
+ debian-keyring debian-archive-keyring apt-transport-https sqlite3
+
+curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/gpg.key' | \
+ gpg --dearmor -o /usr/share/keyrings/caddy-stable-archive-keyring.gpg
+curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/debian.deb.txt' \
+ > /etc/apt/sources.list.d/caddy-stable.list
+
+apt-get update && apt-get install -y caddy
+caddy version
+ From your workstation:
+scp /tmp/server_release.tgz root@$LXC_IP:/tmp/
+ Inside the LXC:
+mkdir -p /opt/proxmox-monitor
+tar -xzf /tmp/server_release.tgz -C /opt/proxmox-monitor
+ls /opt/proxmox-monitor/server/bin/
+# server migrate server.bat migrate.bat
+ install -d -m 0700 /var/lib/proxmox-monitor
+
+cat > /etc/default/proxmox-monitor <<'EOF'
+DATABASE_PATH=/var/lib/proxmox-monitor/monitor.db
+SECRET_KEY_BASE=<paste-mix-phx.gen.secret-output>
+DASHBOARD_PASSWORD_HASH=<paste-$argon2id$-hash>
+PHX_SERVER=true
+PHX_HOST=monitor.example.com
+PORT=4000
+EOF
+chmod 0600 /etc/default/proxmox-monitor
+ $
+ characters in the Argon2 hash.
+ set -a; . /etc/default/proxmox-monitor; set +a
+/opt/proxmox-monitor/server/bin/server eval 'Server.Release.migrate()'
+
+sqlite3 /var/lib/proxmox-monitor/monitor.db '.tables'
+# hosts metrics schema_migrations
+ Then install the systemd unit (see runbook §3.6) with:
+ExecStartPre=/opt/proxmox-monitor/server/bin/server eval 'Server.Release.migrate()'
+ExecStart=/opt/proxmox-monitor/server/bin/server start
+Restart=always
+RestartSec=5
+ systemctl daemon-reload && systemctl enable --now proxmox-monitor
+ monitor.example.com {
+ reverse_proxy 127.0.0.1:4000 {
+ header_up X-Forwarded-Proto {scheme}
+ header_up X-Forwarded-For {remote_host}
+ transport http {
+ read_timeout 90s
+ dial_timeout 10s
+ }
+ }
+}
+ read_timeout 90s is critical. Without it, every agent's
+ WebSocket is torn down every ~30s and the dashboard stays permanently offline-looking.
+ caddy validate --config /etc/caddy/Caddyfile
+systemctl reload caddy
+ From anywhere on the internet:
+curl -s https://monitor.example.com/health
+ Expected:
+{"db":"ok","status":"ok","version":"0.1.0"}
+ Then browser:
+https://monitor.example.com/ → redirects to /loginDASHBOARD_PASSWORD_HASH was not pasted
+ correctly. Re-generate and redeploy §3.4.
+ /admin/hostspve-host-01)export HOST=pve-host-01
+
+scp agent/dist/proxmox-monitor-agent_linux_amd64 \
+ root@$HOST:/usr/local/bin/proxmox-monitor-agent
+ssh root@$HOST 'chmod 0755 /usr/local/bin/proxmox-monitor-agent'
+
+scp agent/rel/proxmox-monitor-agent.service \
+ root@$HOST:/etc/systemd/system/
+ On the host — write the TOML config:
+install -d -m 0700 /etc/proxmox-monitor /var/cache/proxmox-monitor-agent
+
+cat > /etc/proxmox-monitor/agent.toml <<'EOF'
+server_url = "wss://monitor.example.com/socket/websocket"
+token = "<paste-token-from-dashboard>"
+host_id = "pve-host-01"
+
+[intervals]
+fast_seconds = 30
+medium_seconds = 300
+slow_seconds = 1800
+EOF
+chmod 0600 /etc/proxmox-monitor/agent.toml
+ systemctl daemon-reload
+systemctl enable --now proxmox-monitor-agent
+journalctl -u proxmox-monitor-agent -f
+ Expected within 10s:
+agent: starting with host_id=pve-host-01
+reporter: connected, joining host:pve-host-01
+reporter: joined host:pve-host-01
+ Reload the dashboard — the card should be online (green border) + with Load / RAM / Pools / VMs populated.
+ssh root@$HOST \
+ 'systemctl stop proxmox-monitor-agent'
+ Dashboard card grey within ~1s.
+ssh root@$HOST \
+ 'systemctl start proxmox-monitor-agent'
+ Green again within 30s.
+Channel terminate callback didn't run — usually Caddy.
/etc/caddy/Caddyfile has read_timeout 90ssystemctl reload caddy after fixingPick non-critical hosts, or hosts with independent monitoring to fall back on.
+What to look for overnight:
+[error] lines in server logretention: pruned N stale samples (starts firing after 48h)systemctl restart proxmox-monitor on the server → all agents flip offline, then green within 30s. No stuck agents.After 3–4 hosts by hand, batch:
+for HOST in pve-host-04 pve-host-05 pve-host-06; do
+ echo "Register $HOST in admin UI, paste token:"
+ read -s TOKEN
+
+ scp agent/dist/proxmox-monitor-agent_linux_amd64 \
+ root@$HOST:/usr/local/bin/proxmox-monitor-agent
+ scp agent/rel/proxmox-monitor-agent.service \
+ root@$HOST:/etc/systemd/system/
+
+ ssh root@$HOST "chmod 0755 /usr/local/bin/proxmox-monitor-agent &&
+ install -d -m 0700 /etc/proxmox-monitor /var/cache/proxmox-monitor-agent &&
+ cat > /etc/proxmox-monitor/agent.toml <<EOF
+server_url = \"wss://monitor.example.com/socket/websocket\"
+token = \"$TOKEN\"
+host_id = \"$HOST\"
+EOF
+ chmod 0600 /etc/proxmox-monitor/agent.toml &&
+ systemctl daemon-reload &&
+ systemctl enable --now proxmox-monitor-agent"
+done
+ After each batch of ~5: spot-check cards, filter for offline, open a random host detail.
+ssh root@$HOST \
+ 'systemctl disable --now proxmox-monitor-agent'
+ systemctl stop proxmox-monitor
+systemctl stop caddy
+ systemctl stop proxmox-monitor
+rm -rf /opt/proxmox-monitor/server
+tar -xzf /tmp/server_release_PREV.tgz \
+ -C /opt/proxmox-monitor
+systemctl start proxmox-monitor
+ systemctl stop proxmox-monitor
+cp /var/backups/proxmox-monitor/monitor-YYYY-MM-DD.db \
+ /var/lib/proxmox-monitor/monitor.db
+systemctl start proxmox-monitor
+ Tokens survive DB restores. Metrics post-backup are lost (48h max by retention policy).
+cd server
+MIX_ENV=prod DASHBOARD_PASSWORD_HASH='placeholder' \
+ mix release --overwrite
+tar -czf /tmp/server_release.tgz -C _build/prod/rel server
+scp /tmp/server_release.tgz root@$LXC:/tmp/
+
+ssh root@$LXC '
+ systemctl stop proxmox-monitor
+ mv /opt/proxmox-monitor/server{,.old}
+ tar -xzf /tmp/server_release.tgz -C /opt/proxmox-monitor
+ systemctl start proxmox-monitor # ExecStartPre runs migrate
+'
+ Verify /health then delete server.old.
scp agent/dist/proxmox-monitor-agent_linux_amd64 \
+ root@$HOST:/usr/local/bin/proxmox-monitor-agent.new
+
+ssh root@$HOST '
+ mv /usr/local/bin/proxmox-monitor-agent{.new,}
+ systemctl restart proxmox-monitor-agent
+'
+ No DB on the host, so agent upgrades are trivially atomic.
+Install as a cron inside the LXC — keeps 30 daily snapshots:
+cat > /etc/cron.d/proxmox-monitor-backup <<'EOF'
+30 3 * * * root install -d -m 0700 /var/backups/proxmox-monitor && \
+ sqlite3 /var/lib/proxmox-monitor/monitor.db \
+ ".backup /var/backups/proxmox-monitor/monitor-$(date +\%Y-\%m-\%d).db" && \
+ find /var/backups/proxmox-monitor -name 'monitor-*.db' -mtime +30 -delete
+EOF
+ SQLite's online-backup command is safe while the server is running.
+Verify at least one run before declaring the rollout complete.
+/health returns 200 with status:okSECRET_KEY_BASE in a password manager/etc/default/proxmox-monitor is 0600 root:root/etc/proxmox-monitor/agent.toml is 0600 root:root on every host| Symptom | First thing to check |
|---|---|
CERT_AUTHORITY_INVALID in browser | Caddy hasn't finished LE issuance. Wait 60s. journalctl -u caddy. |
| Login loops on correct password | DASHBOARD_PASSWORD_HASH mismatch. Regenerate and redeploy. |
| Card stays offline after agent restart | Wrong token or unknown_host. Check agent journal. |
| All agents reconnect every ~30s | Caddy read_timeout missing or too short. |
/health returns 503 | Process up but DB unreadable. Check permissions + DATABASE_PATH. |
| LXC can't bind port 4000 | Another process owns it. ss -ltnp | grep 4000. |
Agent logs {:enoent, "pvesh"} | Not a Proxmox host, or empty $PATH under systemd. |
/opt/proxmox-monitor/server/ release tree
+/etc/default/proxmox-monitor env secrets, 0600
+/etc/systemd/system/proxmox-monitor.service
+/etc/caddy/Caddyfile
+/var/lib/proxmox-monitor/monitor.db
+/var/backups/proxmox-monitor/ daily backups
+
+tcp 443 (caddy) → tcp 127.0.0.1:4000 (phoenix)
+ /usr/local/bin/proxmox-monitor-agent
+/etc/proxmox-monitor/agent.toml token, 0600
+/etc/systemd/system/proxmox-monitor-agent.service
+/var/cache/proxmox-monitor-agent/ Burrito unpack
+
+no listening ports
+ + All four phases from the concept shipped: monitoring skeleton, ZFS/VM/storage collectors, + LiveView dashboard, packaged binaries. The operator has the runbook; agents report; retention + prunes; backups run. Everything else is iteration. +
+
+ Full runbook: SETUP-AND-DEPLOY.md · Concept: proxmox-monitor-konzept.md
+