From fe7b07db4f975b3472ca691653ded0ce59805b9a Mon Sep 17 00:00:00 2001 From: Carsten Date: Tue, 21 Apr 2026 22:59:24 +0200 Subject: [PATCH] fix(server): only require DASHBOARD_PASSWORD_HASH in prod Blocking bootstrap in dev meant you couldn't even run 'mix run' to generate the initial hash. Now dev/test accept an optional env override and boot without it; prod still raises when unset. --- ...026-04-21-phase2-metrics-und-collectors.md | 2081 +++++++++++++++++ .../2026-04-21-phase3-liveview-dashboard.md | 1889 +++++++++++++++ server/config/runtime.exs | 8 +- 3 files changed, 3977 insertions(+), 1 deletion(-) create mode 100644 docs/superpowers/plans/2026-04-21-phase2-metrics-und-collectors.md create mode 100644 docs/superpowers/plans/2026-04-21-phase3-liveview-dashboard.md diff --git a/docs/superpowers/plans/2026-04-21-phase2-metrics-und-collectors.md b/docs/superpowers/plans/2026-04-21-phase2-metrics-und-collectors.md new file mode 100644 index 0000000..c25e2bd --- /dev/null +++ b/docs/superpowers/plans/2026-04-21-phase2-metrics-und-collectors.md @@ -0,0 +1,2081 @@ +# Phase 2 — Metrics Persistence + ZFS/VM/Storage Collectors + +> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. + +**Goal:** Persist agent samples in SQLite with 48h retention, expose recent samples via a simple JSON route, and ship agent collectors for ZFS pools/datasets, Proxmox storage, VM/LXC runtime, and system info. End state: agent pushes rich samples on fast/medium/slow intervals; server stores them; `GET /api/hosts/:name` returns the latest data. + +**Architecture:** Channel handlers delegate to a new `Server.Metrics` context that writes to a single `metrics` table with a JSON `payload` column. A `Retention` GenServer prunes old rows hourly. Agent gains a tiny command-runner abstraction so each external-command collector (`zfs`, `pvesh`, `apt`, `pveversion`) can be unit-tested with fixture output on macOS. Reporter schedules medium and slow collection alongside the existing fast path. + +**Tech Stack:** Elixir/OTP, Phoenix, Ecto SQLite with JSON column support (`%{...}` stored as TEXT), ExUnit, fixture-based testing for external commands. + +--- + +## File Structure + +``` +server/ +├── priv/repo/migrations/_create_metrics.exs create +├── lib/server/schema/metric.ex create +├── lib/server/metrics.ex create (context) +├── lib/server/retention.ex create (GenServer) +├── lib/server/application.ex modify: add Retention to children +├── lib/server_web/channels/host_channel.ex modify: call Metrics.record_sample +├── lib/server_web/controllers/host_controller.ex create +├── lib/server_web/router.ex modify: add /api/hosts/:name route +├── test/server/metrics_test.exs create +├── test/server/retention_test.exs create +├── test/server_web/channels/host_channel_test.exs modify: assert DB write +└── test/server_web/controllers/host_controller_test.exs create + +agent/ +├── lib/proxmox_agent/shell.ex create (System.cmd wrapper) +├── lib/proxmox_agent/collectors/zfs.ex create +├── lib/proxmox_agent/collectors/storage.ex create +├── lib/proxmox_agent/collectors/vms.ex create +├── lib/proxmox_agent/collectors/system_info.ex create +├── lib/proxmox_agent/reporter.ex modify: medium + slow handlers +├── test/proxmox_agent/shell_test.exs create +├── test/proxmox_agent/collectors/zfs_test.exs create +├── test/proxmox_agent/collectors/storage_test.exs create +├── test/proxmox_agent/collectors/vms_test.exs create +├── test/proxmox_agent/collectors/system_info_test.exs create +└── test/fixtures/ create + ├── zfs/zpool_list.json + ├── zfs/zpool_status.json + ├── zfs/zfs_list.json + ├── pvesh/storage.json + ├── pvesh/qemu.json + ├── pvesh/lxc.json + ├── pvesh/qemu_N_status.json + ├── pvesh/qemu_N_config.json + ├── pvesh/qemu_N_agent_interfaces.json + ├── system/pveversion.txt + ├── system/zfs_version.txt + └── system/apt_upgradable.txt +``` + +**Shell abstraction rationale:** Each external-command collector takes a `:runner` keyword option — a `(cmd, args) -> {:ok, output} | {:error, term}` function. Default is `ProxmoxAgent.Shell.run/2` which wraps `System.cmd`. Tests inject a fake that returns fixture content. Same pattern as `proc_dir:` in the existing host collector — consistent, no new abstractions. + +**Sample payload contract (what the agent sends):** + +| Event | Data keys | +|---------------|---------------------------------------------------------------| +| `metric:fast` | `host`, `zfs_pools`, `storage`, `vms_runtime` | +| `metric:medium` | `zfs_datasets`, `vms_detail` | +| `metric:slow` | `system_info` | + +Each key's value is a collector-produced map. `errors` inside each collector allows partial samples. + +--- + +## Task 1: Server — Metrics Schema & Context + +**Files:** +- Create: `server/priv/repo/migrations/_create_metrics.exs` +- Create: `server/lib/server/schema/metric.ex` +- Create: `server/lib/server/metrics.ex` +- Create: `server/test/server/metrics_test.exs` + +- [ ] **Step 1: Generate migration** + +```bash +cd /Users/cabele/claudeprojects/proxmox_monitor/server +mix ecto.gen.migration create_metrics +``` + +Replace the contents of the newly-created migration file with: + +```elixir +defmodule Server.Repo.Migrations.CreateMetrics do + use Ecto.Migration + + def change do + create table(:metrics) do + add :host_id, references(:hosts, on_delete: :delete_all), null: false + add :collected_at, :utc_datetime_usec, null: false + add :interval_type, :string, null: false + add :payload, :string, null: false + + timestamps(type: :utc_datetime_usec, updated_at: false) + end + + create index(:metrics, [:host_id, :collected_at]) + create index(:metrics, [:collected_at]) + end +end +``` + +Note: SQLite has no native JSON column — use TEXT (`:string`). Ecto's JSON handling happens at the schema layer via the `{:map, Jason}` type. + +- [ ] **Step 2: Write the Metric schema** + +Create `server/lib/server/schema/metric.ex`: + +```elixir +defmodule Server.Schema.Metric do + use Ecto.Schema + import Ecto.Changeset + + @intervals ~w(fast medium slow) + + schema "metrics" do + belongs_to :host, Server.Schema.Host + field :collected_at, :utc_datetime_usec + field :interval_type, :string + field :payload, :map + + timestamps(type: :utc_datetime_usec, updated_at: false) + end + + def changeset(metric, attrs) do + metric + |> cast(attrs, [:host_id, :collected_at, :interval_type, :payload]) + |> validate_required([:host_id, :collected_at, :interval_type, :payload]) + |> validate_inclusion(:interval_type, @intervals) + |> assoc_constraint(:host) + end +end +``` + +`ecto_sqlite3` serializes `:map` fields to JSON TEXT transparently — no extra config needed. + +- [ ] **Step 3: Write failing tests for the Metrics context** + +Create `server/test/server/metrics_test.exs`: + +```elixir +defmodule Server.MetricsTest do + use Server.DataCase, async: true + + alias Server.Metrics + alias Server.Hosts + + setup do + {:ok, {host, _token}} = Hosts.create_host("pve-01") + %{host: host} + end + + describe "record_sample/4" do + test "inserts a metric row with the given payload", %{host: host} do + ts = DateTime.utc_now() + payload = %{"host" => %{"load1" => 0.5}} + + assert {:ok, metric} = Metrics.record_sample(host.id, "fast", ts, payload) + assert metric.host_id == host.id + assert metric.interval_type == "fast" + assert metric.payload == payload + assert metric.collected_at == ts + end + + test "rejects unknown interval_type", %{host: host} do + ts = DateTime.utc_now() + assert {:error, cs} = Metrics.record_sample(host.id, "nope", ts, %{}) + assert %{interval_type: ["is invalid"]} = errors_on(cs) + end + + test "rejects unknown host_id" do + ts = DateTime.utc_now() + assert {:error, cs} = Metrics.record_sample(999_999, "fast", ts, %{}) + assert %{host: ["does not exist"]} = errors_on(cs) + end + end + + describe "latest_sample/2" do + test "returns most recent sample for a host and interval", %{host: host} do + {:ok, _} = Metrics.record_sample(host.id, "fast", dt(-60), %{"v" => 1}) + {:ok, _} = Metrics.record_sample(host.id, "fast", dt(-30), %{"v" => 2}) + {:ok, _} = Metrics.record_sample(host.id, "fast", dt(-10), %{"v" => 3}) + + assert %{payload: %{"v" => 3}} = Metrics.latest_sample(host.id, "fast") + end + + test "returns nil when no samples exist", %{host: host} do + assert Metrics.latest_sample(host.id, "fast") == nil + end + end + + describe "delete_older_than/1" do + test "deletes samples with collected_at before the cutoff", %{host: host} do + {:ok, _} = Metrics.record_sample(host.id, "fast", dt(-3600 * 50), %{"v" => "old"}) + {:ok, keep} = Metrics.record_sample(host.id, "fast", dt(-60), %{"v" => "fresh"}) + + cutoff = DateTime.add(DateTime.utc_now(), -48 * 3600, :second) + assert {1, nil} = Metrics.delete_older_than(cutoff) + + remaining = Server.Repo.all(Server.Schema.Metric) + assert length(remaining) == 1 + assert hd(remaining).id == keep.id + end + end + + defp dt(offset_seconds), do: DateTime.add(DateTime.utc_now(), offset_seconds, :second) +end +``` + +- [ ] **Step 4: Run tests — expect compile failure (`Server.Metrics` not defined)** + +```bash +mix test test/server/metrics_test.exs 2>&1 | tail -10 +``` + +Expected: module undefined errors. + +- [ ] **Step 5: Implement the context** + +Create `server/lib/server/metrics.ex`: + +```elixir +defmodule Server.Metrics do + @moduledoc "Metric sample storage and retrieval." + + import Ecto.Query + + alias Server.Repo + alias Server.Schema.Metric + + @spec record_sample(integer(), String.t(), DateTime.t(), map()) :: + {:ok, Metric.t()} | {:error, Ecto.Changeset.t()} + def record_sample(host_id, interval_type, collected_at, payload) do + %Metric{} + |> Metric.changeset(%{ + host_id: host_id, + interval_type: interval_type, + collected_at: collected_at, + payload: payload + }) + |> Repo.insert() + end + + @spec latest_sample(integer(), String.t()) :: Metric.t() | nil + def latest_sample(host_id, interval_type) do + from(m in Metric, + where: m.host_id == ^host_id and m.interval_type == ^interval_type, + order_by: [desc: m.collected_at], + limit: 1 + ) + |> Repo.one() + end + + @spec delete_older_than(DateTime.t()) :: {non_neg_integer(), nil} + def delete_older_than(%DateTime{} = cutoff) do + from(m in Metric, where: m.collected_at < ^cutoff) + |> Repo.delete_all() + end +end +``` + +- [ ] **Step 6: Run the migration and tests** + +```bash +mix ecto.migrate && mix test test/server/metrics_test.exs 2>&1 | tail -6 +``` + +Expected: 6 tests pass. + +- [ ] **Step 7: Commit** + +```bash +cd /Users/cabele/claudeprojects/proxmox_monitor +git add server/priv/repo/migrations server/lib/server/schema/metric.ex server/lib/server/metrics.ex server/test/server/metrics_test.exs +git commit -m "feat(server): metrics schema + context with record/latest/prune" +``` + +--- + +## Task 2: Server — Channel Writes Samples + +**Files:** +- Modify: `server/lib/server_web/channels/host_channel.ex` +- Modify: `server/test/server_web/channels/host_channel_test.exs` + +- [ ] **Step 1: Extend existing tests to assert DB writes** + +Open `server/test/server_web/channels/host_channel_test.exs` and replace the `describe "metric:fast event" do ... end` block with: + +```elixir + describe "metric events persist to DB" do + setup %{token: token, host: host} do + {:ok, socket} = connect(AgentSocket, %{}) + + {:ok, _reply, joined} = + subscribe_and_join(socket, "host:pve-01", %{ + "token" => token, + "agent_version" => "0.1.0" + }) + + %{socket: joined, host: host} + end + + test "metric:fast is stored with interval=fast", %{socket: socket, host: host} do + ts = "2026-04-21T12:00:00.123456Z" + + ref = + push(socket, "metric:fast", %{ + "collected_at" => ts, + "data" => %{"cpu_percent" => 12.3, "load1" => 0.2} + }) + + assert_reply ref, :ok + + sample = Server.Metrics.latest_sample(host.id, "fast") + assert sample != nil + assert sample.payload == %{"cpu_percent" => 12.3, "load1" => 0.2} + {:ok, expected, _} = DateTime.from_iso8601(ts) + assert DateTime.compare(sample.collected_at, expected) == :eq + end + + test "metric:medium is stored with interval=medium", %{socket: socket, host: host} do + ref = + push(socket, "metric:medium", %{ + "collected_at" => "2026-04-21T12:05:00Z", + "data" => %{"vms_detail" => []} + }) + + assert_reply ref, :ok + + sample = Server.Metrics.latest_sample(host.id, "medium") + assert sample != nil + assert sample.payload == %{"vms_detail" => []} + end + + test "metric:slow is stored with interval=slow", %{socket: socket, host: host} do + ref = + push(socket, "metric:slow", %{ + "collected_at" => "2026-04-21T12:30:00Z", + "data" => %{"system_info" => %{"pveversion" => "8.3.0"}} + }) + + assert_reply ref, :ok + + sample = Server.Metrics.latest_sample(host.id, "slow") + assert sample != nil + assert sample.payload == %{"system_info" => %{"pveversion" => "8.3.0"}} + end + + test "replies :error when collected_at is missing", %{socket: socket} do + ref = push(socket, "metric:fast", %{"data" => %{}}) + assert_reply ref, :error, %{reason: "missing_collected_at"} + end + + test "replies :error when data is missing", %{socket: socket} do + ref = push(socket, "metric:fast", %{"collected_at" => "2026-04-21T12:00:00Z"}) + assert_reply ref, :error, %{reason: "missing_data"} + end + end +``` + +- [ ] **Step 2: Run tests — expect failure** + +```bash +mix test test/server_web/channels/host_channel_test.exs 2>&1 | tail -10 +``` + +Expected: tests in the new `describe` block fail because the channel doesn't persist yet. + +- [ ] **Step 3: Update HostChannel to persist** + +Replace the three `handle_in("metric:..."/2, ...)` clauses in `server/lib/server_web/channels/host_channel.ex` with: + +```elixir + @impl true + def handle_in("metric:" <> kind, payload, socket) when kind in ~w(fast medium slow) do + with {:ok, collected_at} <- parse_collected_at(payload), + {:ok, data} <- parse_data(payload), + {:ok, _} <- Server.Metrics.record_sample(socket.assigns.host_id, kind, collected_at, data) do + Logger.debug("stored #{kind} sample host=#{socket.assigns.host_name}") + {:reply, :ok, socket} + else + {:error, reason} when is_binary(reason) -> + Logger.warning("metric:#{kind} rejected host=#{socket.assigns.host_name} reason=#{reason}") + {:reply, {:error, %{reason: reason}}, socket} + + {:error, %Ecto.Changeset{} = cs} -> + Logger.warning("metric:#{kind} changeset failed: #{inspect(cs.errors)}") + {:reply, {:error, %{reason: "invalid_payload"}}, socket} + end + end + + defp parse_collected_at(%{"collected_at" => ts}) when is_binary(ts) do + case DateTime.from_iso8601(ts) do + {:ok, dt, _} -> {:ok, dt} + _ -> {:error, "invalid_collected_at"} + end + end + + defp parse_collected_at(_), do: {:error, "missing_collected_at"} + + defp parse_data(%{"data" => data}) when is_map(data), do: {:ok, data} + defp parse_data(_), do: {:error, "missing_data"} +``` + +- [ ] **Step 4: Run tests — expect pass** + +```bash +mix test test/server_web/channels/host_channel_test.exs 2>&1 | tail -5 +``` + +Expected: all channel tests pass. + +- [ ] **Step 5: Run full server suite** + +```bash +mix test 2>&1 | tail -4 +``` + +Expected: all green. + +- [ ] **Step 6: Commit** + +```bash +cd /Users/cabele/claudeprojects/proxmox_monitor +git add server/lib/server_web/channels/host_channel.ex server/test/server_web/channels/host_channel_test.exs +git commit -m "feat(server): channel persists fast/medium/slow samples to metrics table" +``` + +--- + +## Task 3: Server — Retention GenServer + +**Files:** +- Create: `server/lib/server/retention.ex` +- Create: `server/test/server/retention_test.exs` +- Modify: `server/lib/server/application.ex` + +- [ ] **Step 1: Write failing test** + +Create `server/test/server/retention_test.exs`: + +```elixir +defmodule Server.RetentionTest do + use Server.DataCase, async: false + + alias Server.{Hosts, Metrics, Retention} + + test "prune_now/1 deletes samples older than the retention window" do + {:ok, {host, _}} = Hosts.create_host("pve-01") + stale_at = DateTime.add(DateTime.utc_now(), -49 * 3600, :second) + fresh_at = DateTime.add(DateTime.utc_now(), -60, :second) + + {:ok, _} = Metrics.record_sample(host.id, "fast", stale_at, %{"x" => 1}) + {:ok, fresh} = Metrics.record_sample(host.id, "fast", fresh_at, %{"x" => 2}) + + {deleted, _} = Retention.prune_now(48 * 3600) + + assert deleted == 1 + remaining = Server.Repo.all(Server.Schema.Metric) + assert length(remaining) == 1 + assert hd(remaining).id == fresh.id + end +end +``` + +- [ ] **Step 2: Run test — expect failure** + +```bash +mix test test/server/retention_test.exs 2>&1 | tail -5 +``` + +Expected: `Server.Retention` undefined. + +- [ ] **Step 3: Implement the GenServer** + +Create `server/lib/server/retention.ex`: + +```elixir +defmodule Server.Retention do + @moduledoc "Deletes metric samples older than the retention window. Runs hourly." + + use GenServer + require Logger + + @default_retention_seconds 48 * 60 * 60 + @default_interval_ms 60 * 60 * 1_000 + + def start_link(opts) do + GenServer.start_link(__MODULE__, opts, name: __MODULE__) + end + + @doc "Synchronous prune used by tests and manual ops." + def prune_now(retention_seconds \\ @default_retention_seconds) do + cutoff = DateTime.add(DateTime.utc_now(), -retention_seconds, :second) + Server.Metrics.delete_older_than(cutoff) + end + + @impl true + def init(opts) do + retention_seconds = Keyword.get(opts, :retention_seconds, @default_retention_seconds) + interval_ms = Keyword.get(opts, :interval_ms, @default_interval_ms) + state = %{retention_seconds: retention_seconds, interval_ms: interval_ms} + Process.send_after(self(), :prune, interval_ms) + {:ok, state} + end + + @impl true + def handle_info(:prune, state) do + {count, _} = prune_now(state.retention_seconds) + if count > 0, do: Logger.info("retention: pruned #{count} stale samples") + Process.send_after(self(), :prune, state.interval_ms) + {:noreply, state} + end +end +``` + +- [ ] **Step 4: Run test — expect pass** + +```bash +mix test test/server/retention_test.exs 2>&1 | tail -5 +``` + +Expected: 1 test passes. + +- [ ] **Step 5: Add Retention to the supervision tree** + +In `server/lib/server/application.ex`, find the children list and add `Server.Retention` **after** `{Phoenix.PubSub, name: Server.PubSub}`: + +```elixir + {Phoenix.PubSub, name: Server.PubSub}, + Server.Retention, + # Start a worker by calling: Server.Worker.start_link(arg) +``` + +- [ ] **Step 6: Run full test suite** + +```bash +mix test 2>&1 | tail -4 +``` + +Expected: all green. + +- [ ] **Step 7: Commit** + +```bash +cd /Users/cabele/claudeprojects/proxmox_monitor +git add server/lib/server/retention.ex server/test/server/retention_test.exs server/lib/server/application.ex +git commit -m "feat(server): retention GenServer prunes samples older than 48h hourly" +``` + +--- + +## Task 4: Server — Simple JSON Route + +**Files:** +- Create: `server/lib/server_web/controllers/host_controller.ex` +- Create: `server/test/server_web/controllers/host_controller_test.exs` +- Modify: `server/lib/server_web/router.ex` + +- [ ] **Step 1: Write failing test** + +Create `server/test/server_web/controllers/host_controller_test.exs`: + +```elixir +defmodule ServerWeb.HostControllerTest do + use ServerWeb.ConnCase, async: true + + alias Server.{Hosts, Metrics} + + describe "GET /api/hosts/:name" do + setup do + {:ok, {host, _token}} = Hosts.create_host("pve-01") + + {:ok, _} = + Metrics.record_sample( + host.id, + "fast", + DateTime.utc_now(), + %{"host" => %{"load1" => 0.5}} + ) + + %{host: host} + end + + test "returns host info and latest samples", %{conn: conn, host: host} do + conn = get(conn, ~p"/api/hosts/#{host.name}") + + assert %{ + "name" => "pve-01", + "status" => _, + "samples" => %{"fast" => %{"payload" => %{"host" => %{"load1" => 0.5}}}} + } = json_response(conn, 200) + end + + test "returns 404 for unknown host", %{conn: conn} do + conn = get(conn, ~p"/api/hosts/nope") + assert json_response(conn, 404) == %{"error" => "host_not_found"} + end + end +end +``` + +- [ ] **Step 2: Add route** + +In `server/lib/server_web/router.ex`, add after the existing `scope "/"` block: + +```elixir + scope "/api", ServerWeb do + pipe_through :api + + get "/hosts/:name", HostController, :show + end +``` + +Verify there is already a `:api` pipeline above; in Phoenix 1.7 scaffold it exists. If not, add: + +```elixir + pipeline :api do + plug :accepts, ["json"] + end +``` + +(Check the file first; add only if missing.) + +- [ ] **Step 3: Run test — expect failure (controller missing)** + +```bash +mix test test/server_web/controllers/host_controller_test.exs 2>&1 | tail -8 +``` + +Expected: undefined `ServerWeb.HostController`. + +- [ ] **Step 4: Implement controller** + +Create `server/lib/server_web/controllers/host_controller.ex`: + +```elixir +defmodule ServerWeb.HostController do + use ServerWeb, :controller + + alias Server.{Metrics, Repo, Schema.Host} + + def show(conn, %{"name" => name}) do + case Repo.get_by(Host, name: name) do + nil -> + conn + |> put_status(:not_found) + |> json(%{error: "host_not_found"}) + + %Host{} = host -> + samples = + for interval <- ~w(fast medium slow), + sample = Metrics.latest_sample(host.id, interval), + into: %{} do + {interval, %{collected_at: sample.collected_at, payload: sample.payload}} + end + + json(conn, %{ + name: host.name, + status: host.status, + agent_version: host.agent_version, + last_seen_at: host.last_seen_at, + samples: samples + }) + end + end +end +``` + +- [ ] **Step 5: Run test — expect pass** + +```bash +mix test test/server_web/controllers/host_controller_test.exs 2>&1 | tail -5 +``` + +Expected: 2 tests pass. + +- [ ] **Step 6: Run full test suite** + +```bash +mix test 2>&1 | tail -4 +``` + +Expected: all green. + +- [ ] **Step 7: Commit** + +```bash +cd /Users/cabele/claudeprojects/proxmox_monitor +git add server/lib/server_web/controllers/host_controller.ex server/lib/server_web/router.ex server/test/server_web/controllers/host_controller_test.exs +git commit -m "feat(server): GET /api/hosts/:name returns latest fast/medium/slow samples" +``` + +--- + +## Task 5: Agent — Shell Runner + +**Files:** +- Create: `agent/lib/proxmox_agent/shell.ex` +- Create: `agent/test/proxmox_agent/shell_test.exs` + +- [ ] **Step 1: Write failing test** + +Create `agent/test/proxmox_agent/shell_test.exs`: + +```elixir +defmodule ProxmoxAgent.ShellTest do + use ExUnit.Case, async: true + + alias ProxmoxAgent.Shell + + test "run/2 returns {:ok, output} on zero exit" do + assert {:ok, output} = Shell.run("/bin/echo", ["hello"]) + assert String.trim(output) == "hello" + end + + test "run/2 returns {:error, {:nonzero_exit, code, output}} on non-zero exit" do + assert {:error, {:nonzero_exit, code, _}} = Shell.run("/bin/sh", ["-c", "exit 7"]) + assert code == 7 + end + + test "run/2 returns {:error, {:enoent, _}} when binary is missing" do + assert {:error, {:enoent, _}} = Shell.run("/does/not/exist/nope", []) + end +end +``` + +- [ ] **Step 2: Run test — expect failure** + +```bash +cd /Users/cabele/claudeprojects/proxmox_monitor/agent +mix test test/proxmox_agent/shell_test.exs 2>&1 | tail -5 +``` + +Expected: module undefined. + +- [ ] **Step 3: Implement Shell** + +Create `agent/lib/proxmox_agent/shell.ex`: + +```elixir +defmodule ProxmoxAgent.Shell do + @moduledoc """ + Thin wrapper over System.cmd for testability. Collectors accept an optional + :runner function of this shape so tests can inject fixture-backed fakes. + """ + + @type result :: {:ok, String.t()} | {:error, term()} + + @spec run(String.t(), [String.t()]) :: result + def run(command, args) do + try do + case System.cmd(command, args, stderr_to_stdout: true) do + {output, 0} -> {:ok, output} + {output, code} -> {:error, {:nonzero_exit, code, output}} + end + rescue + e in ErlangError -> + case e.original do + :enoent -> {:error, {:enoent, command}} + other -> {:error, {:system_error, other}} + end + end + end +end +``` + +- [ ] **Step 4: Run test — expect pass** + +```bash +mix test test/proxmox_agent/shell_test.exs 2>&1 | tail -5 +``` + +Expected: 3 tests pass. + +- [ ] **Step 5: Commit** + +```bash +cd /Users/cabele/claudeprojects/proxmox_monitor +git add agent/lib/proxmox_agent/shell.ex agent/test/proxmox_agent/shell_test.exs +git commit -m "feat(agent): Shell.run wrapper for testable external commands" +``` + +--- + +## Task 6: Agent — ZFS Collector + +**Files:** +- Create: `agent/test/fixtures/zfs/zpool_list.json` +- Create: `agent/test/fixtures/zfs/zpool_status.json` +- Create: `agent/test/fixtures/zfs/zfs_list.json` +- Create: `agent/test/proxmox_agent/collectors/zfs_test.exs` +- Create: `agent/lib/proxmox_agent/collectors/zfs.ex` + +- [ ] **Step 1: Write fixture JSON** + +Create `agent/test/fixtures/zfs/zpool_list.json`: + +```json +{ + "output_version": { "command": "zpool list", "vers_major": 0, "vers_minor": 1 }, + "pools": { + "rpool": { + "name": "rpool", + "size": 500000000000, + "alloc": 200000000000, + "free": 300000000000, + "frag": 17, + "cap": 40, + "health": "ONLINE" + }, + "tank": { + "name": "tank", + "size": 8000000000000, + "alloc": 6000000000000, + "free": 2000000000000, + "frag": 55, + "cap": 75, + "health": "DEGRADED" + } + } +} +``` + +Create `agent/test/fixtures/zfs/zpool_status.json`: + +```json +{ + "output_version": { "command": "zpool status", "vers_major": 0, "vers_minor": 1 }, + "pools": { + "rpool": { + "name": "rpool", + "state": "ONLINE", + "scan": { + "function": "scrub", + "state": "FINISHED", + "end_time": "Sat Apr 19 02:00:00 2026" + }, + "error_count": "0", + "vdevs": { + "mirror-0": { + "name": "mirror-0", + "vdev_type": "mirror", + "state": "ONLINE", + "read_errors": "0", + "write_errors": "0", + "checksum_errors": "0" + } + } + }, + "tank": { + "name": "tank", + "state": "DEGRADED", + "scan": { + "function": "scrub", + "state": "FINISHED", + "end_time": "Tue Mar 01 08:00:00 2026" + }, + "error_count": "2", + "vdevs": { + "raidz2-0": { + "name": "raidz2-0", + "vdev_type": "raidz2", + "state": "DEGRADED", + "read_errors": "0", + "write_errors": "0", + "checksum_errors": "2" + } + } + } + } +} +``` + +Create `agent/test/fixtures/zfs/zfs_list.json`: + +```json +{ + "output_version": { "command": "zfs list", "vers_major": 0, "vers_minor": 1 }, + "datasets": { + "rpool": { + "name": "rpool", + "type": "FILESYSTEM", + "properties": { + "used": { "value": "200000000000" }, + "available": { "value": "300000000000" }, + "usedbysnapshots": { "value": "5000000000" } + } + }, + "rpool/data": { + "name": "rpool/data", + "type": "FILESYSTEM", + "properties": { + "used": { "value": "100000000000" }, + "available": { "value": "300000000000" }, + "usedbysnapshots": { "value": "2000000000" } + } + }, + "rpool/data@daily-2026-04-20": { + "name": "rpool/data@daily-2026-04-20", + "type": "SNAPSHOT", + "properties": { + "creation": { "value": "1745107200" } + } + }, + "rpool/data@daily-2026-04-21": { + "name": "rpool/data@daily-2026-04-21", + "type": "SNAPSHOT", + "properties": { + "creation": { "value": "1745193600" } + } + } + } +} +``` + +The Unix timestamps `1745107200` and `1745193600` correspond to 2025-04-20 and 2025-04-21 — plausible ages for tests. + +- [ ] **Step 2: Write failing tests** + +Create `agent/test/proxmox_agent/collectors/zfs_test.exs`: + +```elixir +defmodule ProxmoxAgent.Collectors.ZfsTest do + use ExUnit.Case, async: true + + alias ProxmoxAgent.Collectors.Zfs + + @fixtures Path.expand("../../fixtures/zfs", __DIR__) + + defp fake_runner do + fn + "zpool", ["list", "-j", "--json-int"] -> + {:ok, File.read!(Path.join(@fixtures, "zpool_list.json"))} + + "zpool", ["status", "-j", "--json-flat-vdevs", "--json-int"] -> + {:ok, File.read!(Path.join(@fixtures, "zpool_status.json"))} + + "zfs", ["list", "-j", "--json-int", "-t", "all"] -> + {:ok, File.read!(Path.join(@fixtures, "zfs_list.json"))} + end + end + + describe "collect_pools/1" do + test "returns a summary per pool" do + sample = Zfs.collect_pools(runner: fake_runner()) + assert is_list(sample.pools) + assert length(sample.pools) == 2 + rpool = Enum.find(sample.pools, &(&1.name == "rpool")) + tank = Enum.find(sample.pools, &(&1.name == "tank")) + + assert rpool.health == "ONLINE" + assert rpool.capacity_percent == 40 + assert rpool.fragmentation_percent == 17 + assert rpool.size_bytes == 500_000_000_000 + assert rpool.error_count == 0 + assert rpool.degraded_vdev_count == 0 + + assert tank.health == "DEGRADED" + assert tank.error_count == 2 + assert tank.degraded_vdev_count == 1 + end + + test "populates errors list when zpool fails" do + failing = fn _, _ -> {:error, {:enoent, "zpool"}} end + sample = Zfs.collect_pools(runner: failing) + assert sample.pools == [] + assert length(sample.errors) >= 1 + end + end + + describe "collect_datasets/1" do + test "returns datasets and per-dataset snapshot summary" do + sample = Zfs.collect_datasets(runner: fake_runner()) + assert length(sample.datasets) == 2 + + rpool_data = Enum.find(sample.datasets, &(&1.name == "rpool/data")) + assert rpool_data.used_bytes == 100_000_000_000 + assert rpool_data.usedbysnapshots_bytes == 2_000_000_000 + assert rpool_data.snapshot_count == 2 + assert rpool_data.newest_snapshot_unix == 1_745_193_600 + assert rpool_data.oldest_snapshot_unix == 1_745_107_200 + end + end +end +``` + +- [ ] **Step 3: Run tests — expect failure** + +```bash +mix test test/proxmox_agent/collectors/zfs_test.exs 2>&1 | tail -10 +``` + +Expected: `ProxmoxAgent.Collectors.Zfs` undefined. + +- [ ] **Step 4: Implement Zfs collector** + +Create `agent/lib/proxmox_agent/collectors/zfs.ex`: + +```elixir +defmodule ProxmoxAgent.Collectors.Zfs do + @moduledoc """ + Collects ZFS pool health (fast path) and dataset/snapshot info (medium path). + Delegates shelling out to an injectable runner so tests can supply fixtures. + """ + + @type pool_summary :: %{ + name: String.t(), + health: String.t(), + size_bytes: non_neg_integer(), + allocated_bytes: non_neg_integer(), + free_bytes: non_neg_integer(), + fragmentation_percent: non_neg_integer(), + capacity_percent: non_neg_integer(), + error_count: non_neg_integer(), + vdev_count: non_neg_integer(), + degraded_vdev_count: non_neg_integer(), + last_scrub_end: String.t() | nil + } + + @spec collect_pools(keyword()) :: %{pools: [pool_summary()], errors: [map()]} + def collect_pools(opts \\ []) do + runner = runner(opts) + + {list_result, list_err} = decode(runner.("zpool", ["list", "-j", "--json-int"]), :zpool_list) + + {status_result, status_err} = + decode(runner.("zpool", ["status", "-j", "--json-flat-vdevs", "--json-int"]), :zpool_status) + + pools = merge_pools(list_result, status_result) + errors = Enum.filter([list_err, status_err], & &1) + %{pools: pools, errors: errors} + end + + @type dataset_summary :: %{ + name: String.t(), + used_bytes: non_neg_integer(), + usedbysnapshots_bytes: non_neg_integer(), + snapshot_count: non_neg_integer(), + newest_snapshot_unix: non_neg_integer() | nil, + oldest_snapshot_unix: non_neg_integer() | nil + } + + @spec collect_datasets(keyword()) :: %{datasets: [dataset_summary()], errors: [map()]} + def collect_datasets(opts \\ []) do + runner = runner(opts) + + {list_result, err} = + decode(runner.("zfs", ["list", "-j", "--json-int", "-t", "all"]), :zfs_list) + + datasets = summarize_datasets(list_result) + errors = if err, do: [err], else: [] + %{datasets: datasets, errors: errors} + end + + defp runner(opts), do: Keyword.get(opts, :runner, &ProxmoxAgent.Shell.run/2) + + defp decode({:ok, body}, _tag) do + case Jason.decode(body) do + {:ok, map} -> {map, nil} + {:error, e} -> {nil, %{tag: "decode", message: Exception.message(e)}} + end + end + + defp decode({:error, reason}, tag), do: {nil, %{tag: Atom.to_string(tag), message: inspect(reason)}} + + defp merge_pools(nil, _), do: [] + defp merge_pools(_, nil), do: [] + + defp merge_pools(%{"pools" => list_pools}, %{"pools" => status_pools}) do + Enum.map(list_pools, fn {name, list_info} -> + status_info = Map.get(status_pools, name, %{}) + vdevs = Map.get(status_info, "vdevs", %{}) |> Map.values() + + %{ + name: name, + health: Map.get(list_info, "health"), + size_bytes: Map.get(list_info, "size", 0), + allocated_bytes: Map.get(list_info, "alloc", 0), + free_bytes: Map.get(list_info, "free", 0), + fragmentation_percent: Map.get(list_info, "frag", 0), + capacity_percent: Map.get(list_info, "cap", 0), + error_count: to_int(Map.get(status_info, "error_count", "0")), + vdev_count: length(vdevs), + degraded_vdev_count: Enum.count(vdevs, &(&1["state"] != "ONLINE")), + last_scrub_end: get_in(status_info, ["scan", "end_time"]) + } + end) + end + + defp summarize_datasets(nil), do: [] + + defp summarize_datasets(%{"datasets" => datasets}) do + by_type = Enum.group_by(datasets, fn {_, d} -> d["type"] end) + + filesystems = Map.get(by_type, "FILESYSTEM", []) + snapshots_by_ds = group_snapshots(Map.get(by_type, "SNAPSHOT", [])) + + Enum.map(filesystems, fn {_name, ds} -> + name = ds["name"] + snaps = Map.get(snapshots_by_ds, name, []) + + %{ + name: name, + used_bytes: get_prop_int(ds, "used"), + usedbysnapshots_bytes: get_prop_int(ds, "usedbysnapshots"), + snapshot_count: length(snaps), + newest_snapshot_unix: snaps |> Enum.map(& &1.creation) |> max_or_nil(), + oldest_snapshot_unix: snaps |> Enum.map(& &1.creation) |> min_or_nil() + } + end) + end + + defp group_snapshots(snapshots) do + Enum.reduce(snapshots, %{}, fn {_, snap}, acc -> + [parent | _] = String.split(snap["name"], "@", parts: 2) + creation = get_prop_int(snap, "creation") + entry = %{name: snap["name"], creation: creation} + Map.update(acc, parent, [entry], &[entry | &1]) + end) + end + + defp get_prop_int(ds, key) do + case get_in(ds, ["properties", key, "value"]) do + nil -> 0 + v -> to_int(v) + end + end + + defp to_int(v) when is_integer(v), do: v + defp to_int(v) when is_binary(v), do: String.to_integer(v) + + defp max_or_nil([]), do: nil + defp max_or_nil(list), do: Enum.max(list) + defp min_or_nil([]), do: nil + defp min_or_nil(list), do: Enum.min(list) +end +``` + +- [ ] **Step 5: Run tests — expect pass** + +```bash +mix test test/proxmox_agent/collectors/zfs_test.exs 2>&1 | tail -5 +``` + +Expected: 3 tests pass. + +- [ ] **Step 6: Commit** + +```bash +cd /Users/cabele/claudeprojects/proxmox_monitor +git add agent/lib/proxmox_agent/collectors/zfs.ex agent/test/proxmox_agent/collectors/zfs_test.exs agent/test/fixtures/zfs +git commit -m "feat(agent): zfs collector for pools + datasets/snapshots with fixture tests" +``` + +--- + +## Task 7: Agent — Proxmox Storage Collector + +**Files:** +- Create: `agent/test/fixtures/pvesh/storage.json` +- Create: `agent/test/proxmox_agent/collectors/storage_test.exs` +- Create: `agent/lib/proxmox_agent/collectors/storage.ex` + +- [ ] **Step 1: Fixture** + +Create `agent/test/fixtures/pvesh/storage.json`: + +```json +[ + { + "storage": "local", + "type": "dir", + "content": "backup,iso,vztmpl", + "active": 1, + "enabled": 1, + "used": 50000000000, + "total": 500000000000, + "avail": 450000000000, + "used_fraction": 0.1 + }, + { + "storage": "local-zfs", + "type": "zfspool", + "content": "images,rootdir", + "active": 1, + "enabled": 1, + "used": 200000000000, + "total": 500000000000, + "avail": 300000000000, + "used_fraction": 0.4 + }, + { + "storage": "backup-nfs", + "type": "nfs", + "content": "backup", + "active": 0, + "enabled": 1, + "used": 0, + "total": 0, + "avail": 0, + "used_fraction": 0.0 + } +] +``` + +- [ ] **Step 2: Write failing test** + +Create `agent/test/proxmox_agent/collectors/storage_test.exs`: + +```elixir +defmodule ProxmoxAgent.Collectors.StorageTest do + use ExUnit.Case, async: true + + alias ProxmoxAgent.Collectors.Storage + + @fixtures Path.expand("../../fixtures/pvesh", __DIR__) + + defp fake_runner do + fn + "pvesh", ["get", "/nodes/" <> _, "--output-format", "json"] -> + {:ok, File.read!(Path.join(@fixtures, "storage.json"))} + end + end + + test "returns one summary per storage entry" do + sample = Storage.collect(node: "pve-01", runner: fake_runner()) + + assert length(sample.storages) == 3 + local = Enum.find(sample.storages, &(&1.name == "local")) + assert local.type == "dir" + assert local.active == true + assert local.used_bytes == 50_000_000_000 + assert local.total_bytes == 500_000_000_000 + assert local.content == "backup,iso,vztmpl" + + nfs = Enum.find(sample.storages, &(&1.name == "backup-nfs")) + assert nfs.active == false + end + + test "populates errors on failure" do + failing = fn _, _ -> {:error, {:enoent, "pvesh"}} end + sample = Storage.collect(node: "pve-01", runner: failing) + assert sample.storages == [] + assert sample.errors != [] + end +end +``` + +- [ ] **Step 3: Run test — expect failure** + +```bash +mix test test/proxmox_agent/collectors/storage_test.exs 2>&1 | tail -5 +``` + +Expected: module undefined. + +- [ ] **Step 4: Implement** + +Create `agent/lib/proxmox_agent/collectors/storage.ex`: + +```elixir +defmodule ProxmoxAgent.Collectors.Storage do + @moduledoc "Collects Proxmox storage summary via pvesh." + + @spec collect(keyword()) :: %{storages: [map()], errors: [map()]} + def collect(opts \\ []) do + runner = Keyword.get(opts, :runner, &ProxmoxAgent.Shell.run/2) + node = Keyword.fetch!(opts, :node) + + case runner.("pvesh", ["get", "/nodes/#{node}/storage", "--output-format", "json"]) do + {:ok, body} -> + case Jason.decode(body) do + {:ok, list} when is_list(list) -> + %{storages: Enum.map(list, &normalize/1), errors: []} + + {:error, e} -> + %{storages: [], errors: [%{tag: "decode", message: Exception.message(e)}]} + end + + {:error, reason} -> + %{storages: [], errors: [%{tag: "pvesh", message: inspect(reason)}]} + end + end + + defp normalize(entry) do + %{ + name: entry["storage"], + type: entry["type"], + content: entry["content"], + active: entry["active"] == 1, + enabled: entry["enabled"] == 1, + used_bytes: entry["used"] || 0, + total_bytes: entry["total"] || 0, + avail_bytes: entry["avail"] || 0, + used_fraction: entry["used_fraction"] || 0.0 + } + end +end +``` + +- [ ] **Step 5: Run test — expect pass** + +```bash +mix test test/proxmox_agent/collectors/storage_test.exs 2>&1 | tail -5 +``` + +Expected: 2 tests pass. + +- [ ] **Step 6: Commit** + +```bash +cd /Users/cabele/claudeprojects/proxmox_monitor +git add agent/lib/proxmox_agent/collectors/storage.ex agent/test/proxmox_agent/collectors/storage_test.exs agent/test/fixtures/pvesh/storage.json +git commit -m "feat(agent): pvesh storage collector" +``` + +--- + +## Task 8: Agent — VM/LXC Collector + +**Files:** +- Create: `agent/test/fixtures/pvesh/qemu.json` +- Create: `agent/test/fixtures/pvesh/lxc.json` +- Create: `agent/test/fixtures/pvesh/qemu_100_config.json` +- Create: `agent/test/fixtures/pvesh/qemu_100_agent_interfaces.json` +- Create: `agent/test/proxmox_agent/collectors/vms_test.exs` +- Create: `agent/lib/proxmox_agent/collectors/vms.ex` + +- [ ] **Step 1: Fixtures** + +Create `agent/test/fixtures/pvesh/qemu.json`: + +```json +[ + { + "vmid": 100, + "name": "nginx-proxy", + "status": "running", + "uptime": 86400, + "cpu": 0.05, + "mem": 536870912, + "maxmem": 2147483648, + "tags": "web;production" + }, + { + "vmid": 101, + "name": "db-backup", + "status": "stopped", + "uptime": 0, + "cpu": 0, + "mem": 0, + "maxmem": 4294967296, + "tags": "db" + } +] +``` + +Create `agent/test/fixtures/pvesh/lxc.json`: + +```json +[ + { + "vmid": 200, + "name": "minecraft", + "status": "running", + "uptime": 3600, + "cpu": 0.15, + "mem": 2147483648, + "maxmem": 4294967296, + "tags": "" + } +] +``` + +Create `agent/test/fixtures/pvesh/qemu_100_config.json`: + +```json +{ + "name": "nginx-proxy", + "cores": 2, + "memory": 2048, + "onboot": 1, + "scsi0": "local-zfs:vm-100-disk-0,size=32G", + "net0": "virtio=AA:BB:CC:DD:EE:FF,bridge=vmbr0" +} +``` + +Create `agent/test/fixtures/pvesh/qemu_100_agent_interfaces.json`: + +```json +{ + "result": [ + { + "name": "lo", + "ip-addresses": [ + { "ip-address": "127.0.0.1", "ip-address-type": "ipv4" } + ] + }, + { + "name": "eth0", + "ip-addresses": [ + { "ip-address": "192.168.1.10", "ip-address-type": "ipv4" }, + { "ip-address": "fe80::a", "ip-address-type": "ipv6" } + ] + } + ] +} +``` + +- [ ] **Step 2: Write failing tests** + +Create `agent/test/proxmox_agent/collectors/vms_test.exs`: + +```elixir +defmodule ProxmoxAgent.Collectors.VmsTest do + use ExUnit.Case, async: true + + alias ProxmoxAgent.Collectors.Vms + + @fixtures Path.expand("../../fixtures/pvesh", __DIR__) + + defp read!(name), do: File.read!(Path.join(@fixtures, name)) + + defp fake_runner do + fn + "pvesh", ["get", "/nodes/" <> rest, "--output-format", "json"] -> + cond do + String.ends_with?(rest, "/qemu") -> + {:ok, read!("qemu.json")} + + String.ends_with?(rest, "/lxc") -> + {:ok, read!("lxc.json")} + + String.ends_with?(rest, "/qemu/100/config") -> + {:ok, read!("qemu_100_config.json")} + + String.ends_with?(rest, "/qemu/100/agent/network-get-interfaces") -> + {:ok, read!("qemu_100_agent_interfaces.json")} + + String.ends_with?(rest, "/qemu/101/config") -> + {:ok, ~s({"name":"db-backup","cores":4,"memory":4096})} + + String.ends_with?(rest, "/qemu/101/agent/network-get-interfaces") -> + {:error, {:nonzero_exit, 1, "QEMU guest agent is not running"}} + + String.ends_with?(rest, "/lxc/200/config") -> + {:ok, ~s({"hostname":"minecraft","cores":4,"memory":4096,"net0":"name=eth0,ip=10.0.0.5/24"})} + end + end + end + + describe "collect_runtime/1" do + test "returns qemu + lxc runtime info" do + sample = Vms.collect_runtime(node: "pve-01", runner: fake_runner()) + + assert length(sample.vms) == 3 + nginx = Enum.find(sample.vms, &(&1.vmid == 100)) + assert nginx.type == "qemu" + assert nginx.name == "nginx-proxy" + assert nginx.status == "running" + assert nginx.cpu_usage == 0.05 + assert nginx.mem_bytes == 536_870_912 + assert nginx.max_mem_bytes == 2_147_483_648 + assert nginx.tags == ["web", "production"] + + mc = Enum.find(sample.vms, &(&1.vmid == 200)) + assert mc.type == "lxc" + end + end + + describe "collect_detail/1" do + test "returns per-VM config + IPs" do + sample = Vms.collect_detail(node: "pve-01", runner: fake_runner()) + + nginx = Enum.find(sample.vms, &(&1.vmid == 100)) + assert nginx.config["cores"] == 2 + assert nginx.config["memory"] == 2048 + assert "192.168.1.10" in nginx.ips + + db = Enum.find(sample.vms, &(&1.vmid == 101)) + assert db.config["cores"] == 4 + assert db.ips == [] + assert length(db.errors) == 1 + + mc = Enum.find(sample.vms, &(&1.vmid == 200)) + assert mc.config["hostname"] == "minecraft" + assert "10.0.0.5" in mc.ips + end + end +end +``` + +- [ ] **Step 3: Run tests — expect failure** + +```bash +mix test test/proxmox_agent/collectors/vms_test.exs 2>&1 | tail -10 +``` + +Expected: module undefined. + +- [ ] **Step 4: Implement** + +Create `agent/lib/proxmox_agent/collectors/vms.ex`: + +```elixir +defmodule ProxmoxAgent.Collectors.Vms do + @moduledoc """ + Collects VM/LXC runtime (fast path) and per-VM detail incl. IPs (medium path). + """ + + @spec collect_runtime(keyword()) :: %{vms: [map()], errors: [map()]} + def collect_runtime(opts) do + runner = runner(opts) + node = Keyword.fetch!(opts, :node) + + {qemu, e1} = list(runner, node, "qemu") + {lxc, e2} = list(runner, node, "lxc") + + vms = + Enum.map(qemu, &normalize_runtime(&1, "qemu")) ++ + Enum.map(lxc, &normalize_runtime(&1, "lxc")) + + %{vms: vms, errors: Enum.filter([e1, e2], & &1)} + end + + @spec collect_detail(keyword()) :: %{vms: [map()], errors: [map()]} + def collect_detail(opts) do + runner = runner(opts) + node = Keyword.fetch!(opts, :node) + + {qemu, e1} = list(runner, node, "qemu") + {lxc, e2} = list(runner, node, "lxc") + + qemu_details = Enum.map(qemu, &qemu_detail(runner, node, &1)) + lxc_details = Enum.map(lxc, &lxc_detail(runner, node, &1)) + + %{vms: qemu_details ++ lxc_details, errors: Enum.filter([e1, e2], & &1)} + end + + defp runner(opts), do: Keyword.get(opts, :runner, &ProxmoxAgent.Shell.run/2) + + defp list(runner, node, type) do + case runner.("pvesh", ["get", "/nodes/#{node}/#{type}", "--output-format", "json"]) do + {:ok, body} -> + case Jason.decode(body) do + {:ok, list} when is_list(list) -> {list, nil} + {:error, e} -> {[], %{tag: "decode_#{type}", message: Exception.message(e)}} + end + + {:error, reason} -> + {[], %{tag: "list_#{type}", message: inspect(reason)}} + end + end + + defp normalize_runtime(entry, type) do + %{ + vmid: entry["vmid"], + type: type, + name: entry["name"] || entry["hostname"], + status: entry["status"], + uptime_seconds: entry["uptime"] || 0, + cpu_usage: entry["cpu"] || 0.0, + mem_bytes: entry["mem"] || 0, + max_mem_bytes: entry["maxmem"] || 0, + tags: parse_tags(entry["tags"]) + } + end + + defp parse_tags(nil), do: [] + defp parse_tags(""), do: [] + + defp parse_tags(str) when is_binary(str) do + str |> String.split([";", ","], trim: true) |> Enum.map(&String.trim/1) + end + + defp qemu_detail(runner, node, entry) do + vmid = entry["vmid"] + {config, cfg_err} = fetch_json(runner, "/nodes/#{node}/qemu/#{vmid}/config") + {ips, ip_err} = fetch_qemu_agent_ips(runner, node, vmid) + + %{ + vmid: vmid, + type: "qemu", + name: entry["name"], + config: config || %{}, + ips: ips, + errors: Enum.filter([cfg_err, ip_err], & &1) + } + end + + defp lxc_detail(runner, node, entry) do + vmid = entry["vmid"] + {config, cfg_err} = fetch_json(runner, "/nodes/#{node}/lxc/#{vmid}/config") + ips = extract_lxc_ips(config || %{}) + + %{ + vmid: vmid, + type: "lxc", + name: entry["name"], + config: config || %{}, + ips: ips, + errors: Enum.filter([cfg_err], & &1) + } + end + + defp fetch_json(runner, path) do + case runner.("pvesh", ["get", path, "--output-format", "json"]) do + {:ok, body} -> + case Jason.decode(body) do + {:ok, map} -> {map, nil} + {:error, e} -> {nil, %{tag: "decode", message: Exception.message(e)}} + end + + {:error, reason} -> + {nil, %{tag: "pvesh", message: inspect(reason)}} + end + end + + defp fetch_qemu_agent_ips(runner, node, vmid) do + case runner.("pvesh", [ + "get", + "/nodes/#{node}/qemu/#{vmid}/agent/network-get-interfaces", + "--output-format", + "json" + ]) do + {:ok, body} -> + case Jason.decode(body) do + {:ok, %{"result" => interfaces}} -> + ips = + interfaces + |> Enum.reject(&(&1["name"] == "lo")) + |> Enum.flat_map(&Map.get(&1, "ip-addresses", [])) + |> Enum.filter(&(&1["ip-address-type"] == "ipv4")) + |> Enum.map(& &1["ip-address"]) + + {ips, nil} + + _ -> + {[], %{tag: "agent_ips", message: "unexpected shape"}} + end + + {:error, _reason} -> + {[], nil} + end + end + + defp extract_lxc_ips(config) do + config + |> Enum.filter(fn {k, _} -> String.starts_with?(to_string(k), "net") end) + |> Enum.flat_map(fn {_, val} -> parse_lxc_net(val) end) + end + + defp parse_lxc_net(val) when is_binary(val) do + val + |> String.split(",") + |> Enum.find_value([], fn pair -> + case String.split(pair, "=", parts: 2) do + ["ip", ip] -> + ip = ip |> String.split("/") |> hd() + if ip == "dhcp", do: [], else: [ip] + + _ -> + nil + end + end) + end + + defp parse_lxc_net(_), do: [] +end +``` + +- [ ] **Step 5: Run tests — expect pass** + +```bash +mix test test/proxmox_agent/collectors/vms_test.exs 2>&1 | tail -5 +``` + +Expected: 2 tests pass. + +- [ ] **Step 6: Commit** + +```bash +cd /Users/cabele/claudeprojects/proxmox_monitor +git add agent/lib/proxmox_agent/collectors/vms.ex agent/test/proxmox_agent/collectors/vms_test.exs agent/test/fixtures/pvesh +git commit -m "feat(agent): vms/lxc collectors for runtime and detail with fixtures" +``` + +--- + +## Task 9: Agent — System Info Collector + +**Files:** +- Create: `agent/test/fixtures/system/pveversion.txt` +- Create: `agent/test/fixtures/system/zfs_version.txt` +- Create: `agent/test/fixtures/system/apt_upgradable.txt` +- Create: `agent/test/proxmox_agent/collectors/system_info_test.exs` +- Create: `agent/lib/proxmox_agent/collectors/system_info.ex` + +- [ ] **Step 1: Fixtures** + +Create `agent/test/fixtures/system/pveversion.txt`: + +``` +pve-manager/8.3.1/abc123 (running kernel: 6.8.12-1-pve) +``` + +Create `agent/test/fixtures/system/zfs_version.txt`: + +``` +zfs-2.3.0-pve2 +zfs-kmod-2.3.0-pve2 +``` + +Create `agent/test/fixtures/system/apt_upgradable.txt`: + +``` +Listing... +libssl3/stable 3.0.11-1~deb12u2 amd64 [upgradable from: 3.0.11-1~deb12u1] +openssh-server/stable 1:9.2p1-2+deb12u3 amd64 [upgradable from: 1:9.2p1-2+deb12u2] +``` + +- [ ] **Step 2: Write failing test** + +Create `agent/test/proxmox_agent/collectors/system_info_test.exs`: + +```elixir +defmodule ProxmoxAgent.Collectors.SystemInfoTest do + use ExUnit.Case, async: true + + alias ProxmoxAgent.Collectors.SystemInfo + + @fixtures Path.expand("../../fixtures/system", __DIR__) + + defp fake_runner do + fn + "pveversion", [] -> {:ok, File.read!(Path.join(@fixtures, "pveversion.txt"))} + "zfs", ["--version"] -> {:ok, File.read!(Path.join(@fixtures, "zfs_version.txt"))} + "apt", ["list", "--upgradable"] -> {:ok, File.read!(Path.join(@fixtures, "apt_upgradable.txt"))} + end + end + + test "collects pveversion, zfs version and pending upgrade count" do + sample = SystemInfo.collect(runner: fake_runner()) + + assert sample.pve_version =~ "pve-manager/8.3.1" + assert sample.zfs_version =~ "2.3.0" + assert sample.pending_updates == 2 + assert sample.errors == [] + end + + test "partial sample when one command fails" do + partial = fn + "pveversion", [] -> {:ok, "pve-manager/8.3.1/abc (running kernel: 6.8.12-1-pve)\n"} + "zfs", ["--version"] -> {:error, {:enoent, "zfs"}} + "apt", ["list", "--upgradable"] -> {:ok, "Listing...\n"} + end + + sample = SystemInfo.collect(runner: partial) + assert sample.pve_version =~ "8.3.1" + assert sample.zfs_version == nil + assert sample.pending_updates == 0 + assert length(sample.errors) == 1 + end +end +``` + +- [ ] **Step 3: Run test — expect failure** + +```bash +mix test test/proxmox_agent/collectors/system_info_test.exs 2>&1 | tail -5 +``` + +Expected: module undefined. + +- [ ] **Step 4: Implement** + +Create `agent/lib/proxmox_agent/collectors/system_info.ex`: + +```elixir +defmodule ProxmoxAgent.Collectors.SystemInfo do + @moduledoc "Slow-path system metadata: pveversion, zfs version, apt upgradable count." + + @spec collect(keyword()) :: %{ + pve_version: String.t() | nil, + zfs_version: String.t() | nil, + pending_updates: non_neg_integer(), + agent_version: String.t(), + errors: [map()] + } + def collect(opts \\ []) do + runner = Keyword.get(opts, :runner, &ProxmoxAgent.Shell.run/2) + + {pve, e1} = trim_output(runner.("pveversion", []), :pveversion) + {zfs, e2} = trim_output(runner.("zfs", ["--version"]), :zfs_version) + {apt, e3} = runner.("apt", ["list", "--upgradable"]) |> count_upgrades() + + %{ + pve_version: pve, + zfs_version: zfs, + pending_updates: apt, + agent_version: ProxmoxAgent.version(), + errors: Enum.filter([e1, e2, e3], & &1) + } + end + + defp trim_output({:ok, text}, _tag), do: {String.trim(text) |> first_line(), nil} + + defp trim_output({:error, reason}, tag), + do: {nil, %{tag: Atom.to_string(tag), message: inspect(reason)}} + + defp first_line(str), do: str |> String.split("\n", parts: 2) |> hd() + + defp count_upgrades({:ok, text}) do + count = + text + |> String.split("\n", trim: true) + |> Enum.count(&String.contains?(&1, "[upgradable")) + + {count, nil} + end + + defp count_upgrades({:error, reason}), + do: {0, %{tag: "apt_upgradable", message: inspect(reason)}} +end +``` + +- [ ] **Step 5: Run test — expect pass** + +```bash +mix test test/proxmox_agent/collectors/system_info_test.exs 2>&1 | tail -5 +``` + +Expected: 2 tests pass. + +- [ ] **Step 6: Commit** + +```bash +cd /Users/cabele/claudeprojects/proxmox_monitor +git add agent/lib/proxmox_agent/collectors/system_info.ex agent/test/proxmox_agent/collectors/system_info_test.exs agent/test/fixtures/system +git commit -m "feat(agent): system info collector for pveversion/zfs/apt" +``` + +--- + +## Task 10: Agent — Reporter Schedules Medium + Slow + +**Files:** +- Modify: `agent/lib/proxmox_agent/reporter.ex` + +- [ ] **Step 1: Update Reporter to schedule all three intervals and bundle multi-collector payloads** + +Replace the existing `handle_info(:collect_fast, socket)` clause and add medium/slow. Full new module body: + +```elixir +defmodule ProxmoxAgent.Reporter do + @moduledoc """ + Maintains a persistent Phoenix Channel connection to the server, joins + `host:`, and pushes metric samples on fast/medium/slow intervals. + + Payload contract: + metric:fast => %{host, zfs_pools, storage, vms_runtime} + metric:medium => %{zfs_datasets, vms_detail} + metric:slow => %{system_info} + """ + + use Slipstream, restart: :permanent + require Logger + + alias ProxmoxAgent.Collectors.{Host, Storage, SystemInfo, Vms, Zfs} + + def start_link(%ProxmoxAgent.Config{} = cfg) do + Slipstream.start_link(__MODULE__, cfg, name: __MODULE__) + end + + @impl Slipstream + def init(cfg) do + socket = + new_socket() + |> assign(:cfg, cfg) + |> assign(:topic, "host:" <> cfg.host_id) + |> connect!(uri: cfg.server_url) + + {:ok, socket} + end + + @impl Slipstream + def handle_connect(socket) do + topic = socket.assigns.topic + cfg = socket.assigns.cfg + + payload = %{"token" => cfg.token, "agent_version" => ProxmoxAgent.version()} + Logger.info("reporter: connected, joining #{topic}") + {:ok, join(socket, topic, payload)} + end + + @impl Slipstream + def handle_join(topic, _reply, socket) do + Logger.info("reporter: joined #{topic}") + send(self(), :collect_fast) + send(self(), :collect_medium) + send(self(), :collect_slow) + {:ok, socket} + end + + @impl Slipstream + def handle_info(:collect_fast, socket) do + cfg = socket.assigns.cfg + + data = %{ + host: Host.collect(), + zfs_pools: Zfs.collect_pools(), + storage: Storage.collect(node: cfg.host_id), + vms_runtime: Vms.collect_runtime(node: cfg.host_id) + } + + push_metric(socket, "metric:fast", data) + Process.send_after(self(), :collect_fast, cfg.fast_seconds * 1000) + {:noreply, socket} + end + + def handle_info(:collect_medium, socket) do + cfg = socket.assigns.cfg + + data = %{ + zfs_datasets: Zfs.collect_datasets(), + vms_detail: Vms.collect_detail(node: cfg.host_id) + } + + push_metric(socket, "metric:medium", data) + Process.send_after(self(), :collect_medium, cfg.medium_seconds * 1000) + {:noreply, socket} + end + + def handle_info(:collect_slow, socket) do + cfg = socket.assigns.cfg + + data = %{system_info: SystemInfo.collect()} + + push_metric(socket, "metric:slow", data) + Process.send_after(self(), :collect_slow, cfg.slow_seconds * 1000) + {:noreply, socket} + end + + @impl Slipstream + def handle_disconnect(reason, socket) do + Logger.warning("reporter: disconnected — #{inspect(reason)}; reconnecting") + reconnect(socket) + end + + @impl Slipstream + def handle_topic_close(topic, reason, socket) do + Logger.warning("reporter: topic #{topic} closed: #{inspect(reason)}; rejoining") + rejoin(socket, topic) + end + + defp push_metric(socket, event, data) do + payload = %{collected_at: DateTime.utc_now() |> DateTime.to_iso8601(), data: data} + + case push(socket, socket.assigns.topic, event, payload) do + {:ok, _ref} -> + :ok + + {:error, reason} -> + Logger.warning("reporter: push #{event} failed: #{inspect(reason)}") + :ok + end + end +end +``` + +- [ ] **Step 2: Compile and run all agent tests** + +```bash +cd /Users/cabele/claudeprojects/proxmox_monitor/agent +mix compile --warnings-as-errors 2>&1 | tail -5 +mix test 2>&1 | tail -5 +``` + +Expected: compile clean, all tests pass (existing + new collector tests). + +- [ ] **Step 3: Commit** + +```bash +cd /Users/cabele/claudeprojects/proxmox_monitor +git add agent/lib/proxmox_agent/reporter.ex +git commit -m "feat(agent): reporter schedules fast/medium/slow collection with bundled payloads" +``` + +--- + +## Task 11: End-to-End Smoke Test + +**Goal:** Agent runs against local server, all three intervals produce rows in `metrics`, `GET /api/hosts/pve-dev-01` returns the latest samples. + +- [ ] **Step 1: Start server** + +```bash +cd /Users/cabele/claudeprojects/proxmox_monitor/server +mix ecto.migrate +mix phx.server +``` + +Run in a separate terminal/background. Wait for `Running ServerWeb.Endpoint` log line. + +- [ ] **Step 2: Re-use or create a host token** + +Previous Phase 1 smoke test registered `pve-dev-01`. If that host still exists, use its token. If not: + +```bash +mix run -e 'Server.Release.register_host("pve-dev-01")' +``` + +Copy the printed TOKEN. + +- [ ] **Step 3: Write agent config** + +```bash +cat > /tmp/agent-local.toml <= 5 after ~25 seconds of runtime (fast every 5s = ~5 rows, medium every 10s = ~2-3 rows, slow every 20s = ~1-2 rows). + +- [ ] **Step 7: Stop agent, clean up** + +```bash +# Stop agent with Ctrl+C,a (or pkill for automation) +rm /tmp/agent-local.toml +``` + +No code changes; no commit. + +--- + +## Phase 2 Exit Criteria + +- `cd server && mix test` — all green. +- `cd agent && mix test` — all green. +- Smoke test: agent pushes three intervals; DB grows; API returns structured samples. +- Retention GenServer running in server supervision tree. +- All commits on `main`. + +Next up (Phase 3): LiveView dashboard. See roadmap in `proxmox-monitor-konzept.md`. diff --git a/docs/superpowers/plans/2026-04-21-phase3-liveview-dashboard.md b/docs/superpowers/plans/2026-04-21-phase3-liveview-dashboard.md new file mode 100644 index 0000000..d4bf1d3 --- /dev/null +++ b/docs/superpowers/plans/2026-04-21-phase3-liveview-dashboard.md @@ -0,0 +1,1889 @@ +# Phase 3 — LiveView Dashboard + +> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. + +**Goal:** Replace the JSON-only view with a real, password-protected LiveView dashboard: status-ampel overview, per-host detail with ZFS/VM/storage sections, global VM search, and host admin. Samples pushed by agents propagate to open browser sessions in real time via PubSub. + +**Architecture:** Single-user session auth (Argon2 password hash in env var). Status-derivation is a pure function over the latest fast sample. `Server.Metrics.record_sample/4` broadcasts `{:metric_inserted, host_id}` on `Server.PubSub`; LiveViews subscribe and re-query on each event. No new persistence — everything reads from `hosts` and the `metrics` JSON payload. No charts in this phase — current values + recent history as tables (charts can ride in a later pass). + +**Tech Stack:** Phoenix LiveView 1.0, Phoenix.PubSub, Argon2 (`argon2_elixir`), existing Tailwind/core_components from the Phoenix 1.7 scaffold. + +--- + +## File Structure + +``` +server/ +├── mix.exs modify: add argon2_elixir +├── lib/server/auth.ex create (verify_password/1) +├── lib/server/status.ex create (compute_status/1 pure fn) +├── lib/server/hosts.ex modify: list_all/0, delete_host/1, rotate_token/1 +├── lib/server/metrics.ex modify: broadcast after insert +├── lib/server_web/plugs/require_auth.ex create +├── lib/server_web/controllers/auth_controller.ex create +├── lib/server_web/controllers/auth_html/login.html.heex create +├── lib/server_web/controllers/auth_html.ex create +├── lib/server_web/live/overview_live.ex create +├── lib/server_web/live/host_detail_live.ex create +├── lib/server_web/live/vm_search_live.ex create +├── lib/server_web/live/admin_hosts_live.ex create +├── lib/server_web/router.ex modify: auth pipeline + routes +├── test/server/auth_test.exs create +├── test/server/status_test.exs create +├── test/server_web/live/overview_live_test.exs create +├── test/server_web/live/host_detail_live_test.exs create +├── test/server_web/live/vm_search_live_test.exs create +└── test/server_web/live/admin_hosts_live_test.exs create +``` + +**Layering:** +- **`Server.Auth`** — password verification against env-configured Argon2 hash. +- **`Server.Status`** — `compute_status(payload) :: :ok | :warning | :critical` and `:offline` is derived from the host row, not from payload. Pure function, unit-tested. +- **`Server.Hosts`** — list/delete/rotate helpers. +- **`Server.Metrics`** — one-line addition: broadcast on successful insert. +- **`ServerWeb.Plugs.RequireAuth`** — reads session, redirects to `/login` on miss. +- **`ServerWeb.AuthController`** — login form + POST + logout. +- **LiveViews** — one per page, each subscribes to `Server.PubSub` for real-time updates on relevant topics. + +**Status rules (concept lines 218-222):** +- **critical** → ANY pool health ∈ `{DEGRADED, FAULTED, SUSPENDED, UNAVAIL}` OR any pool capacity > 90% OR host offline +- **warning** → any pool capacity 80–90%, any dataset oldest-snapshot > 30 days old, any pending_updates > 0, last scrub > 35 days ago +- **ok** → none of the above +- **offline** → `host.status == "offline"` (takes precedence over all payload-derived states) + +--- + +## Task 1: Auth Dependency + Config + +**Files:** +- Modify: `server/mix.exs` +- Modify: `server/config/runtime.exs` +- Modify: `server/config/test.exs` + +- [ ] **Step 1: Add argon2_elixir** + +In `server/mix.exs`, extend the deps list: + +```elixir + {:bcrypt_elixir, "~> 3.1"}, + {:argon2_elixir, "~> 4.0"} +``` + +- [ ] **Step 2: Fetch and compile** + +```bash +cd /Users/cabele/claudeprojects/proxmox_monitor/server +mix deps.get && mix compile 2>&1 | tail -3 +``` + +Expected: argon2_elixir fetched, NIF builds successfully. + +- [ ] **Step 3: Wire env var into runtime config** + +Open `server/config/runtime.exs`. At the very top (after `import Config` and any existing code), add: + +```elixir +if config_env() == :prod or config_env() == :dev do + hash = + System.get_env("DASHBOARD_PASSWORD_HASH") || + raise """ + DASHBOARD_PASSWORD_HASH not set. + Generate one with: + mix run -e 'IO.puts(Argon2.hash_pwd_salt("your-password"))' + """ + + config :server, :dashboard_password_hash, hash +end +``` + +- [ ] **Step 4: Test-env Argon2 tuning** + +In `server/config/test.exs` add at the bottom: + +```elixir +config :argon2_elixir, t_cost: 1, m_cost: 8 +``` + +This keeps Argon2 fast in tests. The `:dashboard_password_hash` app env key is only read by `Server.Auth.verify_password/1`; the auth test sets a real hash in its own `setup`, and no other test touches auth, so there's no need for a config-file default. + +- [ ] **Step 5: Commit** + +```bash +cd /Users/cabele/claudeprojects/proxmox_monitor +git add server/mix.exs server/mix.lock server/config/runtime.exs server/config/test.exs +git commit -m "feat(server): argon2_elixir dep + dashboard_password_hash config" +``` + +--- + +## Task 2: Server.Auth Module (TDD) + +**Files:** +- Create: `server/lib/server/auth.ex` +- Create: `server/test/server/auth_test.exs` + +- [ ] **Step 1: Failing test** + +Create `server/test/server/auth_test.exs`: + +```elixir +defmodule Server.AuthTest do + use ExUnit.Case, async: true + + alias Server.Auth + + setup do + hash = Argon2.hash_pwd_salt("testpass") + prev = Application.get_env(:server, :dashboard_password_hash) + Application.put_env(:server, :dashboard_password_hash, hash) + on_exit(fn -> Application.put_env(:server, :dashboard_password_hash, prev) end) + :ok + end + + describe "verify_password/1" do + test "returns :ok for correct password" do + assert Auth.verify_password("testpass") == :ok + end + + test "returns :error for wrong password" do + assert Auth.verify_password("wrong") == :error + end + + test "returns :error for non-binary input" do + assert Auth.verify_password(nil) == :error + assert Auth.verify_password(123) == :error + end + end +end +``` + +- [ ] **Step 2: Run — expect failure** + +```bash +cd /Users/cabele/claudeprojects/proxmox_monitor/server +mix test test/server/auth_test.exs 2>&1 | tail -5 +``` + +Expected: `Server.Auth` undefined. + +- [ ] **Step 3: Implement** + +Create `server/lib/server/auth.ex`: + +```elixir +defmodule Server.Auth do + @moduledoc "Single-user dashboard authentication." + + @spec verify_password(term()) :: :ok | :error + def verify_password(password) when is_binary(password) do + hash = Application.fetch_env!(:server, :dashboard_password_hash) + + if Argon2.verify_pass(password, hash) do + :ok + else + :error + end + end + + def verify_password(_), do: :error +end +``` + +- [ ] **Step 4: Run — expect pass** + +```bash +mix test test/server/auth_test.exs 2>&1 | tail -5 +``` + +Expected: 3 tests pass. + +- [ ] **Step 5: Commit** + +```bash +cd /Users/cabele/claudeprojects/proxmox_monitor +git add server/lib/server/auth.ex server/test/server/auth_test.exs +git commit -m "feat(server): Server.Auth.verify_password/1" +``` + +--- + +## Task 3: Server.Status Pure Function (TDD) + +**Files:** +- Create: `server/lib/server/status.ex` +- Create: `server/test/server/status_test.exs` + +- [ ] **Step 1: Failing test** + +Create `server/test/server/status_test.exs`: + +```elixir +defmodule Server.StatusTest do + use ExUnit.Case, async: true + + alias Server.Status + + describe "compute/2" do + test "returns :offline when host status is offline, regardless of payload" do + assert Status.compute("offline", %{"zfs_pools" => %{"pools" => [healthy_pool()]}}) == + :offline + end + + test "returns :ok with all-healthy payload" do + payload = %{ + "zfs_pools" => %{"pools" => [healthy_pool()]}, + "system_info" => %{"pending_updates" => 0} + } + + assert Status.compute("online", payload) == :ok + end + + test "returns :critical for degraded pool" do + payload = %{"zfs_pools" => %{"pools" => [Map.put(healthy_pool(), "health", "DEGRADED")]}} + assert Status.compute("online", payload) == :critical + end + + test "returns :critical for pool capacity > 90" do + payload = %{"zfs_pools" => %{"pools" => [Map.put(healthy_pool(), "capacity_percent", 95)]}} + assert Status.compute("online", payload) == :critical + end + + test "returns :warning for pool capacity 80..90" do + payload = %{"zfs_pools" => %{"pools" => [Map.put(healthy_pool(), "capacity_percent", 85)]}} + assert Status.compute("online", payload) == :warning + end + + test "returns :warning for pending OS updates > 0" do + payload = %{ + "zfs_pools" => %{"pools" => [healthy_pool()]}, + "system_info" => %{"pending_updates" => 3} + } + + assert Status.compute("online", payload) == :warning + end + + test "returns :ok when payload is nil (never-seen host) but host is online" do + assert Status.compute("online", nil) == :ok + end + + test "treats never_connected like offline" do + assert Status.compute("never_connected", nil) == :offline + end + end + + defp healthy_pool do + %{ + "name" => "rpool", + "health" => "ONLINE", + "capacity_percent" => 40 + } + end +end +``` + +- [ ] **Step 2: Run — expect failure** + +```bash +mix test test/server/status_test.exs 2>&1 | tail -5 +``` + +Expected: `Server.Status` undefined. + +- [ ] **Step 3: Implement** + +Create `server/lib/server/status.ex`: + +```elixir +defmodule Server.Status do + @moduledoc """ + Derive a status level for a host from its latest fast sample. + :offline host has no active agent connection + :critical pool DEGRADED/FAULTED or capacity > 90 + :warning capacity 80..90 or pending OS updates + :ok everything nominal + """ + + @bad_pool_states ~w(DEGRADED FAULTED SUSPENDED UNAVAIL) + + @spec compute(String.t(), map() | nil) :: :offline | :critical | :warning | :ok + def compute(host_status, _payload) when host_status in ~w(offline never_connected), + do: :offline + + def compute(_host_status, nil), do: :ok + + def compute(_host_status, %{} = payload) do + pools = get_in(payload, ["zfs_pools", "pools"]) || [] + pending = get_in(payload, ["system_info", "pending_updates"]) || 0 + + cond do + Enum.any?(pools, &critical_pool?/1) -> :critical + Enum.any?(pools, &warning_pool?/1) -> :warning + pending > 0 -> :warning + true -> :ok + end + end + + defp critical_pool?(pool) do + health = pool["health"] + cap = pool["capacity_percent"] || 0 + + health in @bad_pool_states or cap > 90 + end + + defp warning_pool?(pool) do + cap = pool["capacity_percent"] || 0 + cap >= 80 and cap <= 90 + end +end +``` + +- [ ] **Step 4: Run — expect pass** + +```bash +mix test test/server/status_test.exs 2>&1 | tail -5 +``` + +Expected: 8 tests pass. + +- [ ] **Step 5: Commit** + +```bash +cd /Users/cabele/claudeprojects/proxmox_monitor +git add server/lib/server/status.ex server/test/server/status_test.exs +git commit -m "feat(server): pure Status.compute/2 for ok/warning/critical/offline" +``` + +--- + +## Task 4: Hosts Context Extensions + Metrics PubSub + +**Files:** +- Modify: `server/lib/server/hosts.ex` +- Modify: `server/lib/server/metrics.ex` +- Modify: `server/test/server/hosts_test.exs` +- Modify: `server/test/server/metrics_test.exs` + +- [ ] **Step 1: Extend `Server.Hosts` tests** + +Open `server/test/server/hosts_test.exs` and append these test blocks before the final `end`: + +```elixir + describe "list_all/0" do + test "returns every host ordered by name" do + {:ok, {_, _}} = Hosts.create_host("pve-02") + {:ok, {_, _}} = Hosts.create_host("pve-01") + names = Hosts.list_all() |> Enum.map(& &1.name) + assert names == ["pve-01", "pve-02"] + end + end + + describe "delete_host/1" do + test "deletes the host row" do + {:ok, {host, _}} = Hosts.create_host("pve-01") + assert {:ok, _} = Hosts.delete_host(host) + assert Server.Repo.get(Server.Schema.Host, host.id) == nil + end + end + + describe "rotate_token/1" do + test "replaces token_hash and returns new plaintext token" do + {:ok, {host, old_token}} = Hosts.create_host("pve-01") + assert {:ok, {updated, new_token}} = Hosts.rotate_token(host) + assert updated.id == host.id + refute updated.token_hash == host.token_hash + assert is_binary(new_token) + refute new_token == old_token + assert {:error, :invalid_token} = Hosts.authenticate("pve-01", old_token) + assert {:ok, _} = Hosts.authenticate("pve-01", new_token) + end + end +``` + +- [ ] **Step 2: Run — expect failure** + +```bash +mix test test/server/hosts_test.exs 2>&1 | tail -5 +``` + +Expected: undefined functions `list_all/0`, `delete_host/1`, `rotate_token/1`. + +- [ ] **Step 3: Extend `Server.Hosts`** + +Append to `server/lib/server/hosts.ex` before the closing `end`: + +```elixir + @spec list_all() :: [Host.t()] + def list_all do + import Ecto.Query + Repo.all(from h in Host, order_by: [asc: h.name]) + end + + @spec delete_host(Host.t()) :: {:ok, Host.t()} | {:error, Ecto.Changeset.t()} + def delete_host(%Host{} = host), do: Repo.delete(host) + + @spec rotate_token(Host.t()) :: {:ok, {Host.t(), String.t()}} | {:error, Ecto.Changeset.t()} + def rotate_token(%Host{} = host) do + token = :crypto.strong_rand_bytes(32) |> Base.url_encode64(padding: false) + hash = Bcrypt.hash_pwd_salt(token) + + host + |> Ecto.Changeset.change(token_hash: hash) + |> Repo.update() + |> case do + {:ok, updated} -> {:ok, {updated, token}} + {:error, cs} -> {:error, cs} + end + end +``` + +- [ ] **Step 4: Run hosts tests — expect pass** + +```bash +mix test test/server/hosts_test.exs 2>&1 | tail -5 +``` + +Expected: 10 tests pass. + +- [ ] **Step 5: Add PubSub broadcast to Metrics.record_sample** + +Open `server/lib/server/metrics.ex`. Replace the existing `record_sample/4` function (keep everything else) with: + +```elixir + @spec record_sample(integer(), String.t(), DateTime.t(), map()) :: + {:ok, Metric.t()} | {:error, Ecto.Changeset.t()} + def record_sample(host_id, interval_type, collected_at, payload) do + changeset = + Metric.changeset(%Metric{}, %{ + host_id: host_id, + interval_type: interval_type, + collected_at: collected_at, + payload: payload + }) + + with %Ecto.Changeset{valid?: true} = cs <- changeset, + true <- host_exists?(host_id) || {:host_missing, cs}, + {:ok, metric} <- Repo.insert(cs) do + Phoenix.PubSub.broadcast(Server.PubSub, "metrics", {:metric_inserted, host_id, interval_type}) + Phoenix.PubSub.broadcast(Server.PubSub, "metrics:#{host_id}", {:metric_inserted, host_id, interval_type}) + {:ok, metric} + else + %Ecto.Changeset{} = cs -> {:error, cs} + {:host_missing, cs} -> {:error, Ecto.Changeset.add_error(cs, :host, "does not exist")} + {:error, %Ecto.Changeset{} = cs} -> {:error, cs} + end + end +``` + +- [ ] **Step 6: Add a PubSub assertion to metrics tests** + +In `server/test/server/metrics_test.exs` append within the `describe "record_sample/4" do` block (before its closing `end`): + +```elixir + test "broadcasts {:metric_inserted, host_id, interval} on success", %{host: host} do + Phoenix.PubSub.subscribe(Server.PubSub, "metrics") + ts = DateTime.utc_now() + {:ok, _} = Metrics.record_sample(host.id, "fast", ts, %{"v" => 1}) + assert_receive {:metric_inserted, host_id, "fast"}, 500 + assert host_id == host.id + end +``` + +- [ ] **Step 7: Run tests — expect pass** + +```bash +mix test 2>&1 | tail -4 +``` + +Expected: all green (previous tests + 1 new hosts + 1 new metrics). + +- [ ] **Step 8: Commit** + +```bash +cd /Users/cabele/claudeprojects/proxmox_monitor +git add server/lib/server/hosts.ex server/lib/server/metrics.ex server/test/server/hosts_test.exs server/test/server/metrics_test.exs +git commit -m "feat(server): hosts list/delete/rotate helpers + pubsub on metric insert" +``` + +--- + +## Task 5: Auth Plug + Session Controller + +**Files:** +- Create: `server/lib/server_web/plugs/require_auth.ex` +- Create: `server/lib/server_web/controllers/auth_controller.ex` +- Create: `server/lib/server_web/controllers/auth_html.ex` +- Create: `server/lib/server_web/controllers/auth_html/login.html.heex` + +- [ ] **Step 1: Require-auth plug** + +Create `server/lib/server_web/plugs/require_auth.ex`: + +```elixir +defmodule ServerWeb.Plugs.RequireAuth do + @moduledoc "Redirects to /login unless the session is authenticated." + + import Plug.Conn + import Phoenix.Controller + + def init(opts), do: opts + + def call(conn, _opts) do + if get_session(conn, :authenticated) do + conn + else + conn + |> put_flash(:error, "Please sign in.") + |> redirect(to: "/login") + |> halt() + end + end +end +``` + +- [ ] **Step 2: AuthController** + +Create `server/lib/server_web/controllers/auth_controller.ex`: + +```elixir +defmodule ServerWeb.AuthController do + use ServerWeb, :controller + + def login(conn, _params) do + render(conn, :login, error: nil, layout: false) + end + + def create(conn, %{"password" => password}) do + case Server.Auth.verify_password(password) do + :ok -> + conn + |> configure_session(renew: true) + |> put_session(:authenticated, true) + |> redirect(to: "/") + + :error -> + conn + |> put_status(:unauthorized) + |> render(:login, error: "Incorrect password.", layout: false) + end + end + + def delete(conn, _params) do + conn + |> configure_session(drop: true) + |> redirect(to: "/login") + end +end +``` + +- [ ] **Step 3: Auth HTML module (empty — uses embed_templates)** + +Create `server/lib/server_web/controllers/auth_html.ex`: + +```elixir +defmodule ServerWeb.AuthHTML do + use ServerWeb, :html + + embed_templates "auth_html/*" +end +``` + +- [ ] **Step 4: Login template** + +Create `server/lib/server_web/controllers/auth_html/login.html.heex`: + +```heex + + + + + + + Sign in · Proxmox Monitor + + + +
+
+

Proxmox Monitor

+ + <%= if @error do %> +

{@error}

+ <% end %> + +
+ + + + + +
+
+
+ + +``` + +- [ ] **Step 5: Compile to verify no syntax errors** + +```bash +mix compile 2>&1 | tail -5 +``` + +Expected: compiles, no warnings. + +- [ ] **Step 6: Commit** + +```bash +cd /Users/cabele/claudeprojects/proxmox_monitor +git add server/lib/server_web/plugs server/lib/server_web/controllers/auth_controller.ex server/lib/server_web/controllers/auth_html.ex server/lib/server_web/controllers/auth_html +git commit -m "feat(server): session-based auth plug + login controller/template" +``` + +--- + +## Task 6: Router — Auth Pipeline + Login/Logout Routes + +**Files:** +- Modify: `server/lib/server_web/router.ex` + +- [ ] **Step 1: Introduce auth pipeline and wire routes** + +Replace the contents of `server/lib/server_web/router.ex` with: + +```elixir +defmodule ServerWeb.Router do + use ServerWeb, :router + + pipeline :browser do + plug :accepts, ["html"] + plug :fetch_session + plug :fetch_live_flash + plug :put_root_layout, html: {ServerWeb.Layouts, :root} + plug :protect_from_forgery + plug :put_secure_browser_headers + end + + pipeline :require_auth do + plug ServerWeb.Plugs.RequireAuth + end + + pipeline :api do + plug :accepts, ["json"] + end + + # Public login/logout + scope "/", ServerWeb do + pipe_through :browser + + get "/login", AuthController, :login + post "/login", AuthController, :create + delete "/logout", AuthController, :delete + end + + # Authenticated dashboard (LiveView) + scope "/", ServerWeb do + pipe_through [:browser, :require_auth] + + live_session :authenticated, on_mount: {ServerWeb.LiveAuth, :require_authenticated} do + live "/", OverviewLive, :index + live "/hosts/:name", HostDetailLive, :show + live "/vms", VmSearchLive, :index + live "/admin/hosts", AdminHostsLive, :index + end + end + + scope "/api", ServerWeb do + pipe_through :api + + get "/hosts/:name", HostController, :show + end + + if Application.compile_env(:server, :dev_routes) do + import Phoenix.LiveDashboard.Router + + scope "/dev" do + pipe_through :browser + + live_dashboard "/dashboard", metrics: ServerWeb.Telemetry + end + end +end +``` + +- [ ] **Step 2: Create `ServerWeb.LiveAuth` — the `on_mount` hook LiveViews use to enforce auth** + +Create `server/lib/server_web/live_auth.ex`: + +```elixir +defmodule ServerWeb.LiveAuth do + @moduledoc "on_mount hook for LiveView sessions requiring authentication." + + import Phoenix.LiveView + import Phoenix.Component, only: [assign: 3] + + def on_mount(:require_authenticated, _params, session, socket) do + if session["authenticated"] do + {:cont, assign(socket, :authenticated, true)} + else + {:halt, redirect(socket, to: "/login")} + end + end +end +``` + +- [ ] **Step 3: Replace the default root page** + +Phoenix 1.7 scaffold provides `PageController.home/2`. Our router above replaces `/` with `OverviewLive`, which doesn't exist yet. Compile should still succeed — Phoenix only checks live route modules at request time. Verify: + +```bash +mix compile 2>&1 | tail -5 +``` + +Expected: clean compile (warnings about unused `ServerWeb.PageController` are OK — we'll leave the file alone for now). + +- [ ] **Step 4: Commit** + +```bash +cd /Users/cabele/claudeprojects/proxmox_monitor +git add server/lib/server_web/router.ex server/lib/server_web/live_auth.ex +git commit -m "feat(server): router pipelines + live_auth hook for authenticated dashboard" +``` + +--- + +## Task 7: Overview LiveView + +**Files:** +- Create: `server/lib/server_web/live/overview_live.ex` +- Create: `server/test/server_web/live/overview_live_test.exs` + +- [ ] **Step 1: Tests** + +Create `server/test/server_web/live/overview_live_test.exs`: + +```elixir +defmodule ServerWeb.OverviewLiveTest do + use ServerWeb.ConnCase, async: false + + import Phoenix.LiveViewTest + alias Server.{Hosts, Metrics} + + defp auth(conn), do: Plug.Test.init_test_session(conn, %{authenticated: true}) + + describe "mount" do + test "redirects to /login when unauthenticated", %{conn: conn} do + assert {:error, {:redirect, %{to: "/login"}}} = live(conn, "/") + end + + test "renders a card for each host", %{conn: conn} do + {:ok, {h1, _}} = Hosts.create_host("pve-01") + {:ok, {_h2, _}} = Hosts.create_host("pve-02") + + {:ok, _view, html} = live(auth(conn), "/") + + assert html =~ "pve-01" + assert html =~ "pve-02" + # at least two cards visible + assert length(Floki.find(Floki.parse_document!(html), "[data-role=host-card]")) == 2 + _ = h1 + end + + test "reflects :critical status for a degraded pool", %{conn: conn} do + {:ok, {host, _}} = Hosts.create_host("pve-01") + {:ok, _} = Hosts.mark_online(host, "0.1.0") + + payload = %{ + "zfs_pools" => %{ + "pools" => [%{"name" => "rpool", "health" => "DEGRADED", "capacity_percent" => 40}] + } + } + + {:ok, _} = Metrics.record_sample(host.id, "fast", DateTime.utc_now(), payload) + + {:ok, _view, html} = live(auth(conn), "/") + + assert html =~ ~r/data-status=\"critical\"/ + end + end + + describe "pubsub" do + test "updates the card when a new metric arrives", %{conn: conn} do + {:ok, {host, _}} = Hosts.create_host("pve-01") + {:ok, _} = Hosts.mark_online(host, "0.1.0") + + {:ok, view, _html} = live(auth(conn), "/") + assert render(view) =~ ~r/data-status=\"ok\"/ + + payload = %{ + "zfs_pools" => %{ + "pools" => [%{"name" => "rpool", "health" => "DEGRADED", "capacity_percent" => 40}] + } + } + + {:ok, _} = Metrics.record_sample(host.id, "fast", DateTime.utc_now(), payload) + + # Allow the PubSub message to round-trip + Process.sleep(50) + assert render(view) =~ ~r/data-status=\"critical\"/ + end + end +end +``` + +- [ ] **Step 2: Run — expect failure (`OverviewLive` undefined)** + +```bash +mix test test/server_web/live/overview_live_test.exs 2>&1 | tail -5 +``` + +- [ ] **Step 3: Implement OverviewLive** + +Create `server/lib/server_web/live/overview_live.ex`: + +```elixir +defmodule ServerWeb.OverviewLive do + use ServerWeb, :live_view + + alias Server.{Hosts, Metrics, Status} + + @impl true + def mount(_params, _session, socket) do + if connected?(socket), do: Phoenix.PubSub.subscribe(Server.PubSub, "metrics") + {:ok, assign(socket, :hosts, load_hosts())} + end + + @impl true + def handle_info({:metric_inserted, _host_id, _interval}, socket) do + {:noreply, assign(socket, :hosts, load_hosts())} + end + + defp load_hosts do + for host <- Hosts.list_all() do + sample = Metrics.latest_sample(host.id, "fast") + payload = sample && sample.payload + %{host: host, sample: sample, status: Status.compute(host.status, payload)} + end + end + + @impl true + def render(assigns) do + ~H""" +
+
+
+

Proxmox Monitor

+ <.link navigate={~p"/vms"} class="text-sm text-zinc-600 hover:text-zinc-900">VMs + <.link navigate={~p"/admin/hosts"} class="text-sm text-zinc-600 hover:text-zinc-900">Admin +
+ <.link href={~p"/logout"} method="delete" class="text-sm text-zinc-500">Sign out +
+ +
+
border_class(entry.status)} + > + <.link navigate={~p"/hosts/#{entry.host.name}"} class="block space-y-2"> +
+ {entry.host.name} + text_class(entry.status)}> + {entry.status} + +
+
+ Last seen: {last_seen(entry.host.last_seen_at)} +
+
+
Load: {format_load(entry.sample.payload)}
+
RAM used: {format_mem(entry.sample.payload)}
+
Pools: {pool_summary(entry.sample.payload)}
+
VMs: {vm_count(entry.sample.payload)}
+
+
+ No samples yet +
+ +
+
+ +

+ No hosts registered yet. Add one via /admin/hosts. +

+
+ """ + end + + defp border_class(:ok), do: "border-green-500" + defp border_class(:warning), do: "border-yellow-500" + defp border_class(:critical), do: "border-red-500" + defp border_class(:offline), do: "border-zinc-400" + + defp text_class(:ok), do: "text-green-600" + defp text_class(:warning), do: "text-yellow-600" + defp text_class(:critical), do: "text-red-600" + defp text_class(:offline), do: "text-zinc-500" + + defp last_seen(nil), do: "never" + + defp last_seen(%DateTime{} = dt) do + secs = DateTime.diff(DateTime.utc_now(), dt, :second) + + cond do + secs < 60 -> "#{secs}s ago" + secs < 3600 -> "#{div(secs, 60)}m ago" + true -> "#{div(secs, 3600)}h ago" + end + end + + defp format_load(payload) do + case get_in(payload, ["host", "load1"]) do + nil -> "—" + l -> :io_lib.format("~.2f", [l]) |> to_string() + end + end + + defp format_mem(payload) do + used = get_in(payload, ["host", "mem_used_bytes"]) + total = get_in(payload, ["host", "mem_total_bytes"]) + + case {used, total} do + {u, t} when is_integer(u) and is_integer(t) and t > 0 -> + "#{Float.round(u / t * 100, 1)}%" + + _ -> + "—" + end + end + + defp pool_summary(payload) do + pools = get_in(payload, ["zfs_pools", "pools"]) || [] + total = length(pools) + bad = Enum.count(pools, &(&1["health"] != "ONLINE")) + if total == 0, do: "—", else: "#{total - bad}/#{total} ok" + end + + defp vm_count(payload) do + vms = get_in(payload, ["vms_runtime", "vms"]) || [] + length(vms) + end +end +``` + +- [ ] **Step 4: Run — expect pass** + +```bash +mix test test/server_web/live/overview_live_test.exs 2>&1 | tail -5 +``` + +Expected: 4 tests pass. + +- [ ] **Step 5: Commit** + +```bash +cd /Users/cabele/claudeprojects/proxmox_monitor +git add server/lib/server_web/live/overview_live.ex server/test/server_web/live/overview_live_test.exs +git commit -m "feat(server): overview LiveView with status ampel + pubsub updates" +``` + +--- + +## Task 8: Host Detail LiveView + +**Files:** +- Create: `server/lib/server_web/live/host_detail_live.ex` +- Create: `server/test/server_web/live/host_detail_live_test.exs` + +- [ ] **Step 1: Tests** + +Create `server/test/server_web/live/host_detail_live_test.exs`: + +```elixir +defmodule ServerWeb.HostDetailLiveTest do + use ServerWeb.ConnCase, async: false + + import Phoenix.LiveViewTest + alias Server.{Hosts, Metrics} + + defp auth(conn), do: Plug.Test.init_test_session(conn, %{authenticated: true}) + + setup do + {:ok, {host, _}} = Hosts.create_host("pve-01") + {:ok, _} = Hosts.mark_online(host, "0.1.0") + + fast = %{ + "host" => %{"load1" => 0.25, "load5" => 0.3, "load15" => 0.4}, + "zfs_pools" => %{ + "pools" => [ + %{ + "name" => "rpool", + "health" => "ONLINE", + "capacity_percent" => 40, + "error_count" => 0, + "last_scrub_end" => "Sat Apr 19 02:00:00 2026" + } + ] + }, + "storage" => %{ + "storages" => [ + %{"name" => "local", "type" => "dir", "used_bytes" => 10, "total_bytes" => 100} + ] + }, + "vms_runtime" => %{ + "vms" => [%{"vmid" => 100, "name" => "nginx", "type" => "qemu", "status" => "running"}] + } + } + + medium = %{ + "zfs_datasets" => %{ + "datasets" => [ + %{ + "name" => "rpool/data", + "snapshot_count" => 2, + "newest_snapshot_unix" => 1_745_193_600, + "oldest_snapshot_unix" => 1_745_107_200 + } + ] + }, + "vms_detail" => %{"vms" => []} + } + + slow = %{ + "system_info" => %{ + "pve_version" => "pve-manager/8.3.1", + "zfs_version" => "zfs-2.3.0", + "pending_updates" => 0 + } + } + + {:ok, _} = Metrics.record_sample(host.id, "fast", DateTime.utc_now(), fast) + {:ok, _} = Metrics.record_sample(host.id, "medium", DateTime.utc_now(), medium) + {:ok, _} = Metrics.record_sample(host.id, "slow", DateTime.utc_now(), slow) + + %{host: host} + end + + test "renders sections for metrics, pools, snapshots, storage, VMs", %{ + conn: conn, + host: host + } do + {:ok, _view, html} = live(auth(conn), ~p"/hosts/#{host.name}") + + assert html =~ "pve-01" + assert html =~ "pve-manager/8.3.1" + assert html =~ "rpool" + assert html =~ "ONLINE" + assert html =~ "nginx" + assert html =~ "rpool/data" + assert html =~ "local" + end + + test "404 for unknown host", %{conn: conn} do + assert {:error, {:live_redirect, %{to: "/"}}} = + live(auth(conn), ~p"/hosts/unknown") + end +end +``` + +- [ ] **Step 2: Run — expect failure** + +```bash +mix test test/server_web/live/host_detail_live_test.exs 2>&1 | tail -5 +``` + +- [ ] **Step 3: Implement** + +Create `server/lib/server_web/live/host_detail_live.ex`: + +```elixir +defmodule ServerWeb.HostDetailLive do + use ServerWeb, :live_view + + alias Server.{Metrics, Repo, Schema.Host} + + @impl true + def mount(%{"name" => name}, _session, socket) do + case Repo.get_by(Host, name: name) do + nil -> + {:ok, + socket + |> put_flash(:error, "Host not found") + |> push_navigate(to: ~p"/")} + + %Host{} = host -> + if connected?(socket), + do: Phoenix.PubSub.subscribe(Server.PubSub, "metrics:#{host.id}") + + {:ok, socket |> assign(:host, host) |> load_samples()} + end + end + + @impl true + def handle_info({:metric_inserted, _host_id, _interval}, socket) do + {:noreply, load_samples(socket)} + end + + defp load_samples(socket) do + host_id = socket.assigns.host.id + + assign(socket, + fast: Metrics.latest_sample(host_id, "fast"), + medium: Metrics.latest_sample(host_id, "medium"), + slow: Metrics.latest_sample(host_id, "slow") + ) + end + + @impl true + def render(assigns) do + ~H""" +
+
+
+ <.link navigate={~p"/"} class="text-sm text-zinc-500 hover:text-zinc-900">← Back +

{@host.name}

+

+ {sys_line(@slow)} · Uptime {uptime(@fast)} · Last seen {last_seen(@host.last_seen_at)} +

+
+ status_class(@host.status)}> + {@host.status} + +
+ +
+

Host metrics

+ <.metric_row label="Load (1/5/15)" value={host_load(@fast)} /> + <.metric_row label="Memory" value={host_mem(@fast)} /> +
+ +
+

ZFS pools

+

No data.

+
+
+ {pool["name"]} + {pool["health"]} +
+
+ Capacity {pool["capacity_percent"]}% · Fragmentation {pool["fragmentation_percent"] || 0}% · Errors {pool[ + "error_count" + ] || 0} · vdevs {pool["vdev_count"] || 0} (degraded {pool["degraded_vdev_count"] || 0}) · Last scrub {pool[ + "last_scrub_end" + ] || "never"} +
+
+
+ +
+

Snapshots

+ + + + + + + + + + + + + + + + + +
DatasetCountOldestNewest
{ds["name"]}{ds["snapshot_count"]}{unix_to_date(ds["oldest_snapshot_unix"])}{unix_to_date(ds["newest_snapshot_unix"])}
+

No data.

+
+ +
+

Storage

+ + + + + + + + + + + + + + + +
NameTypeUsage
{s["name"]}{s["type"]}{storage_usage(s)}
+

No data.

+
+ +
+

VMs / LXCs

+ + + + + + + + + + + + + + + + + +
VMIDNameTypeStatus
{vm["vmid"]}{vm["name"]}{vm["type"]}{vm["status"]}
+

No data.

+
+
+ """ + end + + attr :label, :string, required: true + attr :value, :string, required: true + + def metric_row(assigns) do + ~H""" +
+ {@label} + {@value} +
+ """ + end + + defp status_class("online"), do: "text-green-600" + defp status_class("offline"), do: "text-zinc-500" + defp status_class(_), do: "text-zinc-500" + + defp pool_class("ONLINE"), do: "text-green-600 font-mono" + defp pool_class(_), do: "text-red-600 font-mono" + + defp sys_line(nil), do: "—" + + defp sys_line(%{payload: p}) do + get_in(p, ["system_info", "pve_version"]) || "—" + end + + defp uptime(nil), do: "—" + + defp uptime(%{payload: p}) do + case get_in(p, ["host", "uptime_seconds"]) do + nil -> "—" + s when is_integer(s) -> "#{div(s, 86_400)}d" + _ -> "—" + end + end + + defp last_seen(nil), do: "never" + + defp last_seen(%DateTime{} = dt) do + secs = DateTime.diff(DateTime.utc_now(), dt, :second) + + cond do + secs < 60 -> "#{secs}s ago" + secs < 3600 -> "#{div(secs, 60)}m ago" + true -> "#{div(secs, 3600)}h ago" + end + end + + defp host_load(nil), do: "—" + + defp host_load(%{payload: p}) do + "#{p |> get_in(["host", "load1"]) || "—"} / #{p |> get_in(["host", "load5"]) || "—"} / #{p |> get_in(["host", "load15"]) || "—"}" + end + + defp host_mem(nil), do: "—" + + defp host_mem(%{payload: p}) do + used = get_in(p, ["host", "mem_used_bytes"]) + total = get_in(p, ["host", "mem_total_bytes"]) + + case {used, total} do + {u, t} when is_integer(u) and is_integer(t) and t > 0 -> + "#{Float.round(u / t * 100, 1)}% (#{format_bytes(u)} / #{format_bytes(t)})" + + _ -> + "—" + end + end + + defp pools(nil), do: [] + defp pools(%{payload: p}), do: get_in(p, ["zfs_pools", "pools"]) || [] + + defp datasets(nil), do: [] + defp datasets(%{payload: p}), do: get_in(p, ["zfs_datasets", "datasets"]) || [] + + defp storages(nil), do: [] + defp storages(%{payload: p}), do: get_in(p, ["storage", "storages"]) || [] + + defp vms(nil), do: [] + defp vms(%{payload: p}), do: get_in(p, ["vms_runtime", "vms"]) || [] + + defp storage_usage(%{"used_bytes" => u, "total_bytes" => t}) when is_integer(u) and is_integer(t) and t > 0 do + "#{Float.round(u / t * 100, 1)}% (#{format_bytes(u)} / #{format_bytes(t)})" + end + + defp storage_usage(_), do: "—" + + defp unix_to_date(nil), do: "—" + + defp unix_to_date(unix) when is_integer(unix) do + case DateTime.from_unix(unix) do + {:ok, dt} -> Calendar.strftime(dt, "%Y-%m-%d") + _ -> "—" + end + end + + defp format_bytes(n) when is_integer(n) do + units = ["B", "KB", "MB", "GB", "TB"] + + {val, unit} = + Enum.reduce_while(units, {n * 1.0, "B"}, fn u, {v, _} -> + if v < 1024, do: {:halt, {v, u}}, else: {:cont, {v / 1024, u}} + end) + + "#{Float.round(val, 1)} #{unit}" + end +end +``` + +- [ ] **Step 4: Run — expect pass** + +```bash +mix test test/server_web/live/host_detail_live_test.exs 2>&1 | tail -5 +``` + +Expected: 2 tests pass. + +- [ ] **Step 5: Commit** + +```bash +cd /Users/cabele/claudeprojects/proxmox_monitor +git add server/lib/server_web/live/host_detail_live.ex server/test/server_web/live/host_detail_live_test.exs +git commit -m "feat(server): host detail LiveView with metrics/pools/snapshots/storage/vms" +``` + +--- + +## Task 9: VM Search LiveView + +**Files:** +- Create: `server/lib/server_web/live/vm_search_live.ex` +- Create: `server/test/server_web/live/vm_search_live_test.exs` + +- [ ] **Step 1: Tests** + +Create `server/test/server_web/live/vm_search_live_test.exs`: + +```elixir +defmodule ServerWeb.VmSearchLiveTest do + use ServerWeb.ConnCase, async: false + + import Phoenix.LiveViewTest + alias Server.{Hosts, Metrics} + + defp auth(conn), do: Plug.Test.init_test_session(conn, %{authenticated: true}) + + setup do + {:ok, {h1, _}} = Hosts.create_host("pve-01") + {:ok, {h2, _}} = Hosts.create_host("pve-02") + + fast1 = %{ + "vms_runtime" => %{ + "vms" => [ + %{"vmid" => 100, "name" => "nginx-proxy", "type" => "qemu", "status" => "running"} + ] + } + } + + fast2 = %{ + "vms_runtime" => %{ + "vms" => [ + %{"vmid" => 200, "name" => "db-primary", "type" => "qemu", "status" => "running"} + ] + } + } + + medium1 = %{ + "vms_detail" => %{ + "vms" => [%{"vmid" => 100, "name" => "nginx-proxy", "ips" => ["192.168.1.10"]}] + } + } + + {:ok, _} = Metrics.record_sample(h1.id, "fast", DateTime.utc_now(), fast1) + {:ok, _} = Metrics.record_sample(h2.id, "fast", DateTime.utc_now(), fast2) + {:ok, _} = Metrics.record_sample(h1.id, "medium", DateTime.utc_now(), medium1) + + :ok + end + + test "lists all VMs from all hosts by default", %{conn: conn} do + {:ok, _view, html} = live(auth(conn), "/vms") + assert html =~ "nginx-proxy" + assert html =~ "db-primary" + end + + test "filters by name substring", %{conn: conn} do + {:ok, view, _html} = live(auth(conn), "/vms") + + html = + view + |> form("form", q: "nginx") + |> render_change() + + assert html =~ "nginx-proxy" + refute html =~ "db-primary" + end + + test "filters by IP substring (matches detail payload)", %{conn: conn} do + {:ok, view, _html} = live(auth(conn), "/vms") + + html = + view + |> form("form", q: "192.168.1") + |> render_change() + + assert html =~ "nginx-proxy" + refute html =~ "db-primary" + end +end +``` + +- [ ] **Step 2: Run — expect failure** + +```bash +mix test test/server_web/live/vm_search_live_test.exs 2>&1 | tail -5 +``` + +- [ ] **Step 3: Implement** + +Create `server/lib/server_web/live/vm_search_live.ex`: + +```elixir +defmodule ServerWeb.VmSearchLive do + use ServerWeb, :live_view + + alias Server.{Hosts, Metrics} + + @impl true + def mount(_params, _session, socket) do + if connected?(socket), do: Phoenix.PubSub.subscribe(Server.PubSub, "metrics") + {:ok, socket |> assign(:q, "") |> assign(:vms, load_vms())} + end + + @impl true + def handle_info({:metric_inserted, _, _}, socket) do + {:noreply, assign(socket, :vms, load_vms())} + end + + @impl true + def handle_event("search", %{"q" => q}, socket) do + {:noreply, assign(socket, :q, q)} + end + + defp load_vms do + for host <- Hosts.list_all(), + runtime_sample = Metrics.latest_sample(host.id, "fast"), + detail_sample = Metrics.latest_sample(host.id, "medium"), + vm <- get_in(runtime_sample && runtime_sample.payload, ["vms_runtime", "vms"]) || [], + into: [] do + detail_vms = get_in(detail_sample && detail_sample.payload, ["vms_detail", "vms"]) || [] + detail_vm = Enum.find(detail_vms, &(&1["vmid"] == vm["vmid"])) || %{} + ips = detail_vm["ips"] || [] + + %{ + vmid: vm["vmid"], + name: vm["name"], + type: vm["type"], + status: vm["status"], + host_name: host.name, + ips: ips + } + end + end + + defp filter(vms, ""), do: vms + + defp filter(vms, q) do + q = String.downcase(q) + + Enum.filter(vms, fn vm -> + String.contains?(String.downcase(vm.name || ""), q) or + Enum.any?(vm.ips, &String.contains?(&1, q)) + end) + end + + @impl true + def render(assigns) do + ~H""" +
+ <.link navigate={~p"/"} class="text-sm text-zinc-500 hover:text-zinc-900">← Back +

VM Search

+ +
+ +
+ + + + + + + + + + + + + + + + + + + + + + + +
NameHostTypeStatusIPs
{vm.name} + <.link navigate={~p"/hosts/#{vm.host_name}"} class="text-zinc-700 hover:text-zinc-900 underline"> + {vm.host_name} + + {vm.type}{vm.status}{Enum.join(vm.ips, ", ")}
No matches.
+
+ """ + end +end +``` + +- [ ] **Step 4: Run — expect pass** + +```bash +mix test test/server_web/live/vm_search_live_test.exs 2>&1 | tail -5 +``` + +Expected: 3 tests pass. + +- [ ] **Step 5: Commit** + +```bash +cd /Users/cabele/claudeprojects/proxmox_monitor +git add server/lib/server_web/live/vm_search_live.ex server/test/server_web/live/vm_search_live_test.exs +git commit -m "feat(server): vm search LiveView with name+IP filtering" +``` + +--- + +## Task 10: Admin Hosts LiveView + +**Files:** +- Create: `server/lib/server_web/live/admin_hosts_live.ex` +- Create: `server/test/server_web/live/admin_hosts_live_test.exs` + +- [ ] **Step 1: Tests** + +Create `server/test/server_web/live/admin_hosts_live_test.exs`: + +```elixir +defmodule ServerWeb.AdminHostsLiveTest do + use ServerWeb.ConnCase, async: false + + import Phoenix.LiveViewTest + alias Server.Hosts + + defp auth(conn), do: Plug.Test.init_test_session(conn, %{authenticated: true}) + + test "lists hosts", %{conn: conn} do + {:ok, {_, _}} = Hosts.create_host("pve-01") + {:ok, _view, html} = live(auth(conn), "/admin/hosts") + assert html =~ "pve-01" + end + + test "creates a new host and reveals the token", %{conn: conn} do + {:ok, view, _html} = live(auth(conn), "/admin/hosts") + + html = + view + |> form("form[phx-submit=create]", host: %{name: "pve-new"}) + |> render_submit() + + assert html =~ "pve-new" + assert html =~ ~r/[A-Za-z0-9_\-]{40,}/ + end + + test "revokes token", %{conn: conn} do + {:ok, {host, _}} = Hosts.create_host("pve-01") + original_hash = host.token_hash + + {:ok, view, _html} = live(auth(conn), "/admin/hosts") + + _html = render_click(view, "rotate", %{"id" => to_string(host.id)}) + + reloaded = Server.Repo.reload!(host) + refute reloaded.token_hash == original_hash + end + + test "deletes a host", %{conn: conn} do + {:ok, {host, _}} = Hosts.create_host("pve-gone") + {:ok, view, _html} = live(auth(conn), "/admin/hosts") + + html = render_click(view, "delete", %{"id" => to_string(host.id)}) + refute html =~ "pve-gone" + end +end +``` + +- [ ] **Step 2: Run — expect failure** + +```bash +mix test test/server_web/live/admin_hosts_live_test.exs 2>&1 | tail -5 +``` + +- [ ] **Step 3: Implement** + +Create `server/lib/server_web/live/admin_hosts_live.ex`: + +```elixir +defmodule ServerWeb.AdminHostsLive do + use ServerWeb, :live_view + + alias Server.{Hosts, Repo, Schema.Host} + + @impl true + def mount(_params, _session, socket) do + {:ok, + socket + |> assign(:hosts, Hosts.list_all()) + |> assign(:new_token, nil) + |> assign(:error, nil)} + end + + @impl true + def handle_event("create", %{"host" => %{"name" => name}}, socket) do + case Hosts.create_host(name) do + {:ok, {host, token}} -> + {:noreply, + socket + |> assign(:hosts, Hosts.list_all()) + |> assign(:new_token, %{name: host.name, token: token}) + |> assign(:error, nil)} + + {:error, cs} -> + {:noreply, assign(socket, :error, changeset_message(cs))} + end + end + + def handle_event("rotate", %{"id" => id}, socket) do + %Host{} = host = Repo.get!(Host, id) + {:ok, {_, token}} = Hosts.rotate_token(host) + + {:noreply, + socket + |> assign(:hosts, Hosts.list_all()) + |> assign(:new_token, %{name: host.name, token: token})} + end + + def handle_event("delete", %{"id" => id}, socket) do + %Host{} = host = Repo.get!(Host, id) + {:ok, _} = Hosts.delete_host(host) + {:noreply, assign(socket, :hosts, Hosts.list_all())} + end + + defp changeset_message(cs) do + cs.errors + |> Enum.map_join(", ", fn {k, {msg, _}} -> "#{k}: #{msg}" end) + end + + @impl true + def render(assigns) do + ~H""" +
+ <.link navigate={~p"/"} class="text-sm text-zinc-500 hover:text-zinc-900">← Back +

Hosts

+ +
+

Register a new host

+
+ + +
+

{@error}

+ +
+

+ Token for {@new_token.name} (shown once): +

+ {@new_token.token} +
+
+ +
+ + + + + + + + + + + + + + + + + + + + + + +
NameStatusAgentLast seenActions
{h.name}{h.status}{h.agent_version || "—"}{format_seen(h.last_seen_at)} + + +
+ No hosts yet. +
+
+
+ """ + end + + defp format_seen(nil), do: "never" + + defp format_seen(%DateTime{} = dt) do + Calendar.strftime(dt, "%Y-%m-%d %H:%M UTC") + end +end +``` + +- [ ] **Step 4: Run — expect pass** + +```bash +mix test test/server_web/live/admin_hosts_live_test.exs 2>&1 | tail -5 +``` + +Expected: 4 tests pass. + +- [ ] **Step 5: Commit** + +```bash +cd /Users/cabele/claudeprojects/proxmox_monitor +git add server/lib/server_web/live/admin_hosts_live.ex server/test/server_web/live/admin_hosts_live_test.exs +git commit -m "feat(server): admin LiveView for host registration, rotate, delete" +``` + +--- + +## Task 11: Full Suite + Manual Smoke Test + +- [ ] **Step 1: Run all server tests** + +```bash +cd /Users/cabele/claudeprojects/proxmox_monitor/server +mix test 2>&1 | tail -4 +``` + +Expected: all green. + +- [ ] **Step 2: Generate a password hash** + +```bash +mix run -e 'IO.puts(Argon2.hash_pwd_salt("devpass"))' +``` + +Copy the printed hash. + +- [ ] **Step 3: Start server with hash set** + +```bash +DASHBOARD_PASSWORD_HASH='' mix phx.server +``` + +Expected: `Running ServerWeb.Endpoint` log line. + +- [ ] **Step 4: Browser-drive the dashboard** + +Open http://localhost:4000/. Expected: redirect to `/login`. Enter `devpass`. Expected: redirect to `/` showing host cards. + +If no hosts exist, go to `/admin/hosts` and register one via the form. Copy the token. Write it to `/tmp/agent-local.toml` (same format as Phase 1/2 smoke test) with a short fast interval, and run the agent: + +```bash +cd /Users/cabele/claudeprojects/proxmox_monitor/agent +AGENT_CONFIG=/tmp/agent-local.toml mix run --no-halt +``` + +Back in the browser, watch `/` — within 5-10s the card should gain Load/RAM/Pools/VMs rows and flip to green. Click the card: `/hosts/` should show all sections with live data. + +Navigate to `/vms`: should list the VMs; search should filter. + +Navigate to `/admin/hosts`: Rotate → agent disconnects (old token invalid) and status flips to offline in real time. + +- [ ] **Step 5: Clean up and stop services** + +Stop the agent (`Ctrl+C, a`) and the server (`Ctrl+C, a`). Remove `/tmp/agent-local.toml`. + +No code changes — no commit. + +--- + +## Phase 3 Exit Criteria + +- `mix test` — all green. +- Login redirect + session flow works. +- Overview, Host-Detail, VM-Search, Admin pages render with data. +- PubSub round-trip observed in browser (live update on metric arrival). +- All commits on `main`. + +**Deferred to a later phase (YAGNI):** +- Charts over 24h — the dashboard shows current values; historical metrics are in SQLite for future sparklines. +- API auth — `/api/hosts/:name` is still public. +- Export/download of payloads. diff --git a/server/config/runtime.exs b/server/config/runtime.exs index af17c0c..206b888 100644 --- a/server/config/runtime.exs +++ b/server/config/runtime.exs @@ -1,6 +1,6 @@ import Config -if config_env() in [:prod, :dev] do +if config_env() == :prod do hash = System.get_env("DASHBOARD_PASSWORD_HASH") || raise """ @@ -10,6 +10,12 @@ if config_env() in [:prod, :dev] do """ config :server, :dashboard_password_hash, hash +else + # dev/test: accept an env var override, otherwise leave unset. + # Dev boot without it will crash only when someone POSTs /login. + if hash = System.get_env("DASHBOARD_PASSWORD_HASH") do + config :server, :dashboard_password_hash, hash + end end # config/runtime.exs is executed for all environments, including