proxMon/docs/superpowers/plans/2026-04-22-zfs-pool-detail.md
Carsten a4f4d3ca51 docs: implementation plan for ZFS pool detail
Four tasks: collector enrichment (pool_type/scan/vdevs), classification
coverage tests, CSS for capacity bar + pool block, LiveView rendering
and test updates.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 17:40:31 +02:00

19 KiB
Raw Permalink Blame History

ZFS Pool Detail Implementation Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Show type, total/used/free size, capacity bar, and scan state per ZFS pool on the host detail page — a simple at-a-glance view with no drill-down yet.

Architecture: Extend the existing agent collector (ProxmoxAgent.Collectors.Zfs.collect_pools/1) to derive pool_type, scan_function, scan_state, and a compact vdevs list from the already-fetched zpool status -j --json-flat-vdevs JSON. No new shell-outs. The Phoenix channel stores pool payloads as opaque JSON, so server/DB layers need no change. The host detail LiveView renders a new compact per-pool block using the enriched fields plus a thin capacity bar driven by existing capacity_percent thresholds.

Tech Stack: Elixir / Phoenix LiveView, ExUnit, existing assets/css/app.css. Design doc: docs/superpowers/specs/2026-04-22-zfs-pool-detail-design.md.


File Structure

Modify

  • agent/lib/proxmox_agent/collectors/zfs.ex — extend merge_pools/2 to emit pool_type, scan_function, scan_state, and vdevs list.
  • agent/test/proxmox_agent/collectors/zfs_test.exs — extend existing assertions, add new test cases for stripe, mixed, and ignored special vdev types.
  • server/lib/server_web/live/host_detail_live.ex — replace the pool row markup (current lines 6986), add capbar_level/1, pool_scrub_line/1, and pool_layout/1 helpers.
  • server/assets/css/app.css — add .capbar rules.
  • server/test/server_web/live/host_detail_live_test.exs — extend the fast-sample pool fixture with the new fields and add assertions.

No new files. All changes are additive and land inside existing modules.


Task 1: Agent collector — pool_type, scan state, vdev list

Files:

  • Modify: agent/lib/proxmox_agent/collectors/zfs.ex
  • Modify: agent/test/proxmox_agent/collectors/zfs_test.exs
  • Modify: agent/test/fixtures/zfs/zpool_status.json

- [ ] Step 1: Add per-vdev error counters to the fixture so tests can assert on them

Replace agent/test/fixtures/zfs/zpool_status.json with:

{
  "output_version": { "command": "zpool status", "vers_major": 0, "vers_minor": 1 },
  "pools": {
    "rpool": {
      "name": "rpool",
      "state": "ONLINE",
      "scan": {
        "function": "scrub",
        "state": "FINISHED",
        "end_time": "Sat Apr 19 02:00:00 2026"
      },
      "error_count": "0",
      "vdevs": {
        "mirror-0": {
          "name": "mirror-0",
          "vdev_type": "mirror",
          "state": "ONLINE",
          "read_errors": "0",
          "write_errors": "0",
          "checksum_errors": "0"
        }
      }
    },
    "tank": {
      "name": "tank",
      "state": "DEGRADED",
      "scan": {
        "function": "scrub",
        "state": "SCANNING",
        "end_time": "Tue Mar 01 08:00:00 2026"
      },
      "error_count": "2",
      "vdevs": {
        "raidz2-0": {
          "name": "raidz2-0",
          "vdev_type": "raidz2",
          "state": "DEGRADED",
          "read_errors": "0",
          "write_errors": "0",
          "checksum_errors": "2"
        }
      }
    }
  }
}

(The only change from the current fixture is "tank"'s scan.state"SCANNING" so a scrub-in-progress case is covered.)

- [ ] Step 2: Extend the existing fixture-based test with new field assertions

Edit agent/test/proxmox_agent/collectors/zfs_test.exs. Inside describe "collect_pools/1", replace the "returns a summary per pool" test body with:

test "returns a summary per pool" do
  sample = Zfs.collect_pools(runner: fake_runner())
  assert is_list(sample.pools)
  assert length(sample.pools) == 2
  rpool = Enum.find(sample.pools, &(&1.name == "rpool"))
  tank = Enum.find(sample.pools, &(&1.name == "tank"))

  assert rpool.health == "ONLINE"
  assert rpool.capacity_percent == 40
  assert rpool.fragmentation_percent == 17
  assert rpool.size_bytes == 500_000_000_000
  assert rpool.error_count == 0
  assert rpool.degraded_vdev_count == 0
  assert rpool.pool_type == "mirror"
  assert rpool.scan_function == "scrub"
  assert rpool.scan_state == "FINISHED"
  assert [%{name: "mirror-0", type: "mirror", state: "ONLINE",
            read_errors: 0, write_errors: 0, checksum_errors: 0}] = rpool.vdevs

  assert tank.health == "DEGRADED"
  assert tank.error_count == 2
  assert tank.degraded_vdev_count == 1
  assert tank.pool_type == "raidz2"
  assert tank.scan_state == "SCANNING"
  assert [%{name: "raidz2-0", type: "raidz2", state: "DEGRADED",
            checksum_errors: 2}] = tank.vdevs
end

- [ ] Step 3: Run tests — expect FAIL

Run: cd agent && mix test test/proxmox_agent/collectors/zfs_test.exs

Expected: the "returns a summary per pool" test fails because :pool_type, :scan_function, :scan_state, and :vdevs are not yet on the pool map.

- [ ] Step 4: Implement the new fields in the collector

Edit agent/lib/proxmox_agent/collectors/zfs.ex. Update the @type pool_summary and merge_pools/2 function as follows:

  @type vdev_summary :: %{
          name: String.t(),
          type: String.t(),
          state: String.t(),
          read_errors: non_neg_integer(),
          write_errors: non_neg_integer(),
          checksum_errors: non_neg_integer()
        }

  @type pool_summary :: %{
          name: String.t(),
          health: String.t(),
          size_bytes: non_neg_integer(),
          allocated_bytes: non_neg_integer(),
          free_bytes: non_neg_integer(),
          fragmentation_percent: non_neg_integer(),
          capacity_percent: non_neg_integer(),
          error_count: non_neg_integer(),
          vdev_count: non_neg_integer(),
          degraded_vdev_count: non_neg_integer(),
          pool_type: String.t(),
          scan_function: String.t() | nil,
          scan_state: String.t() | nil,
          last_scrub_end: String.t() | nil,
          vdevs: [vdev_summary()]
        }

Replace the body of merge_pools(%{"pools" => list_pools}, %{"pools" => status_pools}) with:

  defp merge_pools(%{"pools" => list_pools}, %{"pools" => status_pools}) do
    Enum.map(list_pools, fn {name, list_info} ->
      status_info = Map.get(status_pools, name, %{})
      raw_vdevs = Map.get(status_info, "vdevs", %{}) |> Map.values()
      vdevs = Enum.map(raw_vdevs, &vdev_summary/1)

      %{
        name: name,
        health: Map.get(list_info, "health"),
        size_bytes: Map.get(list_info, "size", 0),
        allocated_bytes: Map.get(list_info, "alloc", 0),
        free_bytes: Map.get(list_info, "free", 0),
        fragmentation_percent: Map.get(list_info, "frag", 0),
        capacity_percent: Map.get(list_info, "cap", 0),
        error_count: to_int(Map.get(status_info, "error_count", "0")),
        vdev_count: length(vdevs),
        degraded_vdev_count: Enum.count(vdevs, &(&1.state != "ONLINE")),
        pool_type: derive_pool_type(vdevs),
        scan_function: get_in(status_info, ["scan", "function"]),
        scan_state: get_in(status_info, ["scan", "state"]),
        last_scrub_end: get_in(status_info, ["scan", "end_time"]),
        vdevs: vdevs
      }
    end)
  end

  defp vdev_summary(v) do
    %{
      name: Map.get(v, "name"),
      type: Map.get(v, "vdev_type"),
      state: Map.get(v, "state"),
      read_errors: to_int(Map.get(v, "read_errors", "0")),
      write_errors: to_int(Map.get(v, "write_errors", "0")),
      checksum_errors: to_int(Map.get(v, "checksum_errors", "0"))
    }
  end

  @data_vdev_types ~w(mirror raidz1 raidz2 raidz3 disk)
  @special_vdev_types ~w(log cache spare dedup special)

  defp derive_pool_type(vdevs) do
    data_types =
      vdevs
      |> Enum.map(& &1.type)
      |> Enum.reject(&(&1 in @special_vdev_types))
      |> Enum.uniq()

    case data_types do
      [] -> "unknown"
      ["disk"] -> "stripe"
      [t] when t in @data_vdev_types -> t
      _ -> "mixed"
    end
  end

- [ ] Step 5: Run tests — expect PASS

Run: cd agent && mix test test/proxmox_agent/collectors/zfs_test.exs

Expected: all tests pass.

- [ ] Step 6: Commit

git add agent/lib/proxmox_agent/collectors/zfs.ex \
        agent/test/proxmox_agent/collectors/zfs_test.exs \
        agent/test/fixtures/zfs/zpool_status.json
git commit -m "feat(agent): enrich zpool summary with type, scan state, vdev list"

Task 2: Agent collector — stripe, mixed, and special-vdev coverage

Files:

  • Modify: agent/test/proxmox_agent/collectors/zfs_test.exs

- [ ] Step 1: Add test for plain stripe, mixed layout, and special-vdev filtering

Append this block inside describe "collect_pools/1" in agent/test/proxmox_agent/collectors/zfs_test.exs:

  test "classifies pool_type for stripe, mixed, and special vdevs" do
    list_json =
      Jason.encode!(%{
        "pools" => %{
          "stripe" => %{"name" => "stripe", "size" => 1, "alloc" => 0, "free" => 1,
                       "frag" => 0, "cap" => 0, "health" => "ONLINE"},
          "mixed" => %{"name" => "mixed", "size" => 1, "alloc" => 0, "free" => 1,
                      "frag" => 0, "cap" => 0, "health" => "ONLINE"},
          "mirror_with_log" => %{"name" => "mirror_with_log", "size" => 1, "alloc" => 0, "free" => 1,
                                  "frag" => 0, "cap" => 0, "health" => "ONLINE"}
        }
      })

    vdev = fn name, type ->
      {name, %{"name" => name, "vdev_type" => type, "state" => "ONLINE",
               "read_errors" => "0", "write_errors" => "0", "checksum_errors" => "0"}}
    end

    status_json =
      Jason.encode!(%{
        "pools" => %{
          "stripe" => %{
            "name" => "stripe", "state" => "ONLINE", "error_count" => "0",
            "vdevs" => Map.new([vdev.("sda", "disk"), vdev.("sdb", "disk")])
          },
          "mixed" => %{
            "name" => "mixed", "state" => "ONLINE", "error_count" => "0",
            "vdevs" => Map.new([vdev.("mirror-0", "mirror"), vdev.("raidz1-1", "raidz1")])
          },
          "mirror_with_log" => %{
            "name" => "mirror_with_log", "state" => "ONLINE", "error_count" => "0",
            "vdevs" => Map.new([vdev.("mirror-0", "mirror"), vdev.("log-0", "log")])
          }
        }
      })

    runner = fn
      "zpool", ["list" | _] -> {:ok, list_json}
      "zpool", ["status" | _] -> {:ok, status_json}
    end

    sample = Zfs.collect_pools(runner: runner)
    by_name = Map.new(sample.pools, &{&1.name, &1})

    assert by_name["stripe"].pool_type == "stripe"
    assert by_name["mixed"].pool_type == "mixed"
    assert by_name["mirror_with_log"].pool_type == "mirror"
    # log vdev is retained in the per-pool vdevs list even though it's ignored for layout classification
    assert Enum.any?(by_name["mirror_with_log"].vdevs, &(&1.type == "log"))
  end

- [ ] Step 2: Run the new test — expect PASS (collector already implements the logic)

Run: cd agent && mix test test/proxmox_agent/collectors/zfs_test.exs

Expected: all tests pass, including the new case.

- [ ] Step 3: Commit

git add agent/test/proxmox_agent/collectors/zfs_test.exs
git commit -m "test(agent): cover stripe, mixed, and special-vdev pool_type classification"

Task 3: UI — capacity bar CSS

Files:

  • Modify: server/assets/css/app.css

- [ ] Step 1: Add .capbar rules after the existing .pool-row block

In server/assets/css/app.css, locate the .pool-row rules (around lines 249258) and insert the following immediately after them:

.capbar {
  height: 4px;
  background: var(--panel-2);
  border-radius: 2px;
  overflow: hidden;
  margin: 0.25rem 0 0.4rem;
}
.capbar > span {
  display: block;
  height: 100%;
  background: var(--ok);
  transition: width 0.3s ease;
}
.capbar[data-level="warn"] > span { background: var(--warn); }
.capbar[data-level="crit"] > span { background: var(--crit); }

.pool-block {
  padding: 0.6rem 0.9rem;
  border-bottom: 1px solid var(--border);
}
.pool-block:last-child { border-bottom: none; }
.pool-block .head {
  display: flex;
  justify-content: space-between;
  align-items: baseline;
  gap: 0.6rem;
}
.pool-block .head .layout { color: var(--muted); font-size: 0.8rem; margin-left: 0.5rem; }
.pool-block .sizes { font-family: var(--mono); font-size: 0.78rem; color: var(--fg); }
.pool-block .details { color: var(--muted); font-family: var(--mono); font-size: 0.78rem; }

- [ ] Step 2: Commit

git add server/assets/css/app.css
git commit -m "style(ui): capacity bar and per-pool block styles"

Task 4: UI — render per-pool block with type, capacity bar, sizes, scrub state

Files:

  • Modify: server/lib/server_web/live/host_detail_live.ex
  • Modify: server/test/server_web/live/host_detail_live_test.exs

- [ ] Step 1: Extend the LiveView test fixture with the new fields and add assertions

In server/test/server_web/live/host_detail_live_test.exs, replace the "zfs_pools" block inside the fast fixture (currently ~lines 1525) with:

      "zfs_pools" => %{
        "pools" => [
          %{
            "name" => "rpool",
            "health" => "ONLINE",
            "pool_type" => "mirror",
            "size_bytes" => 500_000_000_000,
            "allocated_bytes" => 200_000_000_000,
            "free_bytes" => 300_000_000_000,
            "capacity_percent" => 40,
            "fragmentation_percent" => 17,
            "error_count" => 0,
            "vdev_count" => 1,
            "degraded_vdev_count" => 0,
            "scan_function" => "scrub",
            "scan_state" => "FINISHED",
            "last_scrub_end" => "Sat Apr 19 02:00:00 2026",
            "vdevs" => [
              %{"name" => "mirror-0", "type" => "mirror", "state" => "ONLINE",
                "read_errors" => 0, "write_errors" => 0, "checksum_errors" => 0}
            ]
          }
        ]
      },

Then in the "renders sections..." test, after the existing assertions, add:

    assert html =~ "mirror"
    assert html =~ "465.7 GB"       # size_bytes formatted
    assert html =~ "186.3 GB"       # allocated_bytes formatted
    assert html =~ "279.4 GB"       # free_bytes formatted
    assert html =~ "capbar"
    assert html =~ "scrub"

(Byte-to-format_bytes/1 values: 500 GB decimal → 465.7 GiB; 200 GB → 186.3 GiB; 300 GB → 279.4 GiB. The helper divides by 1024 per step.)

- [ ] Step 2: Run LiveView test — expect FAIL

Run: cd server && mix test test/server_web/live/host_detail_live_test.exs

Expected: "renders sections..." fails on the new assertions ("mirror", sizes, "capbar").

- [ ] Step 3: Replace the pool-rendering block in the LiveView

In server/lib/server_web/live/host_detail_live.ex, replace the panel that renders ZFS pools (current lines 6587, the <div class="panel"> containing <header><span>ZFS pools</span> down through its closing </div>) with:

      <div class="panel">
        <header><span>ZFS pools</span><span class="mono">{length(pools(@fast))}</span></header>
        <div class="body tight">
          <div :if={pools(@fast) == []} class="empty">No data.</div>
          <div :for={pool <- pools(@fast)} class="pool-block">
            <div class="head">
              <div>
                <span class="mono" style="color: var(--fg-bright); font-weight: 600;">{pool["name"]}</span>
                <span class="layout">{pool_layout(pool)}</span>
              </div>
              <span class="badge" style={pool_badge_style(pool["health"])}>{pool["health"]}</span>
            </div>

            <div class="capbar" data-level={capbar_level(pool["capacity_percent"])}>
              <span style={"width: #{pool["capacity_percent"] || 0}%"}></span>
            </div>

            <div class="sizes">
              used {format_bytes(pool["allocated_bytes"] || 0)} ·
              free {format_bytes(pool["free_bytes"] || 0)} ·
              total {format_bytes(pool["size_bytes"] || 0)}
              <span class="muted">({pool["capacity_percent"] || 0}%)</span>
            </div>

            <div class="details">
              frag {pool["fragmentation_percent"] || 0}% ·
              err {pool["error_count"] || 0} ·
              vdevs {pool["vdev_count"] || 0} (deg {pool["degraded_vdev_count"] || 0}) ·
              {pool_scrub_line(pool)}
            </div>

            <div :for={v <- degraded_vdevs(pool)} class="callout err" style="margin-top: 0.4rem;">
              {v["name"]} {v["state"]} · r={v["read_errors"]} w={v["write_errors"]} cksum={v["checksum_errors"]}
            </div>
          </div>
        </div>
      </div>

Then add these helper functions near the other private helpers (after pool_badge_style/1):

  defp pool_layout(pool) do
    case pool["pool_type"] do
      nil -> "—"
      "" -> "—"
      t -> t
    end
  end

  defp capbar_level(cap) when is_number(cap) and cap >= 90, do: "crit"
  defp capbar_level(cap) when is_number(cap) and cap >= 80, do: "warn"
  defp capbar_level(_), do: "ok"

  defp pool_scrub_line(%{"scan_state" => "SCANNING"}), do: "scrub scanning"

  defp pool_scrub_line(%{"scan_state" => "FINISHED", "last_scrub_end" => end_time})
       when is_binary(end_time) and end_time != "",
       do: "scrub #{end_time}"

  defp pool_scrub_line(%{"last_scrub_end" => end_time}) when is_binary(end_time) and end_time != "",
    do: "scrub #{end_time}"

  defp pool_scrub_line(_), do: "scrub never"

  defp degraded_vdevs(pool) do
    (pool["vdevs"] || [])
    |> Enum.filter(fn v -> Map.get(v, "state") not in [nil, "ONLINE"] end)
  end

- [ ] Step 4: Run LiveView test — expect PASS

Run: cd server && mix test test/server_web/live/host_detail_live_test.exs

Expected: all tests pass.

- [ ] Step 5: Run the full server suite to catch regressions

Run: cd server && mix test

Expected: all tests pass.

- [ ] Step 6: Run the full agent suite to catch regressions

Run: cd agent && mix test

Expected: all tests pass.

- [ ] Step 7: Manual visual check (dev server)

Start the server locally (cd server && mix phx.server), log in, open a host detail page with live agent data, and confirm:

  • Each pool shows name pool_type on line 1 with the health badge on the right.
  • The capacity bar renders at the correct width and turns yellow/red at 80% / 90%.
  • used / free / total line shows bytes formatted like 200.0 GB.
  • The details line shows frag/err/vdevs and a scrub label (scrub finished …, scrub scanning, or scrub never).
  • Degraded pools list each non-ONLINE vdev in a red .callout.err line; ONLINE pools don't.

If the manual check reveals a rendering issue, fix it in host_detail_live.ex and re-run cd server && mix test.

- [ ] Step 8: Commit

git add server/lib/server_web/live/host_detail_live.ex \
        server/test/server_web/live/host_detail_live_test.exs
git commit -m "feat(ui): detailed per-pool block with type, capacity bar, scrub state"