proxMon/docs/superpowers/plans/2026-04-22-zfs-pool-detail.md
Carsten a4f4d3ca51 docs: implementation plan for ZFS pool detail
Four tasks: collector enrichment (pool_type/scan/vdevs), classification
coverage tests, CSS for capacity bar + pool block, LiveView rendering
and test updates.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 17:40:31 +02:00

532 lines
19 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# ZFS Pool Detail Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Show type, total/used/free size, capacity bar, and scan state per ZFS pool on the host detail page — a simple at-a-glance view with no drill-down yet.
**Architecture:** Extend the existing agent collector (`ProxmoxAgent.Collectors.Zfs.collect_pools/1`) to derive `pool_type`, `scan_function`, `scan_state`, and a compact `vdevs` list from the already-fetched `zpool status -j --json-flat-vdevs` JSON. No new shell-outs. The Phoenix channel stores pool payloads as opaque JSON, so server/DB layers need no change. The host detail LiveView renders a new compact per-pool block using the enriched fields plus a thin capacity bar driven by existing `capacity_percent` thresholds.
**Tech Stack:** Elixir / Phoenix LiveView, ExUnit, existing `assets/css/app.css`. Design doc: `docs/superpowers/specs/2026-04-22-zfs-pool-detail-design.md`.
---
## File Structure
**Modify**
- `agent/lib/proxmox_agent/collectors/zfs.ex` — extend `merge_pools/2` to emit `pool_type`, `scan_function`, `scan_state`, and `vdevs` list.
- `agent/test/proxmox_agent/collectors/zfs_test.exs` — extend existing assertions, add new test cases for stripe, mixed, and ignored special vdev types.
- `server/lib/server_web/live/host_detail_live.ex` — replace the pool row markup (current lines 6986), add `capbar_level/1`, `pool_scrub_line/1`, and `pool_layout/1` helpers.
- `server/assets/css/app.css` — add `.capbar` rules.
- `server/test/server_web/live/host_detail_live_test.exs` — extend the fast-sample pool fixture with the new fields and add assertions.
**No new files.** All changes are additive and land inside existing modules.
---
## Task 1: Agent collector — pool_type, scan state, vdev list
**Files:**
- Modify: `agent/lib/proxmox_agent/collectors/zfs.ex`
- Modify: `agent/test/proxmox_agent/collectors/zfs_test.exs`
- Modify: `agent/test/fixtures/zfs/zpool_status.json`
### - [ ] Step 1: Add per-vdev error counters to the fixture so tests can assert on them
Replace `agent/test/fixtures/zfs/zpool_status.json` with:
```json
{
"output_version": { "command": "zpool status", "vers_major": 0, "vers_minor": 1 },
"pools": {
"rpool": {
"name": "rpool",
"state": "ONLINE",
"scan": {
"function": "scrub",
"state": "FINISHED",
"end_time": "Sat Apr 19 02:00:00 2026"
},
"error_count": "0",
"vdevs": {
"mirror-0": {
"name": "mirror-0",
"vdev_type": "mirror",
"state": "ONLINE",
"read_errors": "0",
"write_errors": "0",
"checksum_errors": "0"
}
}
},
"tank": {
"name": "tank",
"state": "DEGRADED",
"scan": {
"function": "scrub",
"state": "SCANNING",
"end_time": "Tue Mar 01 08:00:00 2026"
},
"error_count": "2",
"vdevs": {
"raidz2-0": {
"name": "raidz2-0",
"vdev_type": "raidz2",
"state": "DEGRADED",
"read_errors": "0",
"write_errors": "0",
"checksum_errors": "2"
}
}
}
}
}
```
(The only change from the current fixture is `"tank"`'s `scan.state``"SCANNING"` so a scrub-in-progress case is covered.)
### - [ ] Step 2: Extend the existing fixture-based test with new field assertions
Edit `agent/test/proxmox_agent/collectors/zfs_test.exs`. Inside `describe "collect_pools/1"`, replace the `"returns a summary per pool"` test body with:
```elixir
test "returns a summary per pool" do
sample = Zfs.collect_pools(runner: fake_runner())
assert is_list(sample.pools)
assert length(sample.pools) == 2
rpool = Enum.find(sample.pools, &(&1.name == "rpool"))
tank = Enum.find(sample.pools, &(&1.name == "tank"))
assert rpool.health == "ONLINE"
assert rpool.capacity_percent == 40
assert rpool.fragmentation_percent == 17
assert rpool.size_bytes == 500_000_000_000
assert rpool.error_count == 0
assert rpool.degraded_vdev_count == 0
assert rpool.pool_type == "mirror"
assert rpool.scan_function == "scrub"
assert rpool.scan_state == "FINISHED"
assert [%{name: "mirror-0", type: "mirror", state: "ONLINE",
read_errors: 0, write_errors: 0, checksum_errors: 0}] = rpool.vdevs
assert tank.health == "DEGRADED"
assert tank.error_count == 2
assert tank.degraded_vdev_count == 1
assert tank.pool_type == "raidz2"
assert tank.scan_state == "SCANNING"
assert [%{name: "raidz2-0", type: "raidz2", state: "DEGRADED",
checksum_errors: 2}] = tank.vdevs
end
```
### - [ ] Step 3: Run tests — expect FAIL
Run: `cd agent && mix test test/proxmox_agent/collectors/zfs_test.exs`
Expected: the `"returns a summary per pool"` test fails because `:pool_type`, `:scan_function`, `:scan_state`, and `:vdevs` are not yet on the pool map.
### - [ ] Step 4: Implement the new fields in the collector
Edit `agent/lib/proxmox_agent/collectors/zfs.ex`. Update the `@type pool_summary` and `merge_pools/2` function as follows:
```elixir
@type vdev_summary :: %{
name: String.t(),
type: String.t(),
state: String.t(),
read_errors: non_neg_integer(),
write_errors: non_neg_integer(),
checksum_errors: non_neg_integer()
}
@type pool_summary :: %{
name: String.t(),
health: String.t(),
size_bytes: non_neg_integer(),
allocated_bytes: non_neg_integer(),
free_bytes: non_neg_integer(),
fragmentation_percent: non_neg_integer(),
capacity_percent: non_neg_integer(),
error_count: non_neg_integer(),
vdev_count: non_neg_integer(),
degraded_vdev_count: non_neg_integer(),
pool_type: String.t(),
scan_function: String.t() | nil,
scan_state: String.t() | nil,
last_scrub_end: String.t() | nil,
vdevs: [vdev_summary()]
}
```
Replace the body of `merge_pools(%{"pools" => list_pools}, %{"pools" => status_pools})` with:
```elixir
defp merge_pools(%{"pools" => list_pools}, %{"pools" => status_pools}) do
Enum.map(list_pools, fn {name, list_info} ->
status_info = Map.get(status_pools, name, %{})
raw_vdevs = Map.get(status_info, "vdevs", %{}) |> Map.values()
vdevs = Enum.map(raw_vdevs, &vdev_summary/1)
%{
name: name,
health: Map.get(list_info, "health"),
size_bytes: Map.get(list_info, "size", 0),
allocated_bytes: Map.get(list_info, "alloc", 0),
free_bytes: Map.get(list_info, "free", 0),
fragmentation_percent: Map.get(list_info, "frag", 0),
capacity_percent: Map.get(list_info, "cap", 0),
error_count: to_int(Map.get(status_info, "error_count", "0")),
vdev_count: length(vdevs),
degraded_vdev_count: Enum.count(vdevs, &(&1.state != "ONLINE")),
pool_type: derive_pool_type(vdevs),
scan_function: get_in(status_info, ["scan", "function"]),
scan_state: get_in(status_info, ["scan", "state"]),
last_scrub_end: get_in(status_info, ["scan", "end_time"]),
vdevs: vdevs
}
end)
end
defp vdev_summary(v) do
%{
name: Map.get(v, "name"),
type: Map.get(v, "vdev_type"),
state: Map.get(v, "state"),
read_errors: to_int(Map.get(v, "read_errors", "0")),
write_errors: to_int(Map.get(v, "write_errors", "0")),
checksum_errors: to_int(Map.get(v, "checksum_errors", "0"))
}
end
@data_vdev_types ~w(mirror raidz1 raidz2 raidz3 disk)
@special_vdev_types ~w(log cache spare dedup special)
defp derive_pool_type(vdevs) do
data_types =
vdevs
|> Enum.map(& &1.type)
|> Enum.reject(&(&1 in @special_vdev_types))
|> Enum.uniq()
case data_types do
[] -> "unknown"
["disk"] -> "stripe"
[t] when t in @data_vdev_types -> t
_ -> "mixed"
end
end
```
### - [ ] Step 5: Run tests — expect PASS
Run: `cd agent && mix test test/proxmox_agent/collectors/zfs_test.exs`
Expected: all tests pass.
### - [ ] Step 6: Commit
```bash
git add agent/lib/proxmox_agent/collectors/zfs.ex \
agent/test/proxmox_agent/collectors/zfs_test.exs \
agent/test/fixtures/zfs/zpool_status.json
git commit -m "feat(agent): enrich zpool summary with type, scan state, vdev list"
```
---
## Task 2: Agent collector — stripe, mixed, and special-vdev coverage
**Files:**
- Modify: `agent/test/proxmox_agent/collectors/zfs_test.exs`
### - [ ] Step 1: Add test for plain stripe, mixed layout, and special-vdev filtering
Append this block inside `describe "collect_pools/1"` in `agent/test/proxmox_agent/collectors/zfs_test.exs`:
```elixir
test "classifies pool_type for stripe, mixed, and special vdevs" do
list_json =
Jason.encode!(%{
"pools" => %{
"stripe" => %{"name" => "stripe", "size" => 1, "alloc" => 0, "free" => 1,
"frag" => 0, "cap" => 0, "health" => "ONLINE"},
"mixed" => %{"name" => "mixed", "size" => 1, "alloc" => 0, "free" => 1,
"frag" => 0, "cap" => 0, "health" => "ONLINE"},
"mirror_with_log" => %{"name" => "mirror_with_log", "size" => 1, "alloc" => 0, "free" => 1,
"frag" => 0, "cap" => 0, "health" => "ONLINE"}
}
})
vdev = fn name, type ->
{name, %{"name" => name, "vdev_type" => type, "state" => "ONLINE",
"read_errors" => "0", "write_errors" => "0", "checksum_errors" => "0"}}
end
status_json =
Jason.encode!(%{
"pools" => %{
"stripe" => %{
"name" => "stripe", "state" => "ONLINE", "error_count" => "0",
"vdevs" => Map.new([vdev.("sda", "disk"), vdev.("sdb", "disk")])
},
"mixed" => %{
"name" => "mixed", "state" => "ONLINE", "error_count" => "0",
"vdevs" => Map.new([vdev.("mirror-0", "mirror"), vdev.("raidz1-1", "raidz1")])
},
"mirror_with_log" => %{
"name" => "mirror_with_log", "state" => "ONLINE", "error_count" => "0",
"vdevs" => Map.new([vdev.("mirror-0", "mirror"), vdev.("log-0", "log")])
}
}
})
runner = fn
"zpool", ["list" | _] -> {:ok, list_json}
"zpool", ["status" | _] -> {:ok, status_json}
end
sample = Zfs.collect_pools(runner: runner)
by_name = Map.new(sample.pools, &{&1.name, &1})
assert by_name["stripe"].pool_type == "stripe"
assert by_name["mixed"].pool_type == "mixed"
assert by_name["mirror_with_log"].pool_type == "mirror"
# log vdev is retained in the per-pool vdevs list even though it's ignored for layout classification
assert Enum.any?(by_name["mirror_with_log"].vdevs, &(&1.type == "log"))
end
```
### - [ ] Step 2: Run the new test — expect PASS (collector already implements the logic)
Run: `cd agent && mix test test/proxmox_agent/collectors/zfs_test.exs`
Expected: all tests pass, including the new case.
### - [ ] Step 3: Commit
```bash
git add agent/test/proxmox_agent/collectors/zfs_test.exs
git commit -m "test(agent): cover stripe, mixed, and special-vdev pool_type classification"
```
---
## Task 3: UI — capacity bar CSS
**Files:**
- Modify: `server/assets/css/app.css`
### - [ ] Step 1: Add `.capbar` rules after the existing `.pool-row` block
In `server/assets/css/app.css`, locate the `.pool-row` rules (around lines 249258) and insert the following immediately after them:
```css
.capbar {
height: 4px;
background: var(--panel-2);
border-radius: 2px;
overflow: hidden;
margin: 0.25rem 0 0.4rem;
}
.capbar > span {
display: block;
height: 100%;
background: var(--ok);
transition: width 0.3s ease;
}
.capbar[data-level="warn"] > span { background: var(--warn); }
.capbar[data-level="crit"] > span { background: var(--crit); }
.pool-block {
padding: 0.6rem 0.9rem;
border-bottom: 1px solid var(--border);
}
.pool-block:last-child { border-bottom: none; }
.pool-block .head {
display: flex;
justify-content: space-between;
align-items: baseline;
gap: 0.6rem;
}
.pool-block .head .layout { color: var(--muted); font-size: 0.8rem; margin-left: 0.5rem; }
.pool-block .sizes { font-family: var(--mono); font-size: 0.78rem; color: var(--fg); }
.pool-block .details { color: var(--muted); font-family: var(--mono); font-size: 0.78rem; }
```
### - [ ] Step 2: Commit
```bash
git add server/assets/css/app.css
git commit -m "style(ui): capacity bar and per-pool block styles"
```
---
## Task 4: UI — render per-pool block with type, capacity bar, sizes, scrub state
**Files:**
- Modify: `server/lib/server_web/live/host_detail_live.ex`
- Modify: `server/test/server_web/live/host_detail_live_test.exs`
### - [ ] Step 1: Extend the LiveView test fixture with the new fields and add assertions
In `server/test/server_web/live/host_detail_live_test.exs`, replace the `"zfs_pools"` block inside the `fast` fixture (currently ~lines 1525) with:
```elixir
"zfs_pools" => %{
"pools" => [
%{
"name" => "rpool",
"health" => "ONLINE",
"pool_type" => "mirror",
"size_bytes" => 500_000_000_000,
"allocated_bytes" => 200_000_000_000,
"free_bytes" => 300_000_000_000,
"capacity_percent" => 40,
"fragmentation_percent" => 17,
"error_count" => 0,
"vdev_count" => 1,
"degraded_vdev_count" => 0,
"scan_function" => "scrub",
"scan_state" => "FINISHED",
"last_scrub_end" => "Sat Apr 19 02:00:00 2026",
"vdevs" => [
%{"name" => "mirror-0", "type" => "mirror", "state" => "ONLINE",
"read_errors" => 0, "write_errors" => 0, "checksum_errors" => 0}
]
}
]
},
```
Then in the `"renders sections..."` test, after the existing assertions, add:
```elixir
assert html =~ "mirror"
assert html =~ "465.7 GB" # size_bytes formatted
assert html =~ "186.3 GB" # allocated_bytes formatted
assert html =~ "279.4 GB" # free_bytes formatted
assert html =~ "capbar"
assert html =~ "scrub"
```
(Byte-to-`format_bytes/1` values: 500 GB decimal → 465.7 GiB; 200 GB → 186.3 GiB; 300 GB → 279.4 GiB. The helper divides by 1024 per step.)
### - [ ] Step 2: Run LiveView test — expect FAIL
Run: `cd server && mix test test/server_web/live/host_detail_live_test.exs`
Expected: `"renders sections..."` fails on the new assertions (`"mirror"`, sizes, `"capbar"`).
### - [ ] Step 3: Replace the pool-rendering block in the LiveView
In `server/lib/server_web/live/host_detail_live.ex`, replace the panel that renders ZFS pools (current lines 6587, the `<div class="panel">` containing `<header><span>ZFS pools</span>` down through its closing `</div>`) with:
```heex
<div class="panel">
<header><span>ZFS pools</span><span class="mono">{length(pools(@fast))}</span></header>
<div class="body tight">
<div :if={pools(@fast) == []} class="empty">No data.</div>
<div :for={pool <- pools(@fast)} class="pool-block">
<div class="head">
<div>
<span class="mono" style="color: var(--fg-bright); font-weight: 600;">{pool["name"]}</span>
<span class="layout">{pool_layout(pool)}</span>
</div>
<span class="badge" style={pool_badge_style(pool["health"])}>{pool["health"]}</span>
</div>
<div class="capbar" data-level={capbar_level(pool["capacity_percent"])}>
<span style={"width: #{pool["capacity_percent"] || 0}%"}></span>
</div>
<div class="sizes">
used {format_bytes(pool["allocated_bytes"] || 0)} ·
free {format_bytes(pool["free_bytes"] || 0)} ·
total {format_bytes(pool["size_bytes"] || 0)}
<span class="muted">({pool["capacity_percent"] || 0}%)</span>
</div>
<div class="details">
frag {pool["fragmentation_percent"] || 0}% ·
err {pool["error_count"] || 0} ·
vdevs {pool["vdev_count"] || 0} (deg {pool["degraded_vdev_count"] || 0}) ·
{pool_scrub_line(pool)}
</div>
<div :for={v <- degraded_vdevs(pool)} class="callout err" style="margin-top: 0.4rem;">
{v["name"]} {v["state"]} · r={v["read_errors"]} w={v["write_errors"]} cksum={v["checksum_errors"]}
</div>
</div>
</div>
</div>
```
Then add these helper functions near the other private helpers (after `pool_badge_style/1`):
```elixir
defp pool_layout(pool) do
case pool["pool_type"] do
nil -> "—"
"" -> "—"
t -> t
end
end
defp capbar_level(cap) when is_number(cap) and cap >= 90, do: "crit"
defp capbar_level(cap) when is_number(cap) and cap >= 80, do: "warn"
defp capbar_level(_), do: "ok"
defp pool_scrub_line(%{"scan_state" => "SCANNING"}), do: "scrub scanning"
defp pool_scrub_line(%{"scan_state" => "FINISHED", "last_scrub_end" => end_time})
when is_binary(end_time) and end_time != "",
do: "scrub #{end_time}"
defp pool_scrub_line(%{"last_scrub_end" => end_time}) when is_binary(end_time) and end_time != "",
do: "scrub #{end_time}"
defp pool_scrub_line(_), do: "scrub never"
defp degraded_vdevs(pool) do
(pool["vdevs"] || [])
|> Enum.filter(fn v -> Map.get(v, "state") not in [nil, "ONLINE"] end)
end
```
### - [ ] Step 4: Run LiveView test — expect PASS
Run: `cd server && mix test test/server_web/live/host_detail_live_test.exs`
Expected: all tests pass.
### - [ ] Step 5: Run the full server suite to catch regressions
Run: `cd server && mix test`
Expected: all tests pass.
### - [ ] Step 6: Run the full agent suite to catch regressions
Run: `cd agent && mix test`
Expected: all tests pass.
### - [ ] Step 7: Manual visual check (dev server)
Start the server locally (`cd server && mix phx.server`), log in, open a host detail page with live agent data, and confirm:
- Each pool shows `name pool_type` on line 1 with the health badge on the right.
- The capacity bar renders at the correct width and turns yellow/red at 80% / 90%.
- `used / free / total` line shows bytes formatted like `200.0 GB`.
- The `details` line shows frag/err/vdevs and a scrub label (`scrub finished …`, `scrub scanning`, or `scrub never`).
- Degraded pools list each non-ONLINE vdev in a red `.callout.err` line; ONLINE pools don't.
If the manual check reveals a rendering issue, fix it in `host_detail_live.ex` and re-run `cd server && mix test`.
### - [ ] Step 8: Commit
```bash
git add server/lib/server_web/live/host_detail_live.ex \
server/test/server_web/live/host_detail_live_test.exs
git commit -m "feat(ui): detailed per-pool block with type, capacity bar, scrub state"
```