diff --git a/docs/superpowers/specs/2026-04-22-zfs-pool-detail-design.md b/docs/superpowers/specs/2026-04-22-zfs-pool-detail-design.md new file mode 100644 index 0000000..2b87092 --- /dev/null +++ b/docs/superpowers/specs/2026-04-22-zfs-pool-detail-design.md @@ -0,0 +1,159 @@ +# ZFS Pool Detail — Design + +Date: 2026-04-22 +Status: approved (pending implementation) + +## Problem + +The host detail view renders a compact row per ZFS pool today (`server/lib/server_web/live/host_detail_live.ex:67`): + +``` +rpool [ONLINE] +cap 0% · frag 0% · err 0 · vdevs 4 (deg 0) scrub never +``` + +This hides information the user needs at first glance: + +- Total / used / free size (bytes are already collected but never rendered). +- Pool layout (mirror / raidz1 / raidz2 / stripe / mixed) — not collected. +- Scan state — only `end_time` is kept, so an in-progress scrub looks like a finished one. + +The original concept doc calls for "Health, Capacity-Bar, Fragmentation, Error-Counters, Scrub-Info, vdev-Liste" per pool (`proxmox-monitor-konzept.md:227`). We never finished that. + +## Goal + +One compact block per pool that answers at a glance: *is it healthy, what layout is it, how full is it, is a scrub running*. No drill-down yet. + +## Scope + +In scope: + +1. Agent collector enrichment — derive `pool_type`, keep vdev summary list, keep scan function/state. No new shell-outs; `zpool status -j --json-flat-vdevs --json-int` already returns all of this. +2. Host detail LiveView — replace the current single-line pool row with a richer compact block (see layout below). +3. Capacity bar styling in `assets/css/app.css`. +4. Tests — extend `agent/test/proxmox_agent/collectors/zfs_test.exs` fixtures and assertions for the new fields. + +Out of scope (YAGNI): + +- Drill-down view with per-vdev disk state, resilver progress bars, or scan history. +- Persistence schema changes — payload is stored as JSON blob; adding keys is additive. +- Storage/dataset/VM panel changes — separate conversation. + +## Agent changes + +### Collector output + +Extend `ProxmoxAgent.Collectors.Zfs.pool_summary` with three fields: + +```elixir +%{ + # existing fields unchanged: + name:, health:, size_bytes:, allocated_bytes:, free_bytes:, + fragmentation_percent:, capacity_percent:, error_count:, + vdev_count:, degraded_vdev_count:, last_scrub_end:, + + # new: + pool_type: String.t(), # "mirror" | "raidz1" | "raidz2" | "raidz3" | "stripe" | "mixed" + scan_function: String.t() | nil, # "scrub" | "resilver" | nil + scan_state: String.t() | nil, # "SCANNING" | "FINISHED" | "CANCELED" | nil + vdevs: [%{name: String.t(), type: String.t(), state: String.t(), + read_errors: non_neg_integer(), write_errors: non_neg_integer(), + checksum_errors: non_neg_integer()}] +} +``` + +### Derivation rules + +`pool_type` is derived from the set of `vdev_type` values across top-level vdevs: + +- All vdevs the same type → that type (`"mirror"`, `"raidz1"`, `"raidz2"`, `"raidz3"`). +- All vdevs are `disk` (plain top-level disk with no redundancy) → `"stripe"`. +- Anything else → `"mixed"`. + +Special vdev types (`log`, `cache`, `spare`, `dedup`, `special`) are ignored for layout classification — they don't change the data redundancy story. They are still included in the `vdevs` list. + +`scan_function` / `scan_state` read `get_in(status_info, ["scan", "function" | "state"])`. + +Per-vdev numeric fields (`read_errors`, `write_errors`, `checksum_errors`) are parsed the same way `error_count` already is (string or int tolerant). + +### Tests + +`agent/test/fixtures/zfs/zpool_status.json` already has a mirror and a raidz2 pool; extend assertions in `zfs_test.exs`: + +- `rpool.pool_type == "mirror"` +- `tank.pool_type == "raidz2"` +- `rpool.scan_state == "FINISHED"` +- `rpool.vdevs` has length 1 with `type: "mirror"`, `state: "ONLINE"` + +Add one new fixture-free unit test covering the `"stripe"` and `"mixed"` branches by injecting a synthetic runner. + +## Server changes + +None in the collector pipeline. The channel handler already stores the whole `zfs_pools.pools` list as JSON (`server/lib/server_web/channels/host_channel.ex` — to confirm in plan) and the LiveView reads it with `get_in/2`. New keys flow through automatically. + +## UI changes + +### Layout + +Replace the current `.pool-row` flex block in `host_detail_live.ex:69-86` with a per-pool compact block: + +``` +rpool mirror [ONLINE] +████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 40% +used 200.0 GB · free 300.0 GB · total 500.0 GB +frag 17% · err 0 · vdevs 1 (deg 0) · scrub finished 2026-04-19 +``` + +Element mapping: + +- Line 1: pool name (bright mono, bold) · pool_type (muted) · health badge (right). +- Line 2: capacity bar (div with width % + background color keyed to capacity thresholds). +- Line 3: used / free / total — rendered with the existing `format_bytes/1` helper. +- Line 4: the existing compact details line, plus scrub state — `scrub scanning` / `scrub finished ` / `scrub never`. + +### Capacity bar + +CSS in `server/assets/css/app.css`: + +```css +.capbar { + height: 4px; background: var(--panel-2); border-radius: 2px; + overflow: hidden; margin: 0.25rem 0; +} +.capbar > span { display: block; height: 100%; background: var(--ok); } +.capbar[data-level="warn"] > span { background: var(--warn); } +.capbar[data-level="crit"] > span { background: var(--crit); } +``` + +Thresholds (matching the concept doc's thresholds at `proxmox-monitor-konzept.md:218-219`): + +- `cap >= 90` → `data-level="crit"` +- `cap >= 80` → `data-level="warn"` +- else → default (ok green). + +### Degraded pool callout + +For ONLINE pools with `degraded_vdev_count == 0`, do not render per-vdev detail — keep it simple. For anything else, render one line per non-ONLINE vdev below the detail line: + +``` +! mirror-1 DEGRADED r=0 w=0 cksum=12 +``` + +Styled with the existing `.callout.err` class. + +### Scrub rendering + +- `scan_state == "SCANNING"` → `"scrub scanning"` (no date). +- `scan_state == "FINISHED"` and `last_scrub_end` present → `"scrub #{format_date(last_scrub_end)}"`. +- Otherwise → `"scrub never"`. + +`last_scrub_end` is a string like `"Sat Apr 19 02:00:00 2026"` — keep as-is or reformat to `YYYY-MM-DD` with a tiny helper (strptime isn't stdlib-trivial in Elixir; simplest: split on whitespace and reorder). Accept "as-is" if reformatting is ugly. + +## Risks + +- ZFS JSON output has changed shape between OpenZFS releases. The concept doc requires `OpenZFS 2.3+`. Agent code tolerates missing keys via `Map.get/3` defaults — keep that discipline. +- `zpool status --json-flat-vdevs` flattens nested mirrors-of-mirrors. Top-level vdevs are keyed by name; pool_type derivation inspects only top-level entries (no child vdev walking needed in the flat form). + +## Rollout + +Additive collector changes + additive UI. No DB migration, no breaking payload change. Old agents without the new fields render the "graceful degraded" path: `pool_type` shows as `—`, scrub line falls back to `never`, capacity bar still renders from existing bytes.