docs: implementation plan for agent diagnostic dump
Seven tasks: config field, Diagnostics module, Writer GenServer, end-to-end test, Shell hook, Reporter hook, Application integration. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
b2070b6a39
commit
b798b462ca
1 changed files with 892 additions and 0 deletions
892
docs/superpowers/plans/2026-04-22-agent-diagnostic-dump.md
Normal file
892
docs/superpowers/plans/2026-04-22-agent-diagnostic-dump.md
Normal file
|
|
@ -0,0 +1,892 @@
|
||||||
|
# Agent Diagnostic Dump Implementation Plan
|
||||||
|
|
||||||
|
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
||||||
|
|
||||||
|
**Goal:** Add an opt-in diagnostic dump to the agent that logs every external command (with full raw output) and every outgoing metric payload, so the operator can verify source data before it hits the server.
|
||||||
|
|
||||||
|
**Architecture:** A new `ProxmoxAgent.Diagnostics` module holds the on/off state in the application env (`:agent, :dump_dir`). When enabled, a `ProxmoxAgent.Diagnostics.Writer` GenServer owns append-only file handles for `commands.log` and `samples.log` and serializes all writes. `Shell.run/2` calls `Diagnostics.log_command/4` after each command; `Reporter.push_metric/3` calls `Diagnostics.log_sample/2` before each push. When `dump_dir` is unset, both helpers are no-ops and the Writer is not started.
|
||||||
|
|
||||||
|
**Tech Stack:** Elixir / OTP, ExUnit, `GenServer`, `Application.get_env`, `File` / `IO.write`. Spec: `docs/superpowers/specs/2026-04-22-agent-diagnostic-dump-design.md`.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## File Structure
|
||||||
|
|
||||||
|
**Create:**
|
||||||
|
- `agent/lib/proxmox_agent/diagnostics.ex` — public API: `configure/1`, `enabled?/0`, `log_command/4`, `log_sample/2`. Thin wrapper around the Writer.
|
||||||
|
- `agent/lib/proxmox_agent/diagnostics/writer.ex` — `GenServer` that owns the two file handles and serializes writes.
|
||||||
|
- `agent/test/proxmox_agent/diagnostics_test.exs` — unit tests for the public API (both disabled and enabled paths).
|
||||||
|
- `agent/test/proxmox_agent/diagnostics/writer_test.exs` — unit tests for the writer GenServer.
|
||||||
|
|
||||||
|
**Modify:**
|
||||||
|
- `agent/lib/proxmox_agent/config.ex` — add `:dump_dir` field (String.t() | nil).
|
||||||
|
- `agent/test/proxmox_agent/config_test.exs` — test `[debug] dump_dir` parsing.
|
||||||
|
- `agent/test/fixtures/agent.toml` — add `[debug] dump_dir` to the fixture.
|
||||||
|
- `agent/lib/proxmox_agent/shell.ex` — add timing + call `Diagnostics.log_command/4`.
|
||||||
|
- `agent/test/proxmox_agent/shell_test.exs` — add a test that exercises the dump path.
|
||||||
|
- `agent/lib/proxmox_agent/reporter.ex` — add one line calling `Diagnostics.log_sample/2` before `push/4`.
|
||||||
|
- `agent/lib/proxmox_agent/application.ex` — call `Diagnostics.configure/1` and conditionally add the Writer to the supervision tree.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Task 1: Config — add `dump_dir` field
|
||||||
|
|
||||||
|
**Files:**
|
||||||
|
- Modify: `agent/lib/proxmox_agent/config.ex`
|
||||||
|
- Modify: `agent/test/proxmox_agent/config_test.exs`
|
||||||
|
- Modify: `agent/test/fixtures/agent.toml`
|
||||||
|
|
||||||
|
### - [ ] Step 1: Add `dump_dir` to the fixture
|
||||||
|
|
||||||
|
Replace the contents of `agent/test/fixtures/agent.toml` with:
|
||||||
|
|
||||||
|
```toml
|
||||||
|
server_url = "wss://monitor.example.com/socket/websocket"
|
||||||
|
token = "test_token_123"
|
||||||
|
host_id = "pve-test-01"
|
||||||
|
|
||||||
|
[intervals]
|
||||||
|
fast_seconds = 15
|
||||||
|
medium_seconds = 120
|
||||||
|
slow_seconds = 600
|
||||||
|
|
||||||
|
[debug]
|
||||||
|
dump_dir = "/tmp/proxmox-monitor-test"
|
||||||
|
```
|
||||||
|
|
||||||
|
### - [ ] Step 2: Add a failing Config test for `dump_dir`
|
||||||
|
|
||||||
|
In `agent/test/proxmox_agent/config_test.exs`, add this test inside `describe "load/1"`:
|
||||||
|
|
||||||
|
```elixir
|
||||||
|
test "parses [debug] dump_dir when present" do
|
||||||
|
assert {:ok, cfg} = Config.load(@fixture)
|
||||||
|
assert cfg.dump_dir == "/tmp/proxmox-monitor-test"
|
||||||
|
end
|
||||||
|
|
||||||
|
test "dump_dir is nil when [debug] is absent" do
|
||||||
|
tmp = Path.join(System.tmp_dir!(), "agent_nodebug.toml")
|
||||||
|
|
||||||
|
File.write!(tmp, """
|
||||||
|
server_url = "wss://x/socket/websocket"
|
||||||
|
token = "t"
|
||||||
|
""")
|
||||||
|
|
||||||
|
on_exit(fn -> File.rm(tmp) end)
|
||||||
|
|
||||||
|
assert {:ok, cfg} = Config.load(tmp)
|
||||||
|
assert cfg.dump_dir == nil
|
||||||
|
end
|
||||||
|
|
||||||
|
test "dump_dir is nil when set to empty string" do
|
||||||
|
tmp = Path.join(System.tmp_dir!(), "agent_emptydebug.toml")
|
||||||
|
|
||||||
|
File.write!(tmp, """
|
||||||
|
server_url = "wss://x/socket/websocket"
|
||||||
|
token = "t"
|
||||||
|
|
||||||
|
[debug]
|
||||||
|
dump_dir = ""
|
||||||
|
""")
|
||||||
|
|
||||||
|
on_exit(fn -> File.rm(tmp) end)
|
||||||
|
|
||||||
|
assert {:ok, cfg} = Config.load(tmp)
|
||||||
|
assert cfg.dump_dir == nil
|
||||||
|
end
|
||||||
|
```
|
||||||
|
|
||||||
|
Update the existing `"parses required fields"` test to also assert on `dump_dir`:
|
||||||
|
|
||||||
|
```elixir
|
||||||
|
test "parses required fields" do
|
||||||
|
assert {:ok, cfg} = Config.load(@fixture)
|
||||||
|
assert cfg.server_url == "wss://monitor.example.com/socket/websocket"
|
||||||
|
assert cfg.token == "test_token_123"
|
||||||
|
assert cfg.host_id == "pve-test-01"
|
||||||
|
assert cfg.fast_seconds == 15
|
||||||
|
assert cfg.medium_seconds == 120
|
||||||
|
assert cfg.slow_seconds == 600
|
||||||
|
assert cfg.dump_dir == "/tmp/proxmox-monitor-test"
|
||||||
|
end
|
||||||
|
```
|
||||||
|
|
||||||
|
### - [ ] Step 3: Run Config tests — expect FAIL
|
||||||
|
|
||||||
|
Run: `cd agent && mix test test/proxmox_agent/config_test.exs`
|
||||||
|
|
||||||
|
Expected: at least the three new assertions fail because `Config` struct has no `:dump_dir` field and `build/1` does not read `[debug] dump_dir`.
|
||||||
|
|
||||||
|
### - [ ] Step 4: Add `:dump_dir` to the Config struct and build/1
|
||||||
|
|
||||||
|
Edit `agent/lib/proxmox_agent/config.ex`. Replace the `defstruct` and `@type` blocks with:
|
||||||
|
|
||||||
|
```elixir
|
||||||
|
defstruct [
|
||||||
|
:server_url,
|
||||||
|
:token,
|
||||||
|
:host_id,
|
||||||
|
:dump_dir,
|
||||||
|
fast_seconds: 30,
|
||||||
|
medium_seconds: 300,
|
||||||
|
slow_seconds: 1800
|
||||||
|
]
|
||||||
|
|
||||||
|
@type t :: %__MODULE__{
|
||||||
|
server_url: String.t(),
|
||||||
|
token: String.t(),
|
||||||
|
host_id: String.t(),
|
||||||
|
dump_dir: String.t() | nil,
|
||||||
|
fast_seconds: pos_integer(),
|
||||||
|
medium_seconds: pos_integer(),
|
||||||
|
slow_seconds: pos_integer()
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Replace the `build/1` function with:
|
||||||
|
|
||||||
|
```elixir
|
||||||
|
defp build(map) do
|
||||||
|
intervals = Map.get(map, "intervals", %{})
|
||||||
|
debug = Map.get(map, "debug", %{})
|
||||||
|
|
||||||
|
%__MODULE__{
|
||||||
|
server_url: map["server_url"],
|
||||||
|
token: map["token"],
|
||||||
|
host_id: map["host_id"] || hostname(),
|
||||||
|
dump_dir: normalize_dump_dir(Map.get(debug, "dump_dir")),
|
||||||
|
fast_seconds: Map.get(intervals, "fast_seconds", 30),
|
||||||
|
medium_seconds: Map.get(intervals, "medium_seconds", 300),
|
||||||
|
slow_seconds: Map.get(intervals, "slow_seconds", 1800)
|
||||||
|
}
|
||||||
|
end
|
||||||
|
|
||||||
|
defp normalize_dump_dir(nil), do: nil
|
||||||
|
defp normalize_dump_dir(""), do: nil
|
||||||
|
defp normalize_dump_dir(s) when is_binary(s), do: s
|
||||||
|
defp normalize_dump_dir(_), do: nil
|
||||||
|
```
|
||||||
|
|
||||||
|
### - [ ] Step 5: Run Config tests — expect PASS
|
||||||
|
|
||||||
|
Run: `cd agent && mix test test/proxmox_agent/config_test.exs`
|
||||||
|
|
||||||
|
Expected: all tests pass.
|
||||||
|
|
||||||
|
### - [ ] Step 6: Commit
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git add agent/lib/proxmox_agent/config.ex \
|
||||||
|
agent/test/proxmox_agent/config_test.exs \
|
||||||
|
agent/test/fixtures/agent.toml
|
||||||
|
git commit -m "feat(agent): add [debug] dump_dir config field"
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Task 2: Diagnostics module — no-op API skeleton
|
||||||
|
|
||||||
|
**Files:**
|
||||||
|
- Create: `agent/lib/proxmox_agent/diagnostics.ex`
|
||||||
|
- Create: `agent/test/proxmox_agent/diagnostics_test.exs`
|
||||||
|
|
||||||
|
This task introduces the public API with a null-object behavior (no Writer yet). Logging calls succeed and silently no-op when disabled. This proves the API is safe to call from `Shell` and `Reporter` even before the Writer is wired in.
|
||||||
|
|
||||||
|
### - [ ] Step 1: Write failing tests for configure/enabled? and no-op logging
|
||||||
|
|
||||||
|
Create `agent/test/proxmox_agent/diagnostics_test.exs`:
|
||||||
|
|
||||||
|
```elixir
|
||||||
|
defmodule ProxmoxAgent.DiagnosticsTest do
|
||||||
|
use ExUnit.Case, async: false
|
||||||
|
|
||||||
|
alias ProxmoxAgent.Diagnostics
|
||||||
|
|
||||||
|
setup do
|
||||||
|
# Isolate tests: clear the env key before and after each test.
|
||||||
|
Application.delete_env(:agent, :dump_dir)
|
||||||
|
on_exit(fn -> Application.delete_env(:agent, :dump_dir) end)
|
||||||
|
:ok
|
||||||
|
end
|
||||||
|
|
||||||
|
describe "configure/1 and enabled?/0" do
|
||||||
|
test "nil disables and returns :ok" do
|
||||||
|
assert :ok = Diagnostics.configure(nil)
|
||||||
|
refute Diagnostics.enabled?()
|
||||||
|
end
|
||||||
|
|
||||||
|
test "empty string disables and returns :ok" do
|
||||||
|
assert :ok = Diagnostics.configure("")
|
||||||
|
refute Diagnostics.enabled?()
|
||||||
|
end
|
||||||
|
|
||||||
|
test "valid path creates the directory and enables" do
|
||||||
|
dir = Path.join(System.tmp_dir!(), "diag-#{System.unique_integer([:positive])}")
|
||||||
|
on_exit(fn -> File.rm_rf(dir) end)
|
||||||
|
|
||||||
|
assert :ok = Diagnostics.configure(dir)
|
||||||
|
assert Diagnostics.enabled?()
|
||||||
|
assert File.dir?(dir)
|
||||||
|
assert Application.get_env(:agent, :dump_dir) == dir
|
||||||
|
end
|
||||||
|
|
||||||
|
test "unreachable path disables (does not crash)" do
|
||||||
|
# Point at a path under a non-directory file to force mkdir_p failure.
|
||||||
|
parent = Path.join(System.tmp_dir!(), "diag-parent-#{System.unique_integer([:positive])}")
|
||||||
|
File.write!(parent, "not a directory")
|
||||||
|
dir = Path.join(parent, "child")
|
||||||
|
on_exit(fn -> File.rm_rf(parent) end)
|
||||||
|
|
||||||
|
assert :ok = Diagnostics.configure(dir)
|
||||||
|
refute Diagnostics.enabled?()
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
describe "log_command/4 and log_sample/2 (no writer running)" do
|
||||||
|
test "log_command/4 no-ops and returns :ok when disabled" do
|
||||||
|
assert :ok = Diagnostics.log_command("zpool", ["list"], {:ok, "body"}, 1_234)
|
||||||
|
end
|
||||||
|
|
||||||
|
test "log_sample/2 no-ops and returns :ok when disabled" do
|
||||||
|
assert :ok = Diagnostics.log_sample("fast", %{foo: "bar"})
|
||||||
|
end
|
||||||
|
|
||||||
|
test "log_command/4 no-ops when enabled but writer is not running" do
|
||||||
|
dir = Path.join(System.tmp_dir!(), "diag-#{System.unique_integer([:positive])}")
|
||||||
|
on_exit(fn -> File.rm_rf(dir) end)
|
||||||
|
|
||||||
|
:ok = Diagnostics.configure(dir)
|
||||||
|
# Writer is not started in this test — log should still be safe.
|
||||||
|
assert :ok = Diagnostics.log_command("zpool", ["list"], {:ok, "body"}, 1_234)
|
||||||
|
end
|
||||||
|
end
|
||||||
|
end
|
||||||
|
```
|
||||||
|
|
||||||
|
### - [ ] Step 2: Run tests — expect FAIL (module does not exist)
|
||||||
|
|
||||||
|
Run: `cd agent && mix test test/proxmox_agent/diagnostics_test.exs`
|
||||||
|
|
||||||
|
Expected: FAIL — `ProxmoxAgent.Diagnostics.__struct__/0 is undefined (module ProxmoxAgent.Diagnostics is not available)` or compilation error.
|
||||||
|
|
||||||
|
### - [ ] Step 3: Create the Diagnostics module
|
||||||
|
|
||||||
|
Create `agent/lib/proxmox_agent/diagnostics.ex`:
|
||||||
|
|
||||||
|
```elixir
|
||||||
|
defmodule ProxmoxAgent.Diagnostics do
|
||||||
|
@moduledoc """
|
||||||
|
Optional diagnostic dump of external commands and outgoing samples.
|
||||||
|
|
||||||
|
Controlled by `:dump_dir` in the application env. When unset or nil,
|
||||||
|
`log_command/4` and `log_sample/2` are no-ops and incur no I/O.
|
||||||
|
When set, they cast to `ProxmoxAgent.Diagnostics.Writer` (if running)
|
||||||
|
which serializes appends to `commands.log` and `samples.log`.
|
||||||
|
"""
|
||||||
|
|
||||||
|
require Logger
|
||||||
|
|
||||||
|
@type command_result ::
|
||||||
|
{:ok, String.t()}
|
||||||
|
| {:error, term()}
|
||||||
|
|
||||||
|
@spec configure(String.t() | nil) :: :ok
|
||||||
|
def configure(nil), do: disable()
|
||||||
|
def configure(""), do: disable()
|
||||||
|
|
||||||
|
def configure(dir) when is_binary(dir) do
|
||||||
|
case File.mkdir_p(dir) do
|
||||||
|
:ok ->
|
||||||
|
Application.put_env(:agent, :dump_dir, dir)
|
||||||
|
:ok
|
||||||
|
|
||||||
|
{:error, reason} ->
|
||||||
|
Logger.warning("diagnostics: mkdir_p #{dir} failed (#{inspect(reason)}); dumping disabled")
|
||||||
|
disable()
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
@spec enabled?() :: boolean()
|
||||||
|
def enabled?, do: not is_nil(Application.get_env(:agent, :dump_dir))
|
||||||
|
|
||||||
|
@spec dump_dir() :: String.t() | nil
|
||||||
|
def dump_dir, do: Application.get_env(:agent, :dump_dir)
|
||||||
|
|
||||||
|
@spec log_command(String.t(), [String.t()], command_result(), non_neg_integer()) :: :ok
|
||||||
|
def log_command(cmd, args, result, duration_us) do
|
||||||
|
cast({:command, cmd, args, result, duration_us})
|
||||||
|
end
|
||||||
|
|
||||||
|
@spec log_sample(String.t(), map()) :: :ok
|
||||||
|
def log_sample(kind, payload) when is_binary(kind) and is_map(payload) do
|
||||||
|
cast({:sample, kind, payload})
|
||||||
|
end
|
||||||
|
|
||||||
|
defp disable do
|
||||||
|
Application.delete_env(:agent, :dump_dir)
|
||||||
|
:ok
|
||||||
|
end
|
||||||
|
|
||||||
|
defp cast(msg) do
|
||||||
|
if enabled?() do
|
||||||
|
case Process.whereis(ProxmoxAgent.Diagnostics.Writer) do
|
||||||
|
nil -> :ok
|
||||||
|
pid -> GenServer.cast(pid, msg)
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
:ok
|
||||||
|
end
|
||||||
|
end
|
||||||
|
```
|
||||||
|
|
||||||
|
### - [ ] Step 4: Run tests — expect PASS
|
||||||
|
|
||||||
|
Run: `cd agent && mix test test/proxmox_agent/diagnostics_test.exs`
|
||||||
|
|
||||||
|
Expected: all tests pass.
|
||||||
|
|
||||||
|
### - [ ] Step 5: Commit
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git add agent/lib/proxmox_agent/diagnostics.ex \
|
||||||
|
agent/test/proxmox_agent/diagnostics_test.exs
|
||||||
|
git commit -m "feat(agent): add Diagnostics module with no-op default"
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Task 3: Diagnostics.Writer GenServer
|
||||||
|
|
||||||
|
**Files:**
|
||||||
|
- Create: `agent/lib/proxmox_agent/diagnostics/writer.ex`
|
||||||
|
- Create: `agent/test/proxmox_agent/diagnostics/writer_test.exs`
|
||||||
|
|
||||||
|
### - [ ] Step 1: Write failing tests for the Writer
|
||||||
|
|
||||||
|
Create `agent/test/proxmox_agent/diagnostics/writer_test.exs`:
|
||||||
|
|
||||||
|
```elixir
|
||||||
|
defmodule ProxmoxAgent.Diagnostics.WriterTest do
|
||||||
|
use ExUnit.Case, async: false
|
||||||
|
|
||||||
|
alias ProxmoxAgent.Diagnostics.Writer
|
||||||
|
|
||||||
|
setup do
|
||||||
|
dir = Path.join(System.tmp_dir!(), "writer-#{System.unique_integer([:positive])}")
|
||||||
|
File.mkdir_p!(dir)
|
||||||
|
{:ok, pid} = Writer.start_link(dir: dir)
|
||||||
|
|
||||||
|
on_exit(fn ->
|
||||||
|
if Process.alive?(pid), do: GenServer.stop(pid)
|
||||||
|
File.rm_rf(dir)
|
||||||
|
end)
|
||||||
|
|
||||||
|
%{dir: dir, pid: pid}
|
||||||
|
end
|
||||||
|
|
||||||
|
test "handle_cast :command appends a formatted entry to commands.log", %{dir: dir, pid: pid} do
|
||||||
|
GenServer.cast(pid, {:command, "zpool", ["list", "-j"], {:ok, "hello body"}, 1_234})
|
||||||
|
:ok = GenServer.call(pid, :flush)
|
||||||
|
|
||||||
|
body = File.read!(Path.join(dir, "commands.log"))
|
||||||
|
|
||||||
|
assert body =~ "$ zpool list -j"
|
||||||
|
assert body =~ "exit=0"
|
||||||
|
assert body =~ "duration=1ms"
|
||||||
|
assert body =~ "size=10"
|
||||||
|
assert body =~ "hello body"
|
||||||
|
assert body =~ "=====\n"
|
||||||
|
end
|
||||||
|
|
||||||
|
test "handle_cast :command records errors in place of exit", %{dir: dir, pid: pid} do
|
||||||
|
GenServer.cast(pid, {:command, "zpool", ["list"], {:error, {:enoent, "zpool"}}, 300})
|
||||||
|
:ok = GenServer.call(pid, :flush)
|
||||||
|
|
||||||
|
body = File.read!(Path.join(dir, "commands.log"))
|
||||||
|
assert body =~ "error={:enoent, \"zpool\"}"
|
||||||
|
assert body =~ "duration=0ms"
|
||||||
|
refute body =~ "exit="
|
||||||
|
end
|
||||||
|
|
||||||
|
test "handle_cast :command formats non-zero exit results", %{dir: dir, pid: pid} do
|
||||||
|
GenServer.cast(pid, {:command, "/bin/sh", ["-c", "exit 7"], {:error, {:nonzero_exit, 7, "oops"}}, 500})
|
||||||
|
:ok = GenServer.call(pid, :flush)
|
||||||
|
|
||||||
|
body = File.read!(Path.join(dir, "commands.log"))
|
||||||
|
assert body =~ "exit=7"
|
||||||
|
assert body =~ "size=4"
|
||||||
|
assert body =~ "oops"
|
||||||
|
end
|
||||||
|
|
||||||
|
test "handle_cast :sample appends pretty-printed JSON to samples.log", %{dir: dir, pid: pid} do
|
||||||
|
GenServer.cast(pid, {:sample, "fast", %{"host" => %{"load1" => 0.1}}})
|
||||||
|
:ok = GenServer.call(pid, :flush)
|
||||||
|
|
||||||
|
body = File.read!(Path.join(dir, "samples.log"))
|
||||||
|
assert body =~ "kind=fast"
|
||||||
|
assert body =~ "\"host\""
|
||||||
|
assert body =~ "\"load1\""
|
||||||
|
assert body =~ "=====\n"
|
||||||
|
end
|
||||||
|
|
||||||
|
test "multiple casts serialize and do not interleave", %{dir: dir, pid: pid} do
|
||||||
|
for i <- 1..50 do
|
||||||
|
GenServer.cast(pid, {:command, "cmd#{i}", [], {:ok, "body#{i}"}, 1})
|
||||||
|
end
|
||||||
|
|
||||||
|
:ok = GenServer.call(pid, :flush)
|
||||||
|
body = File.read!(Path.join(dir, "commands.log"))
|
||||||
|
|
||||||
|
# Every body1..body50 must appear exactly once, in order.
|
||||||
|
for i <- 1..50 do
|
||||||
|
assert body =~ "body#{i}"
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
test "terminate closes handles cleanly", %{pid: pid} do
|
||||||
|
ref = Process.monitor(pid)
|
||||||
|
GenServer.stop(pid, :normal)
|
||||||
|
assert_receive {:DOWN, ^ref, :process, ^pid, :normal}, 1000
|
||||||
|
end
|
||||||
|
end
|
||||||
|
```
|
||||||
|
|
||||||
|
The `:flush` call is a synchronous no-op whose only purpose is to wait until every previously-cast message has been handled — a common ExUnit idiom for testing cast-based GenServers.
|
||||||
|
|
||||||
|
### - [ ] Step 2: Run tests — expect FAIL
|
||||||
|
|
||||||
|
Run: `cd agent && mix test test/proxmox_agent/diagnostics/writer_test.exs`
|
||||||
|
|
||||||
|
Expected: FAIL — Writer module does not exist.
|
||||||
|
|
||||||
|
### - [ ] Step 3: Create the Writer GenServer
|
||||||
|
|
||||||
|
Create `agent/lib/proxmox_agent/diagnostics/writer.ex`:
|
||||||
|
|
||||||
|
```elixir
|
||||||
|
defmodule ProxmoxAgent.Diagnostics.Writer do
|
||||||
|
@moduledoc """
|
||||||
|
Serializes diagnostic dump writes. Owns two append-only file handles
|
||||||
|
for commands.log and samples.log under a configured directory.
|
||||||
|
|
||||||
|
Started only when the agent's dump_dir is set. All writes are cast-based
|
||||||
|
to keep the caller path free of file I/O latency.
|
||||||
|
"""
|
||||||
|
|
||||||
|
use GenServer
|
||||||
|
require Logger
|
||||||
|
|
||||||
|
@spec start_link(keyword()) :: GenServer.on_start()
|
||||||
|
def start_link(opts) do
|
||||||
|
GenServer.start_link(__MODULE__, opts, name: __MODULE__)
|
||||||
|
end
|
||||||
|
|
||||||
|
@impl true
|
||||||
|
def init(opts) do
|
||||||
|
dir = Keyword.fetch!(opts, :dir)
|
||||||
|
|
||||||
|
with {:ok, commands} <- open(Path.join(dir, "commands.log")),
|
||||||
|
{:ok, samples} <- open(Path.join(dir, "samples.log")) do
|
||||||
|
Process.flag(:trap_exit, true)
|
||||||
|
{:ok, %{dir: dir, commands: commands, samples: samples}}
|
||||||
|
else
|
||||||
|
{:error, reason} ->
|
||||||
|
{:stop, {:open_failed, reason}}
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
@impl true
|
||||||
|
def handle_cast({:command, cmd, args, result, duration_us}, state) do
|
||||||
|
write(state.commands, format_command(cmd, args, result, duration_us))
|
||||||
|
{:noreply, state}
|
||||||
|
end
|
||||||
|
|
||||||
|
def handle_cast({:sample, kind, payload}, state) do
|
||||||
|
write(state.samples, format_sample(kind, payload))
|
||||||
|
{:noreply, state}
|
||||||
|
end
|
||||||
|
|
||||||
|
@impl true
|
||||||
|
def handle_call(:flush, _from, state) do
|
||||||
|
{:reply, :ok, state}
|
||||||
|
end
|
||||||
|
|
||||||
|
@impl true
|
||||||
|
def terminate(_reason, state) do
|
||||||
|
File.close(state.commands)
|
||||||
|
File.close(state.samples)
|
||||||
|
:ok
|
||||||
|
end
|
||||||
|
|
||||||
|
# --- private ----------------------------------------------------------
|
||||||
|
|
||||||
|
defp open(path), do: File.open(path, [:append, :utf8])
|
||||||
|
|
||||||
|
defp write(io, data) do
|
||||||
|
case IO.write(io, data) do
|
||||||
|
:ok ->
|
||||||
|
:ok
|
||||||
|
|
||||||
|
{:error, reason} ->
|
||||||
|
Logger.warning("diagnostics writer: write failed (#{inspect(reason)})")
|
||||||
|
:ok
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
defp format_command(cmd, args, result, duration_us) do
|
||||||
|
header = "===== #{now_iso()} =====\n$ #{cmd} #{Enum.join(args, " ")}\n"
|
||||||
|
|
||||||
|
status =
|
||||||
|
case result do
|
||||||
|
{:ok, body} ->
|
||||||
|
"exit=0 duration=#{ms(duration_us)}ms size=#{byte_size(body)}\n\n#{body}\n\n"
|
||||||
|
|
||||||
|
{:error, {:nonzero_exit, code, body}} ->
|
||||||
|
"exit=#{code} duration=#{ms(duration_us)}ms size=#{byte_size(body)}\n\n#{body}\n\n"
|
||||||
|
|
||||||
|
{:error, reason} ->
|
||||||
|
"error=#{inspect(reason)} duration=#{ms(duration_us)}ms\n\n"
|
||||||
|
end
|
||||||
|
|
||||||
|
header <> status
|
||||||
|
end
|
||||||
|
|
||||||
|
defp format_sample(kind, payload) do
|
||||||
|
body = Jason.encode!(payload, pretty: true)
|
||||||
|
"===== #{now_iso()} kind=#{kind} =====\n#{body}\n\n"
|
||||||
|
end
|
||||||
|
|
||||||
|
defp now_iso, do: DateTime.utc_now() |> DateTime.to_iso8601()
|
||||||
|
|
||||||
|
defp ms(us) when is_integer(us), do: div(us, 1_000)
|
||||||
|
end
|
||||||
|
```
|
||||||
|
|
||||||
|
### - [ ] Step 4: Run tests — expect PASS
|
||||||
|
|
||||||
|
Run: `cd agent && mix test test/proxmox_agent/diagnostics/writer_test.exs`
|
||||||
|
|
||||||
|
Expected: all tests pass.
|
||||||
|
|
||||||
|
### - [ ] Step 5: Commit
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git add agent/lib/proxmox_agent/diagnostics/writer.ex \
|
||||||
|
agent/test/proxmox_agent/diagnostics/writer_test.exs
|
||||||
|
git commit -m "feat(agent): add Diagnostics.Writer GenServer"
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Task 4: Diagnostics end-to-end test with Writer
|
||||||
|
|
||||||
|
**Files:**
|
||||||
|
- Modify: `agent/test/proxmox_agent/diagnostics_test.exs`
|
||||||
|
|
||||||
|
This task adds a test that exercises `log_command/4` and `log_sample/2` through the real Writer, proving the cast path works and the file contents reflect the public API.
|
||||||
|
|
||||||
|
### - [ ] Step 1: Add the end-to-end tests
|
||||||
|
|
||||||
|
Append this block to `agent/test/proxmox_agent/diagnostics_test.exs` after the existing `describe` blocks:
|
||||||
|
|
||||||
|
```elixir
|
||||||
|
describe "log_command/4 and log_sample/2 with writer running" do
|
||||||
|
setup do
|
||||||
|
dir = Path.join(System.tmp_dir!(), "diag-e2e-#{System.unique_integer([:positive])}")
|
||||||
|
File.mkdir_p!(dir)
|
||||||
|
:ok = Diagnostics.configure(dir)
|
||||||
|
|
||||||
|
{:ok, pid} = ProxmoxAgent.Diagnostics.Writer.start_link(dir: dir)
|
||||||
|
|
||||||
|
on_exit(fn ->
|
||||||
|
if Process.alive?(pid), do: GenServer.stop(pid)
|
||||||
|
File.rm_rf(dir)
|
||||||
|
end)
|
||||||
|
|
||||||
|
%{dir: dir}
|
||||||
|
end
|
||||||
|
|
||||||
|
test "log_command/4 writes to commands.log", %{dir: dir} do
|
||||||
|
assert :ok = Diagnostics.log_command("zpool", ["list", "-j"], {:ok, "body"}, 2_500)
|
||||||
|
:ok = GenServer.call(ProxmoxAgent.Diagnostics.Writer, :flush)
|
||||||
|
|
||||||
|
body = File.read!(Path.join(dir, "commands.log"))
|
||||||
|
assert body =~ "$ zpool list -j"
|
||||||
|
assert body =~ "exit=0"
|
||||||
|
assert body =~ "size=4"
|
||||||
|
end
|
||||||
|
|
||||||
|
test "log_sample/2 writes to samples.log", %{dir: dir} do
|
||||||
|
assert :ok = Diagnostics.log_sample("fast", %{"zfs_pools" => %{"pools" => []}})
|
||||||
|
:ok = GenServer.call(ProxmoxAgent.Diagnostics.Writer, :flush)
|
||||||
|
|
||||||
|
body = File.read!(Path.join(dir, "samples.log"))
|
||||||
|
assert body =~ "kind=fast"
|
||||||
|
assert body =~ "\"zfs_pools\""
|
||||||
|
end
|
||||||
|
end
|
||||||
|
```
|
||||||
|
|
||||||
|
### - [ ] Step 2: Run tests — expect PASS
|
||||||
|
|
||||||
|
Run: `cd agent && mix test test/proxmox_agent/diagnostics_test.exs`
|
||||||
|
|
||||||
|
Expected: all tests pass (existing no-op tests plus the new e2e tests).
|
||||||
|
|
||||||
|
### - [ ] Step 3: Commit
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git add agent/test/proxmox_agent/diagnostics_test.exs
|
||||||
|
git commit -m "test(agent): end-to-end Diagnostics test with running Writer"
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Task 5: Shell — wrap run/2 with timing and diagnostics hook
|
||||||
|
|
||||||
|
**Files:**
|
||||||
|
- Modify: `agent/lib/proxmox_agent/shell.ex`
|
||||||
|
- Modify: `agent/test/proxmox_agent/shell_test.exs`
|
||||||
|
|
||||||
|
### - [ ] Step 1: Add a failing Shell test that asserts a dump entry
|
||||||
|
|
||||||
|
Add this test to `agent/test/proxmox_agent/shell_test.exs`:
|
||||||
|
|
||||||
|
```elixir
|
||||||
|
test "run/2 logs to Diagnostics when dump_dir is set" do
|
||||||
|
dir = Path.join(System.tmp_dir!(), "shell-dump-#{System.unique_integer([:positive])}")
|
||||||
|
File.mkdir_p!(dir)
|
||||||
|
:ok = ProxmoxAgent.Diagnostics.configure(dir)
|
||||||
|
{:ok, writer} = ProxmoxAgent.Diagnostics.Writer.start_link(dir: dir)
|
||||||
|
|
||||||
|
on_exit(fn ->
|
||||||
|
if Process.alive?(writer), do: GenServer.stop(writer)
|
||||||
|
Application.delete_env(:agent, :dump_dir)
|
||||||
|
File.rm_rf(dir)
|
||||||
|
end)
|
||||||
|
|
||||||
|
assert {:ok, _} = Shell.run("/bin/echo", ["diagnostics-test"])
|
||||||
|
:ok = GenServer.call(ProxmoxAgent.Diagnostics.Writer, :flush)
|
||||||
|
|
||||||
|
body = File.read!(Path.join(dir, "commands.log"))
|
||||||
|
assert body =~ "$ /bin/echo diagnostics-test"
|
||||||
|
assert body =~ "exit=0"
|
||||||
|
assert body =~ "diagnostics-test"
|
||||||
|
end
|
||||||
|
```
|
||||||
|
|
||||||
|
The file should also change to `use ExUnit.Case, async: false` at the top, because manipulating the application env is not safe with async tests sharing the VM:
|
||||||
|
|
||||||
|
```elixir
|
||||||
|
defmodule ProxmoxAgent.ShellTest do
|
||||||
|
use ExUnit.Case, async: false
|
||||||
|
```
|
||||||
|
|
||||||
|
### - [ ] Step 2: Run Shell tests — expect FAIL
|
||||||
|
|
||||||
|
Run: `cd agent && mix test test/proxmox_agent/shell_test.exs`
|
||||||
|
|
||||||
|
Expected: the new test fails because `Shell.run/2` does not call `Diagnostics.log_command/4`.
|
||||||
|
|
||||||
|
### - [ ] Step 3: Rewrite Shell.run/2 to time the call and dispatch to Diagnostics
|
||||||
|
|
||||||
|
Replace the entire body of `agent/lib/proxmox_agent/shell.ex` with:
|
||||||
|
|
||||||
|
```elixir
|
||||||
|
defmodule ProxmoxAgent.Shell do
|
||||||
|
@moduledoc """
|
||||||
|
Thin wrapper over System.cmd for testability. Collectors accept an optional
|
||||||
|
:runner function of this shape so tests can inject fixture-backed fakes.
|
||||||
|
"""
|
||||||
|
|
||||||
|
@type result :: {:ok, String.t()} | {:error, term()}
|
||||||
|
|
||||||
|
@spec run(String.t(), [String.t()]) :: result
|
||||||
|
def run(command, args) do
|
||||||
|
start = System.monotonic_time(:microsecond)
|
||||||
|
result = do_run(command, args)
|
||||||
|
duration_us = System.monotonic_time(:microsecond) - start
|
||||||
|
ProxmoxAgent.Diagnostics.log_command(command, args, result, duration_us)
|
||||||
|
result
|
||||||
|
end
|
||||||
|
|
||||||
|
defp do_run(command, args) do
|
||||||
|
try do
|
||||||
|
case System.cmd(command, args, stderr_to_stdout: true) do
|
||||||
|
{output, 0} -> {:ok, output}
|
||||||
|
{output, code} -> {:error, {:nonzero_exit, code, output}}
|
||||||
|
end
|
||||||
|
rescue
|
||||||
|
e in ErlangError ->
|
||||||
|
case e.original do
|
||||||
|
:enoent -> {:error, {:enoent, command}}
|
||||||
|
other -> {:error, {:system_error, other}}
|
||||||
|
end
|
||||||
|
end
|
||||||
|
end
|
||||||
|
end
|
||||||
|
```
|
||||||
|
|
||||||
|
### - [ ] Step 4: Run Shell tests — expect PASS
|
||||||
|
|
||||||
|
Run: `cd agent && mix test test/proxmox_agent/shell_test.exs`
|
||||||
|
|
||||||
|
Expected: all four Shell tests pass.
|
||||||
|
|
||||||
|
### - [ ] Step 5: Run the full agent suite to verify no regressions
|
||||||
|
|
||||||
|
Run: `cd agent && mix test`
|
||||||
|
|
||||||
|
Expected: all tests pass.
|
||||||
|
|
||||||
|
### - [ ] Step 6: Commit
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git add agent/lib/proxmox_agent/shell.ex \
|
||||||
|
agent/test/proxmox_agent/shell_test.exs
|
||||||
|
git commit -m "feat(agent): log every shell command to Diagnostics"
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Task 6: Reporter — log sample before push
|
||||||
|
|
||||||
|
**Files:**
|
||||||
|
- Modify: `agent/lib/proxmox_agent/reporter.ex`
|
||||||
|
|
||||||
|
This is a one-line addition in `push_metric/3` with no behavior change when dumping is off. The `Diagnostics.log_sample/2` path is already covered by Task 4's end-to-end test, so no new Reporter test is required.
|
||||||
|
|
||||||
|
### - [ ] Step 1: Add `log_sample/2` call in push_metric
|
||||||
|
|
||||||
|
Edit `agent/lib/proxmox_agent/reporter.ex`. Replace the existing `defp push_metric/3` with:
|
||||||
|
|
||||||
|
```elixir
|
||||||
|
defp push_metric(socket, event, data) do
|
||||||
|
payload = %{collected_at: DateTime.utc_now() |> DateTime.to_iso8601(), data: data}
|
||||||
|
kind = String.replace_prefix(event, "metric:", "")
|
||||||
|
ProxmoxAgent.Diagnostics.log_sample(kind, payload)
|
||||||
|
|
||||||
|
case push(socket, socket.assigns.topic, event, payload) do
|
||||||
|
{:ok, _ref} ->
|
||||||
|
:ok
|
||||||
|
|
||||||
|
{:error, reason} ->
|
||||||
|
Logger.warning("reporter: push #{event} failed: #{inspect(reason)}")
|
||||||
|
:ok
|
||||||
|
end
|
||||||
|
end
|
||||||
|
```
|
||||||
|
|
||||||
|
### - [ ] Step 2: Run the full agent suite to confirm no regressions
|
||||||
|
|
||||||
|
Run: `cd agent && mix test`
|
||||||
|
|
||||||
|
Expected: all tests pass.
|
||||||
|
|
||||||
|
### - [ ] Step 3: Commit
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git add agent/lib/proxmox_agent/reporter.ex
|
||||||
|
git commit -m "feat(agent): log outgoing sample to Diagnostics before push"
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Task 7: Application — configure Diagnostics and start Writer
|
||||||
|
|
||||||
|
**Files:**
|
||||||
|
- Modify: `agent/lib/proxmox_agent/application.ex`
|
||||||
|
|
||||||
|
### - [ ] Step 1: Wire Diagnostics.configure and conditionally add the Writer child
|
||||||
|
|
||||||
|
Replace the `start/2` function in `agent/lib/proxmox_agent/application.ex` with:
|
||||||
|
|
||||||
|
```elixir
|
||||||
|
@impl true
|
||||||
|
def start(_type, _args) do
|
||||||
|
children =
|
||||||
|
case load_config() do
|
||||||
|
{:ok, cfg} ->
|
||||||
|
Logger.info("agent: starting with host_id=#{cfg.host_id}")
|
||||||
|
:ok = ProxmoxAgent.Diagnostics.configure(cfg.dump_dir)
|
||||||
|
diagnostics_children(ProxmoxAgent.Diagnostics.dump_dir()) ++
|
||||||
|
[{ProxmoxAgent.Reporter, cfg}]
|
||||||
|
|
||||||
|
{:error, reason} ->
|
||||||
|
Logger.error("agent: no config loaded (#{inspect(reason)}); running in idle mode")
|
||||||
|
[]
|
||||||
|
end
|
||||||
|
|
||||||
|
Supervisor.start_link(children, strategy: :one_for_one, name: ProxmoxAgent.Supervisor)
|
||||||
|
end
|
||||||
|
|
||||||
|
defp diagnostics_children(nil), do: []
|
||||||
|
defp diagnostics_children(dir), do: [{ProxmoxAgent.Diagnostics.Writer, [dir: dir]}]
|
||||||
|
```
|
||||||
|
|
||||||
|
Rationale: we call `Diagnostics.configure/1` with the config value, which normalizes nil/empty and creates the directory if needed. We then read back `Diagnostics.dump_dir()` — this returns nil if `configure/1` failed (e.g. mkdir failed) or if the config had no dump_dir. If non-nil, the Writer is prepended to the child list so it starts before the Reporter.
|
||||||
|
|
||||||
|
### - [ ] Step 2: Run the full agent suite to confirm no regressions
|
||||||
|
|
||||||
|
Run: `cd agent && mix test`
|
||||||
|
|
||||||
|
Expected: all tests pass. (The Application itself is not exercised in unit tests — it's started by `mix test` only if explicitly required, and our existing tests do not start the real supervision tree.)
|
||||||
|
|
||||||
|
### - [ ] Step 3: Smoke-check startup with dump enabled
|
||||||
|
|
||||||
|
Create a temporary config file:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cat > /tmp/proxmox-monitor-smoke.toml <<'EOF'
|
||||||
|
server_url = "wss://localhost:9999/socket/websocket"
|
||||||
|
token = "smoke"
|
||||||
|
host_id = "smoke-host"
|
||||||
|
|
||||||
|
[debug]
|
||||||
|
dump_dir = "/tmp/proxmox-monitor-smoke-dump"
|
||||||
|
EOF
|
||||||
|
```
|
||||||
|
|
||||||
|
Start the app in the IEx shell:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd agent && AGENT_CONFIG=/tmp/proxmox-monitor-smoke.toml iex -S mix
|
||||||
|
```
|
||||||
|
|
||||||
|
In IEx, verify the Writer is alive and the directory exists:
|
||||||
|
|
||||||
|
```elixir
|
||||||
|
Process.whereis(ProxmoxAgent.Diagnostics.Writer) |> is_pid()
|
||||||
|
# expected: true
|
||||||
|
|
||||||
|
File.exists?("/tmp/proxmox-monitor-smoke-dump/commands.log")
|
||||||
|
# expected: true (may be empty if no commands ran yet)
|
||||||
|
|
||||||
|
ProxmoxAgent.Diagnostics.enabled?()
|
||||||
|
# expected: true
|
||||||
|
```
|
||||||
|
|
||||||
|
Exit IEx with `Ctrl-C Ctrl-C`. Clean up: `rm -rf /tmp/proxmox-monitor-smoke-dump /tmp/proxmox-monitor-smoke.toml`.
|
||||||
|
|
||||||
|
If IEx is not available in the environment, skip this manual smoke check and rely on the `mix test` suite.
|
||||||
|
|
||||||
|
### - [ ] Step 4: Commit
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git add agent/lib/proxmox_agent/application.ex
|
||||||
|
git commit -m "feat(agent): start Diagnostics.Writer when dump_dir is configured"
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Final verification
|
||||||
|
|
||||||
|
After all seven tasks, run the full suites on both sides:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd agent && mix test
|
||||||
|
cd ../server && mix test
|
||||||
|
```
|
||||||
|
|
||||||
|
Expected: agent has grown by the new tests (Config +3, Diagnostics +~8, Writer +5, Shell +1) and all pass; server suite is unchanged and still passes.
|
||||||
Loading…
Add table
Add a link
Reference in a new issue