proxMon/docs/superpowers/plans/2026-04-22-agent-diagnostic-dump.md
Carsten b798b462ca docs: implementation plan for agent diagnostic dump
Seven tasks: config field, Diagnostics module, Writer GenServer,
end-to-end test, Shell hook, Reporter hook, Application integration.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 22:13:19 +02:00

27 KiB

Agent Diagnostic Dump Implementation Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Add an opt-in diagnostic dump to the agent that logs every external command (with full raw output) and every outgoing metric payload, so the operator can verify source data before it hits the server.

Architecture: A new ProxmoxAgent.Diagnostics module holds the on/off state in the application env (:agent, :dump_dir). When enabled, a ProxmoxAgent.Diagnostics.Writer GenServer owns append-only file handles for commands.log and samples.log and serializes all writes. Shell.run/2 calls Diagnostics.log_command/4 after each command; Reporter.push_metric/3 calls Diagnostics.log_sample/2 before each push. When dump_dir is unset, both helpers are no-ops and the Writer is not started.

Tech Stack: Elixir / OTP, ExUnit, GenServer, Application.get_env, File / IO.write. Spec: docs/superpowers/specs/2026-04-22-agent-diagnostic-dump-design.md.


File Structure

Create:

  • agent/lib/proxmox_agent/diagnostics.ex — public API: configure/1, enabled?/0, log_command/4, log_sample/2. Thin wrapper around the Writer.
  • agent/lib/proxmox_agent/diagnostics/writer.exGenServer that owns the two file handles and serializes writes.
  • agent/test/proxmox_agent/diagnostics_test.exs — unit tests for the public API (both disabled and enabled paths).
  • agent/test/proxmox_agent/diagnostics/writer_test.exs — unit tests for the writer GenServer.

Modify:

  • agent/lib/proxmox_agent/config.ex — add :dump_dir field (String.t() | nil).
  • agent/test/proxmox_agent/config_test.exs — test [debug] dump_dir parsing.
  • agent/test/fixtures/agent.toml — add [debug] dump_dir to the fixture.
  • agent/lib/proxmox_agent/shell.ex — add timing + call Diagnostics.log_command/4.
  • agent/test/proxmox_agent/shell_test.exs — add a test that exercises the dump path.
  • agent/lib/proxmox_agent/reporter.ex — add one line calling Diagnostics.log_sample/2 before push/4.
  • agent/lib/proxmox_agent/application.ex — call Diagnostics.configure/1 and conditionally add the Writer to the supervision tree.

Task 1: Config — add dump_dir field

Files:

  • Modify: agent/lib/proxmox_agent/config.ex
  • Modify: agent/test/proxmox_agent/config_test.exs
  • Modify: agent/test/fixtures/agent.toml

- [ ] Step 1: Add dump_dir to the fixture

Replace the contents of agent/test/fixtures/agent.toml with:

server_url = "wss://monitor.example.com/socket/websocket"
token = "test_token_123"
host_id = "pve-test-01"

[intervals]
fast_seconds = 15
medium_seconds = 120
slow_seconds = 600

[debug]
dump_dir = "/tmp/proxmox-monitor-test"

- [ ] Step 2: Add a failing Config test for dump_dir

In agent/test/proxmox_agent/config_test.exs, add this test inside describe "load/1":

    test "parses [debug] dump_dir when present" do
      assert {:ok, cfg} = Config.load(@fixture)
      assert cfg.dump_dir == "/tmp/proxmox-monitor-test"
    end

    test "dump_dir is nil when [debug] is absent" do
      tmp = Path.join(System.tmp_dir!(), "agent_nodebug.toml")

      File.write!(tmp, """
      server_url = "wss://x/socket/websocket"
      token = "t"
      """)

      on_exit(fn -> File.rm(tmp) end)

      assert {:ok, cfg} = Config.load(tmp)
      assert cfg.dump_dir == nil
    end

    test "dump_dir is nil when set to empty string" do
      tmp = Path.join(System.tmp_dir!(), "agent_emptydebug.toml")

      File.write!(tmp, """
      server_url = "wss://x/socket/websocket"
      token = "t"

      [debug]
      dump_dir = ""
      """)

      on_exit(fn -> File.rm(tmp) end)

      assert {:ok, cfg} = Config.load(tmp)
      assert cfg.dump_dir == nil
    end

Update the existing "parses required fields" test to also assert on dump_dir:

    test "parses required fields" do
      assert {:ok, cfg} = Config.load(@fixture)
      assert cfg.server_url == "wss://monitor.example.com/socket/websocket"
      assert cfg.token == "test_token_123"
      assert cfg.host_id == "pve-test-01"
      assert cfg.fast_seconds == 15
      assert cfg.medium_seconds == 120
      assert cfg.slow_seconds == 600
      assert cfg.dump_dir == "/tmp/proxmox-monitor-test"
    end

- [ ] Step 3: Run Config tests — expect FAIL

Run: cd agent && mix test test/proxmox_agent/config_test.exs

Expected: at least the three new assertions fail because Config struct has no :dump_dir field and build/1 does not read [debug] dump_dir.

- [ ] Step 4: Add :dump_dir to the Config struct and build/1

Edit agent/lib/proxmox_agent/config.ex. Replace the defstruct and @type blocks with:

  defstruct [
    :server_url,
    :token,
    :host_id,
    :dump_dir,
    fast_seconds: 30,
    medium_seconds: 300,
    slow_seconds: 1800
  ]

  @type t :: %__MODULE__{
          server_url: String.t(),
          token: String.t(),
          host_id: String.t(),
          dump_dir: String.t() | nil,
          fast_seconds: pos_integer(),
          medium_seconds: pos_integer(),
          slow_seconds: pos_integer()
        }

Replace the build/1 function with:

  defp build(map) do
    intervals = Map.get(map, "intervals", %{})
    debug = Map.get(map, "debug", %{})

    %__MODULE__{
      server_url: map["server_url"],
      token: map["token"],
      host_id: map["host_id"] || hostname(),
      dump_dir: normalize_dump_dir(Map.get(debug, "dump_dir")),
      fast_seconds: Map.get(intervals, "fast_seconds", 30),
      medium_seconds: Map.get(intervals, "medium_seconds", 300),
      slow_seconds: Map.get(intervals, "slow_seconds", 1800)
    }
  end

  defp normalize_dump_dir(nil), do: nil
  defp normalize_dump_dir(""), do: nil
  defp normalize_dump_dir(s) when is_binary(s), do: s
  defp normalize_dump_dir(_), do: nil

- [ ] Step 5: Run Config tests — expect PASS

Run: cd agent && mix test test/proxmox_agent/config_test.exs

Expected: all tests pass.

- [ ] Step 6: Commit

git add agent/lib/proxmox_agent/config.ex \
        agent/test/proxmox_agent/config_test.exs \
        agent/test/fixtures/agent.toml
git commit -m "feat(agent): add [debug] dump_dir config field"

Task 2: Diagnostics module — no-op API skeleton

Files:

  • Create: agent/lib/proxmox_agent/diagnostics.ex
  • Create: agent/test/proxmox_agent/diagnostics_test.exs

This task introduces the public API with a null-object behavior (no Writer yet). Logging calls succeed and silently no-op when disabled. This proves the API is safe to call from Shell and Reporter even before the Writer is wired in.

- [ ] Step 1: Write failing tests for configure/enabled? and no-op logging

Create agent/test/proxmox_agent/diagnostics_test.exs:

defmodule ProxmoxAgent.DiagnosticsTest do
  use ExUnit.Case, async: false

  alias ProxmoxAgent.Diagnostics

  setup do
    # Isolate tests: clear the env key before and after each test.
    Application.delete_env(:agent, :dump_dir)
    on_exit(fn -> Application.delete_env(:agent, :dump_dir) end)
    :ok
  end

  describe "configure/1 and enabled?/0" do
    test "nil disables and returns :ok" do
      assert :ok = Diagnostics.configure(nil)
      refute Diagnostics.enabled?()
    end

    test "empty string disables and returns :ok" do
      assert :ok = Diagnostics.configure("")
      refute Diagnostics.enabled?()
    end

    test "valid path creates the directory and enables" do
      dir = Path.join(System.tmp_dir!(), "diag-#{System.unique_integer([:positive])}")
      on_exit(fn -> File.rm_rf(dir) end)

      assert :ok = Diagnostics.configure(dir)
      assert Diagnostics.enabled?()
      assert File.dir?(dir)
      assert Application.get_env(:agent, :dump_dir) == dir
    end

    test "unreachable path disables (does not crash)" do
      # Point at a path under a non-directory file to force mkdir_p failure.
      parent = Path.join(System.tmp_dir!(), "diag-parent-#{System.unique_integer([:positive])}")
      File.write!(parent, "not a directory")
      dir = Path.join(parent, "child")
      on_exit(fn -> File.rm_rf(parent) end)

      assert :ok = Diagnostics.configure(dir)
      refute Diagnostics.enabled?()
    end
  end

  describe "log_command/4 and log_sample/2 (no writer running)" do
    test "log_command/4 no-ops and returns :ok when disabled" do
      assert :ok = Diagnostics.log_command("zpool", ["list"], {:ok, "body"}, 1_234)
    end

    test "log_sample/2 no-ops and returns :ok when disabled" do
      assert :ok = Diagnostics.log_sample("fast", %{foo: "bar"})
    end

    test "log_command/4 no-ops when enabled but writer is not running" do
      dir = Path.join(System.tmp_dir!(), "diag-#{System.unique_integer([:positive])}")
      on_exit(fn -> File.rm_rf(dir) end)

      :ok = Diagnostics.configure(dir)
      # Writer is not started in this test — log should still be safe.
      assert :ok = Diagnostics.log_command("zpool", ["list"], {:ok, "body"}, 1_234)
    end
  end
end

- [ ] Step 2: Run tests — expect FAIL (module does not exist)

Run: cd agent && mix test test/proxmox_agent/diagnostics_test.exs

Expected: FAIL — ProxmoxAgent.Diagnostics.__struct__/0 is undefined (module ProxmoxAgent.Diagnostics is not available) or compilation error.

- [ ] Step 3: Create the Diagnostics module

Create agent/lib/proxmox_agent/diagnostics.ex:

defmodule ProxmoxAgent.Diagnostics do
  @moduledoc """
  Optional diagnostic dump of external commands and outgoing samples.

  Controlled by `:dump_dir` in the application env. When unset or nil,
  `log_command/4` and `log_sample/2` are no-ops and incur no I/O.
  When set, they cast to `ProxmoxAgent.Diagnostics.Writer` (if running)
  which serializes appends to `commands.log` and `samples.log`.
  """

  require Logger

  @type command_result ::
          {:ok, String.t()}
          | {:error, term()}

  @spec configure(String.t() | nil) :: :ok
  def configure(nil), do: disable()
  def configure(""), do: disable()

  def configure(dir) when is_binary(dir) do
    case File.mkdir_p(dir) do
      :ok ->
        Application.put_env(:agent, :dump_dir, dir)
        :ok

      {:error, reason} ->
        Logger.warning("diagnostics: mkdir_p #{dir} failed (#{inspect(reason)}); dumping disabled")
        disable()
    end
  end

  @spec enabled?() :: boolean()
  def enabled?, do: not is_nil(Application.get_env(:agent, :dump_dir))

  @spec dump_dir() :: String.t() | nil
  def dump_dir, do: Application.get_env(:agent, :dump_dir)

  @spec log_command(String.t(), [String.t()], command_result(), non_neg_integer()) :: :ok
  def log_command(cmd, args, result, duration_us) do
    cast({:command, cmd, args, result, duration_us})
  end

  @spec log_sample(String.t(), map()) :: :ok
  def log_sample(kind, payload) when is_binary(kind) and is_map(payload) do
    cast({:sample, kind, payload})
  end

  defp disable do
    Application.delete_env(:agent, :dump_dir)
    :ok
  end

  defp cast(msg) do
    if enabled?() do
      case Process.whereis(ProxmoxAgent.Diagnostics.Writer) do
        nil -> :ok
        pid -> GenServer.cast(pid, msg)
      end
    end

    :ok
  end
end

- [ ] Step 4: Run tests — expect PASS

Run: cd agent && mix test test/proxmox_agent/diagnostics_test.exs

Expected: all tests pass.

- [ ] Step 5: Commit

git add agent/lib/proxmox_agent/diagnostics.ex \
        agent/test/proxmox_agent/diagnostics_test.exs
git commit -m "feat(agent): add Diagnostics module with no-op default"

Task 3: Diagnostics.Writer GenServer

Files:

  • Create: agent/lib/proxmox_agent/diagnostics/writer.ex
  • Create: agent/test/proxmox_agent/diagnostics/writer_test.exs

- [ ] Step 1: Write failing tests for the Writer

Create agent/test/proxmox_agent/diagnostics/writer_test.exs:

defmodule ProxmoxAgent.Diagnostics.WriterTest do
  use ExUnit.Case, async: false

  alias ProxmoxAgent.Diagnostics.Writer

  setup do
    dir = Path.join(System.tmp_dir!(), "writer-#{System.unique_integer([:positive])}")
    File.mkdir_p!(dir)
    {:ok, pid} = Writer.start_link(dir: dir)

    on_exit(fn ->
      if Process.alive?(pid), do: GenServer.stop(pid)
      File.rm_rf(dir)
    end)

    %{dir: dir, pid: pid}
  end

  test "handle_cast :command appends a formatted entry to commands.log", %{dir: dir, pid: pid} do
    GenServer.cast(pid, {:command, "zpool", ["list", "-j"], {:ok, "hello body"}, 1_234})
    :ok = GenServer.call(pid, :flush)

    body = File.read!(Path.join(dir, "commands.log"))

    assert body =~ "$ zpool list -j"
    assert body =~ "exit=0"
    assert body =~ "duration=1ms"
    assert body =~ "size=10"
    assert body =~ "hello body"
    assert body =~ "=====\n"
  end

  test "handle_cast :command records errors in place of exit", %{dir: dir, pid: pid} do
    GenServer.cast(pid, {:command, "zpool", ["list"], {:error, {:enoent, "zpool"}}, 300})
    :ok = GenServer.call(pid, :flush)

    body = File.read!(Path.join(dir, "commands.log"))
    assert body =~ "error={:enoent, \"zpool\"}"
    assert body =~ "duration=0ms"
    refute body =~ "exit="
  end

  test "handle_cast :command formats non-zero exit results", %{dir: dir, pid: pid} do
    GenServer.cast(pid, {:command, "/bin/sh", ["-c", "exit 7"], {:error, {:nonzero_exit, 7, "oops"}}, 500})
    :ok = GenServer.call(pid, :flush)

    body = File.read!(Path.join(dir, "commands.log"))
    assert body =~ "exit=7"
    assert body =~ "size=4"
    assert body =~ "oops"
  end

  test "handle_cast :sample appends pretty-printed JSON to samples.log", %{dir: dir, pid: pid} do
    GenServer.cast(pid, {:sample, "fast", %{"host" => %{"load1" => 0.1}}})
    :ok = GenServer.call(pid, :flush)

    body = File.read!(Path.join(dir, "samples.log"))
    assert body =~ "kind=fast"
    assert body =~ "\"host\""
    assert body =~ "\"load1\""
    assert body =~ "=====\n"
  end

  test "multiple casts serialize and do not interleave", %{dir: dir, pid: pid} do
    for i <- 1..50 do
      GenServer.cast(pid, {:command, "cmd#{i}", [], {:ok, "body#{i}"}, 1})
    end

    :ok = GenServer.call(pid, :flush)
    body = File.read!(Path.join(dir, "commands.log"))

    # Every body1..body50 must appear exactly once, in order.
    for i <- 1..50 do
      assert body =~ "body#{i}"
    end
  end

  test "terminate closes handles cleanly", %{pid: pid} do
    ref = Process.monitor(pid)
    GenServer.stop(pid, :normal)
    assert_receive {:DOWN, ^ref, :process, ^pid, :normal}, 1000
  end
end

The :flush call is a synchronous no-op whose only purpose is to wait until every previously-cast message has been handled — a common ExUnit idiom for testing cast-based GenServers.

- [ ] Step 2: Run tests — expect FAIL

Run: cd agent && mix test test/proxmox_agent/diagnostics/writer_test.exs

Expected: FAIL — Writer module does not exist.

- [ ] Step 3: Create the Writer GenServer

Create agent/lib/proxmox_agent/diagnostics/writer.ex:

defmodule ProxmoxAgent.Diagnostics.Writer do
  @moduledoc """
  Serializes diagnostic dump writes. Owns two append-only file handles
  for commands.log and samples.log under a configured directory.

  Started only when the agent's dump_dir is set. All writes are cast-based
  to keep the caller path free of file I/O latency.
  """

  use GenServer
  require Logger

  @spec start_link(keyword()) :: GenServer.on_start()
  def start_link(opts) do
    GenServer.start_link(__MODULE__, opts, name: __MODULE__)
  end

  @impl true
  def init(opts) do
    dir = Keyword.fetch!(opts, :dir)

    with {:ok, commands} <- open(Path.join(dir, "commands.log")),
         {:ok, samples} <- open(Path.join(dir, "samples.log")) do
      Process.flag(:trap_exit, true)
      {:ok, %{dir: dir, commands: commands, samples: samples}}
    else
      {:error, reason} ->
        {:stop, {:open_failed, reason}}
    end
  end

  @impl true
  def handle_cast({:command, cmd, args, result, duration_us}, state) do
    write(state.commands, format_command(cmd, args, result, duration_us))
    {:noreply, state}
  end

  def handle_cast({:sample, kind, payload}, state) do
    write(state.samples, format_sample(kind, payload))
    {:noreply, state}
  end

  @impl true
  def handle_call(:flush, _from, state) do
    {:reply, :ok, state}
  end

  @impl true
  def terminate(_reason, state) do
    File.close(state.commands)
    File.close(state.samples)
    :ok
  end

  # --- private ----------------------------------------------------------

  defp open(path), do: File.open(path, [:append, :utf8])

  defp write(io, data) do
    case IO.write(io, data) do
      :ok ->
        :ok

      {:error, reason} ->
        Logger.warning("diagnostics writer: write failed (#{inspect(reason)})")
        :ok
    end
  end

  defp format_command(cmd, args, result, duration_us) do
    header = "===== #{now_iso()} =====\n$ #{cmd} #{Enum.join(args, " ")}\n"

    status =
      case result do
        {:ok, body} ->
          "exit=0  duration=#{ms(duration_us)}ms  size=#{byte_size(body)}\n\n#{body}\n\n"

        {:error, {:nonzero_exit, code, body}} ->
          "exit=#{code}  duration=#{ms(duration_us)}ms  size=#{byte_size(body)}\n\n#{body}\n\n"

        {:error, reason} ->
          "error=#{inspect(reason)}  duration=#{ms(duration_us)}ms\n\n"
      end

    header <> status
  end

  defp format_sample(kind, payload) do
    body = Jason.encode!(payload, pretty: true)
    "===== #{now_iso()}  kind=#{kind} =====\n#{body}\n\n"
  end

  defp now_iso, do: DateTime.utc_now() |> DateTime.to_iso8601()

  defp ms(us) when is_integer(us), do: div(us, 1_000)
end

- [ ] Step 4: Run tests — expect PASS

Run: cd agent && mix test test/proxmox_agent/diagnostics/writer_test.exs

Expected: all tests pass.

- [ ] Step 5: Commit

git add agent/lib/proxmox_agent/diagnostics/writer.ex \
        agent/test/proxmox_agent/diagnostics/writer_test.exs
git commit -m "feat(agent): add Diagnostics.Writer GenServer"

Task 4: Diagnostics end-to-end test with Writer

Files:

  • Modify: agent/test/proxmox_agent/diagnostics_test.exs

This task adds a test that exercises log_command/4 and log_sample/2 through the real Writer, proving the cast path works and the file contents reflect the public API.

- [ ] Step 1: Add the end-to-end tests

Append this block to agent/test/proxmox_agent/diagnostics_test.exs after the existing describe blocks:

  describe "log_command/4 and log_sample/2 with writer running" do
    setup do
      dir = Path.join(System.tmp_dir!(), "diag-e2e-#{System.unique_integer([:positive])}")
      File.mkdir_p!(dir)
      :ok = Diagnostics.configure(dir)

      {:ok, pid} = ProxmoxAgent.Diagnostics.Writer.start_link(dir: dir)

      on_exit(fn ->
        if Process.alive?(pid), do: GenServer.stop(pid)
        File.rm_rf(dir)
      end)

      %{dir: dir}
    end

    test "log_command/4 writes to commands.log", %{dir: dir} do
      assert :ok = Diagnostics.log_command("zpool", ["list", "-j"], {:ok, "body"}, 2_500)
      :ok = GenServer.call(ProxmoxAgent.Diagnostics.Writer, :flush)

      body = File.read!(Path.join(dir, "commands.log"))
      assert body =~ "$ zpool list -j"
      assert body =~ "exit=0"
      assert body =~ "size=4"
    end

    test "log_sample/2 writes to samples.log", %{dir: dir} do
      assert :ok = Diagnostics.log_sample("fast", %{"zfs_pools" => %{"pools" => []}})
      :ok = GenServer.call(ProxmoxAgent.Diagnostics.Writer, :flush)

      body = File.read!(Path.join(dir, "samples.log"))
      assert body =~ "kind=fast"
      assert body =~ "\"zfs_pools\""
    end
  end

- [ ] Step 2: Run tests — expect PASS

Run: cd agent && mix test test/proxmox_agent/diagnostics_test.exs

Expected: all tests pass (existing no-op tests plus the new e2e tests).

- [ ] Step 3: Commit

git add agent/test/proxmox_agent/diagnostics_test.exs
git commit -m "test(agent): end-to-end Diagnostics test with running Writer"

Task 5: Shell — wrap run/2 with timing and diagnostics hook

Files:

  • Modify: agent/lib/proxmox_agent/shell.ex
  • Modify: agent/test/proxmox_agent/shell_test.exs

- [ ] Step 1: Add a failing Shell test that asserts a dump entry

Add this test to agent/test/proxmox_agent/shell_test.exs:

  test "run/2 logs to Diagnostics when dump_dir is set" do
    dir = Path.join(System.tmp_dir!(), "shell-dump-#{System.unique_integer([:positive])}")
    File.mkdir_p!(dir)
    :ok = ProxmoxAgent.Diagnostics.configure(dir)
    {:ok, writer} = ProxmoxAgent.Diagnostics.Writer.start_link(dir: dir)

    on_exit(fn ->
      if Process.alive?(writer), do: GenServer.stop(writer)
      Application.delete_env(:agent, :dump_dir)
      File.rm_rf(dir)
    end)

    assert {:ok, _} = Shell.run("/bin/echo", ["diagnostics-test"])
    :ok = GenServer.call(ProxmoxAgent.Diagnostics.Writer, :flush)

    body = File.read!(Path.join(dir, "commands.log"))
    assert body =~ "$ /bin/echo diagnostics-test"
    assert body =~ "exit=0"
    assert body =~ "diagnostics-test"
  end

The file should also change to use ExUnit.Case, async: false at the top, because manipulating the application env is not safe with async tests sharing the VM:

defmodule ProxmoxAgent.ShellTest do
  use ExUnit.Case, async: false

- [ ] Step 2: Run Shell tests — expect FAIL

Run: cd agent && mix test test/proxmox_agent/shell_test.exs

Expected: the new test fails because Shell.run/2 does not call Diagnostics.log_command/4.

- [ ] Step 3: Rewrite Shell.run/2 to time the call and dispatch to Diagnostics

Replace the entire body of agent/lib/proxmox_agent/shell.ex with:

defmodule ProxmoxAgent.Shell do
  @moduledoc """
  Thin wrapper over System.cmd for testability. Collectors accept an optional
  :runner function of this shape so tests can inject fixture-backed fakes.
  """

  @type result :: {:ok, String.t()} | {:error, term()}

  @spec run(String.t(), [String.t()]) :: result
  def run(command, args) do
    start = System.monotonic_time(:microsecond)
    result = do_run(command, args)
    duration_us = System.monotonic_time(:microsecond) - start
    ProxmoxAgent.Diagnostics.log_command(command, args, result, duration_us)
    result
  end

  defp do_run(command, args) do
    try do
      case System.cmd(command, args, stderr_to_stdout: true) do
        {output, 0} -> {:ok, output}
        {output, code} -> {:error, {:nonzero_exit, code, output}}
      end
    rescue
      e in ErlangError ->
        case e.original do
          :enoent -> {:error, {:enoent, command}}
          other -> {:error, {:system_error, other}}
        end
    end
  end
end

- [ ] Step 4: Run Shell tests — expect PASS

Run: cd agent && mix test test/proxmox_agent/shell_test.exs

Expected: all four Shell tests pass.

- [ ] Step 5: Run the full agent suite to verify no regressions

Run: cd agent && mix test

Expected: all tests pass.

- [ ] Step 6: Commit

git add agent/lib/proxmox_agent/shell.ex \
        agent/test/proxmox_agent/shell_test.exs
git commit -m "feat(agent): log every shell command to Diagnostics"

Task 6: Reporter — log sample before push

Files:

  • Modify: agent/lib/proxmox_agent/reporter.ex

This is a one-line addition in push_metric/3 with no behavior change when dumping is off. The Diagnostics.log_sample/2 path is already covered by Task 4's end-to-end test, so no new Reporter test is required.

- [ ] Step 1: Add log_sample/2 call in push_metric

Edit agent/lib/proxmox_agent/reporter.ex. Replace the existing defp push_metric/3 with:

  defp push_metric(socket, event, data) do
    payload = %{collected_at: DateTime.utc_now() |> DateTime.to_iso8601(), data: data}
    kind = String.replace_prefix(event, "metric:", "")
    ProxmoxAgent.Diagnostics.log_sample(kind, payload)

    case push(socket, socket.assigns.topic, event, payload) do
      {:ok, _ref} ->
        :ok

      {:error, reason} ->
        Logger.warning("reporter: push #{event} failed: #{inspect(reason)}")
        :ok
    end
  end

- [ ] Step 2: Run the full agent suite to confirm no regressions

Run: cd agent && mix test

Expected: all tests pass.

- [ ] Step 3: Commit

git add agent/lib/proxmox_agent/reporter.ex
git commit -m "feat(agent): log outgoing sample to Diagnostics before push"

Task 7: Application — configure Diagnostics and start Writer

Files:

  • Modify: agent/lib/proxmox_agent/application.ex

- [ ] Step 1: Wire Diagnostics.configure and conditionally add the Writer child

Replace the start/2 function in agent/lib/proxmox_agent/application.ex with:

  @impl true
  def start(_type, _args) do
    children =
      case load_config() do
        {:ok, cfg} ->
          Logger.info("agent: starting with host_id=#{cfg.host_id}")
          :ok = ProxmoxAgent.Diagnostics.configure(cfg.dump_dir)
          diagnostics_children(ProxmoxAgent.Diagnostics.dump_dir()) ++
            [{ProxmoxAgent.Reporter, cfg}]

        {:error, reason} ->
          Logger.error("agent: no config loaded (#{inspect(reason)}); running in idle mode")
          []
      end

    Supervisor.start_link(children, strategy: :one_for_one, name: ProxmoxAgent.Supervisor)
  end

  defp diagnostics_children(nil), do: []
  defp diagnostics_children(dir), do: [{ProxmoxAgent.Diagnostics.Writer, [dir: dir]}]

Rationale: we call Diagnostics.configure/1 with the config value, which normalizes nil/empty and creates the directory if needed. We then read back Diagnostics.dump_dir() — this returns nil if configure/1 failed (e.g. mkdir failed) or if the config had no dump_dir. If non-nil, the Writer is prepended to the child list so it starts before the Reporter.

- [ ] Step 2: Run the full agent suite to confirm no regressions

Run: cd agent && mix test

Expected: all tests pass. (The Application itself is not exercised in unit tests — it's started by mix test only if explicitly required, and our existing tests do not start the real supervision tree.)

- [ ] Step 3: Smoke-check startup with dump enabled

Create a temporary config file:

cat > /tmp/proxmox-monitor-smoke.toml <<'EOF'
server_url = "wss://localhost:9999/socket/websocket"
token = "smoke"
host_id = "smoke-host"

[debug]
dump_dir = "/tmp/proxmox-monitor-smoke-dump"
EOF

Start the app in the IEx shell:

cd agent && AGENT_CONFIG=/tmp/proxmox-monitor-smoke.toml iex -S mix

In IEx, verify the Writer is alive and the directory exists:

Process.whereis(ProxmoxAgent.Diagnostics.Writer) |> is_pid()
# expected: true

File.exists?("/tmp/proxmox-monitor-smoke-dump/commands.log")
# expected: true (may be empty if no commands ran yet)

ProxmoxAgent.Diagnostics.enabled?()
# expected: true

Exit IEx with Ctrl-C Ctrl-C. Clean up: rm -rf /tmp/proxmox-monitor-smoke-dump /tmp/proxmox-monitor-smoke.toml.

If IEx is not available in the environment, skip this manual smoke check and rely on the mix test suite.

- [ ] Step 4: Commit

git add agent/lib/proxmox_agent/application.ex
git commit -m "feat(agent): start Diagnostics.Writer when dump_dir is configured"

Final verification

After all seven tasks, run the full suites on both sides:

cd agent && mix test
cd ../server && mix test

Expected: agent has grown by the new tests (Config +3, Diagnostics +~8, Writer +5, Shell +1) and all pass; server suite is unchanged and still passes.