# Phase 1 — Grundgerüst Implementation Plan > **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. **Goal:** Stand up a minimal agent+server pair where an Elixir agent running locally connects via Phoenix Channels to a Phoenix server, authenticates with a token, and pushes host CPU/RAM metrics every 30 seconds. Server logs the incoming payloads. **Architecture:** Monorepo with two independent Mix projects (`server/` Phoenix+SQLite, `agent/` plain OTP app using Slipstream). Agent initiates a persistent WSS connection, joins topic `host:`, pushes `metric:fast` events. Server persists only `hosts` in Phase 1 — metric storage lands in Phase 2. **Tech Stack:** Elixir 1.19 / OTP 28, Phoenix 1.7.14, Ecto + `ecto_sqlite3`, `bcrypt_elixir` (token hashing), `slipstream` (agent Channels client), `toml` (agent config), ExUnit. --- ## File Structure ``` proxmox_monitor/ ├── .gitignore ├── README.md ├── proxmox-monitor-konzept.md (existing) ├── docs/superpowers/plans/2026-04-21-phase1-grundgeruest.md │ ├── server/ (created by mix phx.new) │ ├── mix.exs modify: add :bcrypt_elixir │ ├── config/{config,dev,test,runtime}.exs scaffolded │ ├── priv/repo/migrations/_create_hosts.exs create │ ├── lib/server/application.ex scaffolded │ ├── lib/server/repo.ex scaffolded │ ├── lib/server/schema/host.ex create │ ├── lib/server/hosts.ex create (context) │ ├── lib/server_web/endpoint.ex modify: add agent socket │ ├── lib/server_web/channels/agent_socket.ex create │ ├── lib/server_web/channels/host_channel.ex create │ ├── test/server/hosts_test.exs create │ └── test/server_web/channels/host_channel_test.exs create │ └── agent/ (created by mix new --sup) ├── mix.exs modify: deps + app config ├── config/config.exs create ├── config/runtime.exs create ├── lib/agent.ex scaffolded ├── lib/agent/application.ex modify ├── lib/agent/config.ex create ├── lib/agent/collectors/host.ex create ├── lib/agent/reporter.ex create ├── test/agent/config_test.exs create ├── test/agent/collectors/host_test.exs create └── test/fixtures/proc/ create (loadavg, meminfo, stat samples) ``` Each file has one responsibility: schema, context (business logic), channel (transport), collector (data acquisition), reporter (transmission). Test files mirror the source tree. --- ## Task 1: Monorepo Init **Files:** - Create: `.gitignore` - Create: `README.md` - [ ] **Step 1: Write `.gitignore` (covers both Mix projects)** ``` # Elixir/Mix /server/_build/ /server/deps/ /server/cover/ /server/doc/ /server/.fetch /server/erl_crash.dump /server/*.ez /server/priv/static/assets/ /server/priv/static/cache_manifest.json /server/*.db /server/*.db-journal /server/*.db-wal /server/*.db-shm /agent/_build/ /agent/deps/ /agent/cover/ /agent/doc/ /agent/.fetch /agent/erl_crash.dump /agent/*.ez # Editors / OS .DS_Store .vscode/ .idea/ ``` - [ ] **Step 2: Write `README.md` (minimal)** ```markdown # Proxmox Monitor Agent-Server monitoring for Proxmox hosts. Elixir/OTP. See `proxmox-monitor-konzept.md`. - `server/` — Phoenix + SQLite + LiveView - `agent/` — Slipstream Channels client, deploys as Burrito binary Phase 1 focuses on end-to-end metric push. Later phases add ZFS/VM collectors, persistence, LiveView dashboard. ``` - [ ] **Step 3: Initial commit** ```bash git add .gitignore README.md proxmox-monitor-konzept.md docs/ git commit -m "chore: project skeleton + phase-1 plan" ``` --- ## Task 2: Server — Phoenix Bootstrap **Files:** - Create: entire `server/` tree via `mix phx.new` - [ ] **Step 1: Generate Phoenix project** Run from `/Users/cabele/claudeprojects/proxmox_monitor`: ```bash mix phx.new server --database sqlite3 --no-mailer --no-gettext --live --install ``` If prompted, answer `Y` to fetch deps. Expected: creates `server/` with Phoenix scaffold, SQLite adapter, LiveView enabled, no Gettext, no Mailer. Deps fetched, assets installed. - [ ] **Step 2: Verify scaffold builds and tests pass** ```bash cd server && mix compile && mix test ``` Expected: compiles clean, default `PageControllerTest` passes. - [ ] **Step 3: Commit the scaffold** ```bash cd /Users/cabele/claudeprojects/proxmox_monitor git add server/ git commit -m "feat(server): phoenix 1.7 scaffold with sqlite + liveview" ``` --- ## Task 3: Server — Bcrypt Dependency **Files:** - Modify: `server/mix.exs` - [ ] **Step 1: Add `:bcrypt_elixir` to deps** In `server/mix.exs`, locate the `defp deps do` list and add the line below alongside existing entries: ```elixir {:bcrypt_elixir, "~> 3.1"}, ``` - [ ] **Step 2: Fetch and compile** ```bash cd server && mix deps.get && mix compile ``` Expected: bcrypt_elixir and cc_precompiler fetched; compile succeeds (bcrypt NIF builds). - [ ] **Step 3: Commit** ```bash git add server/mix.exs server/mix.lock git commit -m "feat(server): add bcrypt_elixir for token hashing" ``` --- ## Task 4: Server — Host Schema + Context (TDD) **Files:** - Create: `server/priv/repo/migrations/_create_hosts.exs` - Create: `server/lib/server/schema/host.ex` - Create: `server/lib/server/hosts.ex` - Create: `server/test/server/hosts_test.exs` - [ ] **Step 1: Generate migration file** ```bash cd server && mix ecto.gen.migration create_hosts ``` Fill the generated file (timestamped name) with: ```elixir defmodule Server.Repo.Migrations.CreateHosts do use Ecto.Migration def change do create table(:hosts) do add :name, :string, null: false add :token_hash, :string, null: false add :agent_version, :string add :proxmox_version, :string add :zfs_version, :string add :status, :string, null: false, default: "never_connected" add :last_seen_at, :utc_datetime_usec timestamps(type: :utc_datetime_usec) end create unique_index(:hosts, [:name]) end end ``` - [ ] **Step 2: Write schema module** Create `server/lib/server/schema/host.ex`: ```elixir defmodule Server.Schema.Host do use Ecto.Schema import Ecto.Changeset @statuses ~w(never_connected online offline) schema "hosts" do field :name, :string field :token_hash, :string field :agent_version, :string field :proxmox_version, :string field :zfs_version, :string field :status, :string, default: "never_connected" field :last_seen_at, :utc_datetime_usec timestamps(type: :utc_datetime_usec) end def create_changeset(host, attrs) do host |> cast(attrs, [:name, :token_hash]) |> validate_required([:name, :token_hash]) |> validate_length(:name, min: 1, max: 100) |> unique_constraint(:name) end def status_changeset(host, attrs) do host |> cast(attrs, [:status, :last_seen_at, :agent_version]) |> validate_inclusion(:status, @statuses) end end ``` - [ ] **Step 3: Write failing tests for the context** Create `server/test/server/hosts_test.exs`: ```elixir defmodule Server.HostsTest do use Server.DataCase, async: true alias Server.Hosts describe "create_host/1" do test "returns host and a plaintext token on success" do assert {:ok, {host, token}} = Hosts.create_host("pve-01") assert host.name == "pve-01" assert host.status == "never_connected" assert is_binary(token) and byte_size(token) >= 32 refute host.token_hash == token end test "rejects duplicate names" do {:ok, _} = Hosts.create_host("pve-01") assert {:error, changeset} = Hosts.create_host("pve-01") assert %{name: ["has already been taken"]} = errors_on(changeset) end end describe "authenticate/2" do test "returns host for valid name+token" do {:ok, {host, token}} = Hosts.create_host("pve-01") assert {:ok, found} = Hosts.authenticate("pve-01", token) assert found.id == host.id end test "returns :invalid_token for wrong token" do {:ok, {_host, _token}} = Hosts.create_host("pve-01") assert {:error, :invalid_token} = Hosts.authenticate("pve-01", "wrong") end test "returns :unknown_host when name does not exist" do assert {:error, :unknown_host} = Hosts.authenticate("nope", "whatever") end end describe "mark_online/2 and mark_offline/1" do test "mark_online stamps status, last_seen_at, agent_version" do {:ok, {host, _}} = Hosts.create_host("pve-01") assert {:ok, updated} = Hosts.mark_online(host, "0.1.0") assert updated.status == "online" assert updated.agent_version == "0.1.0" assert updated.last_seen_at != nil end test "mark_offline sets status to offline" do {:ok, {host, _}} = Hosts.create_host("pve-01") {:ok, online} = Hosts.mark_online(host, "0.1.0") assert {:ok, offline} = Hosts.mark_offline(online) assert offline.status == "offline" end end end ``` - [ ] **Step 4: Run tests — expect failure** ```bash cd server && mix test test/server/hosts_test.exs ``` Expected: compile error `Server.Hosts is not available` or similar. - [ ] **Step 5: Implement the context** Create `server/lib/server/hosts.ex`: ```elixir defmodule Server.Hosts do @moduledoc "Host registration, authentication, status tracking." alias Server.Repo alias Server.Schema.Host @spec create_host(String.t()) :: {:ok, {Host.t(), String.t()}} | {:error, Ecto.Changeset.t()} def create_host(name) do token = generate_token() hash = Bcrypt.hash_pwd_salt(token) %Host{} |> Host.create_changeset(%{name: name, token_hash: hash}) |> Repo.insert() |> case do {:ok, host} -> {:ok, {host, token}} {:error, cs} -> {:error, cs} end end @spec authenticate(String.t(), String.t()) :: {:ok, Host.t()} | {:error, :unknown_host | :invalid_token} def authenticate(name, token) when is_binary(name) and is_binary(token) do case Repo.get_by(Host, name: name) do nil -> Bcrypt.no_user_verify() {:error, :unknown_host} host -> if Bcrypt.verify_pass(token, host.token_hash) do {:ok, host} else {:error, :invalid_token} end end end @spec mark_online(Host.t(), String.t() | nil) :: {:ok, Host.t()} | {:error, Ecto.Changeset.t()} def mark_online(%Host{} = host, agent_version) do host |> Host.status_changeset(%{ status: "online", last_seen_at: DateTime.utc_now(), agent_version: agent_version }) |> Repo.update() end @spec mark_offline(Host.t()) :: {:ok, Host.t()} | {:error, Ecto.Changeset.t()} def mark_offline(%Host{} = host) do host |> Host.status_changeset(%{status: "offline"}) |> Repo.update() end @doc "Mark every host offline — called on server boot to clear stale online flags." @spec mark_all_offline() :: {integer(), nil} def mark_all_offline do import Ecto.Query Repo.update_all(from(h in Host), set: [status: "offline", updated_at: DateTime.utc_now()]) end defp generate_token do :crypto.strong_rand_bytes(32) |> Base.url_encode64(padding: false) end end ``` - [ ] **Step 6: Speed up bcrypt in tests** In `server/config/test.exs`, add at the bottom (before the existing `config :phoenix` line if present, or anywhere at top level): ```elixir config :bcrypt_elixir, :log_rounds, 4 ``` - [ ] **Step 7: Run tests — expect all pass** ```bash cd server && mix ecto.reset && mix test test/server/hosts_test.exs ``` Expected: 7 tests pass. - [ ] **Step 8: Commit** ```bash git add server/priv server/lib/server server/test/server server/config/test.exs git commit -m "feat(server): host schema, context, auth, status transitions" ``` --- ## Task 5: Server — AgentSocket + Mark-All-Offline on Boot **Files:** - Create: `server/lib/server_web/channels/agent_socket.ex` - Modify: `server/lib/server_web/endpoint.ex` - Modify: `server/lib/server/application.ex` - [ ] **Step 1: Write AgentSocket** Create `server/lib/server_web/channels/agent_socket.ex`: ```elixir defmodule ServerWeb.AgentSocket do @moduledoc "Entry socket for agents. Actual authentication happens in HostChannel.join/3." use Phoenix.Socket channel "host:*", ServerWeb.HostChannel @impl true def connect(_params, socket, _connect_info), do: {:ok, socket} @impl true def id(_socket), do: nil end ``` - [ ] **Step 2: Mount the socket in the endpoint** In `server/lib/server_web/endpoint.ex`, find the existing `socket "/live"` line and add just below it: ```elixir socket "/socket", ServerWeb.AgentSocket, websocket: [timeout: 45_000], longpoll: false ``` - [ ] **Step 3: Clear stale online flags on boot** In `server/lib/server/application.ex`, find the existing `start/2` function. It currently ends with something like: ```elixir opts = [strategy: :one_for_one, name: Server.Supervisor] Supervisor.start_link(children, opts) end ``` Replace those two lines with: ```elixir opts = [strategy: :one_for_one, name: Server.Supervisor] result = Supervisor.start_link(children, opts) with {:ok, _} <- result, do: Server.Hosts.mark_all_offline() result end ``` Rationale: if the server is restarted while agents were connected, their `online` row persists stale. Marking everything offline on boot lets the agent's next channel join flip it back to `online` cleanly. - [ ] **Step 4: Compile to verify** ```bash cd server && mix compile ``` Expected: no warnings about undefined `ServerWeb.HostChannel` (module exists as channel ref only; we'll create it next task — note this is acceptable because `channel/2` only registers the name). - [ ] **Step 5: Commit** ```bash git add server/lib/server_web/channels/agent_socket.ex server/lib/server_web/endpoint.ex server/lib/server/application.ex git commit -m "feat(server): agent socket endpoint, clear online status on boot" ``` --- ## Task 6: Server — HostChannel (TDD) **Files:** - Create: `server/lib/server_web/channels/host_channel.ex` - Create: `server/test/server_web/channels/host_channel_test.exs` - Modify: `server/test/support/channel_case.ex` (verify it exists; Phoenix scaffold creates it) - [ ] **Step 1: Confirm ChannelCase exists** ```bash ls server/test/support/channel_case.ex ``` Expected: file exists (`Phoenix 1.7 --live` scaffold creates it). If missing, skip this check and note — ChannelCase is required for the tests below. - [ ] **Step 2: Write failing channel tests** Create `server/test/server_web/channels/host_channel_test.exs`: ```elixir defmodule ServerWeb.HostChannelTest do use ServerWeb.ChannelCase, async: false alias Server.Hosts alias ServerWeb.AgentSocket setup do {:ok, {host, token}} = Hosts.create_host("pve-01") %{host: host, token: token} end describe "join" do test "succeeds with valid token and marks host online", %{host: host, token: token} do {:ok, socket} = connect(AgentSocket, %{}) assert {:ok, _reply, socket} = subscribe_and_join(socket, "host:pve-01", %{ "token" => token, "agent_version" => "0.1.0" }) assert socket.assigns.host_id == host.id reloaded = Server.Repo.reload!(host) assert reloaded.status == "online" assert reloaded.agent_version == "0.1.0" assert reloaded.last_seen_at != nil end test "rejects invalid token", %{host: _host} do {:ok, socket} = connect(AgentSocket, %{}) assert {:error, %{reason: "invalid_token"}} = subscribe_and_join(socket, "host:pve-01", %{ "token" => "garbage", "agent_version" => "0.1.0" }) end test "rejects unknown host name" do {:ok, socket} = connect(AgentSocket, %{}) assert {:error, %{reason: "unknown_host"}} = subscribe_and_join(socket, "host:nope", %{ "token" => "x", "agent_version" => "0.1.0" }) end test "rejects topic mismatch" do {:ok, socket} = connect(AgentSocket, %{}) assert {:error, %{reason: "bad_topic"}} = subscribe_and_join(socket, "host:", %{"token" => "x", "agent_version" => "0.1.0"}) end end describe "metric:fast event" do setup %{token: token} do {:ok, socket} = connect(AgentSocket, %{}) {:ok, _reply, joined} = subscribe_and_join(socket, "host:pve-01", %{ "token" => token, "agent_version" => "0.1.0" }) %{socket: joined} end test "accepts metric payload and replies :ok", %{socket: socket} do ref = push(socket, "metric:fast", %{ "collected_at" => "2026-04-21T12:00:00Z", "data" => %{"cpu_percent" => 12.3, "load1" => 0.2} }) assert_reply ref, :ok end end describe "terminate" do test "marks host offline when channel process exits", %{host: host, token: token} do {:ok, socket} = connect(AgentSocket, %{}) {:ok, _, joined} = subscribe_and_join(socket, "host:pve-01", %{ "token" => token, "agent_version" => "0.1.0" }) Process.unlink(joined.channel_pid) ref = Process.monitor(joined.channel_pid) close(joined) assert_receive {:DOWN, ^ref, :process, _, _}, 1_000 reloaded = Server.Repo.reload!(host) assert reloaded.status == "offline" end end end ``` - [ ] **Step 3: Run tests — expect failure (HostChannel not implemented)** ```bash cd server && mix test test/server_web/channels/host_channel_test.exs ``` Expected: compile error `ServerWeb.HostChannel is not available`. - [ ] **Step 4: Implement HostChannel** Create `server/lib/server_web/channels/host_channel.ex`: ```elixir defmodule ServerWeb.HostChannel do use ServerWeb, :channel require Logger alias Server.Hosts @impl true def join("host:" <> name, params, socket) when name != "" do token = Map.get(params, "token", "") agent_version = Map.get(params, "agent_version") case Hosts.authenticate(name, token) do {:ok, host} -> {:ok, _} = Hosts.mark_online(host, agent_version) Logger.info("agent joined host:#{name}") {:ok, assign(socket, :host_id, host.id) |> assign(:host_name, name)} {:error, :unknown_host} -> {:error, %{reason: "unknown_host"}} {:error, :invalid_token} -> {:error, %{reason: "invalid_token"}} end end def join(_topic, _params, _socket), do: {:error, %{reason: "bad_topic"}} @impl true def handle_in("metric:fast", payload, socket) do Logger.info("metric:fast host=#{socket.assigns.host_name} data=#{inspect(payload["data"])}") {:reply, :ok, socket} end def handle_in("metric:medium", payload, socket) do Logger.info("metric:medium host=#{socket.assigns.host_name} payload=#{inspect(payload)}") {:reply, :ok, socket} end def handle_in("metric:slow", payload, socket) do Logger.info("metric:slow host=#{socket.assigns.host_name} payload=#{inspect(payload)}") {:reply, :ok, socket} end @impl true def terminate(_reason, socket) do case socket.assigns[:host_id] do nil -> :ok id -> with host when not is_nil(host) <- Server.Repo.get(Server.Schema.Host, id) do Hosts.mark_offline(host) end :ok end end end ``` - [ ] **Step 5: Run tests — expect pass** ```bash cd server && mix test test/server_web/channels/host_channel_test.exs ``` Expected: all tests pass. - [ ] **Step 6: Run full test suite** ```bash cd server && mix test ``` Expected: all tests green. - [ ] **Step 7: Commit** ```bash git add server/lib/server_web/channels/host_channel.ex server/test/server_web/channels/host_channel_test.exs git commit -m "feat(server): host channel with token auth and metric events" ``` --- ## Task 7: Server — Smoke-Test Helper **Files:** - Create: `server/lib/server/release.ex` (minimal helper for IEx-driven host creation) - [ ] **Step 1: Add a tiny release helper** Create `server/lib/server/release.ex`: ```elixir defmodule Server.Release do @moduledoc "Convenience functions for IEx and future release tasks." @doc "Create a host and print the plaintext token once." def register_host(name) do case Server.Hosts.create_host(name) do {:ok, {host, token}} -> IO.puts("Host '#{host.name}' registered (id=#{host.id}).") IO.puts("TOKEN: #{token}") IO.puts("Store this token NOW — it will never be shown again.") {:ok, host, token} {:error, cs} -> IO.puts("Failed to register host: #{inspect(cs.errors)}") {:error, cs} end end end ``` - [ ] **Step 2: Compile** ```bash cd server && mix compile ``` - [ ] **Step 3: Commit** ```bash git add server/lib/server/release.ex git commit -m "chore(server): iex helper for host registration" ``` --- ## Task 8: Agent — Mix Project Bootstrap **Files:** - Create: `agent/` directory tree via `mix new` - [ ] **Step 1: Generate the OTP app** Run from `/Users/cabele/claudeprojects/proxmox_monitor`: ```bash mix new agent --sup ``` Expected: creates `agent/` with `mix.exs`, `lib/agent.ex`, `lib/agent/application.ex`, `test/`. - [ ] **Step 2: Replace `agent/mix.exs` contents** Open `agent/mix.exs` and replace with: ```elixir defmodule Agent.MixProject do use Mix.Project @version "0.1.0" def project do [ app: :agent, version: @version, elixir: "~> 1.17", start_permanent: Mix.env() == :prod, deps: deps(), elixirc_paths: elixirc_paths(Mix.env()) ] end def application do [ extra_applications: [:logger, :crypto], mod: {Agent.Application, []} ] end defp deps do [ {:slipstream, "~> 1.1"}, {:jason, "~> 1.4"}, {:toml, "~> 0.7"} ] end defp elixirc_paths(:test), do: ["lib", "test/support"] defp elixirc_paths(_), do: ["lib"] end ``` - [ ] **Step 3: Fetch deps and compile** ```bash cd agent && mix deps.get && mix compile ``` Expected: slipstream, mint_web_socket, jason, toml fetched; compile succeeds. - [ ] **Step 4: Commit** ```bash cd /Users/cabele/claudeprojects/proxmox_monitor git add agent/ git commit -m "feat(agent): otp app scaffold with slipstream + toml deps" ``` --- ## Task 9: Agent — Version Constant **Files:** - Modify: `agent/lib/agent.ex` - [ ] **Step 1: Replace the scaffolded Agent module** Replace the entire contents of `agent/lib/agent.ex` with: ```elixir defmodule Agent do @moduledoc "Top-level namespace. Exposes the compiled version for reporting." @version Mix.Project.config()[:version] @spec version() :: String.t() def version, do: @version end ``` - [ ] **Step 2: Compile and quick-check in IEx** ```bash cd agent && mix compile ``` - [ ] **Step 3: Commit** ```bash git add agent/lib/agent.ex git commit -m "feat(agent): expose compile-time version" ``` --- ## Task 10: Agent — Config Module (TDD) **Files:** - Create: `agent/lib/agent/config.ex` - Create: `agent/test/agent/config_test.exs` - Create: `agent/test/fixtures/agent.toml` (sample config used by test) - [ ] **Step 1: Write a fixture config** Create `agent/test/fixtures/agent.toml`: ```toml server_url = "wss://monitor.example.com/socket/websocket" token = "test_token_123" host_id = "pve-test-01" [intervals] fast_seconds = 15 medium_seconds = 120 slow_seconds = 600 ``` - [ ] **Step 2: Write failing tests** Create `agent/test/agent/config_test.exs`: ```elixir defmodule Agent.ConfigTest do use ExUnit.Case, async: true alias Agent.Config @fixture Path.expand("../fixtures/agent.toml", __DIR__) describe "load/1" do test "parses required fields" do assert {:ok, cfg} = Config.load(@fixture) assert cfg.server_url == "wss://monitor.example.com/socket/websocket" assert cfg.token == "test_token_123" assert cfg.host_id == "pve-test-01" assert cfg.fast_seconds == 15 assert cfg.medium_seconds == 120 assert cfg.slow_seconds == 600 end test "returns error for missing file" do assert {:error, {:file_read, _}} = Config.load("/does/not/exist.toml") end test "defaults host_id to system hostname when absent" do tmp = Path.join(System.tmp_dir!(), "agent_nohost.toml") File.write!(tmp, """ server_url = "wss://x/socket/websocket" token = "t" """) on_exit(fn -> File.rm(tmp) end) assert {:ok, cfg} = Config.load(tmp) assert is_binary(cfg.host_id) assert cfg.host_id != "" end test "applies default intervals when [intervals] is absent" do tmp = Path.join(System.tmp_dir!(), "agent_nointervals.toml") File.write!(tmp, """ server_url = "wss://x/socket/websocket" token = "t" host_id = "h" """) on_exit(fn -> File.rm(tmp) end) assert {:ok, cfg} = Config.load(tmp) assert cfg.fast_seconds == 30 assert cfg.medium_seconds == 300 assert cfg.slow_seconds == 1800 end test "returns error when required keys missing" do tmp = Path.join(System.tmp_dir!(), "agent_bad.toml") File.write!(tmp, "token = \"t\"\n") on_exit(fn -> File.rm(tmp) end) assert {:error, {:missing_key, :server_url}} = Config.load(tmp) end end end ``` - [ ] **Step 3: Run tests — expect failure** ```bash cd agent && mix test test/agent/config_test.exs ``` Expected: `Agent.Config is not available`. - [ ] **Step 4: Implement the config loader** Create `agent/lib/agent/config.ex`: ```elixir defmodule Agent.Config do @moduledoc "Loads and validates the TOML agent config." defstruct [ :server_url, :token, :host_id, fast_seconds: 30, medium_seconds: 300, slow_seconds: 1800 ] @type t :: %__MODULE__{ server_url: String.t(), token: String.t(), host_id: String.t(), fast_seconds: pos_integer(), medium_seconds: pos_integer(), slow_seconds: pos_integer() } @required ~w(server_url token)a @spec load(Path.t()) :: {:ok, t()} | {:error, {:file_read, term()} | {:parse, term()} | {:missing_key, atom()}} def load(path) do with {:ok, body} <- read_file(path), {:ok, parsed} <- parse_toml(body), :ok <- validate_required(parsed) do {:ok, build(parsed)} end end defp read_file(path) do case File.read(path) do {:ok, body} -> {:ok, body} {:error, reason} -> {:error, {:file_read, reason}} end end defp parse_toml(body) do case Toml.decode(body) do {:ok, map} -> {:ok, map} {:error, reason} -> {:error, {:parse, reason}} end end defp validate_required(map) do Enum.find_value(@required, :ok, fn key -> case Map.get(map, Atom.to_string(key)) do v when is_binary(v) and v != "" -> nil _ -> {:error, {:missing_key, key}} end end) end defp build(map) do intervals = Map.get(map, "intervals", %{}) %__MODULE__{ server_url: map["server_url"], token: map["token"], host_id: map["host_id"] || hostname(), fast_seconds: Map.get(intervals, "fast_seconds", 30), medium_seconds: Map.get(intervals, "medium_seconds", 300), slow_seconds: Map.get(intervals, "slow_seconds", 1800) } end defp hostname do case :inet.gethostname() do {:ok, name} -> List.to_string(name) _ -> "unknown-host" end end end ``` - [ ] **Step 5: Run tests — expect pass** ```bash cd agent && mix test test/agent/config_test.exs ``` Expected: 5 tests pass. - [ ] **Step 6: Commit** ```bash git add agent/lib/agent/config.ex agent/test/agent/config_test.exs agent/test/fixtures/agent.toml git commit -m "feat(agent): toml config loader with defaults and validation" ``` --- ## Task 11: Agent — Host Collector (TDD with /proc fixtures) **Files:** - Create: `agent/lib/agent/collectors/host.ex` - Create: `agent/test/agent/collectors/host_test.exs` - Create: `agent/test/fixtures/proc/loadavg` - Create: `agent/test/fixtures/proc/meminfo` - Create: `agent/test/fixtures/proc/uptime` The collector reads Linux `/proc`. Tests run on macOS too — they point the collector at fixture files instead. - [ ] **Step 1: Write fixture files** Create `agent/test/fixtures/proc/loadavg`: ``` 0.42 0.55 0.31 3/512 12345 ``` Create `agent/test/fixtures/proc/meminfo`: ``` MemTotal: 16384000 kB MemFree: 2048000 kB MemAvailable: 8192000 kB Buffers: 256000 kB Cached: 4096000 kB SwapTotal: 4194304 kB SwapFree: 4194304 kB ``` Create `agent/test/fixtures/proc/uptime`: ``` 123456.78 987654.32 ``` - [ ] **Step 2: Write failing tests** Create `agent/test/agent/collectors/host_test.exs`: ```elixir defmodule Agent.Collectors.HostTest do use ExUnit.Case, async: true alias Agent.Collectors.Host @proc Path.expand("../../fixtures/proc", __DIR__) test "collects load average" do sample = Host.collect(proc_dir: @proc) assert sample.load1 == 0.42 assert sample.load5 == 0.55 assert sample.load15 == 0.31 end test "collects memory in bytes" do sample = Host.collect(proc_dir: @proc) assert sample.mem_total_bytes == 16_384_000 * 1024 assert sample.mem_available_bytes == 8_192_000 * 1024 assert sample.mem_used_bytes == sample.mem_total_bytes - sample.mem_available_bytes end test "collects uptime seconds" do sample = Host.collect(proc_dir: @proc) assert sample.uptime_seconds == 123_456 end test "includes hostname string" do sample = Host.collect(proc_dir: @proc) assert is_binary(sample.hostname) assert sample.hostname != "" end test "missing proc files yield :error field, not a crash" do sample = Host.collect(proc_dir: "/nonexistent/path/xyz") assert sample.errors != [] end end ``` - [ ] **Step 3: Run tests — expect failure** ```bash cd agent && mix test test/agent/collectors/host_test.exs ``` Expected: `Agent.Collectors.Host is not available`. - [ ] **Step 4: Implement collector** Create `agent/lib/agent/collectors/host.ex`: ```elixir defmodule Agent.Collectors.Host do @moduledoc """ Reads host metrics from /proc. Accepts `proc_dir:` option for testability. Never raises — on read failure, populates `:errors` and leaves the field nil. """ @type sample :: %{ hostname: String.t(), load1: float() | nil, load5: float() | nil, load15: float() | nil, mem_total_bytes: non_neg_integer() | nil, mem_available_bytes: non_neg_integer() | nil, mem_used_bytes: non_neg_integer() | nil, uptime_seconds: non_neg_integer() | nil, errors: [term()] } @spec collect(keyword()) :: sample() def collect(opts \\ []) do proc_dir = Keyword.get(opts, :proc_dir, "/proc") {load, e1} = safe(&read_loadavg/1, [proc_dir], {nil, nil, nil}) {mem, e2} = safe(&read_meminfo/1, [proc_dir], %{total: nil, available: nil}) {uptime, e3} = safe(&read_uptime/1, [proc_dir], nil) total = mem.total avail = mem.available used = if total && avail, do: total - avail, else: nil {load1, load5, load15} = load %{ hostname: hostname(), load1: load1, load5: load5, load15: load15, mem_total_bytes: total, mem_available_bytes: avail, mem_used_bytes: used, uptime_seconds: uptime, errors: Enum.filter([e1, e2, e3], & &1) } end defp safe(fun, args, fallback) do try do {apply(fun, args), nil} rescue e -> {fallback, {fun_name(fun), Exception.message(e)}} catch :error, reason -> {fallback, {fun_name(fun), reason}} end end defp fun_name(fun), do: Function.info(fun)[:name] defp read_loadavg(proc_dir) do body = File.read!(Path.join(proc_dir, "loadavg")) [l1, l5, l15 | _] = String.split(body, ~r/\s+/, trim: true) {to_float(l1), to_float(l5), to_float(l15)} end defp read_meminfo(proc_dir) do body = File.read!(Path.join(proc_dir, "meminfo")) parsed = body |> String.split("\n", trim: true) |> Enum.reduce(%{}, fn line, acc -> case String.split(line, ~r/:\s+/, parts: 2) do [key, val] -> Map.put(acc, key, val) _ -> acc end end) %{ total: kb_to_bytes(parsed["MemTotal"]), available: kb_to_bytes(parsed["MemAvailable"]) } end defp read_uptime(proc_dir) do body = File.read!(Path.join(proc_dir, "uptime")) [secs | _] = String.split(body, " ", trim: true) secs |> to_float() |> trunc() end defp kb_to_bytes(nil), do: nil defp kb_to_bytes(str) do case Regex.run(~r/(\d+)\s*kB/, str) do [_, kb] -> String.to_integer(kb) * 1024 _ -> nil end end defp to_float(s) do {f, _} = Float.parse(s) f end defp hostname do case :inet.gethostname() do {:ok, name} -> List.to_string(name) _ -> "unknown-host" end end end ``` - [ ] **Step 5: Run tests — expect pass** ```bash cd agent && mix test test/agent/collectors/host_test.exs ``` Expected: 5 tests pass. - [ ] **Step 6: Commit** ```bash git add agent/lib/agent/collectors agent/test/agent/collectors agent/test/fixtures/proc git commit -m "feat(agent): host collector for /proc loadavg, meminfo, uptime" ``` --- ## Task 12: Agent — Reporter (Slipstream Client) **Files:** - Create: `agent/lib/agent/reporter.ex` The Reporter is a Slipstream-backed GenServer. Unit-testing a real WS client is out of scope for Phase 1 — coverage comes from the end-to-end smoke test in Task 14. - [ ] **Step 1: Implement Reporter** Create `agent/lib/agent/reporter.ex`: ```elixir defmodule Agent.Reporter do @moduledoc """ Maintains a persistent Phoenix Channel connection to the server, joins `host:`, and pushes metric samples on the configured fast interval. """ use Slipstream, restart: :permanent require Logger alias Agent.Collectors.Host def start_link(%Agent.Config{} = cfg) do Slipstream.start_link(__MODULE__, cfg, name: __MODULE__) end @impl Slipstream def init(cfg) do socket = new_socket() |> assign(:cfg, cfg) |> assign(:topic, "host:" <> cfg.host_id) |> connect!(uri: cfg.server_url) {:ok, socket} end @impl Slipstream def handle_connect(socket) do topic = socket.assigns.topic cfg = socket.assigns.cfg payload = %{"token" => cfg.token, "agent_version" => Agent.version()} Logger.info("reporter: connected, joining #{topic}") {:ok, join(socket, topic, payload)} end @impl Slipstream def handle_join(topic, _reply, socket) do Logger.info("reporter: joined #{topic}") send(self(), :collect_fast) {:ok, socket} end @impl Slipstream def handle_info(:collect_fast, socket) do sample = Host.collect() payload = %{collected_at: DateTime.utc_now() |> DateTime.to_iso8601(), data: sample} :ok = push_metric(socket, "metric:fast", payload) Process.send_after(self(), :collect_fast, socket.assigns.cfg.fast_seconds * 1000) {:ok, socket} end @impl Slipstream def handle_disconnect(reason, socket) do Logger.warning("reporter: disconnected — #{inspect(reason)}; reconnecting") reconnect(socket) end @impl Slipstream def handle_topic_close(topic, reason, socket) do Logger.warning("reporter: topic #{topic} closed: #{inspect(reason)}; rejoining") rejoin(socket, topic) end defp push_metric(socket, event, payload) do case push(socket, socket.assigns.topic, event, payload) do {:ok, _ref} -> :ok {:error, reason} -> Logger.warning("reporter: push failed: #{inspect(reason)}") :ok end end end ``` - [ ] **Step 2: Compile** ```bash cd agent && mix compile ``` Expected: no errors. Warnings about unused `handle_topic_close` params are fine. - [ ] **Step 3: Commit** ```bash git add agent/lib/agent/reporter.ex git commit -m "feat(agent): slipstream reporter — join, push, auto-reconnect" ``` --- ## Task 13: Agent — Application Supervisor **Files:** - Modify: `agent/lib/agent/application.ex` - Create: `agent/config/config.exs` - Create: `agent/config/runtime.exs` - [ ] **Step 1: Replace application module** Replace `agent/lib/agent/application.ex` with: ```elixir defmodule Agent.Application do @moduledoc false use Application require Logger @impl true def start(_type, _args) do children = case load_config() do {:ok, cfg} -> Logger.info("agent: starting with host_id=#{cfg.host_id}") [{Agent.Reporter, cfg}] {:error, reason} -> Logger.error("agent: no config loaded (#{inspect(reason)}); running in idle mode") [] end Supervisor.start_link(children, strategy: :one_for_one, name: Agent.Supervisor) end defp load_config do path = System.get_env("AGENT_CONFIG") || Application.get_env(:agent, :config_path, "/etc/proxmox-monitor/agent.toml") case File.exists?(path) do true -> Agent.Config.load(path) false -> {:error, {:file_missing, path}} end end end ``` - [ ] **Step 2: Add minimal compile-time config** Create `agent/config/config.exs`: ```elixir import Config config :logger, :default_formatter, format: "$time [$level] $message\n" if File.exists?(Path.join([__DIR__, "#{config_env()}.exs"])) do import_config "#{config_env()}.exs" end ``` Create `agent/config/runtime.exs`: ```elixir import Config if path = System.get_env("AGENT_CONFIG") do config :agent, :config_path, path end ``` - [ ] **Step 3: Compile and run existing tests** ```bash cd agent && mix compile && mix test ``` Expected: all tests pass. On cold boot with no config present, the app starts in idle mode (no crash). - [ ] **Step 4: Commit** ```bash git add agent/lib/agent/application.ex agent/config git commit -m "feat(agent): supervisor boots reporter when config is present" ``` --- ## Task 14: End-to-End Smoke Test **Goal:** Prove the agent connects to a locally-running server, joins the channel, and the server logs an incoming `metric:fast` payload. **Files:** - Create: `/tmp/agent-local.toml` (ad-hoc, not committed) - [ ] **Step 1: Start the server** In terminal A: ```bash cd /Users/cabele/claudeprojects/proxmox_monitor/server mix ecto.create mix ecto.migrate iex -S mix phx.server ``` Expected: `[info] Running ServerWeb.Endpoint with Bandit ... http://localhost:4000` - [ ] **Step 2: Register a host from the IEx shell in terminal A** ```elixir iex> Server.Release.register_host("pve-dev-01") ``` Expected output: ``` Host 'pve-dev-01' registered (id=1). TOKEN: <32+ char string> Store this token NOW — it will never be shown again. ``` Copy the token for the next step. - [ ] **Step 3: Write a local agent config** In terminal B, with `` from the previous step: ```bash cat > /tmp/agent-local.toml < Server.Repo.get_by(Server.Schema.Host, name: "pve-dev-01") |> Map.take([:status, :agent_version, :last_seen_at]) ``` Expected: `%{status: "online", agent_version: "0.1.0", last_seen_at: ~U[...]}`. - [ ] **Step 7: Verify terminate marks host offline** Stop the agent in terminal B with `Ctrl+C, a`. Re-run the query from Step 6. Expected: `status: "offline"`, `last_seen_at` preserved from the last online stamp. - [ ] **Step 8: Clean up temp file and commit a smoke-test log** ```bash rm /tmp/agent-local.toml ``` No code changes — no commit needed. Phase 1 is functionally complete. --- ## Phase 1 Exit Criteria - Monorepo with `server/` and `agent/` each building clean. - `cd server && mix test` — all green. - `cd agent && mix test` — all green. - Manual smoke test in Task 14 — agent joins channel, server logs metrics, host status transitions online→offline on disconnect. - All commits on `main`. Next up (Phase 2): metric persistence in SQLite, ZFS collector, VM collector, Storage collector. See roadmap in `proxmox-monitor-konzept.md`.