41 KiB
Phase 1 — Grundgerüst Implementation Plan
For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (
- [ ]) syntax for tracking.
Goal: Stand up a minimal agent+server pair where an Elixir agent running locally connects via Phoenix Channels to a Phoenix server, authenticates with a token, and pushes host CPU/RAM metrics every 30 seconds. Server logs the incoming payloads.
Architecture: Monorepo with two independent Mix projects (server/ Phoenix+SQLite, agent/ plain OTP app using Slipstream). Agent initiates a persistent WSS connection, joins topic host:<name>, pushes metric:fast events. Server persists only hosts in Phase 1 — metric storage lands in Phase 2.
Tech Stack: Elixir 1.19 / OTP 28, Phoenix 1.7.14, Ecto + ecto_sqlite3, bcrypt_elixir (token hashing), slipstream (agent Channels client), toml (agent config), ExUnit.
File Structure
proxmox_monitor/
├── .gitignore
├── README.md
├── proxmox-monitor-konzept.md (existing)
├── docs/superpowers/plans/2026-04-21-phase1-grundgeruest.md
│
├── server/ (created by mix phx.new)
│ ├── mix.exs modify: add :bcrypt_elixir
│ ├── config/{config,dev,test,runtime}.exs scaffolded
│ ├── priv/repo/migrations/<ts>_create_hosts.exs create
│ ├── lib/server/application.ex scaffolded
│ ├── lib/server/repo.ex scaffolded
│ ├── lib/server/schema/host.ex create
│ ├── lib/server/hosts.ex create (context)
│ ├── lib/server_web/endpoint.ex modify: add agent socket
│ ├── lib/server_web/channels/agent_socket.ex create
│ ├── lib/server_web/channels/host_channel.ex create
│ ├── test/server/hosts_test.exs create
│ └── test/server_web/channels/host_channel_test.exs create
│
└── agent/ (created by mix new --sup)
├── mix.exs modify: deps + app config
├── config/config.exs create
├── config/runtime.exs create
├── lib/agent.ex scaffolded
├── lib/agent/application.ex modify
├── lib/agent/config.ex create
├── lib/agent/collectors/host.ex create
├── lib/agent/reporter.ex create
├── test/agent/config_test.exs create
├── test/agent/collectors/host_test.exs create
└── test/fixtures/proc/ create (loadavg, meminfo, stat samples)
Each file has one responsibility: schema, context (business logic), channel (transport), collector (data acquisition), reporter (transmission). Test files mirror the source tree.
Task 1: Monorepo Init
Files:
-
Create:
.gitignore -
Create:
README.md -
Step 1: Write
.gitignore(covers both Mix projects)
# Elixir/Mix
/server/_build/
/server/deps/
/server/cover/
/server/doc/
/server/.fetch
/server/erl_crash.dump
/server/*.ez
/server/priv/static/assets/
/server/priv/static/cache_manifest.json
/server/*.db
/server/*.db-journal
/server/*.db-wal
/server/*.db-shm
/agent/_build/
/agent/deps/
/agent/cover/
/agent/doc/
/agent/.fetch
/agent/erl_crash.dump
/agent/*.ez
# Editors / OS
.DS_Store
.vscode/
.idea/
- Step 2: Write
README.md(minimal)
# Proxmox Monitor
Agent-Server monitoring for Proxmox hosts. Elixir/OTP. See `proxmox-monitor-konzept.md`.
- `server/` — Phoenix + SQLite + LiveView
- `agent/` — Slipstream Channels client, deploys as Burrito binary
Phase 1 focuses on end-to-end metric push. Later phases add ZFS/VM collectors, persistence, LiveView dashboard.
- Step 3: Initial commit
git add .gitignore README.md proxmox-monitor-konzept.md docs/
git commit -m "chore: project skeleton + phase-1 plan"
Task 2: Server — Phoenix Bootstrap
Files:
-
Create: entire
server/tree viamix phx.new -
Step 1: Generate Phoenix project
Run from /Users/cabele/claudeprojects/proxmox_monitor:
mix phx.new server --database sqlite3 --no-mailer --no-gettext --live --install
If prompted, answer Y to fetch deps.
Expected: creates server/ with Phoenix scaffold, SQLite adapter, LiveView enabled, no Gettext, no Mailer. Deps fetched, assets installed.
- Step 2: Verify scaffold builds and tests pass
cd server && mix compile && mix test
Expected: compiles clean, default PageControllerTest passes.
- Step 3: Commit the scaffold
cd /Users/cabele/claudeprojects/proxmox_monitor
git add server/
git commit -m "feat(server): phoenix 1.7 scaffold with sqlite + liveview"
Task 3: Server — Bcrypt Dependency
Files:
-
Modify:
server/mix.exs -
Step 1: Add
:bcrypt_elixirto deps
In server/mix.exs, locate the defp deps do list and add the line below alongside existing entries:
{:bcrypt_elixir, "~> 3.1"},
- Step 2: Fetch and compile
cd server && mix deps.get && mix compile
Expected: bcrypt_elixir and cc_precompiler fetched; compile succeeds (bcrypt NIF builds).
- Step 3: Commit
git add server/mix.exs server/mix.lock
git commit -m "feat(server): add bcrypt_elixir for token hashing"
Task 4: Server — Host Schema + Context (TDD)
Files:
-
Create:
server/priv/repo/migrations/<ts>_create_hosts.exs -
Create:
server/lib/server/schema/host.ex -
Create:
server/lib/server/hosts.ex -
Create:
server/test/server/hosts_test.exs -
Step 1: Generate migration file
cd server && mix ecto.gen.migration create_hosts
Fill the generated file (timestamped name) with:
defmodule Server.Repo.Migrations.CreateHosts do
use Ecto.Migration
def change do
create table(:hosts) do
add :name, :string, null: false
add :token_hash, :string, null: false
add :agent_version, :string
add :proxmox_version, :string
add :zfs_version, :string
add :status, :string, null: false, default: "never_connected"
add :last_seen_at, :utc_datetime_usec
timestamps(type: :utc_datetime_usec)
end
create unique_index(:hosts, [:name])
end
end
- Step 2: Write schema module
Create server/lib/server/schema/host.ex:
defmodule Server.Schema.Host do
use Ecto.Schema
import Ecto.Changeset
@statuses ~w(never_connected online offline)
schema "hosts" do
field :name, :string
field :token_hash, :string
field :agent_version, :string
field :proxmox_version, :string
field :zfs_version, :string
field :status, :string, default: "never_connected"
field :last_seen_at, :utc_datetime_usec
timestamps(type: :utc_datetime_usec)
end
def create_changeset(host, attrs) do
host
|> cast(attrs, [:name, :token_hash])
|> validate_required([:name, :token_hash])
|> validate_length(:name, min: 1, max: 100)
|> unique_constraint(:name)
end
def status_changeset(host, attrs) do
host
|> cast(attrs, [:status, :last_seen_at, :agent_version])
|> validate_inclusion(:status, @statuses)
end
end
- Step 3: Write failing tests for the context
Create server/test/server/hosts_test.exs:
defmodule Server.HostsTest do
use Server.DataCase, async: true
alias Server.Hosts
describe "create_host/1" do
test "returns host and a plaintext token on success" do
assert {:ok, {host, token}} = Hosts.create_host("pve-01")
assert host.name == "pve-01"
assert host.status == "never_connected"
assert is_binary(token) and byte_size(token) >= 32
refute host.token_hash == token
end
test "rejects duplicate names" do
{:ok, _} = Hosts.create_host("pve-01")
assert {:error, changeset} = Hosts.create_host("pve-01")
assert %{name: ["has already been taken"]} = errors_on(changeset)
end
end
describe "authenticate/2" do
test "returns host for valid name+token" do
{:ok, {host, token}} = Hosts.create_host("pve-01")
assert {:ok, found} = Hosts.authenticate("pve-01", token)
assert found.id == host.id
end
test "returns :invalid_token for wrong token" do
{:ok, {_host, _token}} = Hosts.create_host("pve-01")
assert {:error, :invalid_token} = Hosts.authenticate("pve-01", "wrong")
end
test "returns :unknown_host when name does not exist" do
assert {:error, :unknown_host} = Hosts.authenticate("nope", "whatever")
end
end
describe "mark_online/2 and mark_offline/1" do
test "mark_online stamps status, last_seen_at, agent_version" do
{:ok, {host, _}} = Hosts.create_host("pve-01")
assert {:ok, updated} = Hosts.mark_online(host, "0.1.0")
assert updated.status == "online"
assert updated.agent_version == "0.1.0"
assert updated.last_seen_at != nil
end
test "mark_offline sets status to offline" do
{:ok, {host, _}} = Hosts.create_host("pve-01")
{:ok, online} = Hosts.mark_online(host, "0.1.0")
assert {:ok, offline} = Hosts.mark_offline(online)
assert offline.status == "offline"
end
end
end
- Step 4: Run tests — expect failure
cd server && mix test test/server/hosts_test.exs
Expected: compile error Server.Hosts is not available or similar.
- Step 5: Implement the context
Create server/lib/server/hosts.ex:
defmodule Server.Hosts do
@moduledoc "Host registration, authentication, status tracking."
alias Server.Repo
alias Server.Schema.Host
@spec create_host(String.t()) :: {:ok, {Host.t(), String.t()}} | {:error, Ecto.Changeset.t()}
def create_host(name) do
token = generate_token()
hash = Bcrypt.hash_pwd_salt(token)
%Host{}
|> Host.create_changeset(%{name: name, token_hash: hash})
|> Repo.insert()
|> case do
{:ok, host} -> {:ok, {host, token}}
{:error, cs} -> {:error, cs}
end
end
@spec authenticate(String.t(), String.t()) ::
{:ok, Host.t()} | {:error, :unknown_host | :invalid_token}
def authenticate(name, token) when is_binary(name) and is_binary(token) do
case Repo.get_by(Host, name: name) do
nil ->
Bcrypt.no_user_verify()
{:error, :unknown_host}
host ->
if Bcrypt.verify_pass(token, host.token_hash) do
{:ok, host}
else
{:error, :invalid_token}
end
end
end
@spec mark_online(Host.t(), String.t() | nil) :: {:ok, Host.t()} | {:error, Ecto.Changeset.t()}
def mark_online(%Host{} = host, agent_version) do
host
|> Host.status_changeset(%{
status: "online",
last_seen_at: DateTime.utc_now(),
agent_version: agent_version
})
|> Repo.update()
end
@spec mark_offline(Host.t()) :: {:ok, Host.t()} | {:error, Ecto.Changeset.t()}
def mark_offline(%Host{} = host) do
host
|> Host.status_changeset(%{status: "offline"})
|> Repo.update()
end
@doc "Mark every host offline — called on server boot to clear stale online flags."
@spec mark_all_offline() :: {integer(), nil}
def mark_all_offline do
import Ecto.Query
Repo.update_all(from(h in Host), set: [status: "offline", updated_at: DateTime.utc_now()])
end
defp generate_token do
:crypto.strong_rand_bytes(32) |> Base.url_encode64(padding: false)
end
end
- Step 6: Speed up bcrypt in tests
In server/config/test.exs, add at the bottom (before the existing config :phoenix line if present, or anywhere at top level):
config :bcrypt_elixir, :log_rounds, 4
- Step 7: Run tests — expect all pass
cd server && mix ecto.reset && mix test test/server/hosts_test.exs
Expected: 7 tests pass.
- Step 8: Commit
git add server/priv server/lib/server server/test/server server/config/test.exs
git commit -m "feat(server): host schema, context, auth, status transitions"
Task 5: Server — AgentSocket + Mark-All-Offline on Boot
Files:
-
Create:
server/lib/server_web/channels/agent_socket.ex -
Modify:
server/lib/server_web/endpoint.ex -
Modify:
server/lib/server/application.ex -
Step 1: Write AgentSocket
Create server/lib/server_web/channels/agent_socket.ex:
defmodule ServerWeb.AgentSocket do
@moduledoc "Entry socket for agents. Actual authentication happens in HostChannel.join/3."
use Phoenix.Socket
channel "host:*", ServerWeb.HostChannel
@impl true
def connect(_params, socket, _connect_info), do: {:ok, socket}
@impl true
def id(_socket), do: nil
end
- Step 2: Mount the socket in the endpoint
In server/lib/server_web/endpoint.ex, find the existing socket "/live" line and add just below it:
socket "/socket", ServerWeb.AgentSocket,
websocket: [timeout: 45_000],
longpoll: false
- Step 3: Clear stale online flags on boot
In server/lib/server/application.ex, find the existing start/2 function. It currently ends with something like:
opts = [strategy: :one_for_one, name: Server.Supervisor]
Supervisor.start_link(children, opts)
end
Replace those two lines with:
opts = [strategy: :one_for_one, name: Server.Supervisor]
result = Supervisor.start_link(children, opts)
with {:ok, _} <- result, do: Server.Hosts.mark_all_offline()
result
end
Rationale: if the server is restarted while agents were connected, their online row persists stale. Marking everything offline on boot lets the agent's next channel join flip it back to online cleanly.
- Step 4: Compile to verify
cd server && mix compile
Expected: no warnings about undefined ServerWeb.HostChannel (module exists as channel ref only; we'll create it next task — note this is acceptable because channel/2 only registers the name).
- Step 5: Commit
git add server/lib/server_web/channels/agent_socket.ex server/lib/server_web/endpoint.ex server/lib/server/application.ex
git commit -m "feat(server): agent socket endpoint, clear online status on boot"
Task 6: Server — HostChannel (TDD)
Files:
-
Create:
server/lib/server_web/channels/host_channel.ex -
Create:
server/test/server_web/channels/host_channel_test.exs -
Modify:
server/test/support/channel_case.ex(verify it exists; Phoenix scaffold creates it) -
Step 1: Confirm ChannelCase exists
ls server/test/support/channel_case.ex
Expected: file exists (Phoenix 1.7 --live scaffold creates it). If missing, skip this check and note — ChannelCase is required for the tests below.
- Step 2: Write failing channel tests
Create server/test/server_web/channels/host_channel_test.exs:
defmodule ServerWeb.HostChannelTest do
use ServerWeb.ChannelCase, async: false
alias Server.Hosts
alias ServerWeb.AgentSocket
setup do
{:ok, {host, token}} = Hosts.create_host("pve-01")
%{host: host, token: token}
end
describe "join" do
test "succeeds with valid token and marks host online", %{host: host, token: token} do
{:ok, socket} = connect(AgentSocket, %{})
assert {:ok, _reply, socket} =
subscribe_and_join(socket, "host:pve-01", %{
"token" => token,
"agent_version" => "0.1.0"
})
assert socket.assigns.host_id == host.id
reloaded = Server.Repo.reload!(host)
assert reloaded.status == "online"
assert reloaded.agent_version == "0.1.0"
assert reloaded.last_seen_at != nil
end
test "rejects invalid token", %{host: _host} do
{:ok, socket} = connect(AgentSocket, %{})
assert {:error, %{reason: "invalid_token"}} =
subscribe_and_join(socket, "host:pve-01", %{
"token" => "garbage",
"agent_version" => "0.1.0"
})
end
test "rejects unknown host name" do
{:ok, socket} = connect(AgentSocket, %{})
assert {:error, %{reason: "unknown_host"}} =
subscribe_and_join(socket, "host:nope", %{
"token" => "x",
"agent_version" => "0.1.0"
})
end
test "rejects topic mismatch" do
{:ok, socket} = connect(AgentSocket, %{})
assert {:error, %{reason: "bad_topic"}} =
subscribe_and_join(socket, "host:", %{"token" => "x", "agent_version" => "0.1.0"})
end
end
describe "metric:fast event" do
setup %{token: token} do
{:ok, socket} = connect(AgentSocket, %{})
{:ok, _reply, joined} =
subscribe_and_join(socket, "host:pve-01", %{
"token" => token,
"agent_version" => "0.1.0"
})
%{socket: joined}
end
test "accepts metric payload and replies :ok", %{socket: socket} do
ref =
push(socket, "metric:fast", %{
"collected_at" => "2026-04-21T12:00:00Z",
"data" => %{"cpu_percent" => 12.3, "load1" => 0.2}
})
assert_reply ref, :ok
end
end
describe "terminate" do
test "marks host offline when channel process exits", %{host: host, token: token} do
{:ok, socket} = connect(AgentSocket, %{})
{:ok, _, joined} =
subscribe_and_join(socket, "host:pve-01", %{
"token" => token,
"agent_version" => "0.1.0"
})
Process.unlink(joined.channel_pid)
ref = Process.monitor(joined.channel_pid)
close(joined)
assert_receive {:DOWN, ^ref, :process, _, _}, 1_000
reloaded = Server.Repo.reload!(host)
assert reloaded.status == "offline"
end
end
end
- Step 3: Run tests — expect failure (HostChannel not implemented)
cd server && mix test test/server_web/channels/host_channel_test.exs
Expected: compile error ServerWeb.HostChannel is not available.
- Step 4: Implement HostChannel
Create server/lib/server_web/channels/host_channel.ex:
defmodule ServerWeb.HostChannel do
use ServerWeb, :channel
require Logger
alias Server.Hosts
@impl true
def join("host:" <> name, params, socket) when name != "" do
token = Map.get(params, "token", "")
agent_version = Map.get(params, "agent_version")
case Hosts.authenticate(name, token) do
{:ok, host} ->
{:ok, _} = Hosts.mark_online(host, agent_version)
Logger.info("agent joined host:#{name}")
{:ok, assign(socket, :host_id, host.id) |> assign(:host_name, name)}
{:error, :unknown_host} ->
{:error, %{reason: "unknown_host"}}
{:error, :invalid_token} ->
{:error, %{reason: "invalid_token"}}
end
end
def join(_topic, _params, _socket), do: {:error, %{reason: "bad_topic"}}
@impl true
def handle_in("metric:fast", payload, socket) do
Logger.info("metric:fast host=#{socket.assigns.host_name} data=#{inspect(payload["data"])}")
{:reply, :ok, socket}
end
def handle_in("metric:medium", payload, socket) do
Logger.info("metric:medium host=#{socket.assigns.host_name} payload=#{inspect(payload)}")
{:reply, :ok, socket}
end
def handle_in("metric:slow", payload, socket) do
Logger.info("metric:slow host=#{socket.assigns.host_name} payload=#{inspect(payload)}")
{:reply, :ok, socket}
end
@impl true
def terminate(_reason, socket) do
case socket.assigns[:host_id] do
nil ->
:ok
id ->
with host when not is_nil(host) <- Server.Repo.get(Server.Schema.Host, id) do
Hosts.mark_offline(host)
end
:ok
end
end
end
- Step 5: Run tests — expect pass
cd server && mix test test/server_web/channels/host_channel_test.exs
Expected: all tests pass.
- Step 6: Run full test suite
cd server && mix test
Expected: all tests green.
- Step 7: Commit
git add server/lib/server_web/channels/host_channel.ex server/test/server_web/channels/host_channel_test.exs
git commit -m "feat(server): host channel with token auth and metric events"
Task 7: Server — Smoke-Test Helper
Files:
-
Create:
server/lib/server/release.ex(minimal helper for IEx-driven host creation) -
Step 1: Add a tiny release helper
Create server/lib/server/release.ex:
defmodule Server.Release do
@moduledoc "Convenience functions for IEx and future release tasks."
@doc "Create a host and print the plaintext token once."
def register_host(name) do
case Server.Hosts.create_host(name) do
{:ok, {host, token}} ->
IO.puts("Host '#{host.name}' registered (id=#{host.id}).")
IO.puts("TOKEN: #{token}")
IO.puts("Store this token NOW — it will never be shown again.")
{:ok, host, token}
{:error, cs} ->
IO.puts("Failed to register host: #{inspect(cs.errors)}")
{:error, cs}
end
end
end
- Step 2: Compile
cd server && mix compile
- Step 3: Commit
git add server/lib/server/release.ex
git commit -m "chore(server): iex helper for host registration"
Task 8: Agent — Mix Project Bootstrap
Files:
-
Create:
agent/directory tree viamix new -
Step 1: Generate the OTP app
Run from /Users/cabele/claudeprojects/proxmox_monitor:
mix new agent --sup
Expected: creates agent/ with mix.exs, lib/agent.ex, lib/agent/application.ex, test/.
- Step 2: Replace
agent/mix.exscontents
Open agent/mix.exs and replace with:
defmodule Agent.MixProject do
use Mix.Project
@version "0.1.0"
def project do
[
app: :agent,
version: @version,
elixir: "~> 1.17",
start_permanent: Mix.env() == :prod,
deps: deps(),
elixirc_paths: elixirc_paths(Mix.env())
]
end
def application do
[
extra_applications: [:logger, :crypto],
mod: {Agent.Application, []}
]
end
defp deps do
[
{:slipstream, "~> 1.1"},
{:jason, "~> 1.4"},
{:toml, "~> 0.7"}
]
end
defp elixirc_paths(:test), do: ["lib", "test/support"]
defp elixirc_paths(_), do: ["lib"]
end
- Step 3: Fetch deps and compile
cd agent && mix deps.get && mix compile
Expected: slipstream, mint_web_socket, jason, toml fetched; compile succeeds.
- Step 4: Commit
cd /Users/cabele/claudeprojects/proxmox_monitor
git add agent/
git commit -m "feat(agent): otp app scaffold with slipstream + toml deps"
Task 9: Agent — Version Constant
Files:
-
Modify:
agent/lib/agent.ex -
Step 1: Replace the scaffolded Agent module
Replace the entire contents of agent/lib/agent.ex with:
defmodule Agent do
@moduledoc "Top-level namespace. Exposes the compiled version for reporting."
@version Mix.Project.config()[:version]
@spec version() :: String.t()
def version, do: @version
end
- Step 2: Compile and quick-check in IEx
cd agent && mix compile
- Step 3: Commit
git add agent/lib/agent.ex
git commit -m "feat(agent): expose compile-time version"
Task 10: Agent — Config Module (TDD)
Files:
-
Create:
agent/lib/agent/config.ex -
Create:
agent/test/agent/config_test.exs -
Create:
agent/test/fixtures/agent.toml(sample config used by test) -
Step 1: Write a fixture config
Create agent/test/fixtures/agent.toml:
server_url = "wss://monitor.example.com/socket/websocket"
token = "test_token_123"
host_id = "pve-test-01"
[intervals]
fast_seconds = 15
medium_seconds = 120
slow_seconds = 600
- Step 2: Write failing tests
Create agent/test/agent/config_test.exs:
defmodule Agent.ConfigTest do
use ExUnit.Case, async: true
alias Agent.Config
@fixture Path.expand("../fixtures/agent.toml", __DIR__)
describe "load/1" do
test "parses required fields" do
assert {:ok, cfg} = Config.load(@fixture)
assert cfg.server_url == "wss://monitor.example.com/socket/websocket"
assert cfg.token == "test_token_123"
assert cfg.host_id == "pve-test-01"
assert cfg.fast_seconds == 15
assert cfg.medium_seconds == 120
assert cfg.slow_seconds == 600
end
test "returns error for missing file" do
assert {:error, {:file_read, _}} = Config.load("/does/not/exist.toml")
end
test "defaults host_id to system hostname when absent" do
tmp = Path.join(System.tmp_dir!(), "agent_nohost.toml")
File.write!(tmp, """
server_url = "wss://x/socket/websocket"
token = "t"
""")
on_exit(fn -> File.rm(tmp) end)
assert {:ok, cfg} = Config.load(tmp)
assert is_binary(cfg.host_id)
assert cfg.host_id != ""
end
test "applies default intervals when [intervals] is absent" do
tmp = Path.join(System.tmp_dir!(), "agent_nointervals.toml")
File.write!(tmp, """
server_url = "wss://x/socket/websocket"
token = "t"
host_id = "h"
""")
on_exit(fn -> File.rm(tmp) end)
assert {:ok, cfg} = Config.load(tmp)
assert cfg.fast_seconds == 30
assert cfg.medium_seconds == 300
assert cfg.slow_seconds == 1800
end
test "returns error when required keys missing" do
tmp = Path.join(System.tmp_dir!(), "agent_bad.toml")
File.write!(tmp, "token = \"t\"\n")
on_exit(fn -> File.rm(tmp) end)
assert {:error, {:missing_key, :server_url}} = Config.load(tmp)
end
end
end
- Step 3: Run tests — expect failure
cd agent && mix test test/agent/config_test.exs
Expected: Agent.Config is not available.
- Step 4: Implement the config loader
Create agent/lib/agent/config.ex:
defmodule Agent.Config do
@moduledoc "Loads and validates the TOML agent config."
defstruct [
:server_url,
:token,
:host_id,
fast_seconds: 30,
medium_seconds: 300,
slow_seconds: 1800
]
@type t :: %__MODULE__{
server_url: String.t(),
token: String.t(),
host_id: String.t(),
fast_seconds: pos_integer(),
medium_seconds: pos_integer(),
slow_seconds: pos_integer()
}
@required ~w(server_url token)a
@spec load(Path.t()) ::
{:ok, t()}
| {:error, {:file_read, term()} | {:parse, term()} | {:missing_key, atom()}}
def load(path) do
with {:ok, body} <- read_file(path),
{:ok, parsed} <- parse_toml(body),
:ok <- validate_required(parsed) do
{:ok, build(parsed)}
end
end
defp read_file(path) do
case File.read(path) do
{:ok, body} -> {:ok, body}
{:error, reason} -> {:error, {:file_read, reason}}
end
end
defp parse_toml(body) do
case Toml.decode(body) do
{:ok, map} -> {:ok, map}
{:error, reason} -> {:error, {:parse, reason}}
end
end
defp validate_required(map) do
Enum.find_value(@required, :ok, fn key ->
case Map.get(map, Atom.to_string(key)) do
v when is_binary(v) and v != "" -> nil
_ -> {:error, {:missing_key, key}}
end
end)
end
defp build(map) do
intervals = Map.get(map, "intervals", %{})
%__MODULE__{
server_url: map["server_url"],
token: map["token"],
host_id: map["host_id"] || hostname(),
fast_seconds: Map.get(intervals, "fast_seconds", 30),
medium_seconds: Map.get(intervals, "medium_seconds", 300),
slow_seconds: Map.get(intervals, "slow_seconds", 1800)
}
end
defp hostname do
case :inet.gethostname() do
{:ok, name} -> List.to_string(name)
_ -> "unknown-host"
end
end
end
- Step 5: Run tests — expect pass
cd agent && mix test test/agent/config_test.exs
Expected: 5 tests pass.
- Step 6: Commit
git add agent/lib/agent/config.ex agent/test/agent/config_test.exs agent/test/fixtures/agent.toml
git commit -m "feat(agent): toml config loader with defaults and validation"
Task 11: Agent — Host Collector (TDD with /proc fixtures)
Files:
- Create:
agent/lib/agent/collectors/host.ex - Create:
agent/test/agent/collectors/host_test.exs - Create:
agent/test/fixtures/proc/loadavg - Create:
agent/test/fixtures/proc/meminfo - Create:
agent/test/fixtures/proc/uptime
The collector reads Linux /proc. Tests run on macOS too — they point the collector at fixture files instead.
- Step 1: Write fixture files
Create agent/test/fixtures/proc/loadavg:
0.42 0.55 0.31 3/512 12345
Create agent/test/fixtures/proc/meminfo:
MemTotal: 16384000 kB
MemFree: 2048000 kB
MemAvailable: 8192000 kB
Buffers: 256000 kB
Cached: 4096000 kB
SwapTotal: 4194304 kB
SwapFree: 4194304 kB
Create agent/test/fixtures/proc/uptime:
123456.78 987654.32
- Step 2: Write failing tests
Create agent/test/agent/collectors/host_test.exs:
defmodule Agent.Collectors.HostTest do
use ExUnit.Case, async: true
alias Agent.Collectors.Host
@proc Path.expand("../../fixtures/proc", __DIR__)
test "collects load average" do
sample = Host.collect(proc_dir: @proc)
assert sample.load1 == 0.42
assert sample.load5 == 0.55
assert sample.load15 == 0.31
end
test "collects memory in bytes" do
sample = Host.collect(proc_dir: @proc)
assert sample.mem_total_bytes == 16_384_000 * 1024
assert sample.mem_available_bytes == 8_192_000 * 1024
assert sample.mem_used_bytes == sample.mem_total_bytes - sample.mem_available_bytes
end
test "collects uptime seconds" do
sample = Host.collect(proc_dir: @proc)
assert sample.uptime_seconds == 123_456
end
test "includes hostname string" do
sample = Host.collect(proc_dir: @proc)
assert is_binary(sample.hostname)
assert sample.hostname != ""
end
test "missing proc files yield :error field, not a crash" do
sample = Host.collect(proc_dir: "/nonexistent/path/xyz")
assert sample.errors != []
end
end
- Step 3: Run tests — expect failure
cd agent && mix test test/agent/collectors/host_test.exs
Expected: Agent.Collectors.Host is not available.
- Step 4: Implement collector
Create agent/lib/agent/collectors/host.ex:
defmodule Agent.Collectors.Host do
@moduledoc """
Reads host metrics from /proc. Accepts `proc_dir:` option for testability.
Never raises — on read failure, populates `:errors` and leaves the field nil.
"""
@type sample :: %{
hostname: String.t(),
load1: float() | nil,
load5: float() | nil,
load15: float() | nil,
mem_total_bytes: non_neg_integer() | nil,
mem_available_bytes: non_neg_integer() | nil,
mem_used_bytes: non_neg_integer() | nil,
uptime_seconds: non_neg_integer() | nil,
errors: [term()]
}
@spec collect(keyword()) :: sample()
def collect(opts \\ []) do
proc_dir = Keyword.get(opts, :proc_dir, "/proc")
{load, e1} = safe(&read_loadavg/1, [proc_dir], {nil, nil, nil})
{mem, e2} = safe(&read_meminfo/1, [proc_dir], %{total: nil, available: nil})
{uptime, e3} = safe(&read_uptime/1, [proc_dir], nil)
total = mem.total
avail = mem.available
used = if total && avail, do: total - avail, else: nil
{load1, load5, load15} = load
%{
hostname: hostname(),
load1: load1,
load5: load5,
load15: load15,
mem_total_bytes: total,
mem_available_bytes: avail,
mem_used_bytes: used,
uptime_seconds: uptime,
errors: Enum.filter([e1, e2, e3], & &1)
}
end
defp safe(fun, args, fallback) do
try do
{apply(fun, args), nil}
rescue
e -> {fallback, {fun_name(fun), Exception.message(e)}}
catch
:error, reason -> {fallback, {fun_name(fun), reason}}
end
end
defp fun_name(fun), do: Function.info(fun)[:name]
defp read_loadavg(proc_dir) do
body = File.read!(Path.join(proc_dir, "loadavg"))
[l1, l5, l15 | _] = String.split(body, ~r/\s+/, trim: true)
{to_float(l1), to_float(l5), to_float(l15)}
end
defp read_meminfo(proc_dir) do
body = File.read!(Path.join(proc_dir, "meminfo"))
parsed =
body
|> String.split("\n", trim: true)
|> Enum.reduce(%{}, fn line, acc ->
case String.split(line, ~r/:\s+/, parts: 2) do
[key, val] -> Map.put(acc, key, val)
_ -> acc
end
end)
%{
total: kb_to_bytes(parsed["MemTotal"]),
available: kb_to_bytes(parsed["MemAvailable"])
}
end
defp read_uptime(proc_dir) do
body = File.read!(Path.join(proc_dir, "uptime"))
[secs | _] = String.split(body, " ", trim: true)
secs |> to_float() |> trunc()
end
defp kb_to_bytes(nil), do: nil
defp kb_to_bytes(str) do
case Regex.run(~r/(\d+)\s*kB/, str) do
[_, kb] -> String.to_integer(kb) * 1024
_ -> nil
end
end
defp to_float(s) do
{f, _} = Float.parse(s)
f
end
defp hostname do
case :inet.gethostname() do
{:ok, name} -> List.to_string(name)
_ -> "unknown-host"
end
end
end
- Step 5: Run tests — expect pass
cd agent && mix test test/agent/collectors/host_test.exs
Expected: 5 tests pass.
- Step 6: Commit
git add agent/lib/agent/collectors agent/test/agent/collectors agent/test/fixtures/proc
git commit -m "feat(agent): host collector for /proc loadavg, meminfo, uptime"
Task 12: Agent — Reporter (Slipstream Client)
Files:
- Create:
agent/lib/agent/reporter.ex
The Reporter is a Slipstream-backed GenServer. Unit-testing a real WS client is out of scope for Phase 1 — coverage comes from the end-to-end smoke test in Task 14.
- Step 1: Implement Reporter
Create agent/lib/agent/reporter.ex:
defmodule Agent.Reporter do
@moduledoc """
Maintains a persistent Phoenix Channel connection to the server, joins
`host:<host_id>`, and pushes metric samples on the configured fast interval.
"""
use Slipstream, restart: :permanent
require Logger
alias Agent.Collectors.Host
def start_link(%Agent.Config{} = cfg) do
Slipstream.start_link(__MODULE__, cfg, name: __MODULE__)
end
@impl Slipstream
def init(cfg) do
socket =
new_socket()
|> assign(:cfg, cfg)
|> assign(:topic, "host:" <> cfg.host_id)
|> connect!(uri: cfg.server_url)
{:ok, socket}
end
@impl Slipstream
def handle_connect(socket) do
topic = socket.assigns.topic
cfg = socket.assigns.cfg
payload = %{"token" => cfg.token, "agent_version" => Agent.version()}
Logger.info("reporter: connected, joining #{topic}")
{:ok, join(socket, topic, payload)}
end
@impl Slipstream
def handle_join(topic, _reply, socket) do
Logger.info("reporter: joined #{topic}")
send(self(), :collect_fast)
{:ok, socket}
end
@impl Slipstream
def handle_info(:collect_fast, socket) do
sample = Host.collect()
payload = %{collected_at: DateTime.utc_now() |> DateTime.to_iso8601(), data: sample}
:ok = push_metric(socket, "metric:fast", payload)
Process.send_after(self(), :collect_fast, socket.assigns.cfg.fast_seconds * 1000)
{:ok, socket}
end
@impl Slipstream
def handle_disconnect(reason, socket) do
Logger.warning("reporter: disconnected — #{inspect(reason)}; reconnecting")
reconnect(socket)
end
@impl Slipstream
def handle_topic_close(topic, reason, socket) do
Logger.warning("reporter: topic #{topic} closed: #{inspect(reason)}; rejoining")
rejoin(socket, topic)
end
defp push_metric(socket, event, payload) do
case push(socket, socket.assigns.topic, event, payload) do
{:ok, _ref} -> :ok
{:error, reason} ->
Logger.warning("reporter: push failed: #{inspect(reason)}")
:ok
end
end
end
- Step 2: Compile
cd agent && mix compile
Expected: no errors. Warnings about unused handle_topic_close params are fine.
- Step 3: Commit
git add agent/lib/agent/reporter.ex
git commit -m "feat(agent): slipstream reporter — join, push, auto-reconnect"
Task 13: Agent — Application Supervisor
Files:
-
Modify:
agent/lib/agent/application.ex -
Create:
agent/config/config.exs -
Create:
agent/config/runtime.exs -
Step 1: Replace application module
Replace agent/lib/agent/application.ex with:
defmodule Agent.Application do
@moduledoc false
use Application
require Logger
@impl true
def start(_type, _args) do
children =
case load_config() do
{:ok, cfg} ->
Logger.info("agent: starting with host_id=#{cfg.host_id}")
[{Agent.Reporter, cfg}]
{:error, reason} ->
Logger.error("agent: no config loaded (#{inspect(reason)}); running in idle mode")
[]
end
Supervisor.start_link(children, strategy: :one_for_one, name: Agent.Supervisor)
end
defp load_config do
path =
System.get_env("AGENT_CONFIG") ||
Application.get_env(:agent, :config_path, "/etc/proxmox-monitor/agent.toml")
case File.exists?(path) do
true -> Agent.Config.load(path)
false -> {:error, {:file_missing, path}}
end
end
end
- Step 2: Add minimal compile-time config
Create agent/config/config.exs:
import Config
config :logger, :default_formatter, format: "$time [$level] $message\n"
if File.exists?(Path.join([__DIR__, "#{config_env()}.exs"])) do
import_config "#{config_env()}.exs"
end
Create agent/config/runtime.exs:
import Config
if path = System.get_env("AGENT_CONFIG") do
config :agent, :config_path, path
end
- Step 3: Compile and run existing tests
cd agent && mix compile && mix test
Expected: all tests pass. On cold boot with no config present, the app starts in idle mode (no crash).
- Step 4: Commit
git add agent/lib/agent/application.ex agent/config
git commit -m "feat(agent): supervisor boots reporter when config is present"
Task 14: End-to-End Smoke Test
Goal: Prove the agent connects to a locally-running server, joins the channel, and the server logs an incoming metric:fast payload.
Files:
-
Create:
/tmp/agent-local.toml(ad-hoc, not committed) -
Step 1: Start the server
In terminal A:
cd /Users/cabele/claudeprojects/proxmox_monitor/server
mix ecto.create
mix ecto.migrate
iex -S mix phx.server
Expected: [info] Running ServerWeb.Endpoint with Bandit ... http://localhost:4000
- Step 2: Register a host from the IEx shell in terminal A
iex> Server.Release.register_host("pve-dev-01")
Expected output:
Host 'pve-dev-01' registered (id=1).
TOKEN: <32+ char string>
Store this token NOW — it will never be shown again.
Copy the token for the next step.
- Step 3: Write a local agent config
In terminal B, with <TOKEN> from the previous step:
cat > /tmp/agent-local.toml <<EOF
server_url = "ws://localhost:4000/socket/websocket"
token = "<TOKEN>"
host_id = "pve-dev-01"
[intervals]
fast_seconds = 5
medium_seconds = 60
slow_seconds = 300
EOF
- Step 4: Start the agent
Still in terminal B:
cd /Users/cabele/claudeprojects/proxmox_monitor/agent
AGENT_CONFIG=/tmp/agent-local.toml iex -S mix
Expected in terminal B: agent: starting with host_id=pve-dev-01 then reporter: connected, joining host:pve-dev-01 then reporter: joined host:pve-dev-01.
- Step 5: Observe metrics in terminal A
Within 5 seconds, terminal A should show:
[info] agent joined host:pve-dev-01
[info] metric:fast host=pve-dev-01 data=%{...}
The data= map contains :hostname, :load1/5/15, :mem_*_bytes, :uptime_seconds. On macOS dev machines, :errors will be populated (no /proc). That's expected — the network path and channel protocol are what we're verifying here.
- Step 6: Verify host status in DB
In terminal A IEx:
iex> Server.Repo.get_by(Server.Schema.Host, name: "pve-dev-01") |> Map.take([:status, :agent_version, :last_seen_at])
Expected: %{status: "online", agent_version: "0.1.0", last_seen_at: ~U[...]}.
- Step 7: Verify terminate marks host offline
Stop the agent in terminal B with Ctrl+C, a. Re-run the query from Step 6.
Expected: status: "offline", last_seen_at preserved from the last online stamp.
- Step 8: Clean up temp file and commit a smoke-test log
rm /tmp/agent-local.toml
No code changes — no commit needed. Phase 1 is functionally complete.
Phase 1 Exit Criteria
- Monorepo with
server/andagent/each building clean. cd server && mix test— all green.cd agent && mix test— all green.- Manual smoke test in Task 14 — agent joins channel, server logs metrics, host status transitions online→offline on disconnect.
- All commits on
main.
Next up (Phase 2): metric persistence in SQLite, ZFS collector, VM collector, Storage collector. See roadmap in proxmox-monitor-konzept.md.