1595 lines
41 KiB
Markdown
1595 lines
41 KiB
Markdown
# Phase 1 — Grundgerüst Implementation Plan
|
|
|
|
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
|
|
|
**Goal:** Stand up a minimal agent+server pair where an Elixir agent running locally connects via Phoenix Channels to a Phoenix server, authenticates with a token, and pushes host CPU/RAM metrics every 30 seconds. Server logs the incoming payloads.
|
|
|
|
**Architecture:** Monorepo with two independent Mix projects (`server/` Phoenix+SQLite, `agent/` plain OTP app using Slipstream). Agent initiates a persistent WSS connection, joins topic `host:<name>`, pushes `metric:fast` events. Server persists only `hosts` in Phase 1 — metric storage lands in Phase 2.
|
|
|
|
**Tech Stack:** Elixir 1.19 / OTP 28, Phoenix 1.7.14, Ecto + `ecto_sqlite3`, `bcrypt_elixir` (token hashing), `slipstream` (agent Channels client), `toml` (agent config), ExUnit.
|
|
|
|
---
|
|
|
|
## File Structure
|
|
|
|
```
|
|
proxmox_monitor/
|
|
├── .gitignore
|
|
├── README.md
|
|
├── proxmox-monitor-konzept.md (existing)
|
|
├── docs/superpowers/plans/2026-04-21-phase1-grundgeruest.md
|
|
│
|
|
├── server/ (created by mix phx.new)
|
|
│ ├── mix.exs modify: add :bcrypt_elixir
|
|
│ ├── config/{config,dev,test,runtime}.exs scaffolded
|
|
│ ├── priv/repo/migrations/<ts>_create_hosts.exs create
|
|
│ ├── lib/server/application.ex scaffolded
|
|
│ ├── lib/server/repo.ex scaffolded
|
|
│ ├── lib/server/schema/host.ex create
|
|
│ ├── lib/server/hosts.ex create (context)
|
|
│ ├── lib/server_web/endpoint.ex modify: add agent socket
|
|
│ ├── lib/server_web/channels/agent_socket.ex create
|
|
│ ├── lib/server_web/channels/host_channel.ex create
|
|
│ ├── test/server/hosts_test.exs create
|
|
│ └── test/server_web/channels/host_channel_test.exs create
|
|
│
|
|
└── agent/ (created by mix new --sup)
|
|
├── mix.exs modify: deps + app config
|
|
├── config/config.exs create
|
|
├── config/runtime.exs create
|
|
├── lib/agent.ex scaffolded
|
|
├── lib/agent/application.ex modify
|
|
├── lib/agent/config.ex create
|
|
├── lib/agent/collectors/host.ex create
|
|
├── lib/agent/reporter.ex create
|
|
├── test/agent/config_test.exs create
|
|
├── test/agent/collectors/host_test.exs create
|
|
└── test/fixtures/proc/ create (loadavg, meminfo, stat samples)
|
|
```
|
|
|
|
Each file has one responsibility: schema, context (business logic), channel (transport), collector (data acquisition), reporter (transmission). Test files mirror the source tree.
|
|
|
|
---
|
|
|
|
## Task 1: Monorepo Init
|
|
|
|
**Files:**
|
|
- Create: `.gitignore`
|
|
- Create: `README.md`
|
|
|
|
- [ ] **Step 1: Write `.gitignore` (covers both Mix projects)**
|
|
|
|
```
|
|
# Elixir/Mix
|
|
/server/_build/
|
|
/server/deps/
|
|
/server/cover/
|
|
/server/doc/
|
|
/server/.fetch
|
|
/server/erl_crash.dump
|
|
/server/*.ez
|
|
/server/priv/static/assets/
|
|
/server/priv/static/cache_manifest.json
|
|
/server/*.db
|
|
/server/*.db-journal
|
|
/server/*.db-wal
|
|
/server/*.db-shm
|
|
|
|
/agent/_build/
|
|
/agent/deps/
|
|
/agent/cover/
|
|
/agent/doc/
|
|
/agent/.fetch
|
|
/agent/erl_crash.dump
|
|
/agent/*.ez
|
|
|
|
# Editors / OS
|
|
.DS_Store
|
|
.vscode/
|
|
.idea/
|
|
```
|
|
|
|
- [ ] **Step 2: Write `README.md` (minimal)**
|
|
|
|
```markdown
|
|
# Proxmox Monitor
|
|
|
|
Agent-Server monitoring for Proxmox hosts. Elixir/OTP. See `proxmox-monitor-konzept.md`.
|
|
|
|
- `server/` — Phoenix + SQLite + LiveView
|
|
- `agent/` — Slipstream Channels client, deploys as Burrito binary
|
|
|
|
Phase 1 focuses on end-to-end metric push. Later phases add ZFS/VM collectors, persistence, LiveView dashboard.
|
|
```
|
|
|
|
- [ ] **Step 3: Initial commit**
|
|
|
|
```bash
|
|
git add .gitignore README.md proxmox-monitor-konzept.md docs/
|
|
git commit -m "chore: project skeleton + phase-1 plan"
|
|
```
|
|
|
|
---
|
|
|
|
## Task 2: Server — Phoenix Bootstrap
|
|
|
|
**Files:**
|
|
- Create: entire `server/` tree via `mix phx.new`
|
|
|
|
- [ ] **Step 1: Generate Phoenix project**
|
|
|
|
Run from `/Users/cabele/claudeprojects/proxmox_monitor`:
|
|
|
|
```bash
|
|
mix phx.new server --database sqlite3 --no-mailer --no-gettext --live --install
|
|
```
|
|
|
|
If prompted, answer `Y` to fetch deps.
|
|
|
|
Expected: creates `server/` with Phoenix scaffold, SQLite adapter, LiveView enabled, no Gettext, no Mailer. Deps fetched, assets installed.
|
|
|
|
- [ ] **Step 2: Verify scaffold builds and tests pass**
|
|
|
|
```bash
|
|
cd server && mix compile && mix test
|
|
```
|
|
|
|
Expected: compiles clean, default `PageControllerTest` passes.
|
|
|
|
- [ ] **Step 3: Commit the scaffold**
|
|
|
|
```bash
|
|
cd /Users/cabele/claudeprojects/proxmox_monitor
|
|
git add server/
|
|
git commit -m "feat(server): phoenix 1.7 scaffold with sqlite + liveview"
|
|
```
|
|
|
|
---
|
|
|
|
## Task 3: Server — Bcrypt Dependency
|
|
|
|
**Files:**
|
|
- Modify: `server/mix.exs`
|
|
|
|
- [ ] **Step 1: Add `:bcrypt_elixir` to deps**
|
|
|
|
In `server/mix.exs`, locate the `defp deps do` list and add the line below alongside existing entries:
|
|
|
|
```elixir
|
|
{:bcrypt_elixir, "~> 3.1"},
|
|
```
|
|
|
|
- [ ] **Step 2: Fetch and compile**
|
|
|
|
```bash
|
|
cd server && mix deps.get && mix compile
|
|
```
|
|
|
|
Expected: bcrypt_elixir and cc_precompiler fetched; compile succeeds (bcrypt NIF builds).
|
|
|
|
- [ ] **Step 3: Commit**
|
|
|
|
```bash
|
|
git add server/mix.exs server/mix.lock
|
|
git commit -m "feat(server): add bcrypt_elixir for token hashing"
|
|
```
|
|
|
|
---
|
|
|
|
## Task 4: Server — Host Schema + Context (TDD)
|
|
|
|
**Files:**
|
|
- Create: `server/priv/repo/migrations/<ts>_create_hosts.exs`
|
|
- Create: `server/lib/server/schema/host.ex`
|
|
- Create: `server/lib/server/hosts.ex`
|
|
- Create: `server/test/server/hosts_test.exs`
|
|
|
|
- [ ] **Step 1: Generate migration file**
|
|
|
|
```bash
|
|
cd server && mix ecto.gen.migration create_hosts
|
|
```
|
|
|
|
Fill the generated file (timestamped name) with:
|
|
|
|
```elixir
|
|
defmodule Server.Repo.Migrations.CreateHosts do
|
|
use Ecto.Migration
|
|
|
|
def change do
|
|
create table(:hosts) do
|
|
add :name, :string, null: false
|
|
add :token_hash, :string, null: false
|
|
add :agent_version, :string
|
|
add :proxmox_version, :string
|
|
add :zfs_version, :string
|
|
add :status, :string, null: false, default: "never_connected"
|
|
add :last_seen_at, :utc_datetime_usec
|
|
|
|
timestamps(type: :utc_datetime_usec)
|
|
end
|
|
|
|
create unique_index(:hosts, [:name])
|
|
end
|
|
end
|
|
```
|
|
|
|
- [ ] **Step 2: Write schema module**
|
|
|
|
Create `server/lib/server/schema/host.ex`:
|
|
|
|
```elixir
|
|
defmodule Server.Schema.Host do
|
|
use Ecto.Schema
|
|
import Ecto.Changeset
|
|
|
|
@statuses ~w(never_connected online offline)
|
|
|
|
schema "hosts" do
|
|
field :name, :string
|
|
field :token_hash, :string
|
|
field :agent_version, :string
|
|
field :proxmox_version, :string
|
|
field :zfs_version, :string
|
|
field :status, :string, default: "never_connected"
|
|
field :last_seen_at, :utc_datetime_usec
|
|
|
|
timestamps(type: :utc_datetime_usec)
|
|
end
|
|
|
|
def create_changeset(host, attrs) do
|
|
host
|
|
|> cast(attrs, [:name, :token_hash])
|
|
|> validate_required([:name, :token_hash])
|
|
|> validate_length(:name, min: 1, max: 100)
|
|
|> unique_constraint(:name)
|
|
end
|
|
|
|
def status_changeset(host, attrs) do
|
|
host
|
|
|> cast(attrs, [:status, :last_seen_at, :agent_version])
|
|
|> validate_inclusion(:status, @statuses)
|
|
end
|
|
end
|
|
```
|
|
|
|
- [ ] **Step 3: Write failing tests for the context**
|
|
|
|
Create `server/test/server/hosts_test.exs`:
|
|
|
|
```elixir
|
|
defmodule Server.HostsTest do
|
|
use Server.DataCase, async: true
|
|
|
|
alias Server.Hosts
|
|
|
|
describe "create_host/1" do
|
|
test "returns host and a plaintext token on success" do
|
|
assert {:ok, {host, token}} = Hosts.create_host("pve-01")
|
|
assert host.name == "pve-01"
|
|
assert host.status == "never_connected"
|
|
assert is_binary(token) and byte_size(token) >= 32
|
|
refute host.token_hash == token
|
|
end
|
|
|
|
test "rejects duplicate names" do
|
|
{:ok, _} = Hosts.create_host("pve-01")
|
|
assert {:error, changeset} = Hosts.create_host("pve-01")
|
|
assert %{name: ["has already been taken"]} = errors_on(changeset)
|
|
end
|
|
end
|
|
|
|
describe "authenticate/2" do
|
|
test "returns host for valid name+token" do
|
|
{:ok, {host, token}} = Hosts.create_host("pve-01")
|
|
assert {:ok, found} = Hosts.authenticate("pve-01", token)
|
|
assert found.id == host.id
|
|
end
|
|
|
|
test "returns :invalid_token for wrong token" do
|
|
{:ok, {_host, _token}} = Hosts.create_host("pve-01")
|
|
assert {:error, :invalid_token} = Hosts.authenticate("pve-01", "wrong")
|
|
end
|
|
|
|
test "returns :unknown_host when name does not exist" do
|
|
assert {:error, :unknown_host} = Hosts.authenticate("nope", "whatever")
|
|
end
|
|
end
|
|
|
|
describe "mark_online/2 and mark_offline/1" do
|
|
test "mark_online stamps status, last_seen_at, agent_version" do
|
|
{:ok, {host, _}} = Hosts.create_host("pve-01")
|
|
assert {:ok, updated} = Hosts.mark_online(host, "0.1.0")
|
|
assert updated.status == "online"
|
|
assert updated.agent_version == "0.1.0"
|
|
assert updated.last_seen_at != nil
|
|
end
|
|
|
|
test "mark_offline sets status to offline" do
|
|
{:ok, {host, _}} = Hosts.create_host("pve-01")
|
|
{:ok, online} = Hosts.mark_online(host, "0.1.0")
|
|
assert {:ok, offline} = Hosts.mark_offline(online)
|
|
assert offline.status == "offline"
|
|
end
|
|
end
|
|
end
|
|
```
|
|
|
|
- [ ] **Step 4: Run tests — expect failure**
|
|
|
|
```bash
|
|
cd server && mix test test/server/hosts_test.exs
|
|
```
|
|
|
|
Expected: compile error `Server.Hosts is not available` or similar.
|
|
|
|
- [ ] **Step 5: Implement the context**
|
|
|
|
Create `server/lib/server/hosts.ex`:
|
|
|
|
```elixir
|
|
defmodule Server.Hosts do
|
|
@moduledoc "Host registration, authentication, status tracking."
|
|
|
|
alias Server.Repo
|
|
alias Server.Schema.Host
|
|
|
|
@spec create_host(String.t()) :: {:ok, {Host.t(), String.t()}} | {:error, Ecto.Changeset.t()}
|
|
def create_host(name) do
|
|
token = generate_token()
|
|
hash = Bcrypt.hash_pwd_salt(token)
|
|
|
|
%Host{}
|
|
|> Host.create_changeset(%{name: name, token_hash: hash})
|
|
|> Repo.insert()
|
|
|> case do
|
|
{:ok, host} -> {:ok, {host, token}}
|
|
{:error, cs} -> {:error, cs}
|
|
end
|
|
end
|
|
|
|
@spec authenticate(String.t(), String.t()) ::
|
|
{:ok, Host.t()} | {:error, :unknown_host | :invalid_token}
|
|
def authenticate(name, token) when is_binary(name) and is_binary(token) do
|
|
case Repo.get_by(Host, name: name) do
|
|
nil ->
|
|
Bcrypt.no_user_verify()
|
|
{:error, :unknown_host}
|
|
|
|
host ->
|
|
if Bcrypt.verify_pass(token, host.token_hash) do
|
|
{:ok, host}
|
|
else
|
|
{:error, :invalid_token}
|
|
end
|
|
end
|
|
end
|
|
|
|
@spec mark_online(Host.t(), String.t() | nil) :: {:ok, Host.t()} | {:error, Ecto.Changeset.t()}
|
|
def mark_online(%Host{} = host, agent_version) do
|
|
host
|
|
|> Host.status_changeset(%{
|
|
status: "online",
|
|
last_seen_at: DateTime.utc_now(),
|
|
agent_version: agent_version
|
|
})
|
|
|> Repo.update()
|
|
end
|
|
|
|
@spec mark_offline(Host.t()) :: {:ok, Host.t()} | {:error, Ecto.Changeset.t()}
|
|
def mark_offline(%Host{} = host) do
|
|
host
|
|
|> Host.status_changeset(%{status: "offline"})
|
|
|> Repo.update()
|
|
end
|
|
|
|
@doc "Mark every host offline — called on server boot to clear stale online flags."
|
|
@spec mark_all_offline() :: {integer(), nil}
|
|
def mark_all_offline do
|
|
import Ecto.Query
|
|
Repo.update_all(from(h in Host), set: [status: "offline", updated_at: DateTime.utc_now()])
|
|
end
|
|
|
|
defp generate_token do
|
|
:crypto.strong_rand_bytes(32) |> Base.url_encode64(padding: false)
|
|
end
|
|
end
|
|
```
|
|
|
|
- [ ] **Step 6: Speed up bcrypt in tests**
|
|
|
|
In `server/config/test.exs`, add at the bottom (before the existing `config :phoenix` line if present, or anywhere at top level):
|
|
|
|
```elixir
|
|
config :bcrypt_elixir, :log_rounds, 4
|
|
```
|
|
|
|
- [ ] **Step 7: Run tests — expect all pass**
|
|
|
|
```bash
|
|
cd server && mix ecto.reset && mix test test/server/hosts_test.exs
|
|
```
|
|
|
|
Expected: 7 tests pass.
|
|
|
|
- [ ] **Step 8: Commit**
|
|
|
|
```bash
|
|
git add server/priv server/lib/server server/test/server server/config/test.exs
|
|
git commit -m "feat(server): host schema, context, auth, status transitions"
|
|
```
|
|
|
|
---
|
|
|
|
## Task 5: Server — AgentSocket + Mark-All-Offline on Boot
|
|
|
|
**Files:**
|
|
- Create: `server/lib/server_web/channels/agent_socket.ex`
|
|
- Modify: `server/lib/server_web/endpoint.ex`
|
|
- Modify: `server/lib/server/application.ex`
|
|
|
|
- [ ] **Step 1: Write AgentSocket**
|
|
|
|
Create `server/lib/server_web/channels/agent_socket.ex`:
|
|
|
|
```elixir
|
|
defmodule ServerWeb.AgentSocket do
|
|
@moduledoc "Entry socket for agents. Actual authentication happens in HostChannel.join/3."
|
|
use Phoenix.Socket
|
|
|
|
channel "host:*", ServerWeb.HostChannel
|
|
|
|
@impl true
|
|
def connect(_params, socket, _connect_info), do: {:ok, socket}
|
|
|
|
@impl true
|
|
def id(_socket), do: nil
|
|
end
|
|
```
|
|
|
|
- [ ] **Step 2: Mount the socket in the endpoint**
|
|
|
|
In `server/lib/server_web/endpoint.ex`, find the existing `socket "/live"` line and add just below it:
|
|
|
|
```elixir
|
|
socket "/socket", ServerWeb.AgentSocket,
|
|
websocket: [timeout: 45_000],
|
|
longpoll: false
|
|
```
|
|
|
|
- [ ] **Step 3: Clear stale online flags on boot**
|
|
|
|
In `server/lib/server/application.ex`, find the existing `start/2` function. It currently ends with something like:
|
|
|
|
```elixir
|
|
opts = [strategy: :one_for_one, name: Server.Supervisor]
|
|
Supervisor.start_link(children, opts)
|
|
end
|
|
```
|
|
|
|
Replace those two lines with:
|
|
|
|
```elixir
|
|
opts = [strategy: :one_for_one, name: Server.Supervisor]
|
|
result = Supervisor.start_link(children, opts)
|
|
with {:ok, _} <- result, do: Server.Hosts.mark_all_offline()
|
|
result
|
|
end
|
|
```
|
|
|
|
Rationale: if the server is restarted while agents were connected, their `online` row persists stale. Marking everything offline on boot lets the agent's next channel join flip it back to `online` cleanly.
|
|
|
|
- [ ] **Step 4: Compile to verify**
|
|
|
|
```bash
|
|
cd server && mix compile
|
|
```
|
|
|
|
Expected: no warnings about undefined `ServerWeb.HostChannel` (module exists as channel ref only; we'll create it next task — note this is acceptable because `channel/2` only registers the name).
|
|
|
|
- [ ] **Step 5: Commit**
|
|
|
|
```bash
|
|
git add server/lib/server_web/channels/agent_socket.ex server/lib/server_web/endpoint.ex server/lib/server/application.ex
|
|
git commit -m "feat(server): agent socket endpoint, clear online status on boot"
|
|
```
|
|
|
|
---
|
|
|
|
## Task 6: Server — HostChannel (TDD)
|
|
|
|
**Files:**
|
|
- Create: `server/lib/server_web/channels/host_channel.ex`
|
|
- Create: `server/test/server_web/channels/host_channel_test.exs`
|
|
- Modify: `server/test/support/channel_case.ex` (verify it exists; Phoenix scaffold creates it)
|
|
|
|
- [ ] **Step 1: Confirm ChannelCase exists**
|
|
|
|
```bash
|
|
ls server/test/support/channel_case.ex
|
|
```
|
|
|
|
Expected: file exists (`Phoenix 1.7 --live` scaffold creates it). If missing, skip this check and note — ChannelCase is required for the tests below.
|
|
|
|
- [ ] **Step 2: Write failing channel tests**
|
|
|
|
Create `server/test/server_web/channels/host_channel_test.exs`:
|
|
|
|
```elixir
|
|
defmodule ServerWeb.HostChannelTest do
|
|
use ServerWeb.ChannelCase, async: false
|
|
|
|
alias Server.Hosts
|
|
alias ServerWeb.AgentSocket
|
|
|
|
setup do
|
|
{:ok, {host, token}} = Hosts.create_host("pve-01")
|
|
%{host: host, token: token}
|
|
end
|
|
|
|
describe "join" do
|
|
test "succeeds with valid token and marks host online", %{host: host, token: token} do
|
|
{:ok, socket} = connect(AgentSocket, %{})
|
|
|
|
assert {:ok, _reply, socket} =
|
|
subscribe_and_join(socket, "host:pve-01", %{
|
|
"token" => token,
|
|
"agent_version" => "0.1.0"
|
|
})
|
|
|
|
assert socket.assigns.host_id == host.id
|
|
|
|
reloaded = Server.Repo.reload!(host)
|
|
assert reloaded.status == "online"
|
|
assert reloaded.agent_version == "0.1.0"
|
|
assert reloaded.last_seen_at != nil
|
|
end
|
|
|
|
test "rejects invalid token", %{host: _host} do
|
|
{:ok, socket} = connect(AgentSocket, %{})
|
|
|
|
assert {:error, %{reason: "invalid_token"}} =
|
|
subscribe_and_join(socket, "host:pve-01", %{
|
|
"token" => "garbage",
|
|
"agent_version" => "0.1.0"
|
|
})
|
|
end
|
|
|
|
test "rejects unknown host name" do
|
|
{:ok, socket} = connect(AgentSocket, %{})
|
|
|
|
assert {:error, %{reason: "unknown_host"}} =
|
|
subscribe_and_join(socket, "host:nope", %{
|
|
"token" => "x",
|
|
"agent_version" => "0.1.0"
|
|
})
|
|
end
|
|
|
|
test "rejects topic mismatch" do
|
|
{:ok, socket} = connect(AgentSocket, %{})
|
|
|
|
assert {:error, %{reason: "bad_topic"}} =
|
|
subscribe_and_join(socket, "host:", %{"token" => "x", "agent_version" => "0.1.0"})
|
|
end
|
|
end
|
|
|
|
describe "metric:fast event" do
|
|
setup %{token: token} do
|
|
{:ok, socket} = connect(AgentSocket, %{})
|
|
|
|
{:ok, _reply, joined} =
|
|
subscribe_and_join(socket, "host:pve-01", %{
|
|
"token" => token,
|
|
"agent_version" => "0.1.0"
|
|
})
|
|
|
|
%{socket: joined}
|
|
end
|
|
|
|
test "accepts metric payload and replies :ok", %{socket: socket} do
|
|
ref =
|
|
push(socket, "metric:fast", %{
|
|
"collected_at" => "2026-04-21T12:00:00Z",
|
|
"data" => %{"cpu_percent" => 12.3, "load1" => 0.2}
|
|
})
|
|
|
|
assert_reply ref, :ok
|
|
end
|
|
end
|
|
|
|
describe "terminate" do
|
|
test "marks host offline when channel process exits", %{host: host, token: token} do
|
|
{:ok, socket} = connect(AgentSocket, %{})
|
|
|
|
{:ok, _, joined} =
|
|
subscribe_and_join(socket, "host:pve-01", %{
|
|
"token" => token,
|
|
"agent_version" => "0.1.0"
|
|
})
|
|
|
|
Process.unlink(joined.channel_pid)
|
|
ref = Process.monitor(joined.channel_pid)
|
|
close(joined)
|
|
assert_receive {:DOWN, ^ref, :process, _, _}, 1_000
|
|
|
|
reloaded = Server.Repo.reload!(host)
|
|
assert reloaded.status == "offline"
|
|
end
|
|
end
|
|
end
|
|
```
|
|
|
|
- [ ] **Step 3: Run tests — expect failure (HostChannel not implemented)**
|
|
|
|
```bash
|
|
cd server && mix test test/server_web/channels/host_channel_test.exs
|
|
```
|
|
|
|
Expected: compile error `ServerWeb.HostChannel is not available`.
|
|
|
|
- [ ] **Step 4: Implement HostChannel**
|
|
|
|
Create `server/lib/server_web/channels/host_channel.ex`:
|
|
|
|
```elixir
|
|
defmodule ServerWeb.HostChannel do
|
|
use ServerWeb, :channel
|
|
require Logger
|
|
|
|
alias Server.Hosts
|
|
|
|
@impl true
|
|
def join("host:" <> name, params, socket) when name != "" do
|
|
token = Map.get(params, "token", "")
|
|
agent_version = Map.get(params, "agent_version")
|
|
|
|
case Hosts.authenticate(name, token) do
|
|
{:ok, host} ->
|
|
{:ok, _} = Hosts.mark_online(host, agent_version)
|
|
Logger.info("agent joined host:#{name}")
|
|
{:ok, assign(socket, :host_id, host.id) |> assign(:host_name, name)}
|
|
|
|
{:error, :unknown_host} ->
|
|
{:error, %{reason: "unknown_host"}}
|
|
|
|
{:error, :invalid_token} ->
|
|
{:error, %{reason: "invalid_token"}}
|
|
end
|
|
end
|
|
|
|
def join(_topic, _params, _socket), do: {:error, %{reason: "bad_topic"}}
|
|
|
|
@impl true
|
|
def handle_in("metric:fast", payload, socket) do
|
|
Logger.info("metric:fast host=#{socket.assigns.host_name} data=#{inspect(payload["data"])}")
|
|
{:reply, :ok, socket}
|
|
end
|
|
|
|
def handle_in("metric:medium", payload, socket) do
|
|
Logger.info("metric:medium host=#{socket.assigns.host_name} payload=#{inspect(payload)}")
|
|
{:reply, :ok, socket}
|
|
end
|
|
|
|
def handle_in("metric:slow", payload, socket) do
|
|
Logger.info("metric:slow host=#{socket.assigns.host_name} payload=#{inspect(payload)}")
|
|
{:reply, :ok, socket}
|
|
end
|
|
|
|
@impl true
|
|
def terminate(_reason, socket) do
|
|
case socket.assigns[:host_id] do
|
|
nil ->
|
|
:ok
|
|
|
|
id ->
|
|
with host when not is_nil(host) <- Server.Repo.get(Server.Schema.Host, id) do
|
|
Hosts.mark_offline(host)
|
|
end
|
|
|
|
:ok
|
|
end
|
|
end
|
|
end
|
|
```
|
|
|
|
- [ ] **Step 5: Run tests — expect pass**
|
|
|
|
```bash
|
|
cd server && mix test test/server_web/channels/host_channel_test.exs
|
|
```
|
|
|
|
Expected: all tests pass.
|
|
|
|
- [ ] **Step 6: Run full test suite**
|
|
|
|
```bash
|
|
cd server && mix test
|
|
```
|
|
|
|
Expected: all tests green.
|
|
|
|
- [ ] **Step 7: Commit**
|
|
|
|
```bash
|
|
git add server/lib/server_web/channels/host_channel.ex server/test/server_web/channels/host_channel_test.exs
|
|
git commit -m "feat(server): host channel with token auth and metric events"
|
|
```
|
|
|
|
---
|
|
|
|
## Task 7: Server — Smoke-Test Helper
|
|
|
|
**Files:**
|
|
- Create: `server/lib/server/release.ex` (minimal helper for IEx-driven host creation)
|
|
|
|
- [ ] **Step 1: Add a tiny release helper**
|
|
|
|
Create `server/lib/server/release.ex`:
|
|
|
|
```elixir
|
|
defmodule Server.Release do
|
|
@moduledoc "Convenience functions for IEx and future release tasks."
|
|
|
|
@doc "Create a host and print the plaintext token once."
|
|
def register_host(name) do
|
|
case Server.Hosts.create_host(name) do
|
|
{:ok, {host, token}} ->
|
|
IO.puts("Host '#{host.name}' registered (id=#{host.id}).")
|
|
IO.puts("TOKEN: #{token}")
|
|
IO.puts("Store this token NOW — it will never be shown again.")
|
|
{:ok, host, token}
|
|
|
|
{:error, cs} ->
|
|
IO.puts("Failed to register host: #{inspect(cs.errors)}")
|
|
{:error, cs}
|
|
end
|
|
end
|
|
end
|
|
```
|
|
|
|
- [ ] **Step 2: Compile**
|
|
|
|
```bash
|
|
cd server && mix compile
|
|
```
|
|
|
|
- [ ] **Step 3: Commit**
|
|
|
|
```bash
|
|
git add server/lib/server/release.ex
|
|
git commit -m "chore(server): iex helper for host registration"
|
|
```
|
|
|
|
---
|
|
|
|
## Task 8: Agent — Mix Project Bootstrap
|
|
|
|
**Files:**
|
|
- Create: `agent/` directory tree via `mix new`
|
|
|
|
- [ ] **Step 1: Generate the OTP app**
|
|
|
|
Run from `/Users/cabele/claudeprojects/proxmox_monitor`:
|
|
|
|
```bash
|
|
mix new agent --sup
|
|
```
|
|
|
|
Expected: creates `agent/` with `mix.exs`, `lib/agent.ex`, `lib/agent/application.ex`, `test/`.
|
|
|
|
- [ ] **Step 2: Replace `agent/mix.exs` contents**
|
|
|
|
Open `agent/mix.exs` and replace with:
|
|
|
|
```elixir
|
|
defmodule Agent.MixProject do
|
|
use Mix.Project
|
|
|
|
@version "0.1.0"
|
|
|
|
def project do
|
|
[
|
|
app: :agent,
|
|
version: @version,
|
|
elixir: "~> 1.17",
|
|
start_permanent: Mix.env() == :prod,
|
|
deps: deps(),
|
|
elixirc_paths: elixirc_paths(Mix.env())
|
|
]
|
|
end
|
|
|
|
def application do
|
|
[
|
|
extra_applications: [:logger, :crypto],
|
|
mod: {Agent.Application, []}
|
|
]
|
|
end
|
|
|
|
defp deps do
|
|
[
|
|
{:slipstream, "~> 1.1"},
|
|
{:jason, "~> 1.4"},
|
|
{:toml, "~> 0.7"}
|
|
]
|
|
end
|
|
|
|
defp elixirc_paths(:test), do: ["lib", "test/support"]
|
|
defp elixirc_paths(_), do: ["lib"]
|
|
end
|
|
```
|
|
|
|
- [ ] **Step 3: Fetch deps and compile**
|
|
|
|
```bash
|
|
cd agent && mix deps.get && mix compile
|
|
```
|
|
|
|
Expected: slipstream, mint_web_socket, jason, toml fetched; compile succeeds.
|
|
|
|
- [ ] **Step 4: Commit**
|
|
|
|
```bash
|
|
cd /Users/cabele/claudeprojects/proxmox_monitor
|
|
git add agent/
|
|
git commit -m "feat(agent): otp app scaffold with slipstream + toml deps"
|
|
```
|
|
|
|
---
|
|
|
|
## Task 9: Agent — Version Constant
|
|
|
|
**Files:**
|
|
- Modify: `agent/lib/agent.ex`
|
|
|
|
- [ ] **Step 1: Replace the scaffolded Agent module**
|
|
|
|
Replace the entire contents of `agent/lib/agent.ex` with:
|
|
|
|
```elixir
|
|
defmodule Agent do
|
|
@moduledoc "Top-level namespace. Exposes the compiled version for reporting."
|
|
|
|
@version Mix.Project.config()[:version]
|
|
|
|
@spec version() :: String.t()
|
|
def version, do: @version
|
|
end
|
|
```
|
|
|
|
- [ ] **Step 2: Compile and quick-check in IEx**
|
|
|
|
```bash
|
|
cd agent && mix compile
|
|
```
|
|
|
|
- [ ] **Step 3: Commit**
|
|
|
|
```bash
|
|
git add agent/lib/agent.ex
|
|
git commit -m "feat(agent): expose compile-time version"
|
|
```
|
|
|
|
---
|
|
|
|
## Task 10: Agent — Config Module (TDD)
|
|
|
|
**Files:**
|
|
- Create: `agent/lib/agent/config.ex`
|
|
- Create: `agent/test/agent/config_test.exs`
|
|
- Create: `agent/test/fixtures/agent.toml` (sample config used by test)
|
|
|
|
- [ ] **Step 1: Write a fixture config**
|
|
|
|
Create `agent/test/fixtures/agent.toml`:
|
|
|
|
```toml
|
|
server_url = "wss://monitor.example.com/socket/websocket"
|
|
token = "test_token_123"
|
|
host_id = "pve-test-01"
|
|
|
|
[intervals]
|
|
fast_seconds = 15
|
|
medium_seconds = 120
|
|
slow_seconds = 600
|
|
```
|
|
|
|
- [ ] **Step 2: Write failing tests**
|
|
|
|
Create `agent/test/agent/config_test.exs`:
|
|
|
|
```elixir
|
|
defmodule Agent.ConfigTest do
|
|
use ExUnit.Case, async: true
|
|
|
|
alias Agent.Config
|
|
|
|
@fixture Path.expand("../fixtures/agent.toml", __DIR__)
|
|
|
|
describe "load/1" do
|
|
test "parses required fields" do
|
|
assert {:ok, cfg} = Config.load(@fixture)
|
|
assert cfg.server_url == "wss://monitor.example.com/socket/websocket"
|
|
assert cfg.token == "test_token_123"
|
|
assert cfg.host_id == "pve-test-01"
|
|
assert cfg.fast_seconds == 15
|
|
assert cfg.medium_seconds == 120
|
|
assert cfg.slow_seconds == 600
|
|
end
|
|
|
|
test "returns error for missing file" do
|
|
assert {:error, {:file_read, _}} = Config.load("/does/not/exist.toml")
|
|
end
|
|
|
|
test "defaults host_id to system hostname when absent" do
|
|
tmp = Path.join(System.tmp_dir!(), "agent_nohost.toml")
|
|
|
|
File.write!(tmp, """
|
|
server_url = "wss://x/socket/websocket"
|
|
token = "t"
|
|
""")
|
|
|
|
on_exit(fn -> File.rm(tmp) end)
|
|
|
|
assert {:ok, cfg} = Config.load(tmp)
|
|
assert is_binary(cfg.host_id)
|
|
assert cfg.host_id != ""
|
|
end
|
|
|
|
test "applies default intervals when [intervals] is absent" do
|
|
tmp = Path.join(System.tmp_dir!(), "agent_nointervals.toml")
|
|
|
|
File.write!(tmp, """
|
|
server_url = "wss://x/socket/websocket"
|
|
token = "t"
|
|
host_id = "h"
|
|
""")
|
|
|
|
on_exit(fn -> File.rm(tmp) end)
|
|
|
|
assert {:ok, cfg} = Config.load(tmp)
|
|
assert cfg.fast_seconds == 30
|
|
assert cfg.medium_seconds == 300
|
|
assert cfg.slow_seconds == 1800
|
|
end
|
|
|
|
test "returns error when required keys missing" do
|
|
tmp = Path.join(System.tmp_dir!(), "agent_bad.toml")
|
|
File.write!(tmp, "token = \"t\"\n")
|
|
on_exit(fn -> File.rm(tmp) end)
|
|
assert {:error, {:missing_key, :server_url}} = Config.load(tmp)
|
|
end
|
|
end
|
|
end
|
|
```
|
|
|
|
- [ ] **Step 3: Run tests — expect failure**
|
|
|
|
```bash
|
|
cd agent && mix test test/agent/config_test.exs
|
|
```
|
|
|
|
Expected: `Agent.Config is not available`.
|
|
|
|
- [ ] **Step 4: Implement the config loader**
|
|
|
|
Create `agent/lib/agent/config.ex`:
|
|
|
|
```elixir
|
|
defmodule Agent.Config do
|
|
@moduledoc "Loads and validates the TOML agent config."
|
|
|
|
defstruct [
|
|
:server_url,
|
|
:token,
|
|
:host_id,
|
|
fast_seconds: 30,
|
|
medium_seconds: 300,
|
|
slow_seconds: 1800
|
|
]
|
|
|
|
@type t :: %__MODULE__{
|
|
server_url: String.t(),
|
|
token: String.t(),
|
|
host_id: String.t(),
|
|
fast_seconds: pos_integer(),
|
|
medium_seconds: pos_integer(),
|
|
slow_seconds: pos_integer()
|
|
}
|
|
|
|
@required ~w(server_url token)a
|
|
|
|
@spec load(Path.t()) ::
|
|
{:ok, t()}
|
|
| {:error, {:file_read, term()} | {:parse, term()} | {:missing_key, atom()}}
|
|
def load(path) do
|
|
with {:ok, body} <- read_file(path),
|
|
{:ok, parsed} <- parse_toml(body),
|
|
:ok <- validate_required(parsed) do
|
|
{:ok, build(parsed)}
|
|
end
|
|
end
|
|
|
|
defp read_file(path) do
|
|
case File.read(path) do
|
|
{:ok, body} -> {:ok, body}
|
|
{:error, reason} -> {:error, {:file_read, reason}}
|
|
end
|
|
end
|
|
|
|
defp parse_toml(body) do
|
|
case Toml.decode(body) do
|
|
{:ok, map} -> {:ok, map}
|
|
{:error, reason} -> {:error, {:parse, reason}}
|
|
end
|
|
end
|
|
|
|
defp validate_required(map) do
|
|
Enum.find_value(@required, :ok, fn key ->
|
|
case Map.get(map, Atom.to_string(key)) do
|
|
v when is_binary(v) and v != "" -> nil
|
|
_ -> {:error, {:missing_key, key}}
|
|
end
|
|
end)
|
|
end
|
|
|
|
defp build(map) do
|
|
intervals = Map.get(map, "intervals", %{})
|
|
|
|
%__MODULE__{
|
|
server_url: map["server_url"],
|
|
token: map["token"],
|
|
host_id: map["host_id"] || hostname(),
|
|
fast_seconds: Map.get(intervals, "fast_seconds", 30),
|
|
medium_seconds: Map.get(intervals, "medium_seconds", 300),
|
|
slow_seconds: Map.get(intervals, "slow_seconds", 1800)
|
|
}
|
|
end
|
|
|
|
defp hostname do
|
|
case :inet.gethostname() do
|
|
{:ok, name} -> List.to_string(name)
|
|
_ -> "unknown-host"
|
|
end
|
|
end
|
|
end
|
|
```
|
|
|
|
- [ ] **Step 5: Run tests — expect pass**
|
|
|
|
```bash
|
|
cd agent && mix test test/agent/config_test.exs
|
|
```
|
|
|
|
Expected: 5 tests pass.
|
|
|
|
- [ ] **Step 6: Commit**
|
|
|
|
```bash
|
|
git add agent/lib/agent/config.ex agent/test/agent/config_test.exs agent/test/fixtures/agent.toml
|
|
git commit -m "feat(agent): toml config loader with defaults and validation"
|
|
```
|
|
|
|
---
|
|
|
|
## Task 11: Agent — Host Collector (TDD with /proc fixtures)
|
|
|
|
**Files:**
|
|
- Create: `agent/lib/agent/collectors/host.ex`
|
|
- Create: `agent/test/agent/collectors/host_test.exs`
|
|
- Create: `agent/test/fixtures/proc/loadavg`
|
|
- Create: `agent/test/fixtures/proc/meminfo`
|
|
- Create: `agent/test/fixtures/proc/uptime`
|
|
|
|
The collector reads Linux `/proc`. Tests run on macOS too — they point the collector at fixture files instead.
|
|
|
|
- [ ] **Step 1: Write fixture files**
|
|
|
|
Create `agent/test/fixtures/proc/loadavg`:
|
|
|
|
```
|
|
0.42 0.55 0.31 3/512 12345
|
|
```
|
|
|
|
Create `agent/test/fixtures/proc/meminfo`:
|
|
|
|
```
|
|
MemTotal: 16384000 kB
|
|
MemFree: 2048000 kB
|
|
MemAvailable: 8192000 kB
|
|
Buffers: 256000 kB
|
|
Cached: 4096000 kB
|
|
SwapTotal: 4194304 kB
|
|
SwapFree: 4194304 kB
|
|
```
|
|
|
|
Create `agent/test/fixtures/proc/uptime`:
|
|
|
|
```
|
|
123456.78 987654.32
|
|
```
|
|
|
|
- [ ] **Step 2: Write failing tests**
|
|
|
|
Create `agent/test/agent/collectors/host_test.exs`:
|
|
|
|
```elixir
|
|
defmodule Agent.Collectors.HostTest do
|
|
use ExUnit.Case, async: true
|
|
|
|
alias Agent.Collectors.Host
|
|
|
|
@proc Path.expand("../../fixtures/proc", __DIR__)
|
|
|
|
test "collects load average" do
|
|
sample = Host.collect(proc_dir: @proc)
|
|
assert sample.load1 == 0.42
|
|
assert sample.load5 == 0.55
|
|
assert sample.load15 == 0.31
|
|
end
|
|
|
|
test "collects memory in bytes" do
|
|
sample = Host.collect(proc_dir: @proc)
|
|
assert sample.mem_total_bytes == 16_384_000 * 1024
|
|
assert sample.mem_available_bytes == 8_192_000 * 1024
|
|
assert sample.mem_used_bytes == sample.mem_total_bytes - sample.mem_available_bytes
|
|
end
|
|
|
|
test "collects uptime seconds" do
|
|
sample = Host.collect(proc_dir: @proc)
|
|
assert sample.uptime_seconds == 123_456
|
|
end
|
|
|
|
test "includes hostname string" do
|
|
sample = Host.collect(proc_dir: @proc)
|
|
assert is_binary(sample.hostname)
|
|
assert sample.hostname != ""
|
|
end
|
|
|
|
test "missing proc files yield :error field, not a crash" do
|
|
sample = Host.collect(proc_dir: "/nonexistent/path/xyz")
|
|
assert sample.errors != []
|
|
end
|
|
end
|
|
```
|
|
|
|
- [ ] **Step 3: Run tests — expect failure**
|
|
|
|
```bash
|
|
cd agent && mix test test/agent/collectors/host_test.exs
|
|
```
|
|
|
|
Expected: `Agent.Collectors.Host is not available`.
|
|
|
|
- [ ] **Step 4: Implement collector**
|
|
|
|
Create `agent/lib/agent/collectors/host.ex`:
|
|
|
|
```elixir
|
|
defmodule Agent.Collectors.Host do
|
|
@moduledoc """
|
|
Reads host metrics from /proc. Accepts `proc_dir:` option for testability.
|
|
Never raises — on read failure, populates `:errors` and leaves the field nil.
|
|
"""
|
|
|
|
@type sample :: %{
|
|
hostname: String.t(),
|
|
load1: float() | nil,
|
|
load5: float() | nil,
|
|
load15: float() | nil,
|
|
mem_total_bytes: non_neg_integer() | nil,
|
|
mem_available_bytes: non_neg_integer() | nil,
|
|
mem_used_bytes: non_neg_integer() | nil,
|
|
uptime_seconds: non_neg_integer() | nil,
|
|
errors: [term()]
|
|
}
|
|
|
|
@spec collect(keyword()) :: sample()
|
|
def collect(opts \\ []) do
|
|
proc_dir = Keyword.get(opts, :proc_dir, "/proc")
|
|
|
|
{load, e1} = safe(&read_loadavg/1, [proc_dir], {nil, nil, nil})
|
|
{mem, e2} = safe(&read_meminfo/1, [proc_dir], %{total: nil, available: nil})
|
|
{uptime, e3} = safe(&read_uptime/1, [proc_dir], nil)
|
|
|
|
total = mem.total
|
|
avail = mem.available
|
|
used = if total && avail, do: total - avail, else: nil
|
|
{load1, load5, load15} = load
|
|
|
|
%{
|
|
hostname: hostname(),
|
|
load1: load1,
|
|
load5: load5,
|
|
load15: load15,
|
|
mem_total_bytes: total,
|
|
mem_available_bytes: avail,
|
|
mem_used_bytes: used,
|
|
uptime_seconds: uptime,
|
|
errors: Enum.filter([e1, e2, e3], & &1)
|
|
}
|
|
end
|
|
|
|
defp safe(fun, args, fallback) do
|
|
try do
|
|
{apply(fun, args), nil}
|
|
rescue
|
|
e -> {fallback, {fun_name(fun), Exception.message(e)}}
|
|
catch
|
|
:error, reason -> {fallback, {fun_name(fun), reason}}
|
|
end
|
|
end
|
|
|
|
defp fun_name(fun), do: Function.info(fun)[:name]
|
|
|
|
defp read_loadavg(proc_dir) do
|
|
body = File.read!(Path.join(proc_dir, "loadavg"))
|
|
[l1, l5, l15 | _] = String.split(body, ~r/\s+/, trim: true)
|
|
{to_float(l1), to_float(l5), to_float(l15)}
|
|
end
|
|
|
|
defp read_meminfo(proc_dir) do
|
|
body = File.read!(Path.join(proc_dir, "meminfo"))
|
|
|
|
parsed =
|
|
body
|
|
|> String.split("\n", trim: true)
|
|
|> Enum.reduce(%{}, fn line, acc ->
|
|
case String.split(line, ~r/:\s+/, parts: 2) do
|
|
[key, val] -> Map.put(acc, key, val)
|
|
_ -> acc
|
|
end
|
|
end)
|
|
|
|
%{
|
|
total: kb_to_bytes(parsed["MemTotal"]),
|
|
available: kb_to_bytes(parsed["MemAvailable"])
|
|
}
|
|
end
|
|
|
|
defp read_uptime(proc_dir) do
|
|
body = File.read!(Path.join(proc_dir, "uptime"))
|
|
[secs | _] = String.split(body, " ", trim: true)
|
|
secs |> to_float() |> trunc()
|
|
end
|
|
|
|
defp kb_to_bytes(nil), do: nil
|
|
|
|
defp kb_to_bytes(str) do
|
|
case Regex.run(~r/(\d+)\s*kB/, str) do
|
|
[_, kb] -> String.to_integer(kb) * 1024
|
|
_ -> nil
|
|
end
|
|
end
|
|
|
|
defp to_float(s) do
|
|
{f, _} = Float.parse(s)
|
|
f
|
|
end
|
|
|
|
defp hostname do
|
|
case :inet.gethostname() do
|
|
{:ok, name} -> List.to_string(name)
|
|
_ -> "unknown-host"
|
|
end
|
|
end
|
|
end
|
|
```
|
|
|
|
- [ ] **Step 5: Run tests — expect pass**
|
|
|
|
```bash
|
|
cd agent && mix test test/agent/collectors/host_test.exs
|
|
```
|
|
|
|
Expected: 5 tests pass.
|
|
|
|
- [ ] **Step 6: Commit**
|
|
|
|
```bash
|
|
git add agent/lib/agent/collectors agent/test/agent/collectors agent/test/fixtures/proc
|
|
git commit -m "feat(agent): host collector for /proc loadavg, meminfo, uptime"
|
|
```
|
|
|
|
---
|
|
|
|
## Task 12: Agent — Reporter (Slipstream Client)
|
|
|
|
**Files:**
|
|
- Create: `agent/lib/agent/reporter.ex`
|
|
|
|
The Reporter is a Slipstream-backed GenServer. Unit-testing a real WS client is out of scope for Phase 1 — coverage comes from the end-to-end smoke test in Task 14.
|
|
|
|
- [ ] **Step 1: Implement Reporter**
|
|
|
|
Create `agent/lib/agent/reporter.ex`:
|
|
|
|
```elixir
|
|
defmodule Agent.Reporter do
|
|
@moduledoc """
|
|
Maintains a persistent Phoenix Channel connection to the server, joins
|
|
`host:<host_id>`, and pushes metric samples on the configured fast interval.
|
|
"""
|
|
|
|
use Slipstream, restart: :permanent
|
|
require Logger
|
|
|
|
alias Agent.Collectors.Host
|
|
|
|
def start_link(%Agent.Config{} = cfg) do
|
|
Slipstream.start_link(__MODULE__, cfg, name: __MODULE__)
|
|
end
|
|
|
|
@impl Slipstream
|
|
def init(cfg) do
|
|
socket =
|
|
new_socket()
|
|
|> assign(:cfg, cfg)
|
|
|> assign(:topic, "host:" <> cfg.host_id)
|
|
|> connect!(uri: cfg.server_url)
|
|
|
|
{:ok, socket}
|
|
end
|
|
|
|
@impl Slipstream
|
|
def handle_connect(socket) do
|
|
topic = socket.assigns.topic
|
|
cfg = socket.assigns.cfg
|
|
|
|
payload = %{"token" => cfg.token, "agent_version" => Agent.version()}
|
|
Logger.info("reporter: connected, joining #{topic}")
|
|
{:ok, join(socket, topic, payload)}
|
|
end
|
|
|
|
@impl Slipstream
|
|
def handle_join(topic, _reply, socket) do
|
|
Logger.info("reporter: joined #{topic}")
|
|
send(self(), :collect_fast)
|
|
{:ok, socket}
|
|
end
|
|
|
|
@impl Slipstream
|
|
def handle_info(:collect_fast, socket) do
|
|
sample = Host.collect()
|
|
payload = %{collected_at: DateTime.utc_now() |> DateTime.to_iso8601(), data: sample}
|
|
:ok = push_metric(socket, "metric:fast", payload)
|
|
Process.send_after(self(), :collect_fast, socket.assigns.cfg.fast_seconds * 1000)
|
|
{:ok, socket}
|
|
end
|
|
|
|
@impl Slipstream
|
|
def handle_disconnect(reason, socket) do
|
|
Logger.warning("reporter: disconnected — #{inspect(reason)}; reconnecting")
|
|
reconnect(socket)
|
|
end
|
|
|
|
@impl Slipstream
|
|
def handle_topic_close(topic, reason, socket) do
|
|
Logger.warning("reporter: topic #{topic} closed: #{inspect(reason)}; rejoining")
|
|
rejoin(socket, topic)
|
|
end
|
|
|
|
defp push_metric(socket, event, payload) do
|
|
case push(socket, socket.assigns.topic, event, payload) do
|
|
{:ok, _ref} -> :ok
|
|
{:error, reason} ->
|
|
Logger.warning("reporter: push failed: #{inspect(reason)}")
|
|
:ok
|
|
end
|
|
end
|
|
end
|
|
```
|
|
|
|
- [ ] **Step 2: Compile**
|
|
|
|
```bash
|
|
cd agent && mix compile
|
|
```
|
|
|
|
Expected: no errors. Warnings about unused `handle_topic_close` params are fine.
|
|
|
|
- [ ] **Step 3: Commit**
|
|
|
|
```bash
|
|
git add agent/lib/agent/reporter.ex
|
|
git commit -m "feat(agent): slipstream reporter — join, push, auto-reconnect"
|
|
```
|
|
|
|
---
|
|
|
|
## Task 13: Agent — Application Supervisor
|
|
|
|
**Files:**
|
|
- Modify: `agent/lib/agent/application.ex`
|
|
- Create: `agent/config/config.exs`
|
|
- Create: `agent/config/runtime.exs`
|
|
|
|
- [ ] **Step 1: Replace application module**
|
|
|
|
Replace `agent/lib/agent/application.ex` with:
|
|
|
|
```elixir
|
|
defmodule Agent.Application do
|
|
@moduledoc false
|
|
use Application
|
|
require Logger
|
|
|
|
@impl true
|
|
def start(_type, _args) do
|
|
children =
|
|
case load_config() do
|
|
{:ok, cfg} ->
|
|
Logger.info("agent: starting with host_id=#{cfg.host_id}")
|
|
[{Agent.Reporter, cfg}]
|
|
|
|
{:error, reason} ->
|
|
Logger.error("agent: no config loaded (#{inspect(reason)}); running in idle mode")
|
|
[]
|
|
end
|
|
|
|
Supervisor.start_link(children, strategy: :one_for_one, name: Agent.Supervisor)
|
|
end
|
|
|
|
defp load_config do
|
|
path =
|
|
System.get_env("AGENT_CONFIG") ||
|
|
Application.get_env(:agent, :config_path, "/etc/proxmox-monitor/agent.toml")
|
|
|
|
case File.exists?(path) do
|
|
true -> Agent.Config.load(path)
|
|
false -> {:error, {:file_missing, path}}
|
|
end
|
|
end
|
|
end
|
|
```
|
|
|
|
- [ ] **Step 2: Add minimal compile-time config**
|
|
|
|
Create `agent/config/config.exs`:
|
|
|
|
```elixir
|
|
import Config
|
|
|
|
config :logger, :default_formatter, format: "$time [$level] $message\n"
|
|
|
|
if File.exists?(Path.join([__DIR__, "#{config_env()}.exs"])) do
|
|
import_config "#{config_env()}.exs"
|
|
end
|
|
```
|
|
|
|
Create `agent/config/runtime.exs`:
|
|
|
|
```elixir
|
|
import Config
|
|
|
|
if path = System.get_env("AGENT_CONFIG") do
|
|
config :agent, :config_path, path
|
|
end
|
|
```
|
|
|
|
- [ ] **Step 3: Compile and run existing tests**
|
|
|
|
```bash
|
|
cd agent && mix compile && mix test
|
|
```
|
|
|
|
Expected: all tests pass. On cold boot with no config present, the app starts in idle mode (no crash).
|
|
|
|
- [ ] **Step 4: Commit**
|
|
|
|
```bash
|
|
git add agent/lib/agent/application.ex agent/config
|
|
git commit -m "feat(agent): supervisor boots reporter when config is present"
|
|
```
|
|
|
|
---
|
|
|
|
## Task 14: End-to-End Smoke Test
|
|
|
|
**Goal:** Prove the agent connects to a locally-running server, joins the channel, and the server logs an incoming `metric:fast` payload.
|
|
|
|
**Files:**
|
|
- Create: `/tmp/agent-local.toml` (ad-hoc, not committed)
|
|
|
|
- [ ] **Step 1: Start the server**
|
|
|
|
In terminal A:
|
|
|
|
```bash
|
|
cd /Users/cabele/claudeprojects/proxmox_monitor/server
|
|
mix ecto.create
|
|
mix ecto.migrate
|
|
iex -S mix phx.server
|
|
```
|
|
|
|
Expected: `[info] Running ServerWeb.Endpoint with Bandit ... http://localhost:4000`
|
|
|
|
- [ ] **Step 2: Register a host from the IEx shell in terminal A**
|
|
|
|
```elixir
|
|
iex> Server.Release.register_host("pve-dev-01")
|
|
```
|
|
|
|
Expected output:
|
|
|
|
```
|
|
Host 'pve-dev-01' registered (id=1).
|
|
TOKEN: <32+ char string>
|
|
Store this token NOW — it will never be shown again.
|
|
```
|
|
|
|
Copy the token for the next step.
|
|
|
|
- [ ] **Step 3: Write a local agent config**
|
|
|
|
In terminal B, with `<TOKEN>` from the previous step:
|
|
|
|
```bash
|
|
cat > /tmp/agent-local.toml <<EOF
|
|
server_url = "ws://localhost:4000/socket/websocket"
|
|
token = "<TOKEN>"
|
|
host_id = "pve-dev-01"
|
|
|
|
[intervals]
|
|
fast_seconds = 5
|
|
medium_seconds = 60
|
|
slow_seconds = 300
|
|
EOF
|
|
```
|
|
|
|
- [ ] **Step 4: Start the agent**
|
|
|
|
Still in terminal B:
|
|
|
|
```bash
|
|
cd /Users/cabele/claudeprojects/proxmox_monitor/agent
|
|
AGENT_CONFIG=/tmp/agent-local.toml iex -S mix
|
|
```
|
|
|
|
Expected in terminal B: `agent: starting with host_id=pve-dev-01` then `reporter: connected, joining host:pve-dev-01` then `reporter: joined host:pve-dev-01`.
|
|
|
|
- [ ] **Step 5: Observe metrics in terminal A**
|
|
|
|
Within 5 seconds, terminal A should show:
|
|
|
|
```
|
|
[info] agent joined host:pve-dev-01
|
|
[info] metric:fast host=pve-dev-01 data=%{...}
|
|
```
|
|
|
|
The `data=` map contains `:hostname`, `:load1/5/15`, `:mem_*_bytes`, `:uptime_seconds`. On macOS dev machines, `:errors` will be populated (no `/proc`). That's expected — the network path and channel protocol are what we're verifying here.
|
|
|
|
- [ ] **Step 6: Verify host status in DB**
|
|
|
|
In terminal A IEx:
|
|
|
|
```elixir
|
|
iex> Server.Repo.get_by(Server.Schema.Host, name: "pve-dev-01") |> Map.take([:status, :agent_version, :last_seen_at])
|
|
```
|
|
|
|
Expected: `%{status: "online", agent_version: "0.1.0", last_seen_at: ~U[...]}`.
|
|
|
|
- [ ] **Step 7: Verify terminate marks host offline**
|
|
|
|
Stop the agent in terminal B with `Ctrl+C, a`. Re-run the query from Step 6.
|
|
|
|
Expected: `status: "offline"`, `last_seen_at` preserved from the last online stamp.
|
|
|
|
- [ ] **Step 8: Clean up temp file and commit a smoke-test log**
|
|
|
|
```bash
|
|
rm /tmp/agent-local.toml
|
|
```
|
|
|
|
No code changes — no commit needed. Phase 1 is functionally complete.
|
|
|
|
---
|
|
|
|
## Phase 1 Exit Criteria
|
|
|
|
- Monorepo with `server/` and `agent/` each building clean.
|
|
- `cd server && mix test` — all green.
|
|
- `cd agent && mix test` — all green.
|
|
- Manual smoke test in Task 14 — agent joins channel, server logs metrics, host status transitions online→offline on disconnect.
|
|
- All commits on `main`.
|
|
|
|
Next up (Phase 2): metric persistence in SQLite, ZFS collector, VM collector, Storage collector. See roadmap in `proxmox-monitor-konzept.md`.
|