Commit graph

61 commits

Author SHA1 Message Date
b798b462ca docs: implementation plan for agent diagnostic dump
Seven tasks: config field, Diagnostics module, Writer GenServer,
end-to-end test, Shell hook, Reporter hook, Application integration.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 22:13:19 +02:00
b2070b6a39 docs: spec for agent diagnostic dump
Opt-in per-command and per-sample dump to configurable dump_dir.
Config-gated via [debug] dump_dir, no change when unset. Serialized
through a single writer GenServer to avoid interleaving.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 21:57:09 +02:00
28a40a2650 chore(ui,agent): harden collector parsing, drop dead CSS, resilver label
Addresses final code review:
- to_int/1 now returns 0 on nil or unparseable strings instead of crashing
- remove unused .pool-row CSS (superseded by .pool-block)
- clamp capacity bar width to [0, 100] to prevent visual overflow
- pool_scrub_line/1 uses scan_function so resilver shows as "resilver..."
2026-04-22 18:06:17 +02:00
dd992573a1 fix(ui): strengthen scrub assertion, cover degraded vdev render path
Addresses code review: differentiate pool_scrub_line/1 FINISHED clause
with the word "finished", test the degraded-vdev callout via a second
DEGRADED pool in the fixture, and replace the generic "scrub" match
with an assertion on the full finished line.
2026-04-22 18:00:13 +02:00
f05c20ed0b feat(ui): detailed per-pool block with type, capacity bar, scrub state 2026-04-22 17:55:07 +02:00
612091ff1e style(ui): capacity bar and per-pool block styles 2026-04-22 17:51:22 +02:00
041dfc8fc0 test(agent): cover stripe, mixed, and special-vdev pool_type classification 2026-04-22 17:48:45 +02:00
e763ea96bd feat(agent): enrich zpool summary with type, scan state, vdev list 2026-04-22 17:44:07 +02:00
a4f4d3ca51 docs: implementation plan for ZFS pool detail
Four tasks: collector enrichment (pool_type/scan/vdevs), classification
coverage tests, CSS for capacity bar + pool block, LiveView rendering
and test updates.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 17:40:31 +02:00
45f59eb163 docs: spec for ZFS pool detail enrichment
Compact per-pool block with type, capacity bar, used/free/total,
scrub state, and vdev summary. Collector gets pool_type derivation,
scan state, and vdev list — no new shell-outs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 17:21:06 +02:00
f616d466eb docs: runbook notes assets.deploy runs in release step + troubleshooting entry 2026-04-22 10:19:52 +02:00
50676a7cb8 refactor(ui): minimalistic utilitarian redesign across all views
New design language:
  - dark background, system sans for UI, monospace for data
  - single green accent, amber/red for warn/critical
  - square-bordered panels + tables, no rounded cards or shadows
  - status conveyed via left-border on overview cards + badges

Changes:
  - new app.css defines CSS vars + component classes (.panel, .tbl,
    .card, .btn, .input, .badge with [data-status=*])
  - new ServerWeb.DashboardNav function component for a shared top nav
    with active-link highlighting; replaces per-view navigation clutter
  - strip the Phoenix welcome scaffold (logo, version badge, twitter/GH
    links) from layouts/app.html.heex; leaves only flash + content
  - root.html.heex title suffix switched to 'Proxmox Monitor', body
    loses the Tailwind-white background
  - rewrite render/1 in all four LiveViews + login template to use the
    new classes; admin form now uses <.form for={@form}> and properly
    clears on success
  - login page redesigned to a single tight panel matching the rest

All 58 tests still pass; 'mix compile --warnings-as-errors' is clean.
2026-04-22 10:18:46 +02:00
1b031ecdc3 fix(server): run assets.deploy as a mix release step
Without this, 'mix release' produced a tarball that had app.css/app.js
(so LiveView worked) but was missing cache_manifest.json and the digested
asset paths. Phoenix served the bare files OK, but the client-side
LiveView bootstrap timing was fragile: if a form was submitted before
the LiveSocket attached, the browser fell back to a native HTML GET,
producing bug-report URLs like /admin/hosts?host%5Bname%5D=repl.

Define a project releases/0 with a pre-assemble step that runs
assets.deploy, so minified + digested assets are baked into every
release tarball.

Also gitignore digested priv/static artifacts so dev-time byproducts
don't pollute commits.
2026-04-22 10:18:28 +02:00
bb2a88fb15 fix(agent): bump Dockerfile Zig to 0.15.2 for burrito 1.3
Burrito 1.3 now requires Zig 0.15.2 (build fails with 'Your Zig version
does not match the one Burrito requires! We need 0.15.2, you have: 0.13.0').

Zig also changed its tarball naming around 0.15: the arch now comes
before 'linux' (zig-x86_64-linux-VER.tar.xz instead of
zig-linux-x86_64-VER.tar.xz), so both the download URL and the
post-extract symlink glob had to change.
2026-04-22 09:23:35 +02:00
2f65bab7cf docs: single-file HTML slide deck for setup & deployment
32-slide self-contained deck mirroring SETUP-AND-DEPLOY.md structure.
Keyboard nav (arrows/space/PageUp-Down/digits/f for fullscreen), swipe,
click-to-advance, deep-linkable slides via #s=N, print-friendly.
Zero external deps — ships as one HTML file.
2026-04-22 09:06:32 +02:00
31f7172cf3 docs: SETUP-AND-DEPLOY runbook for phase 5 production rollout
Single top-to-bottom runbook covering preflight, local build, server deploy,
first-agent dry run, test tier, full rollout, rollback, and ongoing ops.
Each step has a verification command. Ends with a Go/No-Go sign-off list.
2026-04-22 08:51:04 +02:00
579d7fc6e8 feat(server): public GET /health endpoint for uptime monitors
Returns 200 with {status: ok, version, db: ok} when SQLite is reachable,
503 when the DB probe fails. Unauthenticated so external monitors can
poll without credentials.
2026-04-22 08:48:14 +02:00
3ce2940094 docs: phase 4 packaging + deployment plan 2026-04-22 08:43:59 +02:00
b06668fcbb docs: deployment overview + LXC server deploy + per-host agent install 2026-04-22 08:42:25 +02:00
585fbd0623 docs(server): Caddyfile template with TLS + WSS reverse-proxy 2026-04-22 08:41:18 +02:00
b44ab86fdb feat(server): phoenix release with migrate/rollback helpers
Extended Server.Release with migrate/0 and rollback/2 so
'bin/server eval Server.Release.migrate' works from a released binary.

Removed the phx.gen.release-generated rel/overlays/bin/server wrapper
that hardcoded 'start' — it collided with the mix-release default
dispatcher, blocking 'server version', 'server eval', etc. The 'migrate'
overlay is kept (bin/migrate calls server eval under the hood).
2026-04-22 08:41:04 +02:00
2ea5dd4b54 feat(agent): docker-based cross-compile for linux binaries 2026-04-22 08:27:25 +02:00
7ae14f35dd feat(agent): systemd unit + release env.sh for root+journald install 2026-04-22 08:27:02 +02:00
d266a7b56c feat(agent): burrito dep + release config for linux_amd64/arm64 + macos 2026-04-22 08:26:47 +02:00
fe7b07db4f fix(server): only require DASHBOARD_PASSWORD_HASH in prod
Blocking bootstrap in dev meant you couldn't even run 'mix run' to
generate the initial hash. Now dev/test accept an optional env override
and boot without it; prod still raises when unset.
2026-04-21 22:59:24 +02:00
2f787ec31f chore(server): remove unused page_controller scaffold — / is now OverviewLive 2026-04-21 22:56:25 +02:00
667fd7160c feat(server): admin LiveView for host registration, rotate, delete 2026-04-21 22:55:29 +02:00
94034eea9b feat(server): vm search LiveView with name+IP filtering 2026-04-21 22:54:47 +02:00
d65832964e feat(server): host detail LiveView with metrics/pools/snapshots/storage/vms 2026-04-21 22:53:57 +02:00
d0507f290e feat(server): overview LiveView with status ampel + pubsub updates 2026-04-21 22:52:40 +02:00
62996d883d feat(server): router pipelines + live_auth hook for authenticated dashboard 2026-04-21 22:51:41 +02:00
4538945b85 feat(server): session-based auth plug + login controller/template 2026-04-21 22:51:11 +02:00
3123743c1c feat(server): hosts list/delete/rotate helpers + pubsub on metric insert 2026-04-21 22:50:33 +02:00
f3e7fab4d2 feat(server): pure Status.compute/2 for ok/warning/critical/offline 2026-04-21 22:49:15 +02:00
9c457c1f68 feat(server): Server.Auth.verify_password/1 2026-04-21 22:48:36 +02:00
58f22243a5 feat(server): argon2_elixir dep + dashboard_password_hash config 2026-04-21 22:48:07 +02:00
663f7a6113 feat(agent): reporter schedules fast/medium/slow collection with bundled payloads 2026-04-21 22:36:14 +02:00
61fa959d52 feat(agent): system info collector for pveversion/zfs/apt 2026-04-21 22:35:32 +02:00
da5ed6cd08 feat(agent): vms/lxc collectors for runtime and detail with fixtures 2026-04-21 22:34:45 +02:00
ec7f08dfda feat(agent): pvesh storage collector 2026-04-21 22:33:27 +02:00
8c3e953e4e feat(agent): zfs collector for pools + datasets/snapshots with fixture tests 2026-04-21 22:32:36 +02:00
6fca450d7e feat(agent): Shell.run wrapper for testable external commands 2026-04-21 22:31:24 +02:00
30b507ba6b feat(server): GET /api/hosts/:name returns latest fast/medium/slow samples 2026-04-21 22:30:35 +02:00
f09a77996b feat(server): retention GenServer prunes samples older than 48h hourly 2026-04-21 22:29:24 +02:00
751e035579 feat(server): channel persists fast/medium/slow samples to metrics table 2026-04-21 22:28:36 +02:00
687fc17082 feat(server): metrics schema + context with record/latest/prune 2026-04-21 22:27:20 +02:00
116f1ada14 fix(server): only mark hosts offline when endpoint is serving
Application.start ran mark_all_offline unconditionally, which meant
every "mix run"/"mix ecto.migrate" invocation would flip all
connected hosts to offline. Gate the call on Phoenix.Endpoint.server?
so non-serving boots don't disturb live state.
2026-04-21 22:15:35 +02:00
4f82701956 fix(agent): jason-safe error entries + correct handle_info return
Errors produced by Collectors.Host were keyword tuples {:tag, msg}, which
Jason cannot encode — metric push crashed the channel. Convert them to
plain maps with :tag and :message fields.

Reporter.handle_info/2 returned {:ok, socket}, which Slipstream rejects
(GenServer-style {:noreply, socket} is the only valid return for that
callback, unlike handle_connect/handle_join/handle_disconnect).
2026-04-21 22:15:32 +02:00
bfe39e71e1 feat(agent): supervisor boots reporter when config is present 2026-04-21 22:09:29 +02:00
3ae38f95a9 feat(agent): slipstream reporter — join, push, auto-reconnect 2026-04-21 22:08:57 +02:00