Monitoring my Homelab, Simply
57 points by tuxes
57 points by tuxes
The claim is one dependency, the go stdlib but then goes on to call various cloud services ntfy.sh and healthchecks.io. maybe they means code dependency?
I’ve setted on Uptime Kuma and just looking at a dashboard. I don’t even want to be paged =P
That’s fair. My goal was reduce maintenance burden by using stable APIs: Go stdlib, HTTP GET to health checks.io, and HTTP POST to ntfy.sh fit the bill.
The usual concerns over longevity of cloud services are a slight worry, but these services are commoditized, and on balance I can’t think of a better solution. :)
My setup is somewhat similar. I have a Cloudflare worker on a 5 minute cronjob. The worker connects to api.mydomain.lan/health.php via Zero Trust and the results are posted to a Discord channel with a webhook.
The health.php script checks a bunch of services running on my Proxmox instance and returns an error that gets forwarded to the Discord channel directly.
So far I found out that my ISP is extremely unreliable and that power outages happen pretty frequently, otherwise things are working very smoothly 😅
I’ve recently gotten my home server spun up and between this and the systemd chatter I’m getting loads of useful ideas.
Now I just want to find some programs to run that actually cause the server to look busy.
My pseudo-homelab is like 1.75 boxes. Recently I’ve been looking into setting up some monitoring to watch basics like the temperature, storage usage, etc. So I was reminded about this post: μMon: Stupid simple monitoring. Although it focuses rather on observability than alerting.
nice! i currently use prometheus & blackbox_exporter for just about everything, but have had trouble with exporters breaking backwards compatibility without warning - invalidating my dashboards and alerts silently. something like this is appealing – i love that i can see it working for another 10 years without many changes.
The thing that I value most in a monitoring system is that it does the job, and the thing that I value second most highly is that it not cause more work by itself.
This looks like it can score highly on both parameters for small systems, so huzzah.
I get a lot of mileage from updown.io and healthchecks.io. I haven’t felt a need for custom extras like this though.
I really should set up ntfy.sh. Right now I’m using email and Gotify. Gotify is really not good but it’s what Proxmox supports. They added web hooks recently though, time to revisit.
Thanks for the idea! Just set up a ntfy.sh target successfully on my Proxmox:
https://ntfy.sh/REDACTED
{{title}}
[{{severity}}]
{{message}}
I am actually attracted to an idea of an all-in-one OTEL tool for the homelab. For example, Signoz. Maybe it won’t tick all the boxes for the enterprise procurement but good to have something small yet capable of ingesting anything useful for observability (not everything!). I haven’t tried it yet, mostly looking through its release notes and waiting until it stabilizes enough so that every release doesn’t include 10 bugfixes.
For now I settled on Uptime Kuma + ntfy.sh and, sometimes, netdata, to have a few days worth of graphs to understand the trends when looking into root causes.
I wrote a small dashboard in the zero-state just-check-with-code ethos: https://github.com/Vaelatern/gokrazy-statuspage - Does zero alerting but I like it for knowing that everything is green or yellow in a glance
I’ve happily used Monit for many years for simple monitoring like this. I’m pretty sure it’ll do 90% of what the author is asking for. It has its own tidy config language. https://mmonit.com/monit/
Prometheus and Blackbox exporter Nooooo
Care to explain ?
It’s a bit of work, but not too bad. There are several components that interact. Take a look at https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/deployment/docker/compose-vm-single.yml but that still misses blackbox exporter (for ping/http probes)
I don’t get the hate for Prometheus. Also, Blackbox-Exporter always served me as a valuable, frictionless tool.
Configuring node-exporter to only measure what I care for is a bit of work… But I guess this wasn’t in question.
I run a Prometheus/Grafana/etc setup at work and use it on my own desktops, but it’s only on my desktops because I’m already familiar with it and I can reuse a lot of our production configuration (and this gives me a test environment). Prometheus and etc is sufficiently complicated that I wouldn’t suggest someone try to use it for simple, basic monitoring and alerting. There’s a lot to learn before you can trigger and send your first alert, at least if you want to understand what’s going on and do something custom instead of using other people’s configurations.
(Possibly it gets somewhat simpler if you rely on Grafana alerting instead of Prometheus alerts and Alertmanager.)
I don’t hate Prometheus, it’s been very good for us, but it’s a powerful and flexible tool to deal with a complex set of problems and that more or less necessarily means it’s complex too. It’s a monitoring and alerting construction kit, not a ready to go monitoring and alerting system.
I use Gatus (https://github.com/TwiN/gatus) to monitor my homelab (https://github.com/adampetrovic/home-ops)
It’s simple both in operation and to configure. Supports monitoring just about anything (including heartbeating against a gatus instance from an external process), has 20 different notification targets and provides an attractive dashboard.