I Cannot SSH Into My Server Anymore (And That’s Fine)

47 points by lthms

novalis

Now, every time I update the tag www/soap.coffee:live to point to a newer version of my image, my website is updated within the hour.

Hope you didn't make a typo -- you'll be staring at it for a long time. I don't see the appeal of a system with cycle times this slow, and observability this weak.

dsr

I have a webserver - nginx - which serves static files. I have a script -- it's actually a makefile -- that can invoke an editor to create or edit an article. To publish, it rsyncs the changes over to the webserver.

I can serve, at a rough estimate, every household in the USA one article a day. Maybe two. Actual demand is lower than this by maybe seven orders of magnitude, so I use the server for other things, as well.

lthms

That’s fair, and my setup was just like that one for ages, but I host a website also (maybe mostly) as a way to experiment here and there. 😅 I wouldn’t advise for this setup for anyone else clearly, maybe I should have added a note about that.
m_a_r_k

Ive made a game out of serving as much as i can from my mailserver. Im already paying to run it, so i eek every cycle i can out.

On the smallest vps vultr has,(1 core 512MB ram), i run

Email

Website

Irc bouncer

Rss aggregator

Gemini capsule

Git server

If i had more than 3 mailboxes i couldnt get away with that, but like you, i might as well do more with the space.

Everything else runs in my homelab, but since im paying for it anyway, i stretch that little machine.

polywolf

I also discovered Podman Quadlets recently, and use them to run a Factorio server. It is pretty nice, though I have found that "reading the man page in full" is a lot nicer than any other documentation that you can find from a standard web search.

One point in particular that tripped me up was the following (https://docs.podman.io/en/latest/markdown/podman-systemd.unit.5.html):

The services created by Podman are considered transient by systemd, which means they don’t have the same persistence rules as regular units. In particular, it is not possible to systemctl enable them in order for them to become automatically enabled on the next boot.

To compensate for this, the generator manually applies the [Install] section of the container definition unit files during generation, in the same way systemctl enable does when run later.

That is, so long as the definition has an [Install] section, it will be automatically enabled, and this has to be done due to limitations in systemd.

Anyways, this is a very good article, nice to see a demonstration of a completely automated setup!

lthms

That looks like something that works by accident more than anything else ahah. If I understand correctly, it means that the generator generates the .service file but also shortcut system enable by just creating the symlinks etc.?
koala

Yup, quadlets have so many hidden features- like there's built-in support to update the container images that works OOB!

I resist using containers because they feel like the wrong solution for me, but when you need them, podman/systemd does everything very nicely in a very lightweight fashion.

I also pull Talos for stuff that fits better into K8S- or when I want to learn more K8S. But I find it's a bit more low level unless you add more moving parts.

wink

To publish a new article, I push a new Docker image to the appropriate registry with the correct tag. tinkerbell will fetch and deploy it

I know I just misread it but now I want a blog where article is its own container. And then every comment as well.

yorickpeterse

Due to the recent Cloudflare issues (amongst other ones) I've been looking into self-hosting a few websites again. Similar to the author I want something stable/no-nonsense. In particular I've been looking into an immutable setup where (ideally) you don't even need Ignition/cloud-init/whatever Red Hat comes up with tomorrow, but so far that turns out to be tricky.

Specifically, I've looked into a few tools such as bootc, mkosi, and FreeBSD (see this rough repository for the code), but the results have been a bit mixed. bootc probably comes closest to what I'd like, but the general state of the project is very alpha-ish and the documentation is more or less non-existing. Building an ISO for the initial bootstrap is also difficult enough that I'm considering just bootstrapping it using Fedora CoreOS (= install CoreOS, then rebase onto the bootc image). I'll probably keep SSH access though, it's just too handy :)

Incus OS is probably quite similar to what OP does as it doesn't let you SSH into the instance either, instead you use an API through their CLI. It looks interesting, though its use of Incus instead of Podman is what put me off so far.

lthms

Thank you for the links, I’ll have some reading to do when I commute!

Incus OS in particular seems pretty interesting. I’m very eager to see what exists out there, and what’s the current state of things on the deployment area. We’re currently using Ubuntu VM provisioned with a clunky Ansible playbook at work, and I’m convinced we can do simpler and more reliable, so i’ll take every resources I can find!
- option
  
  Incus can run OCI containers in addition to Incus containers, just fyi
  - koala
    
    And an interesting approach with Incus is that you can run stuff as root on containers and VMs with incus exec machine -- command. There's an Ansible connection plugin to run stuff on Incus machines directly, so you do not have to set up SSH.
    
    Incus OS reminds me of Talos: entirely API driven. It's pretty cool.
    
    I'm also waiting for bootc to improve a bit too- I'd like to use it for my physical workstations!
  - yorickpeterse
    
    IIRC OCI support is still pretty new, though I have to admit I haven't tried it myself.
    
    One thing that wasn't clear to me when looking at Incus (OS) is what init people would run in their LXC/Incus containers, and how logging with such an init is handled. For example, I suppose you can run systemd in a system container but then if you have 10 containers you need to configure it 10 times (e.g. 10x the logging setup). Surely there's a better way of going about that?
    
    strugee
    
    I'm still on LXD, not Incus, but I'm assuming this answer still applies.
    
    You should think of a system container as behaving like a VM, just with a shared kernel (and without all the annoying stuff a virtualized kernel implies, like having to decide up front how much memory the VM gets allocated to it). This is why the difference between a VM and a system container in the Incus CLI/API is basically just a type field in the instance object, nothing more.
    
    Basically, you're supposed to be running a normal OS image in a system container. I use Debian images in my LXD containers, and Debian defaults to systemd; therefore, my containers boot systemd and they log to journaled, inside the container.
    
    For example, I suppose you can run systemd in a system container but then if you have 10 containers you need to configure it 10 times (e.g. 10x the logging setup). Surely there's a better way of going about that?
    
    You do what you would do with any other operating system - you apply configuration management to it, like Ansible or Chef or something. You can also use cloud-init.
    
    If you want your containers to be even more rigidly specified and deployed than what configuration management will get you, then system containers probably aren't for you - you're trying to reinvent OCI application containers. (Or you want NixOS.)
  - yawaramin
    
    To publish a new article, I push a new Docker image to the appropriate registry
    
    Isn't that a bit...heavy-handed to publish a blog post?
    
    lthms
    
    It is!
    
    I’ve refined a little the introduction of the article to make my rationale clearer: tinkerbell is not just for my website, it’s more of a “cloud homelab” for me, a place i can experiment with, deploy stuff, etc. I just started by the only thing that I really, really want to keep online first, but I will deffo keep experimenting and deploying there in the coming months.
  - zie
    
    How do you update and manage coreos and podman though? An RCE in the Linux kernel/coreos/podman would still be a bad time...
    
    Looks like coreos has some sort of auto-update mechanism, but you still have to track that that is working, I imagine. auto-update stuff breaks on occasion.
    
    Maybe the lazy way is just every 30 days, wipe the VPS and re-deploy it with the latest coreos and podman stuff, so you know at least once a month you are up-to-date, and hope the built-in auto-update stuff works in-between for big RCE's.
    
    lthms
    
    That’s a fair point, and a blind spot of the post—i need to spend some time understanding how CoreOS updates work.
    
    auto-update stuff breaks on occasion.
    
    To be fair, they did break once already on my side. I’ve seen an article explaining how to do automated rollback if the new image does not start correctly, I’ll need to borrow some ideas from there as well.
    
    Maybe the lazy way is just every 30 day
    
    Truth be told, for my personal use case it would work quite nicely and go unnoticed ahah.
    
    zie
    
    LOL I know the feeling. The issue is, if a big RCE gets found and the fixes don't get applied, you could find your instance becoming a botnet or mining crypto or who knows what. Your provider will likely get upset when they find out and do bad things, like drop the machine or worse like ban/drop your account.
    
    I'm not a coreos person, so I dunno how reliable the automations for updates are, and if they handle major updates as well, or just security issues. That's why it might be worth just bulk updating and replacing the entire instance every month, to ensure you are close enough to the latest and greatest.
    
    lthms
    
    Default behaviour of Fedora CoreOS seems to reboot as soon as an update is available, see here for what is possible.
    
    If not otherwise configured, the default updates strategy resolves to immediate.
    
    The immediate strategy is an aggressive finalization method which is biased towards finalizing updates as soon as possible, and it is only aware of node-local state.
    
    BinaryIgor
    
    I have come up with something I believe delivers the same functionality, but it is arguably simpler.
    
    What I deploy these days:
    
    A small, public-facing backend (node.js app) with the nginx as a reverse proxy for my blog. Backend is hosted on a DigitalOcean's droplet (virtual machine) and called by my blog hosted on DigitalOcean's CDN.
    
    For this backend:
    
    nginx is dockerized
    
    node.js app is dockerized
    
    I have a script that sets up Let's Encrypt certs with auto renewal
    
    I am deploying all of that without having a Docker registry - just with access to the said virtual machine through ssh. How?
    
    I build images for nginx and node.js app locally and export it to all to the gzip files, that is:
    docker save "api:latest" | gzip > "dist/api.tar.gz"
    
    Then I have a script that does uses ssh & scp to copy this package into the droplet, stop previous one, make sure that the new one is up and running and so on. Loading docker image from tar archive is as simple is: docker load < dist/api.tar.gz
    
    I've been using variations of this approach for my various projects for years - I highly recommend it! You have all the benefits of docker without the need for a registry or any other external infrastructure. Just a few bash scripts :)
    
    alexandria
    
    When I ran a server in 2022 - 2024, I ran Alpine Linux. The server had maybe 50 packages maximum (and those did not include podman/docker) and used about 56mb of the 2 gigs of RAM, and the bulk of the hardening was that I set some kernel options to reject weird TCP packets and the like, and I put ssh on a non-standard port (and tested with nmap that it showed up as non-ssh) — I also set up key authorisation instead of password, and disallowed password auth.
    
    The last two things were honestly more than sufficient to avoid any bots trying to ssh login (I had absolutely no listed attempts in dmesg — it was clean). I ran caddy and reverse proxied through it, and each software was under the same low-privileged user. For observability, I ran things in tmux and could just ssh in, attach, and look at the logs now and then. I'm not sure what the maximum my server could serve was, but if I had wanted to serve a lot of people I would have chosen software using Erlang, because the footprint of that per-user and the failover modes are very convenient to a single small server. I have no doubt that if I had ran XMPP via Prosidy, no matter how many users — the only major issue would have been file upload storage space.
    
    At no point did my server call out to another server except for nntp, and for Caddy's certbot-style stuff. At no point did I experience any major failure or bug or anything else that might necessitate an industrial docker or kubernetes setup.
    
    I get how this might be super fun to play with, but for me, the old style sysadmin is it. I have the UNIX System Administration Handbook on my shelf — the last version written by Evi Nemeth. I like having a named server with a bit of personal cruft on it, rather than an impersonal thing that serves my needs. It is far more pleasant to work with versus an industrial-strength solution that calls out to foreign servers for maintenance reasons, uses a baseline of 500mb of RAM, and does not feel like it has a soul.
    
    mro
    
    quite a stack to "To publish a new article". I know shared hosting for 2€/month that can take articles. Admitted, wouldn't warrant a blog post and has 0 portfolio buzzwords.
    
    sugaryboa
    
    You can publish articles on wordpress for free, or you can use your ocaml page generator in two lines of gitlab pages.
    
    bityard
    
    Very well and clearly written, thank you.
    
    My web stack is currently a big pile of docker-compose.yaml. It works fine and I'm happy with it but I've been meaning to try out podman/quadlets to see what all the fuss is about. I'm not afraid to read docs, but sometimes I don't have the time or energy to sit down and ingest a set of extremely dry man pages. And some of the intro articles and tutorials I've run across have been hilariously bad. Broken english, typos that break things, missing information, etc. This is very easy to read and follow along.
    
    You answered one piece I was wondering about: how networks are provisioned. I'm curious about the other part... How would you handle persistent data storage in this situation? I understand you're hosting a static site so this isn't applicable to your use case, but if you did have to run a PostgreSQL server, what would that look like in this scenario?
    
    lthms
    
    Speaking in Vultr terms, a NVMe block storage provisioned with Terraform, attached to the VM, and, as far as I understand, a oneshot systemd service to format if needed and mount. That’s my next experiment I think 😅
    
    thor77
    
    Interesting read! I‘m running a similar setup using Flatcar Container Linux and an auto-updating Docker compose stack, orchestrated by GitLab CI/CD for a while now.
    
    ktr
    
    Interesting read, thanks for sharing.
    
    Note that you can use a Network so your container can talk to each other, you don't need to add them to a Pod. Then you won't have this issue of all of them restarting. (you'll need to drop a .network file in /etc/containers/systemd too).
    
    Things to consider implementing:
    
    Have a webhook listening service on the server and an Action on your repo to trigger a pull when you commit to master, once the build of the new image is done and pushed to your container registry.
    
    Mak containers read-only and look into hardening services (https://gist.github.com/ageis/f5595e59b1cddb1513d1b425a323db04) ;)
    
    seachub
    
    Is there some way to not restart all the containers in a pod to restart one of them?
    
    lthms
    
    Apparently not, the fix here would be to attach them to a common network instead of using a pod.