TIL: Docker log rotation
60 points by polywolf
60 points by polywolf
Regardless of the subject, it's always really great to see:
Chef's kiss to the author, here.
This has always been a really weird decision by Docker. Their default JSON logger supports rotation but it doesn’t compress and it doesn’t have a default set. I never understood why.
I suspect it was a backwards-compatibility decision. Compression and rotation were introduced much later after the json-file driver was written.
I know people don't like journald, but you really should try. It centralises log handling (ok, you can have that with syslog + logrotate as well) and makes sure you don't run out of disk space. It (by default) monitors disk space and limits logs so that there is space available
As a long time syslog + logrotate user that never looked into journald, what are the convenient parts of using journald?
Here's an excerpt I found from an article on the topic:
Think of journald as your mini-command-line-ELK that lives on virtually every Linux box. It provides lots of features, most importantly:
- Indexing. journald uses a binary storage for logs, where data is indexed. Lookups are much faster than with plain text files
- Structured logging. Though it’s possible with syslog, too, it’s enforced here. Combined with indexing, it means you can easily filter specific logs (e.g. with a set priority, in a set timeframe)
- Access control. By default, storage files are split by user, with different permissions to each. As a regular user, you won’t see everything root sees, but you’ll see your own logs
- Automatic log rotation. You can configure journald (see below) to keep logs only up to a space limit, or based on free space
AIUI journald’s indexes are limited to log line metadata, basically just the time, not the log contents. It’s much slower to print or grep a journald log than a plain text or gzipped log file.
That's neat!
I'd like to share one approach that I've taken to using, there's a tl;dr at the end but I'd like to talk about why first.
Back in 2014 I started working for a video games development studio and we were making, essentially, an "MMORPG" (though, the RPG was much more emphasised than the MMO aspects throughout development). This put us in a weird position due to a couple of constraints.
We had never developed any large scale system like this before (even though I had run 1% of web traffic at one point in time, you'd be surprised to learn that you can get away with about 25 racks of machines and a CDN, even for heavy sites). This was going to be thousands of machines, not several hundred.
Game developers, really only know (knew?) Windows. So the server itself was going to be written in Windows.
Perhaps a better qualified person would have found a way to do everything necessary within the microsoft ecosystem; but I was not a better qualified person and so I took to inventing the tools needed to make it work; One of the first issues was service management (especially of remote systems), so we developed a service manager that had a websocket interface allowing us to manage (start/stop/inspect) services remotely. The very next set of issues included: logging.
Logging to a file has traumatic tooling on Windows, centralised logging was not very flexible for our QA environments, or developer workstations. So following the same values which we found ourselves in with our service manager: we developed a "log server" which ran via a named pipe on Windows and logged to a ring buffer. We could then access this over websockets and inspect what was going on for a particular machine. Later on we refined this so that it could forward "special" logs that we deemed important enough to centralised storage (Elasticsearch back then).
This, was actually pretty great, no more dealing with log rotation, no more dealing with awkward windows tools, and it works via a web-browser so I don't need special ports to be open (that game dev studio blocked everything by default, something that seems to happen annoyingly often).
This is a patten I have repeated into my current jobs, as much as I lambast journald for taking away control: it has a sort of a ring-buffer mode, you can tell it not to log to files and to keep only some amount of memory active for logs- of course it's not durable between reboots but that's not the point: you can also configure the same "ship the logs" situation too. What's missing is the web-browser component (though that's solved via cockpit if you are ok with the overhead). I even do this for my docker hosts.
Now I can do everything with just a chromebook (without the developer VM) if needed.
tl;dr: log to journald, set it to use memory, give it a reasonable buffer. If you want durable logs, forward what's fun to loki using alloy or something.
That's "log-driver":"journald" in daemon.json for those following along with the article, but you can also pass --log-driver journald to your dockerd service command-line. (Or even to docker run per container.) I know NixOS makes journald the default driver, not sure if any other distros do.
On top of your reasons, I also like journald just because it centralizes logging on the machine. I guess people who aggregate across multiple machines may not care because they rely on tooling, but when looking at a single machine, I no longer have to scour /var/log for where my error is like in the bad old days.
You can actually get web access to journald logs using the separate service systemd-journal-gatewayd. It's an official part of systemd, and offers an API and basic web UI. (NixOS: Add services.journald.gateway.enable = true; and fly to its default port 19531.)
If you're not using the journald log driver and are not planning on shipping the logs elsewhere, "log-driver": "local" automatically logrotates and compresses the logs [1].
systemd-journal-gatewayd
In case it helps others https://www.freedesktop.org/software/systemd/man/latest/systemd-journal-gatewayd.service.html
well, I mean, I was looking for the source in order to find out what "an API" and "basic web UI" meant but the man page mostly answered my question
TIL about that. Do you know why it expects to receive a single socket? It seems strange to me; maybe I've misunderstood something.
"Running out of disk space" seems like a perennial failure mode, despite everything. It makes sense (hard disks are there for a reason) but it's funny how there's always something
In the context of this blog post, that's another reason to love maximum instance lifespan or the slightly more manual aws autoscaling start-instance-refresh
it, of course, doesn't cure all your "out of disk space" woes since there are plenty of other places for that landmine to hide
Sometimes if you can't edit the daemon.json, or want to ship this behavior with the repository, it is also handy to set this in the compose file using anchors
x-logging: &default-logging
driver: "json-file"
options:
max-size: "10m"
max-file: "3"
services:
foobar:
image: ...
logging: *default-logging
As others have noted, using the journald setting for docker logs is way more sane than json file logs. Plus that intergrates 1000 times more neatly into your tooling than yet another log location.
Honestly, docker could use better tooling to produce systemd services for containers or compose setups. I would vastly prefer using systemctl to control my containers and their dependencies if it wasn't annoying keeping a compose and the services in sync.
Podman Quadlets are a pretty fantastic way to run containers with systemd, though that doesn’t solve the problem of wanting a compose file to be the source of truth.
Agree that there could be better tooling. Though I'm not sure what you mean by "keeping a compose and the services in sync", I just use docker compose directly in my service files. Maybe you're referring to quadlets (which do use entirely different syntax than docker compose)?
I never understood why Docker doesn’t simply log into /var/log and ships a proper logrotate file for that.
I’ve got bitten by Docker’s not sane defaults before as well.