Surely there must be a way to make container secrets less dangerous?
20 points by Spots
20 points by Spots
There is, tokenization. This doesn't make the secrets not exist. It puts them in one very small, well-controlled, and observable basket.
I first saw this pattern in how Stripe minimized the scope of PCI compliance with Apiori. It is described starting at 18:41 in this talk. This replaces sensitive keys on ingress to the app rather than egress (what most container secrets are), but the pattern works in either direction. (I worked at Stripe but never touched this system; I don't know anything besides what's in this public talk.)
The phantom token pattern/credential proxy is another implementation of tokenization.
Shimming your own services with your own proxies is a useful strategy that generalizes.
A credential proxy seems to be almost exactly what I'm looking for. I'm going to share this around, thank you!
The tricky bit about injecting creds after the request has been made is that you either need to force your client library to make plaintext requests or you need to terminate TLS.
Interesting, I first saw this pattern being used by the Gondolin sandboxing tool, I didn't realize it was a widespread pattern with a name. Thanks for sharing the links.
The idea of using a proxy is definitely interesting and somewhat similar to how Podman handles DNS using aardvark-dns (at least for rootful containers).
I think for me the main issue is that the use of proxy tokens protects against leaking tokens (e.g. through prompt injection), but it does nothing when an attacker gains arbitrary code execution capabilities. To project against that you'd probably end up needing something like a secure enclave to store tokens in and restrict/bind access to certain processes, but even then I suspect you'll be toast in the event of arbitrary code execution.
Rather than focusing on the form of the credential think about the lifetime of the credential. If the credential was valid for a shorter period of time then once it was exfiltrated it would have to be used very quickly to be of any use to the attacker.
But if the credential is short lived then it needs to be rotated quite frequently. More frequently than you could do manually. You would have to automate it. But if the attacker had persistent presence they could obtain the new credential as it was rotated in.
Instead of doing proactive rotation in order to ensure the application (and thus any attacker mimicking the application) has access to the credential, you can do just-in-time credential delivery. Place a proxy between your application and the network resources it needs. That proxy can then do identity-based authentication of your application environment and acquire credentials from an external service and inject them into the network stream. This allows the credential to be very short-lived. The application never holds the credential directly and thus any attacker mimicking the application also cannot easily retrieve it. There's a few of these types of proxies available today. I work on one for my day job. I think lobste.rs has rules against self-promotion so I won't link to it but if you search for "modern auth" or "identity based auth" or "managed access" you'll find more info about the whole category.
Build an API running on the host that the container queries for the secrets and it only allows them to be read once after container boot.
The danger being, "if container gets popped, then attacker can read all the secrets inside it"? Honestly I think your "cheesy hack" of remounting /run/secrets after you're done with it is probably best given what your threat model seems to be.
In general I'd advise that, if the attacker can pop a shell, they might have been able to run arbitrary code in the context of your secret-accessing service beforehand, so they could've had access to the secrets anyways. In that case, your only recourse is principle of least privilege. The "credential proxy" mentioned by the other comment helps w/ this, limiting the privileges from "having a key that can be used at any time" to "having access to a proxy that can only be used now, inside the container". Still maybe insufficient against a motivated attacker, but stops some forms of exfil at least.
The threat I'm trying to defend against here is tl;dr CVE-2025-27610, or the ability to exfiltrate files out of the container. Most of the projects I'm responsible for, both professionally and personally, are written in Ruby. The CVE quoted above was in Rack, which is HTTP middleware used by virtually all Ruby Web systems – it's roughly equivalent to WSGI in the Python ecosystem.
Popping a shell would indeed allow the ability to run anything as the Web user (or just rails console and Rails.application.credentials.to_json for Rails apps). That isn't what I'm trying to protect against.
And yeah, I do like the idea of a credential proxy.
Hmmm, I was thinking that using environment variables would protect from path traversal, but not if the process can read /proc/self/environ. But likely there are ways to prevent that.
Personally, I prefer to reduce secrets, and to make secrets useless outside their environment. But that of course does not solve everything.
I don't understand why people bend over backwards to shove secrets into env vars / files when client libs for vault, ssm, secret manager are readily available. People will put shittons of business logic into their ops pipelines to avoid adding 7 lines of business logic to their application to fetch creds.
At least in case of Podman there's a practical reason: its secrets engine only supports files and environments out of the box. While it has the ability to run shell scripts to retrieve secrets, you have to glue that together with some secrets backend yourself. Even then it fetches the secrets upon booting the container, so the result is the same: they live somewhere in the container, either as a file or an environment variable.
The suggestion here (which I’ve done many times) is to just not use that mechanism at all. Make the application itself directly call Vault, Secrets Manager, or whatever. Then the secrets only exist in the memory of the application. This does require some kind of ambient underlying auth (VM or pod level) but that’s probably already happening (certainly is in AWS or GCP).
I made some reusable config libraries that look in all the places (env vars, files, secrets managers) and abstract this away so I don’t have to worry about it in the apps anymore. That makes it just a deployment choice where the secrets live, and when possible I prefer the direct approach.
Then the secrets only exist in the memory of the application.
And what prevents an attacker from taking a memory dump?
Well, hopefully quite a lot! If even the application process isn’t allowed to know the secrets, you’re in kind of a weird threat model. You could use a proxy, but how does the application authenticate to the proxy without knowing any secrets? I guess if the model is the attacker can only read application memory, but not write, or make network requests, a proxy set up by the orchestrator could solve it.
On Linux, ptrace_scope set to 2 (only allowed with CAP_SYS_PTRACE) or 3 (prohibited forever until kernel restart). This prevents not only gdb but also access to /proc/self/mem et al.
That only stops the path of something reading from outside, it's still possible for something achieving code execution to read the memory within the process.
It won't help with code execution but rather than using the YAMA lsm the app can also directly ask to not be dumpable, which avoids depending on Linux kernel configuration: prctl(PR_SET_DUMPABLE, SUID_DUMP_DISABLE). Even this doesn't fully prevent it, see https://man7.org/linux/man-pages/man5/proc_pid.5.html -- it basically makes /proc/self/mem owned by root. So don't run your app as root, even inside a container.
I do regret not making the threat model more explicit in the article; the threat I'm trying to prevent (and that I believe we are talking about in general) is arbitrary file reading, not arbitrary code execution.
The only way I'm aware of to protect secrets from arbitrary code execution, assuming a primitive is available to exfiltrate them, is a combination of rapid expiration and ingress whitelisting (that is, ensuring only the app service can use the token issued). Of course, even those don't necessarily help you depending on what actions the secret is allowed to take – an attacker can just use code execution to run a call against the API that is protected by the secret without even exfiltrating the secret itself.
I do regret not making the threat model more explicit in the article;
I asked about mem dumps because in practice the resources required to raise the pod security bar are significant across the board (infra, devs, etc). Relying on a single tool (like a vault) on its own doesn’t improve your defenses by much while adding a significant burden on the infra team[1].
You mention supply chain attacks and your worry is obviously that collection of tokens/access keys and passwords. But supply chain attacks can install C&C servers just as easily. So at that point, I assume the threat model becomes an intruder getting shell access with the app's ID to the pod. To raise the security bar, it takes a holistic approach. For example:
So now one has to stop and think if all this effort makes sense.
ps. Sorry if I hijacked the thread, the reality is that I was thinking about a specific app's use case that I work on. That's why I got the liberty to expand your threat model, I had no intention of hijacking the thread and/or turn this into a philosophical discussion.
[1]: Vault is straight forward to setup & run in k8s HA mode as a deployment or sts. The problem is that we just added another component in the critical path. The secure/compliant way to handle unseal in hs vault using Shamir seals is complicated process-wise.
Not only does it require underlying auth, but at least in my scenario (of everything being self-hosted), now you also need to securely run Vault. And from my experience at both a non-profit and at Big Blue, administering Vault and ensuring its continued security is nearly a full-time job by itself.
There are tradeoffs for every decision (see that lovely design documentation link). I agree that if you can afford hosted Vault or can afford to pay a SecOps team to run Vault for you, it's more secure than environment variables or files. But it's also not always realistic, at least from my perspective.
EC2 solves the exact idea (file is on disk) with IMDS: See https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/configuring-instance-metadata-service.html#instance-metadata-v2-how-it-works
Obv. your problem is that your threat model includes things with full access to do whatever, so :shrug:
Another contention I don't see mentioned is that you might want to update the secrets without restarting the container! Ideally resource access is mostly identity based but you can't avoid secrets. If you rotate them often, you might not want to restart your application every single time. I'm thinking about non managed DBs or internal APIs using a password scheme. Honestly at this point I don't see a better solution than hitting an external store like a vault and cache for a limited time at the application level.
One approach I’ve used, for things like a CI build where you control the code that directly creates the containers, is to pass secrets as json on stdin. The code in the container then reads the secrets and stashes them in memory, and they can’t be read again. That isn’t useful though for kubernetes-style workloads where some orchestrator that can’t provide stdin is what’s spawning the containers.
Also note that, if you pass secrets as env vars (although secrets in env vars is not my preferred approach), you can setenv them away after reading and stashing them in your app startup code. Then they’re no longer readable from /proc/$pid/environ.
The answer is to not put any secrets into the container. The problem is that existing (OS) container tech and (layer cake) security models do not support this either easily or well.
It's a problem best solved by the concept of capabilities, but these are largely incompatible with the layer cake model that everything is currently built on top of.
I guess you're trying to solve this without collaboration from the containerized app, because the app could read secrets on startup and then remove the files. Similar pattern to processes dropping privileves once they're ready.