Capsicum vs seccomp: Process Sandboxing
14 points by runxiyu
14 points by runxiyu
Ugh. Cool topic, but I can't stand the LLM-generated smell to it.
Hmm, I was a bit suspicious but I couldn't spot any specific signs. Could you tell me why so I could also spot them in the future...
The repeated “Not <X>, but <Y>” sentence phrasing is a big one lately. That was originally lazy, low effort writing to pad word count for human-written SEO spam, from what I can tell.
And now LLMs reproduce it at a fraction of the price.
Any human writer worth reading would have gotten bored of that trope several paragraphs before this article’s so-called author did.
I'm skeptical of eBPF as a security boundary tool, since the kernel apparently makes no guarantee that eBPF will see everything: https://securityboulevard.com/2023/09/pitfalls-of-relying-on-ebpf-for-security-monitoring-and-some-solutions/
Thankfully Linux has landlock now: https://docs.kernel.org/userspace-api/landlock.html or the project homepage: https://landlock.io/
Has anyone tried to implement something like cap_enter on Linux by using seccomp-bpf, or perhaps via a cursed mix of unsharing ambient namespaces? I know Google had capsicum-linux but it's an unmaintained fork of Linux which is not viable in the present day.
Also, although the description of capsicum appears to be correct, something feels off about its description of seccomp-bpf (although I'm unfamiliar with the latter).
Has anyone tried to implement something like cap_enter on Linux by using seccomp-bpf
Yup. In the verona-sandbox project, I implement the minimal usable subset, which basically disables all of the syscalls that can create new file descriptors and relies on another process to proxy things like openat (which are safe on FreeBSD in capability mode) securely.
I also wrote a quite horribly hacky Linux kernel module that provided proxy FDs that worked like the original Capsicum implementation. Modern FreeBSD embeds the credentials in the file descriptor and does the checks in the system calls, but the original one had special file descriptors that interposed between the user and the real ones and did the permission checks. You needed to talk to a special device to replace a file descriptor with the proxy version, but then it could be restricted via an ioctl.
I'm not sure if it's gained more abilities recently, but seccomp-bpf composes very badly with the Linux kernel's model of making system calls extensibly by passing the arguments in memory. You can't check the in-memory arguments in bpf, and even if you could you'd be vulnerable to TOCTOU issues. There's a mode that hooks up a second process to validate these things and it can use ptrace to copy out the arguments and substitute its own, but that's very complex and there are lots of ways subtle bugs in your code can accidentally introduce privilege elevation vulnerabilities.
I wish more systems would just implement Capsicum.
Thanks a lot for this overview. I was trying to implement it with seccomp-bpf but quickly ran into my inability to dereference userspace pointers; I also tried landlock but it appears to block openat, even when using existing FDs.
IIRC, the reason for not accepting capsicum as an LSM was that there were too many LSMs and you could compose LSMs to create something capsicum-like, but my (limited) experience doing this, and particularly your response above, have demonstrated it to be... impractical, to say the least.
I'm not an expert here, but I would think using landlock like this would be a better tool than seccomp-bpf. Assuming you are using Linux > 5.19(?) which should be pretty easy, since Linux is going to be at 7.0 any day now.
I'm wondering why the author only mentions landlock once at very bottom of the article and doesn't explain it at all.
The writing in this article seems excessively florid - I didn’t immediately get LLM vibes but I can see how the style of language is somewhat mitigating the appearance of repetition so maybe it is and this is a new attempt to fix this.
That said I found the comment at the end funny
Capsicum eliminates ambient authority. seccomp restricts it. One locks the door and removes it from the hinges. The other hires a bouncer and hopes the guest list is complete.
I think as written this is saying one removes the door from the frame so there no door to get in the way :D