Finding and Fixing Ghostty's Largest Memory Leak
105 points by fluent
105 points by fluent
Not surprised a big tty memory leak had to do with scrollback - my custom terminal had a similar problem a while back. In the first version, I had unlimited scrollback anyway, thinking computers nowadays have so much memory it'd never be a problem, but it added up much faster than I thought, amazing how much data can be spammed into a terminal, especially if you leave it open like I do for months or years at a time... in my terminal, I used a fairly straightforward circular buffer with a max size, just letting the D garbage collector take care of the rest. Works well enough for me, though I do still have a few instances using > 200 MB of memory even now....
Circular buffer is a pretty standard approach to this problem. I think it's what most terminal emulators do.
The reason I went with this doubly linked list approach with Ghostty is because architecturally it makes it easier for us to support some other features that either exist or are planned.
As an example of planned, one of the most upvoted feature requests is the ability for Ghostty to persist scroll back across relaunch (macOS built-in terminal does this and maybe iTerm2). By using a paged linked list architecture, we can take pages that no longer contain the active area (and therefore are read-only) and archive them off the IO thread during destroy when we need to prune scroll back. We don't need to ever worry that the IO thread might circle around and produce a read/write data race.
Or another example that we don't do yet, we can convert the format of scroll back history into a much more compressed form (maybe literally compressed memory using something like zstd) so we can trade off memory for cpu if users are willing to pay a [small, probably imperceptible] CPU time cost when you scroll up.
I'm not saying my approach is strictly better. It has its own problems, but this is why I went this direction.
Persisting might annoy me (though sometimes I'd want it, depends on if my terminal session this time is a temporary or long lived one), but that compression would probably be fine.
A guiding principle I stick to when making my own decisions here is that the purpose of a terminal emulator is as a user interface, so the limiting factor (should be) the user. If you can do something quickly enough that the user won't notice, that's fine, and if you do something so fast they can't possibly read the output anyway, that's pointless. I think transparent compression would fit right into that sweet spot of user speed.
BTW idk if you have the feature or not, but one I like a lot is being able to pipe the terminal's scrollback buffer into a shell command, meaning I can grep it, save it to a file, etc., using the normal unix shell stuff I already know. In that case, more speed might be important, but I still doubt compression will get in the way.
BTW idk if you have the feature or not, but one I like a lot is being able to pipe the terminal's scrollback buffer into a shell command, meaning I can grep it, save it to a file, etc., using the normal unix shell stuff I already know. In that case, more speed might be important, but I still doubt compression will get in the way.
It does! Cmd/Ctrl+Shift+J inserts the path of a temporary file containing the scrollback buffer. Pretty handy.
BTW idk if you have the feature or not, but one I like a lot is being able to pipe the terminal's scrollback buffer into a shell command, meaning I can grep it, save it to a file, etc., using the normal unix shell stuff I already know. In that case, more speed might be important, but I still doubt compression will get in the way.
+1 for this. You can configure a keybind for Ghostty to dump scrollback to a file, then give you the path. Being able to access the content (or trigger this path) programmatically would nice.
Discussion for what should be "scriptable" seems to be happening here: https://github.com/ghostty-org/ghostty/discussions/2353
The underlying "pages" are not single virtual memory pages but they are a contiguous block of memory aligned to page boundaries and composed of an even multiple of system pages.
I know the post says it's not important for this but I am very curious about why page alignment + an even multiple of pages was chosen.
It probably doesn't matter as much anymore, but at one point in our history our renderer worked by flagging the active area with a copy-on-write flag and CoW flags are at the page level. I wanted to prevent false sharing between two terminals where one writing could trigger massive memory copies on another terminal. It turns out copy-on-write is pretty fucking slow on every OS so we abandoned that approach completely.
The contiguous part is still important because we store pointers based on a 32-bit offset on a base pointer rather than full 64-bit wide pointer (on 64-bit systems). So every page has only one pointer and a ton of offsets.
But yeah, the alignment and even multiple probably matters less today.
Thank you for the reply! Fun stuff - I rarely work at this level so it's always interesting to learn more about.
Fun writeup. Did you consider re-using the nonstandard page without resizing instead of freeing it?
There's a brief mention of that idea under the section titled "The Fix"
We could've also reused the non-standard page and just retained the large memory size, but until we have data that shows otherwise, we're still operating under the assumption that standard pages are the common case and it makes sense to reset back to a standard pooled page.