PAX: The Cache Performance You're Looking For

7 points by sjamaan


WilhelmVonWeiner

Suggested vibecoding bc the writing was done with an LLM which immediately killed all my interest in reading it :/

andrewrogers

PAX has been common in commercial databases since the 2000s. PostgreSQL pre-dates that and swapping out page engine models in a database isn't something you can really do. A downside of PAX is that you sometimes have to rewrite the page, which is expensive and messes with your WAL protocol.

Modern page engines are typically PAX-ish but more optimized for SIMD. Some recent ones I've worked on are actually hybrids of PAX-ish and modified NSM layouts; these can be more significantly efficient for analytical processing with complex query constraints if you don't mind the implementation complexity.

sjamaan

Sounds very cool, unless you're using an ORM, as those typically fetch all columns by default.

vi_mi

TIL about PAX, that was interesting, thanks!

On the point of SELECT * queries, if there's a filter with good selectivity, late materialization [1] can help a lot with pruning.

[1] https://arrow.apache.org/blog/2025/12/11/parquet-late-materialization-deep-dive/