Xee: A Modern XPath and XSLT Engine in Rust
91 points by wezm
91 points by wezm
Happy to see new people discovering beauty of the XML world. More and better implementations will help.
So glad to see there’s interest here! It’s been a long strange trip through XML land for me.
Thanks for undertaking this work! I have been wanting XSLT 2.0 without Saxon for some time; I always wondered why libxml2 and libxslt were frozen in time. Also thank you for lxml; I haven’t used it much (honestly, it took some time to comprehend the model) but it has been a life saver many times. I am one of the few people who sort of came on the scene in the early 2000s and actually rather like XML still. My catchphrase has been “XML is something you inflict on others, not yourself” but I actually have used it just for myself.
I should write a post about this, but at one time we were reconciling a legacy database with a new design. I made a custom XML format to capture my notes about each table in the new schema, what tables it reflected from the old schema, what the types would be, what conversion steps were necessary, etc. Then I had a stylesheet for turning my notes into a document for humans (both HTML and PDF via a simple Docbook format), and another one for producing a Liquibase migration that would create the new schema, with commentary. I think it would have been hard to do this any other way; XML is very good for situations like this where you are actually making a document, but also want to extract some structured data from it.
I was looking at using XProc for a blogging platform for a minute, but it seemed even more quiescent than XSLT, so I kind of abandoned the idea. Maybe Xee will someday be a good host for that idea!
I don’t know Rust and I’m really bad at contributing code on the side but I will take a look at Xee and see if there is any low-hanging fruit; if I can give you an hour here or there I would like to!
Thank you so much for your kind comment!
I’m very happy to help you along a little if you want to go on a Rust adventure with Xee!
Seems like a really good candidate for a call on This Week In Rust’s call for participation: https://github.com/rust-lang/this-week-in-rust/blob/master/draft/2025-04-02-this-week-in-rust.md#cfp---projects
What are you using xml for? Is it a design choice or just for legacy interop?
I myself am more interested in implementing it than using it!
As I point out in the blog post, XML is a niche but a sizeable niche; a lot of people use this technology for document processing for instance. My customer Paligo who very generously funded this work is interested in ensuring that implementations of these standards are accessible in a variety of ecosystems.
Rust in particular enables a whole bunch of possibilities that you don’t get with Java, which has the most capable implementations. A simple example of such a possibility is that someone yesterday downloaded a compiled binary of the Xee CLI tool to Windows and it just worked (even though I did all the work on Linux). That’s quite different from having to have a Java stack installed.
Xee being in Rust should make it easier to integrate support for these standards in other programming languages, in web services, and even the browser through WASM.
I think there’s value in having a thoroughly specified set of standards to build on; it allows more durable and interoperable tooling.
Personally I really like XML when I want to markup inline text in a whitespace preserving way. For example, when writing tutorials, I like to generate every stage in the code’s progression from a single xml file. It’s not a very common usecase, but it’s great in that niche!
This actually made my weekend. I’ve been digging into ePub parsing lately for a digital library project I’m working on (think Calibre, but like .01% of it at the moment), and I’m at the point where I want to extract metadata from ePub files.
For those of you that don’t know, an ePub is really just a zip archive full of XML files to describe metadata and contents, and XHTML files that contain the contents. I’m also doing this in Rust and haven’t loved the existing options, so I’ll be giving this a try this weekend.
I am amused to see this post next to How to Write Blog Posts that Developers Read in the listing of posts right now. I wrote this blog post before I read that, but did read it before posting it. So I was able to hold it against the question asked there.
Most software bloggers never think to ask, “Is there a wider audience for this topic?”
This is exactly what I tried to do in the introduction section, because I had the daunting problem that my post talks about XML.
Now hold on. Your brain might shut down when I talk about XML. I totally get that XML may not be your cup of tea. But I’m also going to be talking about a strange different world of technology where everything is specified, and the implementation of a programming language using Rust, so I hope you still decide to read on if those topics could interest you.
I wondered whether I had done a good enough at job at this though, so the sight of my post just under the post that caused me to question myself the very next day is striking enough for me to want to highlight it.
It worked for me. The XPath and XSLT from the title brought back some unpleasant memories, but on the other hand, also sparked an echo of the optimism from back in the day. When we all thought that we would solve things thoroughly once and for all with XML. (With ‘we’ I meant other, smart people. Definitely not myself.)
The wording you used and the interleaving of history, personal experience and the current project made it entertaining enough to read it all the way to the end and made me briefly wonder what role the two languages could play in my projects. Nothing has popped up yet so far, but one never knows.
Thank you! It’s very kind of you to report back. I have insecurities like anyone so it makes me feel good I managed to get someone to keep reading!
I wonder if there’s interest in a trimmed down configuration at build time that limits the implementation to XSLT 1.0, which is the only version browsers ever implemented. No browser is going to implement anything newer.
I honestly hope this work makes it possible for browsers to reconsider
That is very, very unlikely. The interaction model of XSLT during parsing is incompatible with the DOM and JS’s runtime without major refactoring or rebuilding of event loops. There’s a reason the worlds of XSLT beyond 1.0 and the web diverged.
I don’t know about XSLT but I’d be interested in this being restricted to XPath 1.0, with stdlib extensions.
As far as I’m concerned XPath 1.0 was a rarely rivalled height of taste in DSLs, it does one thing and it does that thing amazingly well. It was hampered by stdlib limitations but most implementations allow providing your own functions so that was generally solvable.
The other thing that’s been floating at the back of my head for years but I’ve had no use for is implementing the match/patch model of XSLT in a langage worth using, so having the traversal / match / patch features as a library and writing the templates in, say, Ruby. Not as something portable, but because the host program is in Ruby and writing and debugging Ruby is so much better than xslt. Replace Ruby by whatever langage, point is, being able to use the declarative XSLT transformation model without having to use the XSLT programming language.
Elsewhere I commented about XPath 1.0; there are a few compatibility considerations in the spec that aren’t implemented yet, and we’d need a subset of the parser somehow. How to subset a chumsky parser is an interesting use case I hadn’t considered before.
The way standard library functions for XPath in Xee are quite nice I think, using Rust macros:
#[xpath_fn("fn:node-name($arg as node()?) as xs:QName?", context_first)]
fn node_name(
interpreter: &Interpreter,
arg: Option<xot::Node>,
) -> error::Result<Option<ast::Name>> {
Ok(if let Some(node) = arg {
interpreter.xot().node_name_ref(node)?.map(|n| n.to_owned())
} else {
None
})
}
Right now it’s just good enough to support the standard library, but it shouldn’t be difficult to make it possible to write extension libraries as well, and harden the macro a little bit more.
Using the way templates are matched in XSLT outside of XSLT is an interesting idea!
Right now it’s just good enough to support the standard library, but it shouldn’t be difficult to make it possible to write extension libraries as well, and harden the macro a little bit more.
TBF the extensions to the standard library make that much less necessary e.g. one of the most common missing pieces in xpath 1.0 is matching on a space-delimited token like CSS classes:
contains(concat(' ', normalize-space(@class), ' '), concat(' ', $needle, ' ')
is pretty awful, it’s verbose, it’s fiddly, and it’s hard to remember. xpath 2.0’s standard library makes that
tokenize(@class,'\s+')=$needle
which is a bit weird because of the interaction between sequence and operators but much better, and with xpath 3.1’s standard library it’s
contains-token(@class, $needle)
which is as good as it gets without built-in syntax.
Using the way templates are matched in XSLT outside of XSLT is an interesting idea!
Yeah, my first job involved a lot of XSLT and it left me both impressed by the transformation model and disgusted by the language to this day, I used to literally dream of writing templates in Haskell instead (after I’d left that job and discovered a liking for learning about PLs).
As far as I’m concerned XPath 1.0 was a rarely rivalled height of taste in DSLs, it does one thing and it does that thing amazingly well.
Due to its use in Yang, I’ve had to battle with XPath 1.0 a few times, it can be a bit of a mind bender:
It always felt a bit like writing stuff in a weird form of assembly language.
Ruby
Speaking of, I wonder how hard it’d be to get a XEE-backed gem to be compatible with the nokogiri interfaces.
You’d need to translate CSS selectors to xpath ones, but that isn’t necessarily difficult.
In return you’d have a much-improved security posture (there have been lots of CVEs for the libxml/libxslt components of nokogiri).
No browser is going to implement anything newer.
Why not? Among other things, XSLT is really useful for viewing configuration files. You just open the config file and see a human-friendly representation. XSLT 1.0 is often sufficient, but sometimes not. Same for data files.
less
is really useful for viewing configuration files. You just open the config file and see a human-friendly representation. If it isn’t human-friendly, it’s a bad configuration file.
XHTML generated by XSLT can show you documentation, explain what particular parameters and values mean, what other options are, add links to external resources etc. And the config file still contains pure data. User can make the config as short as possible, leave only needed parameters, and it will not affect the documentation, because it is in a separate file.
Really simple example: http://frantovo.cz/disk/temp/xslt/
And with XSD or other schema, you can provide documentation in machine-readable format, so it can be displayed in an editor or IDE. This is also nice example of M×N – you have M editors and N formats. XSD serves as a bridge between them – editors need to implement a single (meta)format and will support (validation, completion, documentation) any configuration or data format. And authors of formats need to provide a single description of their schema to get support in any editor. The result is support for M×N combinations, while only M+N implementations were needed.
What alternatives we have? If you put the documentation in comments inside the config file, it will immediately become cluttered, incomplete, outdated… If next version adds some new parameters, who will maintain these comments and keep old configs up to date? External schema (and stylesheet) allows updating data and schema independently. Editors also do not understand that comments and will not tell you what parameters are available, will not validate the config file.
If a config file is hard to read then it’s hard to make changes and hard to review changes.
Comments in config files are important because the reason for a particular setting can be subtle in ways that are not explained in the documentation, eg because they depend on specific details of where it is deployed.
I can’t see your link because it redirects to a different URL with broken TLS.
I can’t see your link because it redirects to a different URL with broken TLS.
It doesn’t redirect for me. It’s likely your browser or an extension such as HTTPS Everywhere doing that.
It’s because the server says to use h2c and browsers don’t support h2c (cleartext) only h2 (HTTP/2 over TLS).
It was just plain HTTP (no encryption). It works e.g. in Firefox. However, I just moved it to another server with both protocols.
(it may take some time to fully propagate through DNS)
Right now the XPath 3.1 implementation doesn’t implement the XPath 1 compatibility profile but that shouldn’t be an inordinate amount of work. That said, I think that makes it possible for an XPath 3.1 interpreter to execute XPath 1 expressions, but it would still be a superset. Making a proper subset would be a lot more work, plus then there’d be all this extra stuff in there you’d not want if you want a minimal implementation in a browser.
This whole subset/superset of XPath is an interesting problem though, as XQuery is itself a superset. So it’d be very interesting to try to support this at the parser level.
I’m hoping a trimmed WASM version could make this technology more feasible in the browser for those who do want it.
woahhhh a new XSLT engine! XSLT rules, but having to run java to get saxon is… less than ideal.
XSLT 3 and not the empire of Saxon?!
This is the most exciting thing I will see this month.
I cannot promise to help out soon because I’m under a mountain of other things. But I’m interested and definitely interested as a user as well.
Back during my first technical writing job (at an IoT startup) I cobbled together some XSLT that transformed Doxygen’s XML output into stripped down HTML. We didn’t need a lot of the info that Doxygen provided. That was my first experience with declarative programming and it had a deep but subtle effect on my programmer worldview. Happy to hear that people are improving XSLT tooling. I definitely remember the tooling feeling quite clunky and out-of-date.
This is interesting to me as my Python tool pyastgrep is based on XPath, using lxml
https://github.com/spookylukey/pyastgrep?tab=readme-ov-file
Thanks for sharing!