Google is killing the open web
78 points by kngl
78 points by kngl
This is well written and has a lot of good history but it’s just not right about one fundamental premise: big tech didn’t wage a war on XML. Developers ran away from XML as fast as they could because it fucking sucked. It went SOAP -> XML -> REST+JSON. Only a tiny minority cared enough to shout “but what about XSLT!” Then the pendulum swung back in the other direction 10% with GraphQL (schemas, types, etc).
in 2015, the WHATWG introduces the Fetch API, purportedly intended as the modern replacement for the old XMLHttpRequest;
“purportedly”? Would the author argue that anything about fetch
is less modern than the hacked-together, ridiculous XMLHttpRequest
API? I was writing about XHR back in 2005 in the early days and this API was a nightmare. It had the trifecta: terribly named, extremely hard to use, and awful interoperability. No one has touched XHR directly (not through an abstraction) since the 2000s. fetch
is absolutely, incontrovertibly the modern replacement. Not purportedly.
prominently missing from the new specification is any mention or methods to manage XML documents, in favor of JSON that instead gets a dedicated document body presentation method;
fetch
for XML. XML isn’t elevated to the platform level of JSON with Response
helpers like .json()
(https://developer.mozilla.org/en-US/docs/Web/API/Response/json) but why should it be?Don’t get me wrong, this is a great article. I love it. The author never let go of the XML hype from the 2000s and has intermingled their love of it with their nostalgia for the open web from the 2000s, which is adorable but not entirely accurate. We all killed XML.
“purportedly”? Would the author argue that anything about fetch is less modern than the hacked-together, ridiculous XMLHttpRequest API?
I think “purportedly” here is linked to “intended”, implying that modernization wasn’t the primary goal. Not that I agree with that, but that’s how I read it. I don’t think the author is suggesting that fetch is less modern or less of a replacement.
This is well written and has a lot of good history but it’s just not right about one fundamental premise: big tech didn’t wage a war on XML. Developers ran away from XML as fast as they could because it fucking sucked.
I think both things can be true.
I think both things can be true.
Big Tech discovered that the promises of Web 2.0 (widespread interoperability between disparate sites via XML-based APIs) didn’t make them money. The technology was incidental, but it didn’t help that XML was heavily promoted by a certain kind of enterprise software vendor and thus seen as deeply uncool.
As usual, the reasons web 2.0 died were economic, not technical.
I think “purportedly” here is linked to “intended”, implying that modernization wasn’t the primary goal.
A statement which is even more nonsensical, and outright kooky. As well as insulting to annevk.
The author’s support of xslt on grounds that “it’s a standard” is also hilarious when that “standard” is entirely captured by Saxonica.
No one has touched XHR directly (not through an abstraction) since the 2000s.
You’re missing out, it is actually perfectly fine nowadays.
var x = new XMLHttpRequest();
x.addEventListener("load", function(event) { console.log(event.target.response); });
x.addEventListener("error", function(event) { console.log("get failed"); });
x.open("GET", "/");
x.send(); // for POST you can send the body here, including a FormData object
You can also use the shorthand onload
and onerror
properties but i like writing it out long form for more consistency.
They’re really not missing out on anything. Here’s the same call with fetch:
fetch("/").then(r => r.text()).then(
t => console.log(t),
e => console.log(e),
);
Not quite exactly the same (response
dynamically parses different content types, so it auto-parse XML, JSON, etc. as needed, whereas your fetch always returns the text, but there’s also those json() and formData() promise functions too, but no xml() alas), but yeah, close enough.
My point is really just that XHR isn’t actually that bad. Its name… ok, it is kinda weird since it isn’t actually tied to XML, but the HttpRequest part is certainly fine, so I’ll grant 1/3 true there. But on the other points, it is not extremely hard to use, and does not have awful interoperability. This simple case here and your fetch case aren’t that different!
OK, yeah, in 2005, it was jankier, you couldn’t just new XMLHttpRequest
, you had to do a… i think three level polyfill then on the constructor. And the events weren’t as compatible there so it wasn’t just the constructor you had to account for. And you couldn’t just send(formdata)
or use URLSearchParams
back then either, which was kinda janky. So maybe back then, I’d agree.
But it hasn’t been like that for ages. The xhr api itself has changed, and the surrounding browser apis have changed and they work pretty well together now. But if you haven’t touched it since the 2000’s you might not know that….
The big thing is that XHR has a callback-based API, so if you’re using async/await like most modern JS does you have to manually wrap it in a Promise. fetch
is natively async.
This also means you can do something like
try {
const response = await fetch(url);
if (!response.ok) {
throw new Error("oh no");
}
const result = await response.json();
console.log(result);
} catch (error) {
console.error(error.message);
}
to easily have the same codepath for “fetch failed” and “fetch succeeded with a non-OK status code”.
Very pertinent observations indeed. Perhaps it would have been more accurate to focus on RSS/atom conceptually rather than the technicality that it was an XML file. Truth be told, it could as well be a JSON file, had they existed back then.
It was a structured machine readable way of pushing content to the consumer with a standardized format. The details of the file format are not that important. XLST never caught up and in my opinion is not an important piece of the open web. RSS feeds on the other hand, that is how podcasts started. That was the peak of aggregation of content from unrelated sources.
Atom being XML was very nice in theory, because it was easy to embed. Many, many years ago, I wrote an Atom -> XMPP bridge, which fetched Atom feeds, split them into entries, and sent each as an XMPP element. Used in this way:
The client I was using understood a fairly small subset of XHTML, but that was fine, because I could also open the link in a browser to see the full page. This was the promise of XML: They you could embed any XML document in any other XML document with graceful fallback for any unsupported elements.
The down side of this was that, it turned out, a lot of Atom wasn’t quite conformant XML because it included HTML not XHTML. Part of the reason for this was PHP. An increasing number of web pages were created with PHP and similar tools that generated HTML with string concatenation, rather than things that build a DOM and serialised it. This was faster, but made it hard to validate the HTML that you were creating (and also hard to fix issues that validators reported).
The push for XML was driven by the same people who really wanted the semantic web. In their model, web sites would provide structured data in self-describing formats and some default rendering. User agents (not just browsers) would take the default rendering as a hint but could also present very different representations.
This model was an existential threat for Google. In a semantic-web world, ad blocking was as simple as not rendering the elements that were described as ads. They had a big incentive to move people from providing structured data with a default rendering to Turing-complete renderers where the only operation was to run the program. They took a lot of the dissatisfaction with the HTML 4 -> XHTML 1 transition and weaponised it against the whole idea of anything related to XHTML.
They were helped a lot by the fact that a lot of the technologies that they were arguing against were terrible. XSLT is just an awful language in every possible way. Strict XHTML compliance was a pain. SVG was ludicrously verbose and not even good XML (it basically encodes lightly modified PostScript command sequences in attribute text in a bunch of places).
This model was an existential threat for Google. In a semantic-web world, ad blocking was as simple as not rendering the elements that were described as ads. They had a big incentive to move people from providing structured data with a default rendering to Turing-complete renderers where the only operation was to run the program. They took a lot of the dissatisfaction with the HTML 4 -> XHTML 1 transition and weaponised it against the whole idea of anything related to XHTML.
They were helped a lot by the fact that a lot of the technologies that they were arguing against were terrible. XSLT is just an awful language in every possible way. Strict XHTML compliance was a pain. SVG was ludicrously verbose and not even good XML (it basically encodes lightly modified PostScript command sequences in attribute text in a bunch of places).
Okay, that is actually really helpful. I am too young to have known or cared about this when it was happening, so this extra context helps me understand more why killing XML became a goal.
Truth be told, it could as well be a JSON file, had they existed back then.
I wish spec for https://www.jsonfeed.org was better maintained, I’d strongly prefer to use that over Atom otherwise!
The spec is simple enough that I hand-rolled a JSON feed for my microblog.
Yeah, that’s why I want to use this thing, but the fact that the spec repo is essentially dead and the state of the landing page give me big pause.
I added a JSON feed just because a few years ago. It wasn’t hard, so I don’t see why you don’t just add it. On my site, the ATOM feed is the most used, followed by RSS, followed by JSON (yes, it is being fetched).
I don’t think this article justified the title. It’s mainly about Google Chrome’s 10+ year mission to deprecate XSLT from Blink.
I don’t believe there is a thing as the open web anymore, it’s been at least 15 years since Google has killed it by becoming the dominant browser implementation and bloating the spec such that no one is ever able to compete.
So far websites still exist as namespaces inside the Chrome platform, but who knows until when.
This XML point is a minor detail.
As a survivor of the last browser battle I agree and disagree. MSIE 6/7/8 sort of did the same, including letting everyone to build alternative browsers on top of their engine.
But, when the time had come we all collectively chose the subset of the features we though would move the web forward and we had cut the slack. Yes, at a cost, but it appears to have been worth it.
Nothing stops collective us from doing the same to Chrome when the time is right.
I don’t know where you saw this collective action. Chrome won the browser wars, in great part because they had google.com, and Chrome decided everything. The best that can happen now is some other browser dethrone them, but that feels unlikely.
In any case that won’t help much. The biggest problem is this general understanding that “the web” is a million JavaScript APIs powering bloated websites and that increasing that number and doing more and more things as webapps is a good thing.
Before chrome “won the war” Firefox did, before Firefox did MSIE did, Before MSIE did Navigator did. The war is never “won” it’s just waiting till the next browser comes out on top.
This is a good point, but it’s also import to consider that Google owns not only the dominant browser, but also a substantial and integral backbone of the internet as the majority of people use it. When Firefox, MSIE, and Navigator were at their zenith, they didn’t have such a vertical monopoly.
I don’t really buy the idea that holding more of the vertical increases the switching cost substantially enough that people won’t switch if the offering doesn’t do the job at level of quality that is acceptable. I think the real problem is that while quality has gone down it has effectively gone down across all the offerings and as a result we’ve hit a stable equilibrium where nothing else out there is substantially better enough to merit switching.
It used to be that the worm turns and you just had to wait for someone else to be on top – maybe you! But superimpose that on Moore’s Law and realize there comes a point when the power is so great it can prevent any future worm-turning. We’re very close to that now, both in tech and in politics in general.
I don’t see what Moore’s law has to do with it. Can you expand on this?
Basically, with exponential growth in compute power there comes a time when whoever is in power at that time can prevent any future changes in the power structure. Not quite yet? Wait a for a couple more doublings.
An actual good solution would be something like this: https://joeyh.name/blog/entry/WASM_Wayland_Web_WWW/
This is really what the Web Application (vs Web Page) people have wanted all along, and if they had gotten it this way in the first place, the document based Web might not have been collateral damage the way it has been.
I don’t think that’s true at all. The success of web applications is largely based on the DOM doing a lot of the hard work (layout, drawing, accessibility trees, etc) for you, without you needing to do everything yourself. The DOM is in large part why Electron took off.
Very complex applications eventually need to move away from this model for various reasons, and manage layout and drawing themselves with the canvas, but that’s significantly more complex to get right, and usually takes a lot more time and resources.
All of that stuff can be done by various libraries that exist. So many you can’t even count. And how many more would exist if more people used them, and how much better they would be!
Meanwhile it’s still impossible to do rich text editing, a combobox, a database or rendering a long list in the browser in a way that isn’t awful.
There are libraries for all of those things as well, though - and usually much lighter and simpler than, say, pulling in an entire accessibility tree library and getting it to sync up correctly with the DOM so your application can be used by screen readers. Which is why truly accessible canvas applications are somewhat rare, while comboboxes are ubiquitous.
From my own experiences of building applications of the sort of complexity where canvas starts becoming a viable option, it’s difficult getting everything right. In the end, we opted to keep with a DOM-based implementation because we just didn’t have the manpower needed to support all of that complexity. It is the right option in some cases, and if you do it right, it can produce a much better experience than is possible with just DOM APIs. (In our case, just having complete control over how scrolling worked would have made the application feel much smoother to use, let alone other features.) But if you don’t get it right, the result is usually much worse than a DOM-based implementation because you lose so much that now needs to be done by hand.
But the HTML canvas isn’t the right abstraction. It is too limited and slow. And of course since basically no one uses it then making things there is painful with lots of rough edges.
Still, libraries for rich text editing are always broken in one way or another (except maybe Google Docs) and comboboxes, for example, have libraries that only work on React etc., so now we’re already assuming a UI framework on top of the DOM.
That feels like throwing the baby out with the bathwater.
At the very least, we would all lose a lot of flexibility in being able to block ads, make default font choices, enforce minimum font sizes, use screen readers on the majority of sites, etc
Right, but those are all bonuses for the webapp people; they never wanted you to be able to do those things in the first place.
All those things could be provided too, as long as the app providers implemented some standard interface.
Blocking ads would probably be even easier since in this magic world it’s unlikely that browser extensions would be so horribly limited by Google.
On the internets, where else. Firefox started the avalanche, Chrome finished. As easy as that.
RSS and other XML-based technologies such as Pingbacks were the backbone of blogging […]
Pingback’s completely unrelated to XSLT, right?
XSLT is an essential companion to RSS, as it allows the feed itself to be perused in the browser
Essential? I know some sites that do it, but they’re pretty rare. When talking to people about XSLT in the past it was usually seen as a novelty. I can just view the front page of your blog in my browser, paste the link into my feed reader, and it will automatically discover the feed. Sure, being able to view the feed itself is kinda neat, but that doesn’t feel too important?
And that’s just the beginning: as I’ve shown on this same site, it’s possible to use XSLT to plot XML data […]
My gut instinct was “just generate this server-side”. The sparkline at the bottom of the article is indeed an SVG, but after clicking through to another example I acknowledge this is pretty cool. I like this, I wish it had worked out. I don’t think I’ve really seen anyone do this though?
I’m not really sold on why we need this.
I helped create Pingback. The only relationship to XML was that it used XML-RPC as the delivery mechanism.
With hindsight I don’t think it was a great idea! It was effectively an API for SEO spam.
With hindsight I don’t think it was a great idea!
I do think it was a great idea, except for the “let anyone instantly add a link on your site” part :)
I’ve seen some people use a moderated pingback implementation (where they have to approve domains) and it’s led me to some neat blogs I wouldn’t have found otherwise.
A basic sanity check that we didn’t do back in the day but modern implementations do is check that the source of the pingback/webmention actually does link to you
My 22-year-old implementation did that:
The PingBack server can then grab your page, check that the link is there and extract a title and short description from the blog.
It’s very easy for a spammer to serve a page that includes a link back to the page they are spamming, then remove that link again a few seconds later after the pingback has been verified.
I’m not really sold on why we need this.
The question is rather if you’re sold that we absolutely don’t need it. Otherwise the same can be said of a whole lot of browser features.
It’s also not relevant to the main point here which is that Google et al. are deciding what the web needs. XSLT may not feel worthy of your defense but it could just as easily be something you like on the chopping block.
One note about this article: HTML is not a SGML dialect and never has been. HTML was certainly inspired by SGML, but that was it and there were various differences (especially things HTML didn’t have that SGML did). At one point the W3C wrote down that ‘HTML is a SGML dialect’ (apparently in HTML2 through HTML4), and then browsers ignored it and didn’t parse HTML as SGML (with all of the more obscure SGML features that would imply). Eventually the HTML5 work put a stake into that, as expressed in eg section 13.2, Parsing HTML documents, explicitly calling out that HTML cannot be parsed and handled as a SGML dialect or with SGML rules.
My only disagreement is they call Google and Microsoft Technology companies. They are both Ad companies that employ developers.
Microsoft has some ad products but calling it an ad company seems like a stretch.
I would have agreed, but I had the misfortune of having to run Windows when I worked there and these days Windows is basically an electronic billboard that makes a half-arsed attempt to run applications as a sideline.
Just to add some numbers: in 2024, Microsoft claimed 245,122 M total revenue, only 12,576 M (5%) of which is “Search and news advertising” revenue. Source: Microsoft’s 2024 10-K filing, page 94.
Meanwhile “Google advertising” is 76% of Google’s total revenue. (The other two big ones are “Google subscriptions, platforms, and devices” and “Google Cloud” for 10% each.) Source: Google’s 2024 10-K filing (page 36)
I decided to see if I could find any numbers on how many web pages use XSLT - the best I could find was this Chrome Status chart showing the percentage of page loads that reference the XSLTProcessor JavaScript API - it looks like it’s around 0.05% (that’s 0.0005 of pages, aka 1/2000 page loads).
That chart is part of the very issue I have with arguing about removing it purely based on usage (and not even replacing it with an automatically injected polyfill) - WebUSB hasn’t been straight forward to implement and keep around as far as I know, yet that hasn’t gotten axed despite having a peak metric of 0.003%. WebMIDI has caused security issues, yet it’s kept around despite only having a peak metric of 0.004%.
1/2000 is… a lot? Obviously this’ll be unevenly distributed, but I probably load 2000 web pages in a week, if not a day.
Seems overblown. Defense of the keygen html tag especially: iirc that had a really bewildering UX.
JpegXL could have been neat, tho.
I’ve been experimenting with (very) low-bandwidth images on my website, and AVIF seems to achieve better results/smaller files at low quality settings. Especially aom with the (still unreleased?) ssimulacra tune. It seems like JpegXL might be better only for high-quality/large images, so maybe not all that useful on websites after all.
JPEG XL tends to do less smoothing/waxing over features which can be a big con depending on the image type & viewer preference. The progressive decoding is nice. The lossless compression of JPEG make it free win to many since for compat reasons you can just fetch the original JPG. Not sure about the latest data, but last year there seemed to be good benchmarks from JXL. Doing AV1 without hardware encoding/decoding is also brutal. My website I don’t want to maintain many image file types for size so I just keep the original PNG / JPEG & JXL is the enhanced version since Safari & LibreWolf (with flag) can do JXL but also as a middle finger to who Google tried to say there were no sites hosting the file types so no reason to support it—as if that isn’t a chicken–egg problem.
There is a lot of “wrong” things in this SEO article I feel, but setting most of that aside:
Google runs Feedburner which made heavy use of XSLT.
The “name and shame” section also seems really messed up. Along with the entire attitude of the article.
Here a good counter-point to the online discussion that flared up about that XSLT proposal by long-time web standards observer Eric Meyer: https://meyerweb.com/eric/thoughts/2025/08/22/no-google-did-not-unilaterally-decide-to-kill-xslt/