Goodbye innerHTML, Hello setHTML: Stronger XSS Protection in Firefox 148
61 points by freddyb
61 points by freddyb
Nice work! I'm really looking forward to seeing checkmarks spread across this table.
Same here, currently the sanitizer library is about 30% of the entire JavaScript code size that makes the bulk of my ActivityPub one user server (which is a 90% JS application).
This is a major step forward for the web! Imagine how much leaner and faster this will make frontend code when we no longer need to ship JS-based HTML sanitizers.
Though it seems like the biggest benefit for most web apps apart from the improved security will be performance boost of doing a single, native parsing in setHTML() instead of double parsing first with JS and then native in the innerHTML assignment.
Let's hope this will get good browser support as soon as possible!
Imagine how much leaner and faster this will make frontend code when we no longer need to ship JS-based HTML sanitizers.
Few things use an HTML sanitiser. You only need it if you’re dealing with untrusted HTML, which is not a lot of apps.
If you’re writing an email client that talks JMAP, you need it. (e.g. Fastmail uses, and sponsors, DOMPurify.)
If you’re writing some app that has comments from users… you actually probably don’t need it, because you handle the processing and cleaning on the server, so the client is actually receiving trusted HTML. (There are designs that can reasonably want a sanitiser, but it’s less likely, and most of the time probably inferior.)
It’s a good thing to have in the platform, but it’s actually not as widely useful for that purpose as you might initially imagine.
Yeah. Here is the current sanitizer I use for semi-trusted inputs:
export default function sanitizeText(text) {
let el = document.createElement("div");
el.innerText = text;
text = el.innerHTML;
text = text
.replace(/<b>/g, "<b>")
.replace(/<\/b>/g, "</b>")
.replace(/<i>/g, "<i>")
.replace(/<\/i>/g, "</i>");
return text;
}
It makes everything into text and switches just <b> and <i> tags back to HTML. I guess I could support <strong> and <em> too. If it allowed classes or attributes or required balancing, it would be a lot more complex.
Sorry for being flippant, but what you're telling us is that many things use HTML sanitizers, only they're usually in the backend.
Looks like that is what that's saying. Why is that a problem? Do you have data that says otherwise? That most apps actually use a "JS-based HTML Sanitizer", to quote the root comment?
Also, I apologisze if my comment sounds flippant. I'm really curious because I don't have data on either, and in my - granted, personal and therefore entirely anecdotal - experience, I rarely ever need this on the client.
I'd argue you should never handle the cleaning on the server. A parser you use on your server and the parser of the user's browser will probably disagree
If you’re writing some app that has comments from users… you actually probably don’t need it, because you handle the processing and cleaning on the server, so the client is actually receiving trusted HTML. (There are designs that can reasonably want a sanitiser, but it’s less likely, and most of the time probably inferior.)
I would never treat attacker controlled content without sanitizing at both stages. Are you SURE that you sanitized all inputs and that some intern didn't forget to wire a customer data importer up correctly? Do you re-sanitize all database entries whenever you update the sanitizer? Having the browser itself do the checking means that it will get updated to cover new features as they are introduced. Even then, I would sandbox comments in an iFrame.
I must be out of touch with frontend development because I can't think of a situation where this would solve any of my problems.
I mean, yes it sounds good on paper, but when will I take user input and display it just on the frontend without a trip to the backend? (Let's ignore jsfiddle here)
This is predominantly nice for vanilla JS and maybe if your favorite framework adopts it (tell your framework devs you want safe html insertion!).
The Sanitizer is also a good stepping stone to figure out what "safe by default" could look like for an opt-in document setting.
For sure, even frameworks expose setting html, e.g. React's dangerouslySetInnerHTML could have a better alternative.
When you're fetching remote content that has inline HTML and you want to display that safely. For me this happens with ActivityPub objects, which have HTML for content.
When you can't control the backend.
Sometimes you're pulling data from a third party and you need to display it in your client. Sure, it's not a frequent use case, and often you can simply make a proxy backend for this, but you don't often have the possibility - or resources - to do it.
So you do this on the client.
Another thing is if you expect the user to be copy-pasting potentially untrusted data.
E.g. you're a bank website. You have a transaction where the client copied payment data from the email they got. Before they confirm it, you display the entire transaction once more for the user to doublecheck.
So if they were copying data from a phishing email, now they pasted a script that's running on your domain (with your js-root) and can read your cookie data, and send it to their masters. Probably not a big deal by itself. But read and white hat report of how shit got broken into, and you'll see that it's very often a lot of small steps like this, not one big glaring security hole.
So to prevent it, you have to sanitize that description field before you display it.
This can be relevant for any data input site, really.
Side benefits of this specific thing is that is you're doing it anyway, then you avoid the double parking. Once by your js sanitizer and the second time by the inner html parser.
And instead of pulling another npm dependency into your client, you can use this fancy new html standard. Plus it's native so you get speed. Not that it's gonna get noticed by one user, but we all always complain how JavaScript crap is slowing down everything, so now this means less build time, less download and parsing time by the browser, etc etc.
In global, we waste less CPU cycles so we saved one extra flower or snail or something.
You need to treat all customer data as attacker controlled content. You can't trust a sanitizer checking at the back-end to do the job right, because those sanitizers are decoupled from the browser that is running your code now. This is one reason why mysql_escape_string was so bad.
So anytime you don't control a piece content, the best pattern is to have the whatever is going to parse and manipulate that content (browser, database, etc) sanitize it first.
I've been waiting for this!
One question that remains for me is how much I can "get rid" of CSP by using setHTML and Trust Types. By "get rid", I mean stop with the "significant architectural changes for existing web sites and continuous review by security experts" that the article mentions even if the headers remain, as long as it's less painful.
Right now, you can use Trusted Types to say "no unsafe HTML parsing at all" and disallow innerHTML through eslint. Then you won't accidentally write unsafe code (thanks linter) and the browser will enforce that unsafe code doesn't run (thanks Trusted Types).
People at Jun Kokatsu, formerly of Microsoft Edge, did something similar by going all-in on React (safe HTML insertion unless you do the dangerouslySetHTML) and enforcing that no unsafe code runs (with Trusted Type)
The fundamental issue with innerHtml is that HTML is not a string, it's a tree. setHTML doesn't fix that. Sanitization of unstructured input is weak perimeter security that mitigates XSS but doesn't prevent it. When everything is a string it's easy to get mixed up.
setHTML is in fact sanitizing a tree. I paid attention in my langsec classes.
This is a prevention mechanism, because it ensures that harmful content will not enter your html document, not a mitigation. CSP is a mitigation because it assumes untrusted content is already within your document.
But maybe we are talking past each other, if that’s the case I would love to read a more detailed comment or a direct email :-)
I think there's plenty in the DOM API about working directly with Element and Fragments and Document trees. Since those APIs don't really have a way to shake the tree and remove potentially malicious nodes, I think this is a decent option.