Avoid using "<![CDATA[ ... ]]>" in RSS
4 points by matklad
4 points by matklad
Not sure I understand the argument. Should we avoid using strings because, you know... we cannot put a double-quote within the string without escaping it? Isn't it like software dev 101 to know that within any pair of delimiters, you cannot write the delimiters without special handing (escaping/splitting)?
The point is that CDATA cannot be escaped. That entire construct really does not make a lot of sense for machine generated stuff because you can use XML entities to escape. CDATA only exists is a mechanism that was needed for some legacy code in the XHTML transition that ultimately failed, as well as for human authored text.
I get that. But you can split the ending delimiter into two CDATA blocks, like the OP shows. Same as </script> within <script>.
The thing is... it is well known that you can't put the same delimiter inside a delimiter-enclosed block. You either escape it or split it across blocks. That's the same thing we do everywhere we have delimiters. But we don't treat that as a reason to stop using delimiters, do we?
The author says to do this:
return text.replaceAll("&", "&").replaceAll("<", "<").replaceAll(">", ">").replaceAll('"', """).replaceAll("'", "'");
But why is that better than just this?
return text.replaceAll("]]>", "]]]]><![CDATA[>")
I don't understand how replacing one .replaceAll() with another makes the solution better or worse. If the OP said, "go for a proper XML DOM library", I'd see their point. But I don't see the point of avoiding one .replaceAll() and going for another .replaceAll().
I get that. But you can split the ending delimiter into two CDATA blocks, like the OP shows. Same as </script> within <script>.
Note that script within script is escapable because JavaScript allows <\/script>. JavaScript in fact funnily enough supports HTML comments even in two forms. Those are in fact escaping methods specifically embedded in JavaScript to enable the unambiguous embedding within a native CDATA block in SGML style HTML.
But why is that better than just this?
In practice because html escaping is a very optimized function that usually exists within a system already, and CDATA is a fossil that only existed because people needed a way to hand author XHTML documents (and maybe docbook) and DTD derived CDATA sections don't exist in XML. Since XHTML is effectively dead I don't see a strong argument for CDATA sections.
Fwiw I agree. It's six of one, a half-dozen of the other: both do the same work in the same way.
I am not really convinced. CDATA is much easier to use in a static site generator template, and I think zero of my atom feed posts contain ]]>. it is a funny gotcha, but I’m not sure it has any effect in real life.
(of course, ironically, this particular post actually contains the sequence)
is much easier to use in a static site generator template
Your static side generator does not have a way to escape XML entities? That sounds … less than optimal.
of course, ironically, this particular post actually contains the sequence
It's funny how that happens, isn't it! The hardest thing to do with any language is to use it to talk about itself. If you can't, you've got a templating language. If you can, you've got an embedding language.
The great thing about an embedding language is that it can serve as a backend for semantic frontends. I can write this post that contains & and > and < and " and ' and ]]> and <!-- becuse I'm using a semantic editor: it lets me figure out the content I want, and behind the scenes the encoded form is generated and used
If you use a templating language that has an operator for rendering a string in a safe manner (ie by escaping it), then it's trivial to get the behavior that you want.
For example in my static site generator (Zine), instead of using:html (which would render html verbatim), you use :text, like so:
<description :text="$page.content()"></description>