A Perplexing Javascript Parsing Puzzle
29 points by hwayne
29 points by hwayne
I think part of the answer to the mystery is,
Netscape’s JS engine recognises <!--
as a comment
as mentioned in the link in the post about wrapping JS in comments, the convention was to write the HTML comment terminator inside a JS comment like //-->
, not bare as in the article itself
so Netscape’s JS engine needed a special case for the HTML comment starter so that it was possible to hide scripts from non-JS browsers, but it didn’t need a special case for the HTML comment terminator since it could already be hidden from the JS
the remaining mystery is why bare -->
was added to JS nearly two decades later
I suppose if you’re not going to parse the script tag contents with an HTML parser anymore, then both the start and end tokens need to be valid tokens in the javascript parser. Otherwise you’d need to parse the script contents with the HTML parser, and then reparse that parsed HTML with the javascript parser.
And I expect that is to be able to write a spec that requires switching parsers mid-stream.
The interesting aside for me here is that there are probably still old pages that don’t necessarily have the HTML comment end token with all those requirements met.
It gets more fun than this in later revisions as HTML and browsers both evolved. By the time you get to HTML 4, the contents of script
were defined as CDATA
content, meaning that certain types of characters with special meaning to HTML/SGML – like <
and &
– are not interpreted as their “HTML” meanings and do not have to be escaped.
But then in XHTML 1.0, supposedly simply an XML “reformulation” of SGML-based HTML 4, the contents of script
were defined as PCDATA
, meaning that those special characters were interpreted according to their “HTML” meanings and did have to be escaped! Which meant that inline script
content in XHTML documents (since they also typically had to be interpretable as non-XHTML HTML due to browser limitations) would have to do explicit <![CDATA[
declarations.
Those were interesting times.
Did the comment trick work with XHTML or did it actually comment out the script? (because the latter would be hilariously annoying)
So what you would do is something like this, if I’m remembering it correctly:
<script type="text/javascript">
//<![CDATA[
for(i = 0; i<10; i++) { i; }
//]]>
</script>
And then you have a situation where:
//
on lines 2 and 4 are just text and the CDATA
begin/end bits are read but don’t actually do anything because the whole contents of the script
element are CDATA
anyway.//
on lines 2 and 4 are just text and the CDATA
begin/end bits are vital because they change away from the script
element’s default of PCDATA
and let you have the <
on line 3, as well as potentially other XML-sensitive characters.//
on lines 2 and 4 comment out the begin/end of the CDATA
so the JS engine isn’t confused by them.By that point I don’t think anyone did the <!--
trick anymore because the “browser doesn’t support JS” case was handled in the specs.
Here’s an explanation of the slightly simpler puzzle that this puzzle initially seemed equivalent to:
What does this print?
x = 1 x --> 0
The answer to this puzzle is true
. The reasons:
;
.-->
as a postfix decrement operator and a greater than operator.So that program is equivalent to this:
x = 1;
x-- > 0
The last expression evaluates to whether x
is greater than zero (and then sets the value of x
to zero). 1 is greater than 0, so it evalutes to true
.
Eek, I’ve seen JS without explicit semicolons, but that parsing page at MDN… was it really worth all that extra complexity to allow implicit semicolons? Rules with exceptions everywhere!
Oh right, I remember implementing those blasted comments in JSC many many many years ago.
I misread the question though as having --> 0
being “this is the output” rather than this is still part of the input. To be fair to the author, I’m not sure how they could possibly right this example in any other way, and this is just muppetry on my part :D