A Perplexing Javascript Parsing Puzzle

29 points by hwayne

fanf

I think part of the answer to the mystery is,

Netscape’s JS engine recognises <!-- as a comment
as mentioned in the link in the post about wrapping JS in comments, the convention was to write the HTML comment terminator inside a JS comment like //-->, not bare as in the article itself
so Netscape’s JS engine needed a special case for the HTML comment starter so that it was possible to hide scripts from non-JS browsers, but it didn’t need a special case for the HTML comment terminator since it could already be hidden from the JS
the remaining mystery is why bare --> was added to JS nearly two decades later

jessicah

I suppose if you’re not going to parse the script tag contents with an HTML parser anymore, then both the start and end tokens need to be valid tokens in the javascript parser. Otherwise you’d need to parse the script contents with the HTML parser, and then reparse that parsed HTML with the javascript parser.

And I expect that is to be able to write a spec that requires switching parsers mid-stream.

The interesting aside for me here is that there are probably still old pages that don’t necessarily have the HTML comment end token with all those requirements met.
- ubernostrum
  
  It gets more fun than this in later revisions as HTML and browsers both evolved. By the time you get to HTML 4, the contents of script were defined as CDATA content, meaning that certain types of characters with special meaning to HTML/SGML – like < and & – are not interpreted as their “HTML” meanings and do not have to be escaped.
  
  But then in XHTML 1.0, supposedly simply an XML “reformulation” of SGML-based HTML 4, the contents of script were defined as PCDATA, meaning that those special characters were interpreted according to their “HTML” meanings and did have to be escaped! Which meant that inline script content in XHTML documents (since they also typically had to be interpretable as non-XHTML HTML due to browser limitations) would have to do explicit <![CDATA[ declarations.
  
  Those were interesting times.
  - fanf
    
    Did the comment trick work with XHTML or did it actually comment out the script? (because the latter would be hilariously annoying)
    
    ubernostrum
    
    So what you would do is something like this, if I’m remembering it correctly:
    
    <script type="text/javascript"> //<![CDATA[ for(i = 0; i<10; i++) { i; } //]]> </script>
    
    And then you have a situation where:
    
    In a browser which doesn’t support XHTML, reading it with an HTML 4 parser, the // on lines 2 and 4 are just text and the CDATA begin/end bits are read but don’t actually do anything because the whole contents of the script element are CDATA anyway.
    
    In a browser which does support XHTML, reading it with an XML parser, the // on lines 2 and 4 are just text and the CDATA begin/end bits are vital because they change away from the script element’s default of PCDATA and let you have the < on line 3, as well as potentially other XML-sensitive characters.
    
    In a JavaScript engine, the // on lines 2 and 4 comment out the begin/end of the CDATA so the JS engine isn’t confused by them.
    
    By that point I don’t think anyone did the  0
    
    The answer to this puzzle is true. The reasons:
    
    JavaScript, like most languages with C-like syntax, parses a line as continuing the statement of the previous line if that statement was not already ended with ;.
    
    JavaScript’s unusual automatic semicolon insertion feature adds a semicolon after any newline when that would turn an unparseable program into a parseable one.
    
    JavaScript, like most languages with C-like syntax, parses --> as a postfix decrement operator and a greater than operator.
    
    So that program is equivalent to this:
    
    x = 1; x-- > 0
    
    The last expression evaluates to whether x is greater than zero (and then sets the value of x to zero). 1 is greater than 0, so it evalutes to true.
    
    jessicah
    
    Eek, I’ve seen JS without explicit semicolons, but that parsing page at MDN… was it really worth all that extra complexity to allow implicit semicolons? Rules with exceptions everywhere!
  - olliej
    
    Oh right, I remember implementing those blasted comments in JSC many many many years ago.
    
    I misread the question though as having --> 0 being “this is the output” rather than this is still part of the input. To be fair to the author, I’m not sure how they could possibly right this example in any other way, and this is just muppetry on my part :D
    
    freddyb
    
    I misread it as this too