HTML parsers in Portland

22 points by polywolf

Johz

My interpretation of this is that it's very hard to confidently say how much one understands of the code one has written using AI. All of the various ports were advertised as ports, but don't really seem to be, suggesting the authors were not necessarily aware of how much their version of the code had deviated from the previous iteration. In addition, the final JustHTML sample is complex, but probably doesn't need to be that complex. Last time this came up, I said I'm not confident enough in random bits of Python code to give it a proper code review, but even still, that snippet feels unnecessarily complex.

This matches a lot with my own experience, where when I or my colleagues get an LLM to write substantial chunks of code, the result typically works (at least eventually), but is often kind of crap code. And that's fine in a lot of situations — side projects, one-off scripts, demos and PoCs, etc — but it's a tradeoff that needs to be clear. And in a bunch of these cases — both ones we're reading about like this, and ones we've tried out ourselves — it seems very hard as the "viber" to get a good feel for how unmaintainable the code actually is.

EDIT: As an aside, I'm ashamed how long it took me to figure out why the author was talking about a random city in the US in the title...

rbr

Re: aside, it's pretty rare to start at the end, right? FWIW, the submission title here could've been different so to be more accurate.
- polywolf
  
  Yeah, sorry about that. This might fall under
  
  add missing information ("The bug at the heart of the npm ecosystem" → "Design flaw in npm manifests")
  
  but I was unsure since technically the pun does contain all the information needed, and the other rules for changing the title don't apply.
  - rbr
    
    Fair! I usually err on the side of discoverability/being able to estimate if something is an interesting read, but I see the case for preserving the pun.
  - Gaelan
    
    ohhhhhhhhh finally got the joke.
    
    (For anyone else still struggling, rot13: Cbegynaq = "cbeg ynaq", orpnhfr vg'f nobhg cbegvat orgjrra qvssrerag ynathntrf)
- Johz
  
  It could have been different, but I think that would have been a shame. It's not clickbait, and it's not even at all incorrect. It's nice to have some fun titles like this.
migurski

Fascinating:

Again, the coding agent clearly did not do a translation of #4. It generated an alternate implementation, which might be copied from some other OCaml HTML parser in its training data.
josephjnk

Re: the hyphenation bug, I would love to see how well the various ports compare when held up to a fuzzer. I’ve never worked with software whose test suite absolutely covered every possible observable behavior; at some point you have to not only trust that the test suite specifies the correct behavior, but also that the developer made reasonable decisions in the gaps between the tests. I find it hard to extend that second type of trust to LLMs.