Activating Two Trap Cards at Once, or: A Gentle Response to the Popularity of Vibecoding

30 points by Corbin

bakkot

This is a convincing refutation of the claim that vibecoding allows anyone to perform arbitrary tasks with only trivial effort expended. I don't see many people making that claim here, but I guess it's nice to have a refutation to hand in case it does come up.

I don't see this as a refutation of any stronger claim, such as "some people can use LLMs to complete many tasks faster than they personally otherwise would have been able to do without loss of correctness", which is I think where the meat of it is. I skimmed all the tasks to see if any of them would be a fun way to spend an hour or two, didn't feel like it for any of them except the Menagerie one (specifically targeting wasm), but then I had other stuff happening in my life and never actually tried. So... what are we to take from this? That LLM-assisted coding takes enough effort that putting a semi-specified task in front of a few thousand people will not cause any of them to perform arbitrary tasks in exchange for no incentive except to prove that they can? OK.

milkcanworld

Agreed. I'm not sure what this proves except that people aren't eager to spend their (spare, unpaid) time and actual money (token cost) on ill-specified programming Saw traps that reek of the overwhelming bias of the author against the tools being studied (to be clear, I also share a pretty strong similar bias, but it undermines the "study" when all prose and experimental design is dripping with this). At least to me, it is hard to read this as a good faith attempt at testing the abilities of LLMs or lack thereof, even if that was the author's intent.
- Corbin
  
  Quoting myself from previously, on Lobsters:
  
  I purchased high-RAM machines for my homelab about three years ago, wrote several (five?) parsing harnesses for models like Llama, tried out several code-completion, code-insertion, and code-generation styles; wrote and tested three RAG pipelines, independently tried replicating several papers, and have generally been testing any claim that I can find a way to apply to a local LLM.
  
  I cannot give professional advice to people who pay me without knowing how their technology works. I have had to learn how LLMs work. I have tried repeatedly to ask people to read papers and consider the limitations of their tools; since that has not worked, we are now at the moment of Linus' law and Linus' principle.
  
  I expect to be able to knock out these projects over the next few weekends of February and March. The second and third tasks require research which I haven't done yet, but I still expect them to only take a few working days in total. None of these are urgent.
  
  There was already a moment of good faith in 2023 which faded as I realized that OpenAI is years behind the state of the art. Now it has come out, currently, on Lobsters, that Anthropic is also years behind the state of the art. Neither company has a stable financial outlook and becoming their customer likely means damaging one's cognition, as I speculated about previously, on Lobsters. My position is now that, if you think that any sort of vibecoding harness can do my todo list better than me, you ought to demonstrate it rather than expect me to waste my money and cognition on a scam.
  - bakkot
    
    I don't think you should draw any strong conclusions (or expect anyone else to do so) from the fact that no one thinks "convince @Corbin that LLMs can write code" is worth spending much of our own time and effort on, especially if already-existing examples haven't convinced you.
    
    The stuff about the financial outlook of big AI labs isn't relevant to the question of how well LLMs can write code today. Even whether they damage your own cognition isn't relevant to that questions. Those are certainly relevant to different interesting questions, like "is it a good idea to make use of these tools", but this has no bearing on how well they can write code today.
- Corbin
  
  This was authored for the blog carnival. I didn't see a big summary post yesterday, so I'm submitting this one on its own.
  - veqq
    
    Unfortunately, there were only 2 other submissions:
    
    https://bgslabs.org/blog/why-are-we-using-markdown/ by @bgs_
    
    Spoken Latin by @fpsvogel
    
    I wasn't sure how to proceed and ended up posting one mentioning the other (as it doesn't fit Lobster's content policy alone).
- quad
  
  There wasn't a stampede of folks asking their agents to complete these tasks.
  
  Sorry, where are the attempts linked?
  - Corbin
    
    In comments here and on GitHub on the original two posts, previously on Lobsters (1, 2).
- alexjurkiewicz
  
  Didn't you submit this weeks ago?
  - hyperpape
    
    You can check his stories from his profile, and find the answer (it appears he did not--he submitted the challenge, but not this writeup).