Why AI Coding Advice Contradicts Itself

31 points by anup

orib

AI coding advice contradicts itself because it's cargo culted superstition overlaid on a non-deterministic, rapidly changing black box.

Nobody knows what works. And if they thought they did, it's going to change with the next model.

simonw
I agree that there's a huge amount of superstition surrounding this stuff, but we've been poking at these things for over three years now, which is long enough that some of the things that universally work are beginning to come clear.

I'd hope the following aren't controversial any more:
1. Use a coding agent harness that allows the LLM to execute and test the code that it's writing. This is the single biggest improvement you can make in using these tools, and has been since the original ChatGPT Code Interpreter back in early 2023.
2. Automated tests are not optional. If your dosing agent isn't writing and executing tests who knows anything actually works? And just like human development teams, tests are the best possible protection against regressions caused by future changes.
3. Related, use red/green TDD. Agent writes a test, runs the test to see it fail, then implements code that passes the test. This helps avoid the common pattern where agents may write significantly more code than is strictly necessary.
4. You have to provide a success condition that the coding agent can iterate towards - the clearer the better.
- intarga
  
  isn't this just generic software engineering advice? i guess it proves OP's point about the black box if the LLM-specific advice is all snake oil and we're only left with what we were already doing
  - simonw
    
    Plenty of human engineers treat automated tests as optional, and they often get away with it. For prototypes and exploratory programming I would agree with them. I've heard game developers rarely use automated tests but it's not a world I have any experience in myself.
    
    Personally I've always found red/green TDD to be a waste of my time as a human software engineer - it takes me significantly longer to produce code and the quality is no better than if I wrote the test at the same time or after the implementation. I've always preferred the perfect commit approach of bundling docs, tests and implementation together in a single commit.
    
    It offers a material improvement for agents.
  - erock
    
    Why is it a black box in this case? If anything it proves to me that using coding agents is just another tool that sits nicely in our already well established coding paradigms. Do TDD with code agents or you’re going to churn with an agent just like we churn when we don’t capture edge cases and functionality in tests
- alexandria
  
  Related, use red/green TDD. Agent writes a test, runs the test to see it fail, then implements code that passes the test. This helps avoid the common pattern where agents may write significantly more code than is strictly necessary.
  
  With the new attitude you recently reported on being "Don't look at the code, just make the LLM write tests", what is to stop the LLM from writing tests that do not do as advertised? I've heard reports of some absolutely awful, redundant tests that do not properly test the underlying code being generated. Where is the oversight for this?
- th0ma5
  
  These are superstitions, too.
  - simonw
    
    What makes you think that?
- bryfry
  
  I'm in agreement with this post. In my words: step zero is knowing how code should be written before asking an llm to write it (architecture, language, libraries, practicality).
  
  From the bottom line: The people getting good results are the ones who'd be productive without AI.
  - bediger4000
    
    Seem like a lot fewer people get good results than say they get good results. My working theory is that for people or categories of work where "good enough" is acceptable (advertising, spam, SEO, propaganda), those people actually are getting good results, or results they accept as good.
    
    If you have to do A/B testing on ads to see which one works, you didn't know what worked in the first place. Or course an LLM will produce ad copy or an image that "works well". If 0.01% of the recipients of an email come-on respond, of course LLM generated copy "works well". Good enough yields great results in these fields.
    
    bryfry
    
    For general LLM output, and specifically copy like you mentioned, it's a completely different game.
    
    I think this post focuses on code generation specifically because it has the 'good enough' paradox where it can show promise but often fall short of production grade results. This is not universally true, but I do agree that users with more skills are more likely to succeed.
  - doriancodes
    
    I love that before LLMs documentation to the benefit of other human beings was considered a waste of time at best and now you don't get enough of that for LLMs who kinda get it somehow almost always wrong at the end
  - BinaryIgor
    
    Good take; as you use and experiment with LLMs more, I think you develop some kind of instinct, a sense, where they might help you and where they fail short, taking time rather than saving it; it is quite useful when you are in full control and do not skip on understanding, writing some code by hand, some with LLMs ;) The tricky part is to stay vigilant, sharp and not rely on them too much
  - Yogthos
    
    The reality is that people are still figuring out how to use these tools effectively, and the tools themselves are also evolving rapidly. On top of that, just as with any tool, what works is context dependent. If you're working on one type of project then a particular set of tricks might work well, if you're working on a different type of project then it might not. Some things, like making the agent use TDD, tend to generalize well, but we're still figuring out what they are.
    
    Ultimately, you have to use these tools to figure out what works for you. And this is no different from regular coding. There are plenty of different methodologies for organizing projects, managing project life cycle, different languages, frameworks, and platforms. There's no one way to make software that everybody can agree on.
  - amw-zero
    
    Isn't this exactly the same with everything in programming? Example:
    
    Unit tests are the ideal. Keep the unit under test isolated from as many things as possible.
    
    Integration tests are the ideal. Test large collections of modules so as not to lock down too many internal interfaces.
  - viraptor
    
    The counter-argument: it's like onboarding docs for a new engineer. Except onboarding docs don't need updates every few days.
    
    Not sure I agree. If you have a new engineers joining for every task, you will need to refine (rather than update/change) your onboarding, because each time you'll touch a slightly different area with slightly different requirements. And you weren't prepared for every possible question in V1. This is exactly what happens with documents/instructions/checklists that many people go through. In many places there's even a policy around the recurring policies review and update tasks.
  - ucirello
    
    It reminds a little bit the transition from Desktop apps to Web 1.0 in the middle to late 1990s. I read a very similar argument in an IT magazine back then. It seems what the author is reporting is closer to a transition in the zeitgeist rather something specific to AI. And I think I can say the same about the transition to smartphones - people jumping into a new category, trying to establish practices and reputation, and occupy mindshare.
    
    Good reading though - but the central argument, in its shape, seems more of a reflection than a breakthrough.