What the hell is going on right now?
89 points by alexpls
89 points by alexpls
There’s this combination of “I’ve seen a lot of newcomers use AI really really badly, and this causes me cost and annoyance. Also, I don’t understand how anyone finds AIs useful.”
As an experienced dev that reviews every PR before they make it and would never post slop they don’t understand, AI is genuinely really useful and good to me, and a huge timesaver. I fully agree that the way your newcomers are using it is harmful to them and you. But that’s not an intrinsic property of the tool.
As an experienced dev that reviews every PR before they make it and would never post slop they don’t understand
I used to feel this way until I had the experience of reviewing/merging AI code that tricked me into believing it worked because it looked correct and had subtle flaws. Several times. In ways I do not see humans do.
My perspective now is that AI code cannot be trusted even when reviewed with a lot of experience. It changes the act of reviewing from a last chance to catch something to the primary mode of development. You have to work backwards to understand the behavior of ai generated code, because the human contributor spent much less time reasoning about it up front. And it’s very easy for ai to write convoluted systems to implement something that could have been done with higher simplicity. Reviews often forgive that complexity, because it can take time to unwind what AI did to expose what underlying simplicity it missed. Then that complexity multiplies, and becomes unwieldy for new features and maintainability. I maintain more apps than I build green field these days, and that cost is hard to bear.
Then, relying on LLM to describe ai generated code results in not have to have experts or institutional knowledge. Leadership might like that, because employee retention in tech is expensive and many companies orient around hiring and turnover rather than employee loyalty and training.
My experience is that writing code with AI is impressive on the surface, but falls apart in practice for the problems I encounter in my work. I think some people find success with it in niches, but a lot of vibe coders don’t recognize all the time bombs they are setting. Or maybe there’s a new expectation that it’s better to rewrite apps than maintain them? I’m not entirely sure how my own experience seems so at-odds with the enthusiasm I see from some folks.
I will probably keep trying, but so far it’s been a disappointment that I don’t think should be reflected back at me that “I’m doing it wrong” if my results don’t match the hype.
I used to feel this way until I had the experience of reviewing/merging AI code that tricked me into believing it worked because it looked correct and had subtle flaws. Several times. In ways I do not see humans do.
Last week I needed to rewrite something to avoid a buggy parser library, and the bug is in production so I’m feeling the time pressure and I try asking the LLMs to rewrite it. And my immediate reaction is that these pages of highly branching code feel very over-engineered, but at the same time it kind of makes sense (and hey it’s trained on the Internet so it’s probably a common solution, right?), and when I try it (after editing it until it compiles) it almost works and I delve into it to try to paper over the various cracks and see if I can’t get it finished before 4pm and I’m just getting this looming feeling of digging myself into a hole. Three hours of tunnel-vision later I have to leave the office to go carry in the wood before it starts raining. And after some time away from the screen it dawns on me that the actual solution is a simple, very understandable 10-line intermediate function that lets me reuse my old code with the new library.
Just like “never deploy Friday afternoon”, there should be a principle like “never ask an LLM for a hotfix”
I fully agree, but in my experience reviewing AI code for design is actually vitally important because AIs tend to be really bad at making scalable designs by thinking in advance, I suspect because RL horizons are too short to learn those lessons. Plus, if you lay in an architecture the model tends to stick to it, it doesn’t tend to feel the urge to refactor even if the code structure is getting truly painful. So that’s very much something that you as a reviewer have to watch like a hawk. Personally I tend to enjoy this process a lot, so it’s nice for me, but I don’t expect that’s universal anymore than the other way around is.
Usually when I’m arguing pro-AI it’s in response to something like “I don’t see how anyone can like this or get a benefit from it.” Some people like review and refactoring!
But isn’t design the fundamental difficult problem of writing code? For me, the design is the very hard part and the rest of it comes as fast as I can type it. What’s the value of AI in that case?
There’s also research and sheer volume. Learning a new API. Design. For instance, refactoring that requires you touching parameters on a dozen functions goes from several minutes of grepping to one order. And Claude is just straight up much better at visual design than I am.
LLMs have perfected the superficial style, but the substance behind it can be a hit or miss. This makes code reviews harder.
When you see “how is babby formed?”, you immediately know it’s not good, but an LLM trained on this will emit “Origin of offspring remains an open question”. They do the same with code.
programmers love this argument! They love it!!!! Someone says “This thing causes this effect very frequently, and before this thing existed, that effect didn’t happen”, and they’re met with “well it doesn’t cause that effect when I use it, therefore that thing does not cause that effect”. It’s the old two-punch combo of “only my problems matter” and “that’s a skill issue”. It is truly remarkable how many people that work on systems cannot engage in analysis of systemic effects, either because they are unable or unwilling.
“therefore that thing does not cause that effect” yeah except I didn’t say that at all, merely “that thing does not inherently cause that effect”. Note how I agreed that AI was causing harm.
Here is where you assign the blame to a lesser programmer:
the way your newcomers are using it is harmful
And here is where you deflect blame away from the system:
But that’s not an intrinsic property of the tool
Splitting the difference, I agree it is not an intrinsic property of the underlying tool, but I also agree that it is an intrinsic property of the product (in the sense that includes how the tool is presented, documented, and promoted).
That’s a great point. There’s always this sort of tension about whether technologies are ethically neutral. But, at least some of the time, what we’re actually discussing isn’t whether a technology is ethically neutral, but whether the product(s) built on top of it are ethically neutral. Taking it further, does the socioeconomic / cultural context promote unethical products based on a given technology?
True, but we need to be more explicit about the distinction. Especially here where «it will break, but you get to keep both halves» is an ethos close to the hearts of many of us!
Inherent consequences of the tool in proper use and inherent consequences of the product marketing campaign are different and discussing them using the same words goes nowhere, as we see more often than we’d like to.
A claim «I don’t understand how anyone finds AIs useful.» is a pretty reasonable claim based on the product as advertised, but quite understandably surprising to those who distrust both the tool and its wrapper but search for ways to extract utility out of an unreliable tool, sometimes despite the tool producer being strictly worse than zero help. (Those ways are sometimes found)
It seems to me that some of the people on the LLMs-sometimes-useful side more or less pay for the product to shake the access to some edges of the underlying tool out of it, and intentionally «hold it backwards» because the advertised hold looks cool but cuts the user.
Some people (me included) use some local-only LLM workflows but avoid hosted ones (for reasons such as rug-pull-insurance, incentive considerations not completely unlike even if sometimes more cynical than some of the ethical considerations you cite, and data control).
(It also sometimes feels like there are hidden parameters like reading:typing speed ratio — some people are clearly at 1:1 and some of my workflows make absolutely no sense to them)
(The emotional attitude to the closely related and definitely preexisting dysfunction shining in a new colour under the light of LLM also differs…)
Here is where you assign the blame to a lesser programmer:
I don’t see where I’m assigning blame to them there.
You can engage in analysis of system effects and still respond to the point that “I don’t understand how anyone finds AIs useful” by reporting that you find them useful.
Programmers love “this argument” — the actual argument, not your strawman — because too often, people are quite imprecise in their criticisms, and if I believe something is useful, of course I’m gonna want to isolate the harmful aspects of it in any given discussion, while trying to figure out a way I can still benefit from the useful aspects of it.
Of course, as @FeepingCreature has already responded, they never even made the argument in the way you strawmanned it anyway (and I haven’t seen anyone make it in that way). But it’s not surprising to see imprecise argumentation when the argumentation is essentially making a case for programmers to stop being so precise because “systemic effects” (as if that discussion is precluded by precision in reasoning; I’m not asking for a formal proof level of rigour here, but at least don’t prevent someone else from being more precise by making a distinction that they feel matters).
You then completely ignore the fact that you clearly misrepresented @FeepingCreature, and instead of apologising, you want to jump to the next point of them “deflecting blame away from the system” — which is a completely different discussion to whether they said what you claimed they said — with as much force as you did your first misrepresentation.
As for @FeepingCreature, I would like to mention, I didn’t see that article anywhere saying or implying that the harms they’re experiencing are an intrinsic property of the tool. Saying “I’ve seen a lot of newcomers use AI really really badly, and this causes me cost and annoyance” and “I don’t understand how anyone finds AIs useful” are perfectly compatible, and don’t imply that the harms are intrinsic to the tool.
Also, I didn’t see the latter even said by the article. They reached a conclusion that AI is not useful, they didn’t say that they don’t understand how others can disagree.
The discrepancy in view points on AI effectiveness is between people using AI to do something they are capable of doing themselves vs people using AI to do something they cannot do themselves.
That doesn’t explain people like me, or Mitsuhiko.
Maybe I’m misinterpreting this comment, but is this a difference between construction and verification? That is, are you pointing out that you and mitsuhiko are using LLMs for things that you couldn’t, today, construct, but can verify? Or are you saying that you’re using LLMs for tasks which you can neither perform nor verify?
My understanding is that my parent is saying “people only think AI is effective if they use it to do things they cannot do on their own,” and I am saying that (speaking mostly for myself but also from what I’ve seen of what Armin is doing) that we both can do things and still find AI effective at doing those things.
Another read, and funnily enough one I’d subscribe to, is the opposite: AI is maximally useful for people that have more knowledge than the AI on a topic, hence they can take the automation wins while still detecting and addressing any mistakes, including getting the AI unstuck.
Those who don’t know how to accomplish the task they set out to do on their own won’t be able to successfully have a “good” result in the first place, because they lack the knowledge to guide the AI and backtrack when needed.
This opinion is why I’m concerned about the state of the industry in the future, where experienced developers are still wanted, buy juniors aren’t, thereby shutting down the pipeline of future seniors.
That was actually my reading so this thread made me very confused until you showed up to point out there are multiple ways to interpret the ancestor.
People who know how to do a task themselves:
People who don’t know how to do a task themselves:
I think there is an interesting aspect not explored a lot. The general narrative is that AI is making junior devs dumber as they have become helpless. I think that’s a totally legitimate concern on both an individual and systemic level. I think there are a lot of risks - from completely unmaintainable code bases or a lack of juniors, and that’s restricting the risks AI poses to just software development alone…
But humans are amazingly versatile creatures. Therefore I suspect there are very different learning paths to becoming a competent developer that AI is opening up. Weird ones that might be difficult to comprehend. The understanding of software that such developers have might be very different from ours, but perhaps it will be effective. Or, perhaps what ends up happening is a convergence of understanding, just through a different path.
It’s hard to tell. It’s just a hope of mine.
This is what I meant, you’ll feel much more effective using AI if you already understand the concepts you’re working with.
Well the new juniors will be people who can successfully learn while using the AI to generate solutions. I think the generation of coders who start today are going to have perspectives radically different from all of us working today.
I definitely share this concern, though I am not sure that I actually believe it to be true. Only one way to find out, I guess. All I know is that I don’t see a correlation between skill and enjoyment here, I’ve seen examples of all four quadrants. Or at least maybe there is one in the large, but that’s not super clear to me.
Ah, I was interpreting the post as saying that the people who find value in LLMs were the ones using it to automate work they could have done themselves, but didn’t want to. That seems to match your statement. Anecdotally, it matches my experience as well. However that may also be a product of my coding style. I usually try to set up a “framework” where the only natural way to write the filler code is the correct code, so that may help keep the llm productive.
It sounds to me that the author is working in a bad engineering culture in general and AI has just made this more visible. If you think sending massive commits for review is a good idea (or at least not a bad one), you’re already showing some real inexperience. Bad PRs are probably holding the LLM wrong: you are pair programming, you’re not giving it a one shot prompt and sending it out when it compiles. It’s an ongoing conversation where you’re getting time savings because you can just let the agent do all the typing.
I haven’t handwritten code for 2 months now. I used to enjoy the act of coding, but as my career has progressed I don’t really get a sense of satisfaction of having chained a long Java stream to get just the right data type at the end. I just like solving problems and getting things done now. If I want to have fun I’ll go learn more about different languages outside of work like Lisp and Haskell.
One thing I do is play the model off against itself. I vibecode the solution, then I review it myself and perform multiple rounds of refactors (which I would have generally avoided, I would have stopped at about one round usually), then have the model assume a code review persona instead to root out the final issues. The code I end up with seems about the same as I would have done, but with significantly less frustration and less time. Oh, and more unit tests :)
I do wonder whether vibecoded code is more understandable to the model. Then over time the generated code actually improves because the model can tell what is going on with accuracy already. We may just be in a transitional phase.
I do wonder whether vibecoded code is more understandable to the model.
This is my whole premise for why vibecoding is a good idea. We have a tendency to secondguess good style by optimizing for a human, but the LLM to some extent knows what works for itself better than we do because of its RL training. With an AI as the primary driver, code is written mainly for the AI to read and only secondarily for the human to read or the computer to execute. Does that mean you should never refactor AI code? I don’t think so, I think a lot is still inexperience or inability on the AI’s part, but I don’t sweat small stuff like paren style or “too many comments” these days. The shorter the scope of a style change, the likelier that the AI has some inherent motivation for it.
I do wonder whether vibecoded code is more understandable to the model. Then over time the generated code actually improves because the model can tell what is going on with accuracy already. We may just be in a transitional phase.
In that case it is a relevant question whether the code is more understandable to the model (as in, updates that currently go without renaming already break the spell), to the model family (yay, new and exciting kinds of lock-in), or to all the models with the same rough shape of the dataset (which might converge universally)…
Yeah, I’ve got to say - I poked at using claude finally this week, and I’ve been rather impressed with it.
it does require some critical thought to ensure it doesn’t go down the wrong rabbithole… but if you’re able to do that, it’s a really effective and efficient refactoring tool, great for generating tests (that might need fixing, but it’s the boilerplate that idgaf about), and I also never need to waste brain cycles switching into local contexts to address a linter again, most likely.
I can easily see how if you’re mostly just surface-level prompting and accepting most proposed changes, you’ll end up with a mess. It can’t reason about systems and their interfaces and implementations long-term, but there is value in it being a complementary tool alongside linters and other comparable tooling, that needs to be met with a proper code review approach.
Another friend commiserated the difficulty of trying to help an engineer contribute at work. “I review the code, ask for changes, and then they immediately hit me with another round of AI slop.”
when this happens I don’t give the next round of feedback until the next day. And if I see multiple PRs waiting for a review, I always review the work of my colleagues who I know write their own code first.
What sticks in my craw about contributors like this is that if I’m doing review, and they essentially feed my comments to an LLM to put back up for review… I am doing their work for them because all of the work is happening between the llm and the reviewer at that point. Plugging my feedback into an llm is not work - that’s just an inefficient way to prompt.
Those contributors start to feel like you’re gatekeeping and slowing them down arbitrarily because they aren’t engaging in a collaborative process with you, they are trying to get ai to appease you with minimal personal effort.
I feel lucky that I haven’t run into this much – just a couple times – but it feels like a lot more is coming.
It makes me think of Curl battling ai slop CVEs, where the effort necessary to understand and detect/correct slop is way higher than the effort to submit it. That effort imbalance is a problem.
Even if I have to do a few independent internet searches and read through a few dozen stack overflow and doc pages, my own conclusion is so much more reliable and accurate than anything an LLM could ever spit out.
I believe this author might be using the tools wrongly and blaming it instead; in my case, and in the situation of multiple developers I already discussed with, it’s completely the opposite. I don’t even remember when it was the last time I used stackoverflow for everything other than a “confirmation” of an obscure documentation.
I believe that depends a lot your experience but if your tools are quite “new” with good documentational support, you can skim across all the documentation in less than a search and you get pretty accurate results from that.
One good example of an usage of this that really plays a good role: agent code reviews; you do your job, you write good tests, you push, you wait for the review, and you revisit. For us in our company has been a game changer, usually we did reviews but now with this not anymore. It catches a lot of minimal non “critical” things that might be annoying to debug and things like that (e.g: you change functionality of a method but forgot to update the summary doc)
Again, this is not a saint to me whom I pray every day, but it’s just a fancy indexer that I use