AI Is Like a Crappy Consultant
39 points by krig
39 points by krig
I submitted this story to hopefully have a conversation about this.. personally I don’t and never have used any kind of LLM coding assistance. I don’t even like using Intellisense/autocomplete, I find that it tends to interrupt my flow. I know that I am in the minority here though, so I don’t expect most people to agree with me.
However, this part of the article really bothered me:
I did find one area where LLMs absolutely excel, and I’d never want to be without them:
AIs can find your syntax error 100x faster than you can.
They’ve been a useful tool in multiple areas, to my surprise. But this is the one space where they’ve been an honestly huge help: I know I’ve made a mistake somewhere and I just can’t track it down. I can spend ten minutes staring at my files and pulling my hair out, or get an answer back in thirty seconds.
To me, this seems like it would really be a bad idea in the long run. Those situations where I feel stuck and really have to dig deep to figure out why something is broken are exactly the moments where I feel that I am gaining a deeper understanding of the problem I am trying to solve, the tools I use or the programming environment in general… If I have a cheat code that lets me skip all the hard parts of the level, won’t I just remain a beginner forever?
The kind of syntax error I’d expect LLMs to help with are the ones like missing a closing brace.
If you use a conventional parser, it will probably tell you that the error is at the end of the file. It’s often completely syntactically valid to keep opening and closing brackets (or braces) within a block, but it’s wrong. If you run something like clang-format, it will incorrectly indent everything after the point where you missed the closing brace, but that isn’t always obvious.
In contrast, a probabilistic model that’s trained on a load of code with braces in the correct place probably has a pretty good idea of where the brace should be. If you asked it what was wrong, I wouldn’t be surprised if it could, with high probability, point to the missing close brace location.
My feeling is that using an LLM is like having access to a walkthrough when playing one of the old 2D adventure games.
It becomes too tempting to quickly bail out and ask the LLM to solve even simple problems like a missing brace, but the price being paid is to have a shallow experience, which will come back to bite later.
But maybe I am wrong about that.
Except that instead of the walkthrough coming from one of the old reliable sources, it’s just something that someone else downloaded from a random site on the internet, and so might just be a collection of hallucinated slop that someone threw up onto the web solely to garner clicks.
But seriously, I think this is an interesting analogy. Thanks to lets-plays, I’ve confirmed that a pattern I had previously noticed in myself is actually quite common: namely, that once you give in a look up the answer to a puzzle, the threshold for looking up your second answer drops like a stone. And it’s not just laziness: the whole thing short-circuits your faith that putting in effort will eventually pay off. You stop trusting yourself, or the process, or both.
Using LLMs has made me a better programmer, because I can solve more problems faster and hence gain significantly more experience along the way.
I learn a lot less hunting for a missing brace for half an hour then if I spend that same half hour tacking three more interesting challenges.
My thing is, why allow broken sytax at all, ever?
Broken syntax is one of the things that makes programming actually-not-easy for-noobs. The moment you most need help from your tools they hang you out to dry.
In my formulation, the IDE just doesn’t ever let you make unbalanced braces (or any invalid syntax)
But then you get the nightmare where you type an open brace and the IDE inserts a close right away. Which is just terribly confusing and slows me down every time.
I have many of my own thoughts about that feature (positive and negative), but since I’m seeking out feedback what specifically slows you down and confuses you?
For me, it’s very jarring to see text I didn’t type suddenly appear, the cursor jumps to a new location and it throws me out of whatever flow state I was in. I never got used to that and prefer not to. Also I find that where I want the braces and where the IDE wants the braces differ (and if I don’t want to have opinions, I’d join a cult—I have strong opinions on how code should be formatted, and everybody else is just plain wrong). It took me years to get used to syntax highlighting as for the first few decades of my career that wasn’t a thing.
I can also answer this. I really enjoy using intellisense and a lot of IDE features, but i despise any feature that does work unrequested (like inserting text). the only exception to that which i love is “new line keeps indentation”.
i disable these features because corrective actions break the flow and the train of thought and slow me down more than a single keystroke less can speed me up.
the one feature i really really despise and m, colleagues often see me enrage when i use their machines is “auto-surround”. luckily, it also supports my argument here:
instead of deleting, i usually select and overtype, as it saves me a single keystroke reliably. but if auto-surround is enabled and you type one of the magic characters, you suddenly have:
this means, i now have to repeat my first action (select text), but with more scope. also i now have to press delete, then type the character i wanted, then delete the newly inserted auto-pair, because the IDE thought, i might wanna end the string with a repeated-doublequote.
on the other side, of of the coolest features i’ve seen in the intellisense stuff is that vscode or visual studio can now find the thing you want based on the capitals. this saves so many key strokes, aka: it makes the flow uninterrupted by searching the fifth entry
I don’t expect typing something to cause the insertion of text at a position where my cursor is not. So usually one of these things happens:
and I’m sure other failure modes different IDEs have hit me with over the years
Yeah, closing brace autoinsertion really needs an indication in the IDE’s text model that this was “virtual autoinserted text” and if you manually type a }
it should clear that flag instead of inserting the brace. And NOT swallowing it when there isn’t such a marker.
I suppose the “have to stop and think what should I do now” point could apply regardless of editor behaviour (depending on what the user is used to), but the following behaviour avoids all the others:
(
at foo│
would give foo(│)
)
at foo(│)
would give foo()│
This avoids your three concrete examples:
│
, press {
to get {│}
, press }
to get {}│
, press left arrow to get {│}
│
, press {
to get {│}
, type stuff
to get { stuff │}
, press }
to get { stuff }│
func(1│)
, type ,
to get func(1, │)
, press (
to get func(1, (│))
, press 2
to get func(1, (2│))
, then press )
to get func(1, (2)│)
, then press )
to get func(1, (2))│
I’ve been using Emacs this way for so many years that I can’t remember which settings/packages I use that implement it. It can occasionally do something I didn’t want, but I’m struggling to think of such a situation (and it doesn’t apply when cut/copy/paste, so unwanted insertions can always be “cut” as a last resort (AKA kill/yank in Emacs)).
In contrast, the way it handles quoted strings is much more annoying, since a quotation mark like "
is both an opener and a closer. For example, if I have foo "bar│ baz"
and want to break that string into two, I’ll try to close-off the first string after bar
by pressing "
, but that gives me foo "bar\"│\" baz"
. Very annoying!
Se editors do you suggested solution which prices my above:
I’m editing something with func(1) to func(1, (2)) and end up with func(1, (2) because the editor swallowed typing a ) when a ) was next
I briefly tried a structured editor with exactly brace balancing code you describe. Fortuitously enough, the next day my colleague submitted a PR with a bug caused by forgetting a closing brace. Seemed like the perfect opportunity to examine the advantages of structured editting.
Except that I could not actully fix the bug. Any attempt to add the missing closing brace immediately added an extra opening brace. I wasted an entire afternoon attempting digging through documentation and forum posts trying to find a way to add the single closing brace to the file. It simply wasn’t possible, so I wound up abandonning the dream.
That just means it needs to shift code further left. If your colleague is able to check in something that is text but not valid code, then you’re collaborating on text.
I’m interest in tools that allow people to collaborate on code without having to worry that their code might just turn out to be text.
Demanding that everyone else must use the same tools you do, and that nobody else is ever allowed to make a mistake, are great ways to impact organizational morale.
Of course everyone has to use the same collab tools. Git for example. It isn’t my place to force anyone to use anything they don’t want to, only to offer tools that are helpful and that maybe people do want to use.
Any attempt to add the missing closing brace immediately added an extra opening brace. I wasted an entire afternoon attempting digging through documentation and forum posts trying to find a way to add the single closing brace to the file. It simply wasn’t possible, so I wound up abandonning the dream.
Wow, sounds horrible. If we’re interacting with text under-the-hood, then I think there has to be an escape-hatch. In my Emacs setup, auto-insertion is enabled in prog-mode
(which all programming language modes are derived from); but cut/paste don’t trigger auto-insertion, so I could have inserted that single brace by pasting it (and likewise I could delete a single brace by cutting it).
Visual Basic did this and it was a usability nightmare. It’s quite common to start typing something on one line, but then need to change something elsewhere. So you stop in the middle and go to the other line and then come back. VB would helpfully tell you, you that the line you moved away from was invalid and move the cursor back.
Imagine a word processor that forced you to finish the sentence you’re typing before it let you edit anywhere else. It was even more annoying than that.
Yeah I know there have been a bunch of halfhearted attempts at semantic editing over the years. I wouldn’t be able to use anything as broken as you’re describing, but what you’re describing is a system without embedding gaps. This limitation has no need to exist in a well designed and architected structure editor. We just put a gap in where we know there’s syntax missing. You’re fully free to move away and come back.
If I write
if foo {
bar();
}
and then I delete the first line, what happens?
You’re thinking about it like text. There is no “delete a line” in structural editing, there is only “delete a structure.”
Haha Cory in before me. Yeah that’s basically it. There’s not even a “select the first line” in structural editing, though you certainly can drag your cursor over that line to select all the syntax nodes your drag operation intersects with.
The best current structure editor to give you a feel for what I think is ideal UX there is Pantograph
More complex refactorings can be harder. If you have an if with an if/else inside it, and want to pull things out so that it’s one level deep with if, else if, blocks, then the easiest way of restructuring this (given the things that you’re moving around include the if conditions and adding blocks) will involve some invalid states. The fact that the editor lets you create those is useful in this and more complex changes to the structure.
Yeah, and you can do all that in an editor which treats a syntax node like a lego brick – which is to say it always gives you the freedom to pull the connection apart, leaving behind a fragment of syntax and the gap where that node used to be.
Because each node/tree contains all its context baked-in, subtrees of program can float freely outside the main document tree without losing their identity. You can have the text [a]
floating unattached to the main tree, yet retain the context that that fragment of code is actually a regex character class matching the letter “a”, rather than an array of one item stored in a variable named a
.
I’m also not working from scratch here. I stole loads and loads of ideas from HTML and JSX. Think about it. Can you use the JS APIs to edit the DOM tree in such a way that it has unbalanced tags for example? (No.) So clearly some kind of handcuffs are on here that are preventing you from destroying the tree layer, and yet the system as a whole remains quite expressive. The DOM tree is a solid foundation on which to build, and the expressive power comes from higher-level layers, from d3 to React to Web Components
When people are editing hand-written HTML, they don’t use the JS API; I agree that structured editing is easier to reason about programmatically, but that’s not the thing people usually do in editors.
One transformation that’s difficult to express in a purely structural way but very easy to express in a textual way involves method chains:
let foo = bar
// swap the next two lines here
.baz(1)
.quux("two")
.something();
Generally ASTs are set up so you have one Method
node or what have you that has the object, method name, and args. So you can’t select .quux("two")
without selecting the entire chain up to it. I’ve run into this quite a bit using structural editing features in editors, as well as when trying to delete individual calls in method chains.
Even more than that, consider replacing that whole expression with this:
let foo = bar(
baz=1,
quix="two"
).something();
From a text perspective, the transformation is super obvious - I’m converting parens into =
, I’m removing periods, and I’m wrapping the whole expression in more parens. But from a structural perspective, trying to do this transformation without creating invalid nodes partway through and without having a custom transformation to handle just this case is quite difficult.
Hand-writing HTML still won’t render a page if it can’t figure out some document tree. There is no API, js or parser-based, that will somehow render a web page without having been able to construct some kind of HTML tree. That’s why you don’t have to check if document != null
.
As to the method chain problem, that’s great feedback. It’s something we’ll design nice UX for. We have the primitives to be able to work with it. The state you want to get into is having ·.quux("two")
and ·.something()
where ·
is my forum-post notation for an embedding gap. Then you can reassemble these left-associative fragments however you want.
since this is specifically about syntax errors, what deeper understanding are you hoping to gain from spotting them? is this not something that any editor with syntax highlighting makes obvious anyway?
Well, if the syntax highlighting makes it obvious it’s not really an issue at all and I don’t see what the LLM is providing in that case either, but for those situations where you really have to look hard to figure it out.. yeah, I feel like that’s where the issue really tends to be that I haven’t quite got the mental model in place, and that figuring out why things are broken is the path to really understanding what is going on.
If the tool helps you with this I say great, but in the long term it isn’t a use case for LLMs.
They just kind of shine an embarrassing spotlight on the fact that we don’t have good enough non-AI tools to find our own syntax errors yet.
Yeah, this seems like another way of saying “my compiler’s error messages are so awful that I need to turn to an external tool to decipher them”.
Forgive my slightly glib rhetorical question…
So why not program using the hardest methods available to you? Surely you would learn more?
I get that point of view. Well, my feeling is that the tools are a net negative, which is why I don’t use them. I guess my issue is that it’s hard to predict the usefulness of LLM-based tooling. It may give me something useful, but it may also give me something that seems correct but only adds confusion. If I am avoiding the friction that is a part of learning the code ”the hard way” but end up having to learn not only the code I had but now also the code that the LLM produced… I guess the value proposition just isn’t very convincing.
I guess I should also say that I also don’t like the idea of making myself reliant on products created by massive companies, built using a combination of copyright infringement, pure theft and slave labor. So I don’t think I would use them even if they actually were technically useful. Which I doubt.
Keep going. Almost there…
Sorry, that was unkind.
All computing is reliant on products created by massive companies, built using a combination of copyright infringement, pure theft, and slave labor.
No worries, I didn’t get your point anyway. ;) Yes, that is of course true, but it’s a matter of degree as well.. it’s not like all products are morally corrupt to an equal degree.
The AI companies are especially easy to dislike…
I think the moral panic around AI is more marketing than muscle.
The main complaint right now I see are around ownership and copyright, but if I read something, then write something based on what I read, that’s not a morally dubious action.
I believe the operation of an LLM is identical in this case, it’s training is very analogous to ‘reading’ and it’s output is very analogous to ‘writing’ and researchers who are actually working on these things will agree.
The secondary complaint I see right now is a surge in low quality ‘pull requests’, this phenomena is happening simultaneously across all domains… I.e. artists seeing much more low quality art, scientists being asked to review many more low quality papers (and cranks), open source maintainers are being asked to review much more low quality code.
I believe this is a real problem, but these low-quality submissions are coming from individuals who have always wanted to participate, but were otherwise unable to (for lack of time, concern, or experience). These submissions are not substituting good submissions, and I believe the individuals making them are making them sooner then they would otherwise (and this is done in good faith). This is a real problem, but it’s hardly a moral one.
The third moral issue I see raised relates to resource cost and consumption, but these systems are 10-100X more power / water efficient than a human would be doing that same work, and as they improve this metric continues to improve. Any increase in consumption comes from people immediately wanting more of something as costs go down (which is a problem well beyond the scope of any single technological innovation).
And finally, tech bros and the oligarchy, there are absolutely a ton of vampires out there, but like the previous point, this problem isn’t about AI. The problem space and it’s solutions are well described going back at least 3500 years, and there is no reason to believe it started there.
These aren’t strawmen, these are real arguments… Written without an LLM, though, could probably be written better had I felt like doing something here instead of musing while I drink my morning coffee.
I think everything around AI is more marketing than muscle, in every camp :D
There is definitely a difference between a person reading something and learning from it, and a company downloading and using data that they do not have the right to use to build a commercial product… I don’t think those are the same at all. But then society may decide that the value of AI as a product outweighs the claims of the copyright holders. Either way, I can only speak for myself here but to me it just doesn’t feel right.
Why is that a rhetorical question? People do exactly that, all the time. It’s called recreational programming. Write a useful program in INTERCAL, or Brainfuck, or god help you Malebolge. I guarantee you that you will learn a lot from it by the end.
(Of course, what you learn may not be something that you ultimately find valuable and/or interesting. Given that our lives are finite, it makes sense to be a little more selective in the obstacles that you choose to give yourself.)
Why isn’t vibe coding recreational?
I would be surprised to hear anybody argue that so-called vibe coding is not recreational. Certainly I didn’t.
Though perhaps there are two different shades of meaning of the word “recreational” here. There’s recreational as in doing something for its own sake, not to accomplish something useful, and there’s recreational as in casual, easygoing, lackadaisical. My sense is that “vibe coding” leans towards the latter, and is in fact often done with the goal of making something useful. Meanwhile, “recreational programming” usually means the former. (Anyone who thinks recreational programming is easygoing has never tried to parse the results of a code golf competition.)
I think we are talking past each other. Recreational programming is programming done for fun instead of for money.
An example where an LLM found a typo and figuring it out on my own wouldn’t have given any extra insight: a Python function (scikit-learn’s KDTree constructor) expected a keyword argument that’s either a list or None.
I had passed in False instead and everything ran fine but the clustering algorithm returned just a single cluster. The typo was obvious once I saw it of course.
This resonates with me. I’ll often know that an LLM will spit out the answer I need immediately, but I’ll figure it out myself because I worry those skills will atrophy if I don’t use them. But it’s conflicting because I’m intentionally wasting time doing something that’s been automated.
This resonates with me. I use coding assistants, but I always read the output completely. Recently I am using it mainly for the frontend of a personal project. My colleague (who has only basic programming knowledge) wanted to try vibe coding the most of it. Initially the output was much better than I expected. It looked nice and somewhat functional. But then he quickly hit a wall where it wasn’t able to do very simple tweaks such as fixing visual bugs or changing the layout slightly.
So we gave up on it and I started writing it myself with using a coding assistant. After a while, I found random suggestions very annoying. It saved time some in some cases, but most of the time it suggested things I didn’t want, which broke my flow since it distracted me. Eventually I settled on disabling autocomplete and only using it via inline chat or as a regular chatbot. The most value I got from it was when I decided to migrate from a component library to another. In that case it sped up things a lot, though it also hallucinated some things and used random libraries. I always read the output myself and never trust only the observed program behavior.
I think this story hits really well on where we are in the adoption curve… Right now you will adopt AI if one of these two scenarios are met.
Scenario 1 - You are too naive to do otherwise and you’re going at it with as much vigor as you can muster. (Vibe coding with no programming experience).
Scenario 2 - You are experienced enough to work successfully with a junior engineer and frame your requirements in plain english such that a junior can produce good output. AND You are enough of an early adopter (or unfortunate try hard) to keep experimenting with an interesting new tool for MONTHS until you understand how it benefits you. (This is important since these tools are improving at a breakneck pace, literally day over day).
I heard something similar but can’t remember where: Treat AI like an over-eager pair programming who can type really fast and thinks they know more than they do.
I don’t know if this is really a counter-example, but I was able to get a thing done in 30 minutes yesterday using Claude, when I wouldn’t have bothered automating it at all otherwise because it would have taken me too long to figure out.
I had a set of ansible playbooks/roles/tasks that I use to set up VPSs for things I want to throw online to prove out a concept. I used to have a set of hand-rolled shell scripts to deploy my applications to those VPSs, but for (reasons) I want to start using kamal instead. An astute observer might ask whether I could skip ansible and just use kamal, but I can’t do that because I want to configure some aspects of the servers that kamal doesn’t attempt to handle, I still want to automate it, and I’m not comfortable putting the servers on the internet without that.
There were some details of the configuration (UNIX usernames, uids and gids) that I needed to keep in sync between the two tools.
My skills with ansible’s yaml DSL are not great; I’ve picked up enough to get things done, but the things I need to do with it are simple-ish. My skills with embedded ruby (erb) are non-existent. Ruby breaks my brain. I was able, inside of 30 minutes, after uploading the relevant ansible playbook, inventory definition and task definitions to claude, to get the facts I needed to be common between ansible and kamal into the inventory and get kamal to use erb to read the inventory and keep the facts in sync between its deploy engine and my server config.
I don’t have the energy or the need to become a real expert in ansible or kamal. Without Claude, I’d have manually kept the relevant values synchronized between the two. But in the amount of time it’d have taken me to configure and document both (documenting the need to keep them synchronized might’ve pushed the manual version more to the wrong side…) I was able to automate this. And I was able to review the stuff Claude generated even though it’d have taken me hours of research to figure out how to write it myself.
Writing it myself would’ve taken too long to be worthwhile, and wouldn’t have garnered me a deeper understanding of any problem. It’d have developed skills that don’t pay off for me and would rot before I’d need to use them again. Using the AI to help me with this (admittedly non-core) task made it worthwhile to automate this thing and make it more robust for me.
i’ve yet to get nice code out of an LLM. I’ve tried really fairly hard in various exercises and it could never relieve me of any tedious or difficult task.
If AI’s could take in a lot of context and interpret it properly I imagine using them to refactor or generate CLI code from an API.
The LLMs I used failed at even the most basic of CLI to API code generation that I just gave up entirely
Likewise, I have spent a lot more time trying to redirect co-workers who are mislead by LLM output than time saved.
It’s often good for a laugh tho!
Having written a handful of scripts using chatgpt, kernels modules, etc, I can confirm if you don’t know what you’re looking at (i.e. the api is foreign to you, the problem domain is new to you) you’re going to struggle.
And it might feel like learning, however, without an intentional deep dive of the generated code, API fluency, design pattern fluency, domain knowledge fluency, you’re going to feel lost and not much better off than if you spent an active role learning about the technology first. Then come back and whisper clear, succinct, unambiguous requirements into the assistant’s ear.
If you give a contractor or jr dev a nebulous requirement without understanding the technology, does that make you a product owner or non-technical manager?
Whelp. I guess my internal dialogue reads AI text in Nathan Fielder’s voice now. I’m ok with this.
(He’s a comedian who had a show called ‘Nathan for You’ in which he played a really crappy consultant)