I used AI. It worked. I hated it
70 points by edsu
70 points by edsu
I let this thing into my brain, and now it is always there. For any new potential project, there is a voice in my head telling me how much easier it would be to let the model do it. How much faster it would be to simply describe the objective in a prompt and let go.
I do not want to let go, but I recognize the power of this pull.
I’m so glad I’m not the only one to feel this way. I’ve used Claude at work and paid for a month of the $20 plan to test it with some personal stuff and it annoyed me at the fact it actually worked. Now it’s just in the back of my brain like “oh just try asking Claude” and it’s something I’ve got to actively fight against.
I was there. The solution was trivial, for me. Open notes.md and write out my stream of thoughts on the issue there.
Then I can decide what I want to do next. Which is usually writing more, correcting myself.
What I was hooked on was not model doing anything. It was the instant clarity I was getting from writing it out and then iterating on it, with a rubber duck.
Just writing it down made me more productive. Easily double. It seems I am more efficient when I postpone the coding and spend some time getting familiar with the problem. Code then tends to mostly write itself.
I still one-shot generate tedious parts and then clean them up, but I avoid the iterative slot machine loop. CLI aichat is nice for that - you cannot easily resume after the one-shot prompt and you can attach reference files easily.
And sometimes I chat the model up like a librarian. Hey, is it possible to..? Is there a known sub-log algorithm to..? That part just works.
i had the same experience!! a work trial not-so-subtly said "here's $20, use claude or you won't be able to make things fast enough" & even though i didn't end up getting the job, i used the rest of the membership to assist w/ personal projects. it's just SO fast, that it almost feels worth it despite having a dozen or so instances in the month since where i've found a much simpler/more robust/cleaner solution to something by thinking for a bit and tinkering around with my own code. once i let my understanding of a feature or system or part of my codebase slip, it gets a lot harder to ignore that little demon on my shoulder telling me to forfeit it to the context gods
I hated writing software this way. Forget the output for a moment; the process was excruciating. Most of my time was spent reading proposed code changes and pressing the 1 key to accept the changes, which I almost always did. [...]
Yeah, no wonder they hated it. Approving every single change the model wants to make is miserable.
The next section talks about why they won't just let-er-rip, because they want to review every line. That's great! But you don't have to torture yourself to do that - you can still trust yourself to have the discipline without forcing misery on yourself.
I genuinely think that one of the biggest differences between people who enjoy coding agents and people who hate them is whether or not they run in YOLO mode (aka dangerously-skip-permissions). YOLO mode feels like a whole different product. I talked about that on a podcast just the other day.
I find this interesting, because IIRC you used to recommend strongly in the opposite direction... that these tools strongly needed expert engineers who reviewed everything with suspicion.
But at the same time, the shift doesn't feel that surprising; the combination of delay plus high level of output means that most people are going to just end up like the drinking bird on the Simpsons hitting the "y" key, which obviously people are going to tire of, so of course people will eventually gravitate towards YOLO'ness.
So does it need an expert engineer at all? Especially if in YOLO mode, the cognitive debt effectively spans towards a mostly-opaque-to-your-knowledge codebase. Aside from the safety aspects, what do you think this means in terms of the "it absolutely needs a seasoned engineer reviewing this stuff with suspicion" perspective?
One way or another, I think this is becoming a kind of pipeline in terms of how people are sold on these tools also:
FWIW I'm doing some hobby projects with Claude Code right now and I am still reviewing every line. I'm just not doing it per change. I prompt it, go do something else for a couple minutes, come back and git diff then either commit (even if I'm going to ask it to change something) or git reset --hard.
Reviewing everything with suspicion doesn't have to mean reviewing each change before it gets written to disk. You can wait for it to finish a chunk of work - a set of changes, often arrived at after a couple of rounds of iteration with the compiler - and then review that whole chunk just like reviewing a small self-contained PR. This is much less painful.
(Also, I disagree with @simonw about --yolo - I haven't found the need for it at all. I have auto-approved some ls and grep and similar such tools and it almost never ends up needing any others.)
Right - the review step can happen later and with larger chunks, especially as you grow confidence in the model that you are using and how it's likely to respond to your prompts and implement your more detailed specifications.
I do a lot of my reviewing in the GitHub PR interface now - I'll have the agent working a branch and review the PR.
You can even leave comments on individual lines and have the agent read those comments via the "gh" CLI tool.
I strongly believe that we as an industry need to get away from the gh PR flow, it was designed for and is basically suitable only for a world that no longer exists. The issue as I see it is that we have a scarce resource that is truly scarce (human attention) and one that is rapidly becoming omnipresent (code) and systems designed to manage the scarcity of source code are not fit for purpose in a world where code is free.
You're mixing up different concepts I think.
Running YOLO mode just means allowing more unsupervised steps in one go. It doesn't mean you're not reviewing the results carefully before actually integrating into production.
The same thing also doesn't mean you don't need to be an expert to both start that process with the right context or to actually review the results afterwards.
Jesus Christ! Running in a mode where your entire computer can be wiped? Or all your code stolen? Are you trying to sell us on this, or scare the living crap out of us?
If you run it in Claude Code for web you don't have to worry about that at all.
I'm still figuring out a sandboxing strategy for local usage that I like, but the options for that are getting a lot more promising these days.
Okay for having your computer wiped, but what about code exfiltrated? I know in the podcast you said you don't care because you work on open source, but what about those of us less unfortunate souls/fools?
You can protect against exfiltration by restricting the domains the agent session can talk to. Claude Code for web has a default allow-list with NPM and PyPI for installing packages.
As a major tangent - I wish there was a good default term for this - Is your default "Claude Code for web"? I oscillate between that Claude Desktop, claude.ai, Claude Cloud...
The official name is "Claude Code on the web" but I don't use that because it's a mouthful and doesn't sound like a product name.
I don't like "Claude Code for web" much more but it's close enough to the official name that people hopefully know what I mean.
I'd like to call it "Claude Code Cloud", personally.
I was experimenting with Claude Code the other day. It would generate some Python, I’d be impressed that it looked coherent, but then I’d want it to rename a function or reorder some code blocks. I found myself thinking, “typing out sentences to ask the computer to edit text seems tedious next to just using vim.”
Then I realized I was using it wrong. It is tedious to use Claude to edit text. I think you’re not supposed to use it to write programs, as in like the text of programs. It’s a tool for skipping the text part (or at least relegating it to irrelevance) and just creating running software from English sentences.
Anyway, that’s all to say I think you’re on to something.
Yup, you can still review the whole change/feature later, but obsessing over each step is counterproductive. Also, creating tests to validate the change rather than checking the code itself may be a better time investment.
Really on-point article, especially wrt to how boring waiting for agents is. I don't like using opencode or claude for help, but I'm doing it because it's the thing you're going to have to use in the future (until I pivot to dressmaking), and it's so boring waiting for it to spin up a plan for the three-tier architecture I know I want. I spend a lot of time playing Deadlock while I'm coding vicariously through agents.
It’s really only efficient if you can do multiple things at once, so talking to the agent is a background activity.
I’m currently doing a frontend (in browser) project within a larger existing codebase. I know the parts I know (sending data to an api and processing it) but I don’t really know the browser programming environment or js/ts well. Using Claude is definitely faster than me doing it by hand but the parallelization is limited so I’m stuck waiting. It’s also kind of fuzzy and approximate, like being stuck with only a beginner’s idea of how to program.
Strong agree, most of my gains have come from parallelism, which I've taken to the hilt as much as I can.
The challenge is that's an even bigger change than the one the author is describing -- you swap deep work for managing (juggling!) context. That's a massive gear change and for a skeptic a divide I don't think they can or will want to cross.
I'm curious to hear how you (and others) reach high levels of parallelization on real-life work. Are you letting clusters of agents loose on unrelated, isolated tasks? Or are you parallelizing the work on a single task?
I've really struggled with the latter approach. Maybe it's just the nature of the things I'm working on, or more likely it's a failure of imagination on my part, but most often when I try to break one of my tasks down into very granular pieces, they end up forming a dependency graph that's close enough to linear that I don't get much opportunity for parallelization.
For example, in my CRUD backend project, I can't see a useful way to let one agent design a database schema while another agent is hard at work implementing the database access code from first principles and another agent is writing the business logic that calls the database access code that hasn't been defined yet. Clearly I'm not breaking this kind of task down correctly for agentic parallelism, but I'm at a loss for other ways to structure it.
Personally I shy away from the complex agent flows. I feel that makes the parallelism harder. I also feel like too much long-loop agent-to-agent chatter eventually degrades. But also I feel I'm the outlier on that front.
For what you're saying: I'd say I have a lot of "chores" such as package upgrades and bugs that I'll run in parallel. As soon as I even think of something like "oh I need to" I'll drop it into Claude Desktop.
For everything else I'll not parallelize one feature, but have multiple running at once. As a very general rule I'd prioritize the frontend first. For anything complex I'll rough it out (sometimes multiple variants!) and ship under a feature flag for feedback. Once that loop is closed I'll go back, refine, and then do the remainder in 1-2 chunks. It's still iterative, but I don't try to ship large features in one hit. But I might have 5-6 features in the pipe at any given moment... it's RISC not CISC.
I'm also fine with just throwing things away. If a prompt goes awry I'll ditch it and maybe try again. Sometimes I'll prompt the same problem multiple times in parallel to force different outcomes or exercise different constraints (this might be as simple as the same problem with "and do it quick!" and "and do it comprehensively!" at the end).
I know it sounds wild, but we have others like Customer Success raising PRs now. That cuts out a surprising amount of pre-work. So a thread might be me just looking it, tweaking and closing out. A lot of people seem allergic to the idea, but so far it's worked really well for us.
To offset the massive context switching, I spend half my day improving the tooling around it. Again, seems wild, but it's working out for me.
In terms of my day, I structure around 20-minute "turns", with two turns an hour. I do miss the deep work, but in a very strange way my day is a lot more relaxed. Between turns I often call someone, clean the kitchen, prep dinner. It's been quite a shift for me.
Multiple repos and multiple feature branches in parallel. I already have a few dozen projects to manage so..
(until I pivot to dressmaking),
I'm considering a move to selling my home ferments and spice powders myself.
I've been moving recently, and one of my things has been tasking claude with a buncha stuff on my NAS i've been meaning to get around to, while I keep busy unpacking - for example, my auto-rip scripts to drive my libredrive UHD blu-ray reader (for cds, dvds, blu-rays, with a pipeline for doing the matching/organizing into my jellyfin against tmdb...).... I hate to say, but it does work, it's maintainable, and I wouldn't have gotten a lot of that stuff cobbled together otherwise.
I spend a lot of time playing Deadlock while I'm coding vicariously through agents.
yeahhhh that's the default for me if I'm working on something and don't have other tasks to keep me busy.... street brawl is a bit too perfect for it, I fear.
I find it not only boring, but distracting. it's too tempting to pick up my phone and scroll Instagram while it's churning, but I feel like my brain couldn't really quite handle multitasking with several agents yet? I haven't quite figured out how to have my servers running on different ports based on separate worktrees, but maybe if I figured that part out I could try to get better at multitasking.
Nuance? On the internet?! In this age?!
Sorry, hard to resist. Very interesting writeup. Regardless of personal preferences and alignments, I really appreciate this aspect of the article. It is deeply personal and a quality read - whether you can identify or align with the author on all points or not.
The fact that LLMs & gen-AI is the particular one of many available challenging topics chosen for the format is in my opinion entirely secondary in its value. Though obviously interesting given our lens.
Honest nuance is such a rare resource. Thank you for generously providing it. Leaving with inspiration and food for thought.
The title doesn't do the article justice but I'm glad I didn't skip reading it, this really resonated with me.
I'm maintaining a large codebase at work which puts me in the same position as the author, that I really want to understand the code the chatbot produces. For the few smaller experiments where I only cared about the outcome the experience of using these tools is much less frustrating, and the author is on point when they talk about the temptations that come with having used it. Or the temptation to just yolo it when using agents.
That said, I've had more time to socialize because the problem solving is not that deep and I don't hyperfocus on a problem. And then there is the constant distraction of having to wait for the model output, it's a bit like xkcd 303 but for chatbots.
You don't need to keep pressing 1. That's sound terrible.
I generally work in small, but semi-complete chunks of functionality, and review the previous chunk while the Slopus is working on the next one and/or are writing the next prompt.
Also I sandboxed the whole thing, so I'm not afraid of just letting the agent do its thing without babysitting every command.