I'm not consulting an LLM
40 points by lr0
40 points by lr0
They are still useful to point you to actual sources of information. If you don't know what you don't know, LLMs can provide a "field survey" with links and concepts tailored to your level of non-knowledge. Actually evaluating those linked sources is best done by a human brain.
Besides coding, I also often use LLMs to survey a field. However, it is easy to see what the precision is (follow the sources and check them), but hard to know what the recall is (am I seeing all the relevant sources/angles). So when using an LLM, you may be deprived more of alternative angles than, say, good classic textbooks.
This is also a fundamental difference to e.g. using an LLM for coding as an experienced programmer. I know what the code/solution should look like an LLM can just do it much faster and I'm correcting the LLM once the solution is not good enough, pretty similar to how you would give hints to a junior programmer.
Agreed, it's not truly automated expertise - just a form of search. Probably the best one left, but web search used to be a lot better than it is :/
Seeing all sources depends on the service, but it can be good. Perplexity is decent, but only reviews 10 or so sources typically. On the other hand Opus research often goes up to over 400 pages, which is more than I'd ever read manually. This is useful in case of unofficial stuff that lives spread across many forum posts. Like for example details about compatibility of specific OBD2 devices and a given car - there's just no way I'd be able to collect myself what I learned from the report. Not with all the "also someone mentioned that..." details.
The "I'm feeling lucky" button on Google was amazing when Google first came out back in 1998. It's amazing to think that prior to Google, there were at least two dozen search engines in active use, but Google was the asteroid that took them out---it was that good.
Of course, that was 28 years ago (sigh) and boy, how Google has fallen.
People tend to think that GPT (and other LLMs) is doing so well, but only when it comes to things that they themselves do not understand that well
I would say this is the fundamental gem in this post.
This implies the biggest argument against the current "You're going to lose your job"-narrative: if you're a specialist, LLMs can only reliably contribute if you are already an expert at your job.
I used to think that the claim from the quote is strong and an indication of something bad. But in practice, there's infinitely more things I don't know about than those I do. I'm certain LLMs can give me a better research answer about (for example) pearl diving, than I can find in the same time. If there are things that are wrong in its initial information, I'm as likely to run into those myself anyway.
It’s interesting how something can be both good and an impediment. Guy L. Steele famously said of Java, ‘We were after the C++ programmers. We managed to drag a lot of them about halfway to Lisp’: Java really is an improvement on C++ in a lot of ways. But it (combined with some other trends) also managed to waste roughly fifteen years of the industry’s time. It was better than its predecessors, but hampered the success of those which did even better.
Likewise it seems like LLMs are great at being mediocre or average. This is pretty awesome, because most people are average or worse at most things (which are generally normally distributed). But real advancements don’t rely on the mediocre but rather on the excellent — and so far, at least, LLMs don’t seem to be very excellent.
I think that if you're an expert (or very experienced), you can generally coax LLMs into producing expert level output. But that's a lot of coaxing. From what I've tried Claude Code, for every 15-30 minutes it spent implementing a non-trivial feature, I probably spent 4-5 hours interrogating it about code quality, design trade offs, bugs, test coverage, etc. This wasn't like a matter of leaving for 5 hours as it worked either, but a constant feedback loop with constant course correction, fairly heavily relying on my own domain knowledge and experience with the matter. It eventually produced what I'd sign off as high quality code.
You need to scrutinize design choices when you're programming manually as well (and the patience to do this is really what separates the good from the great), but it's exponentially more important with something like Claude because it is so fast. Get lazy for a few hours and you've easily added 20K LOC worth of technical debt to your code base[1], and short of reverting the commits and starting over, it'll not be easy to get it to fix the problems after the hand.
It's still pretty fast even considering all the coaxing, but holy crap will it rapidly deteriorate the quality of a code base if you just let it rip. It very much feels like how the most vexing enemy of The Flash is like just some random ass banana peel on the road. Raw speed isn't always an asset.
[1] Few weeks ago, the openclaw repo had 50% more lines of code than the ladybird browser. I don't know where it's at right now, but since they're merging on average 100 PRs/day, I don't expect it will have shrunk.
--edit-- had to check, they're at two ladybirds now.
I like this framing. Well-presented (“smooth”) possible-lies are not new; unscrupulous politicians spring to mind. It is easy for us to assume a critical and defensive posture when approaching a rival who is perniciously motivated. The new part is that our modern “oracle” literally does nothing other than smoothly present possible-lies; thus our challenge is to remain alert and to coach the lulled into critical-thinking-as-default.
Mild spoiler warning. In the series The Wandering Inn, the gods create a magic system called the Grand Design, which lets people easily acquire and execute magic and abilities that would typically be outside of their reach. This lead to people increasingly relying on skills to the point where they stopped grinding and learning manually, which meant that nobody would rise up to challenge the gods anymore.
With that being said, I do really think it's possible to strike a healthy balance. When you're first getting started with certain projects it's easy to get overwhelmed or there's a ton of boilerplate to get through before you can start tinkering with the interesting parts. I know I've definitely bounced from a lot of projects and lost interests because the initial hurdle for getting anything up and running was too high.
The article makes two good points:
But why, then, is the response "I'm not going to consult an LLM" rather than "I'm going to consult an LLM, and make sure that I respond to its responses with the same level of caution and verification that I would claims from any other source?"