Horseless intelligence
15 points by nedbat
15 points by nedbat
I love the car analogy here. There is a scientific consensus, more or less, that the humanity needs to stop using cars due to the pollution they’re causing; there is a way forward where everyone wins, pollution is curbed, the cars are available to those who need them to keep the world running. Yet we’re collectively choosing a worse-off future where humans have to rely on cars and can’t live without one. Something like that
That was very much my thought. We have a solution to the car problem, and it’s public transit, and relegating cars to special use cases that most people won’t encounter often enough to make buying one worthwhile. But we don’t solve it, because a lot of people have a vested interest in not solving it. Desire the fact that not solving it will kill them, too.
We have a solution to the car problem, and it’s public transit
I’m speculating and extrapolating a bit from my own personal situation in my smallish Canadian city… one of the reasons why I very rarely use public transit is because I have very little faith in our institutions to actually provide reliable and comfortable service. I can’t be an hour late for work if the bus I’m on gets delayed, and I’d rather not make the trip to work sitting next to someone who’s tweaking on meth and looking for a fight. Even if those two problems were eliminated, it still takes about 3x as long vs driving.
I’ve been a transit user in the past. When I was in grad school my apartment was right next to a bus stop that went straight to the “bus mall” where you do the transfers, and there was a bus that went straight to the university from there. There were occasionally issues with drugs or violence, but it was short enough and direct enough that it was easy to deal with.
I live in a Danish city which recently introduced a new bus line where about two-thirds of the route is on a bus-only lane. I’ve been late once in the last two years … but that was due to extreme overnight snowfall, which affected my car-driving colleagues even more. I’m very happy with this - and very grateful that the city administration managed to get it funded.
(As a footnote, parking here is so terrible that going by car often takes more time than bus, when you factor in getting a parkins pot.)
You and the other reply both nailed the crux of the problem: if the goal is to get the majority of the world to use public transit instead of personal vehicles, we need to figure out ways to make public transit at least comparable to personal vehicles s as far as quality of life goes. I would love to not own a vehicle at all and be able to rely on a system like what you’re describing.
There’s a bit of a chicken and egg problem there too. The city looks at the numbers around the current use of transit, determines that there isn’t enough demand to justify spending money to improve the system, and things stay the way they are.
How big and dense is your city?
I live in a 2M people city (6M metro), although it’s abnormally dense.
In my country, there are 10 cities with subways; two of them have about 200K people living in them, the rest are 400K+ I’m not sure how many of the non-subway population areas have good public transportation- maybe some do well with buses and trams, but certainly I know in some places in this country, most people need a car.
I feel public transportation is unsolved for small/non-dense population areas, and there might be more exceptions, but I guess in most cases if about some people wanting to apply solutions that are proven somewhere else.
Services generally are unsolved for small and sparse population areas. Everything – water, sewer, gas, electric, trash – is more expensive and difficult in spread-out suburbs. It’s often hoped that rising home values on new developments will allow this to pencil out in the long-term but the results can be tragic when it doesn’t. Flint water crisis is just one example.
Ahhh 300k but not particularly dense. The university (out near the edge of town) and downtown have the highest density. Everything else is pretty much single-family homes on decent sized lots. There are a few big box-style commercial hubs, all, of course, also located in the corners of the city.
Well, you live in a place designed for car ownership. There’s a lot of those, but there’s also a lot of places designed to not require car ownership. I think globally the % of people living in places that do not require car ownership will likely grow.
But I don’t think we can, or even should try to get to 100%.There’s tradeoffs involved. But hopefully, the % of people that lives in places that do not require car ownership gets to a point where the negative effects of cars are sustainable.(And hopefully, car technology also improves to reduce those negative effects.)
And on the other hand, there are literally millions of people who have public transportation they prefer to cars.
A large factor of this is population density, though. With very large population density, it’s very difficult for cars to work better than public transportation (not enough parking space), with very little population density it’s the other way around. (So what I’m saying is that perhaps a lot of people opt for public transport not because it works great, but because a car would be even worse.) (And there’s also the factor of people who cannot afford cars. But there’s plenty of people who can afford a car- or even own one- and still use public transport.)
But we don’t solve it, because a lot of people have a vested interest in not solving it
FWIW, I’d say it is the opposite: cars is what you get by default, as they don’t require coordination. Getting public transport requires a lot of vested interest to solve the coordination problem.
I’ll repost here what I wrote on Mastodon, in case it might spark some useful discussion here:
The technology is not going to go away. We will not turn our back on it and put it back into the bottle.
We, or at least those who aren’t being ordered to use LLMs in our jobs, can choose to turn our back on them and just not use them, and doing so might be the best way to reduce the harms. Also, I don’t believe you mentioned the harms that come from training on people’s work without their consent to create plagiarism machines. I think that’s the thing that most concerns me now.
But the anti-AI crowd also seems to be railing against it without a clear understanding of the current capabilities or the useful approaches
Yeah, I felt like this article came close to appreciating that not all sceptics are saying it’s a totally useless phenomenon, but rather one where the cost is not justified; the cost to people, for attribution of their works, and for avoiding an economic massacre, or avoiding even more industrialised surveillance at work, at school, and at home.
Also, one technology that we did put back in the bottle despite being of great interest to certain people was commerical supersonic aeroplane travel. The costs (in all dimensions) were too high for too little gain, so back to sleep it went.
It’s worth noting that OpenAI is making a huge loss even without actually having a valid license to all of their training material and compensating the people whose work they’re using. Their customers are paying a premium for novelty and, even including that and the fact that they have enormous externalities, they are not creating a product that has sufficient value to their customers to make money.
Anthropic, the author’s employer, lost around $2bn last year.
There are, in fact, a lot of things that have ‘just gone away’ when they were backed by companies that could not make a product shipping them.
The economics of LLMs are very interesting. The pure LLM companies (openAI, Anthropocene, etc) are burning huge amounts of money, and are promising AGI to keep that going. But Google and Microsoft are both in the game, and they can afford to be for the longer term. So I don’t think the cat gets put back in the bag just because the pure LLM companies go broke.
Even if they went broke right now, there’s been enough progress made to spur on a decade of AI startups aiming to automate away jobs.
The interesting one is how much LLM inference costs for these models. At the moment inference is being subsidised by investors, so we don’t actually know how much it costs to apply them to different tasks. Though (of course) that cost should decline over time with hardware improvement.
The open source models are pretty decent now too (though not at the level of the largest closed ones), and I don’t think it’s likely they’ll disappear. That they’ll be killed by the expense of inference seems like wishful thinking. My mid-range MacBook can run models that are maybe two years behind the state of the art. If anything that seems like the more fundamental problem with the OpenAI & Anthropic business model, that the technology will get commodified before they can become profitable. I wouldn’t bet much on any particular AI company, but I would definitely bet that LLMs will still exist in 5 years.
The term ‘open source’ is quite misleading. The structure of the models is often quite simple (and documented and easy to reproduce) but the weights are where the real value is. These are derived works of a huge pile of training data and reproducing them requires access to the same corpus (hard) and a lot of training. Even the cheapest models cost tens of millions to train, the larger ones hundreds or more.
Even the cheapest models cost tens of millions to train
I don’t think that’s true anymore? The OLMoE paper, for example, claims that the Allen Institute for AI trained OLMoE-1b-7b (a 7b mixture-of-experts parameter model with 1b active at a time) “on 256 H100 GPUs for approximately 10 days”. That works out to about $150,000 of GPU time at current cloud GPU prices (~60k GPU·hours at ~$2.50/hr). They released the training code and dataset, and 60k GPU hours is small enough that I could get it free via an ACCESS compute allocation, so even I could replicate a model of this size if I had a reason to (I don’t work in LLM training so don’t have a good justification to request the allocation though).
It is a small ray of hope that individuals have never had the resources to train their own models entirely from scratch; it means there are still relatively centralized points of failure for the dispersal of this technology.
The best scenario I can presently imagine for the future of LLMs:
Through a combination of regulators siding with copyright holders- making companies training LLMs liable for their copyright infringements- and a general failure of genai purveyors to produce sustainable profit, corporate investment in the creation of new models cools and grinds to a halt. A lull in LLM-adjacent career advancement opportunities causes the endless stream of breathless blog posts about “how to use LLMs correctly” to dry up. LLM enthusiasts and spammers (to the extent there’s a distinction) continue to play with their pre-baked open-weight local models, but as they grow increasingly outdated it becomes easier to identify the spam they produce and filter it from search results and browsing. The web heals.
Through a combination of regulators siding with copyright holders- making companies training LLMs liable for their copyright infringements- and a general failure of genai purveyors to produce sustainable profit, corporate investment in the creation of new models cools and grinds to a halt.
No copyright-based approach will stop Adobe Firefly or Getty/Shutterstock models, or any model that Disney decided to train on their massive backlog, because they already own the copyright to the entire training set.
Copyright, as you said, protects copyright holders, which is distinct from the individual people that actually created the works.
Sorry, I didn’t communicate what I was thinking too well. I think LLMs are here to stay. I was just pointing out that LLM use will probably decline when the true cost of using them is charged to the user.
In addition, the whole chain-of-thought thing is, if I understand correctly, improving model performance at the cost of making inference more expensive.
Even if OpenAI and Anthropic cease to exist, and Google/Microsoft stop working on LLMs, they will persist in some form
The thing I find interesting is that open source LLMs are not that far behind.
In contrast, consider search engines of the 2000’s – basically there were 5 web indexes: Google, Yahoo, Microsoft, Baidu, and Yandex (with Yahoo abandoning its index)
As far as I know, there was never a highly used European index, but there was desire – e.g. in France IIRC.
So that shows that search indices are hard to build.
But LLMs are not as hard to build. (I guess torrents of libgen help a lot)
I also think there is an incentive to train the chat bots to hallucinate. It’s similar to how there used to be Google queries that return 0, 1, or 2 results. But these days they try to show results for everything, reasoning that it’s money left on the table if you don’t
IOW I don’t think there was a technical reason you couldn’t train the chatbots to say “I don’t know” more often; I think it was more a product choice
And all that said, I do think there is a danger of people being less literate because of LLMs … but I think that already happened with search engines – “Google coding” is a real thing, and most people don’t go to the library. There is a lot of knowledge in older books that’s not on the Internet at all. (although again libgen-like archives have recently changed that)
IOW I don’t think there was a technical reason you couldn’t train the chatbots to say “I don’t know” more often; I think it was more a product choice
Everything I’ve seen so far seems to indicate the contrary. My perception is that if you solve this, you become instantly massively rich.
Indeed, because they don’t know that they’re lying. They don’t have a database of knowledge and a fallback for when things are not in that, they have probabilities that one token follows another.
Most of their value comes from not simply reproducing things in the training corpus but interpolating between n things arranged in an m-dimensional similarity space. Some such things will be correct (according to whatever metric you care about, which may be aesthetic rather than related to objective reality). Others will not. You can’t do this kind of interpolation without introducing nonsense at some points and without the nonsense and the useful outputs being indistinguishable without a priori knowledge.
I am always very impressed because apparently, LLMs handle “translate the following sentence to : ” very well.
I don’t think it is possible to have a system where that request works and which does not “hallucinate”.
Maybe you could have a knob on LLMs to account for both “exact fact questions” and “interpolate questions”. I am not very knowledgeable on the topic, but so far I’ve never seen anything that indicates any kind of progress in this area.
One of the reasons for this assymetry is that your “translate…” example provides a direct reference to the model that is used verbatim as input for generating new tokens (aka it is “in context”) whereas factual questions query that billions of numbers that are a very lossy representation of whatever was used as the training data.
Yeah, that’s a big part of this. There are prompts like “tell me a story about an owl who loves cheesecake” which should clearly (to us) produce non-factual output, and questions like “list the names of the owls who have successfully baked a cheesecake” which shouldn’t. Distinguishing between the two from prompt language alone is a very tall order.
The best LLMs right now - Claude 3.7 Sonnet, Gemini 2.5, the latest GPT-4o upgrade from last week with a very confusing lack of any version number at all - are mostly very good at answering “I don’t know” when unable to answer a question,
They still hallucinate but it’s impressively rare compared to just a year ago.
Which one would you recommend using to benchmark this? (Both in their quality and how much would it cost me to play with them? Ideally I would like to run the LLM on my hardware, even if it’s slow, but if no model that is good at saying “I don’t know” can be self-hosted, I’m good with using a hosted service.)
And most importantly, if I manage to make the one you recommend hallucinate easily, would you be open to signal boost that?
That’s definitely the big challenge with local models: some of them are getting really good now, but they’re still massively less powerful than the huge proprietary monsters.
My favorite local models at the moment are Mistral Small 3.1 24B and Gemma 27B - both released in the last two weeks - but I’ve not spent enough time with either of them to be able to say how good they are at rejecting questions they don’t know the answers to.
I’ve also not seen anyone put together a benchmark that addresses the “can it say no if it doesn’t know the answer” problem, though it’s possible there’s one out there I haven’t seen yet.
If you recommend me a non-local model, I’m open to testing that too.
The key is:
And most importantly, if I manage to make the one you recommend hallucinate easily, would you be open to signal boost that?
I wouldn’t, for a couple of reasons. Firstly, I don’t like agreeing to conditional “I’ll boost X if Y” deals under any circumstances - my editorial integrity is very important to me, so I avoid situations where someone else has influence over my decision to publish something (same reason I don’t do paid promotional posts).
Secondly, I am 100% sure that anyone can get any model to hallucinate - and I think that will be true for a very long time. “Hallucinating” is part of what makes models useful, as seen in my “tell me a story about an owl who loves cheesecake” example above.
When I’m teaching people to use models one of the first exercises I encourage them to do is to find a prompt that causes a model to confidently lie to them, to help illustrate that these things don’t “know” facts about the world with 100% accuracy. A good one is biographical details of someone who is “Internet famous” that models know bits about them but not an actual celebrity. Lots of models think I was CTO of GitHub for some reason (I’ve never worked for GitHub.)
Thanks for your answer.
But really, your last sentence is rarely (if ever) heard aloud outside the anti-LLM camp. You hear “make sure that you validate what the LLM says if it’s important for it to be correct”, but what you are saying is completely different, and IMHO you need to understand it to use an LLM properly and to have reasonable expectations about what LLMs can do today and short-term.
You can chalk it up to human laziness the malpractice we are seeing that some people suffer in their flesh, but I feel people just do not understand that hallucinations are intrinsic to our current LLMs, and people who recommend the use of LLMs should make this abundantly clear.
See also: leaded gasoline, freon refrigerant, asbestos insulation in buildings; all with their own material advantages, but with unacceptable negative externalities to the environment and public health.
(If anything I’d say the above comparisons are being much too generous to the utility of LLMs.)
Sure, but gasoline, refrigeration, and insulation still exist
I don’t think the current crop of LLM is what will persist forever – there are huge flaws – but LLMs will persist in some form. I also don’t have confidence in the industry to do well with the technology
The more accurate comparison is probably cars/gasoline. Car had absolutely huge negative externalities, and a bunch of positive economic effects. Americans overused cars; we overbuilt highways. We also antagonized/invaded much of the world for energy
Well, when I put it that way, it is a bit depressing. But I don’t think the future is set in stone …in theory, LLMs can be “steered” toward something that is healthy … again open source LLMs give some hope
LLMs don’t cost as much to make as Google or Facebook, and that cost is one reason for the sorry state of the tech industry
I like much in this take. It irritates me when someone says “but it just does X” and then dismisses it.
I believe that ultimately what humans do is the result of a lot of “just” stuff too - not the same as an LLM, but when you look at how it works carefully enough you can see the underlying mechanisms. And yes, the synthesis of tricks humans use is much more capable in many ways than the result of the tricks an LLM uses. (though in other ways an LLM can do things a human can’t do) But we can’t just dismiss something because we think we can see the mechanism.
I’m fascinated by the capabilities of LLMs and I think they can be useful tools.
On the other hand I think there are many problems LLMs are causing and probably a lot more problems are coming up soon. Information pollution is real.
Completely overblown hype is real too. A phenomenon that fascinates me is a bunch of CEOs claiming their AI is going to automate everybody away in the next five minutes and at the same time implying everything else will remain the same: they’ll remain CEOs, their companies will just continue, and society is fundamentally unchanged. I don’t know whether that’s a deep flaw in their imagination or whether they know they’re bullshitting.