Language Models as Thespians

9 points by jstrieb


dz4k

It might be more accurate to compare LLMs to script writers, but comparing them to actors is easier to understand and discuss.

I think it’s not only more accurate, but shines more light on how AI assistants can generate usually-correct statements even though they don’t “know” anything.

When you’re talking to, say, ChatGPT, you’re not talking to the LLM. You’re contributing dialogue to a story that the LLM is writing. These story generators are usually tuned to produce one kind of story, slice-of-life fiction about the characters User and Assistant.

If Assistant provides genuinely correct responses, it makes a better story, so the story generator usually learns to write correct lines for it. But correctness is only ever an instrumental goal, so sometimes other goals get prioritized (e.g. Assistant needs to be portrayed as smart and helpful, so it shouldn’t say “I don’t know”). And other times, the subject of an episode is just too complicated for the robot author to understand, and it resorts to taking “creative liberties”.

orib

If you want a comparison, an LLM is someone trying to bullshit their way through an interview. I’ve had interviewees know enough to sound plausible, but not enough to be consistently right. However, they’re terrified that if they admit they don’t know, they won’t get the position, so they make stuff up and hope you don’t notice.

LLVMs don’t have motivations, but they have the same tendency to bullshit.

peniblec

I enjoyed the “finding the most useful metaphor” angle; as a fellow “designated computer guy & self-proclaimed skeptic” it’s a question I expect I’ll be pondering for a while still, so always appreciate reading people’s take on it.

Only bit that gave me pause:

The web developer persona prompt generated better output than the plain request.

Did it? 😐 The plain prompt produced

while the webdev-persona prompt produced

Granted, (a) I’m no marketing expert (b) the persona’d prompt aimed for “simultaneously trendy and timeless” and “memorable”, so the persona output surely is better depending on the metric 😶