Prompting 101: Show, don’t tell

43 points by Gabriella439

Corbin

Fairly good advice. It's also an example of good writing advice in general. Write for the prompt you want; if you want an expert sysadmin then give them part of a Linux kernel log and systemd output which is realistic but tailored for the scenario at hand. If you want them to ask for permission before doing things then don't put "username: root" in there, and pick usernames which are likely to be matched to the behaviors that you want; you can encourage them to sudo or doas instead by giving an example of proper usage. If you want them to think that they work for an elite corporation then display a banner which matches the ethical and ideological values that such a corporation would embody, and you should ensure that the details are likely to immerse the reader.

You can one-shot the example behavior (conversational, short lines, light punctuation) by prompting for e.g. an IRC conversation on any non-RL'd local model of the past 3yrs. This means that the model's context starts with the IRC client header and is followed by a fake OFTC/Freenode/etc. banner, a fake /join, a fake /topic, and synthesized timestamps before every message. After a fake /names, the model can be grammatically restricted to only use (its) real usernames, and a hard token cutoff can be used to interrupt run-on statements just like real IRC. We only enter the room with whatever we choose to bring into it.

hwayne

Similarly if you want an LLM to follow a certain format for output, you get way more accurate results if you show it an example input/output first.

spc476

I can't shake the feeling that vibecoding (or even using LLMs when programming) is more akin to magic than engineering. Or maybe alchemy. A lot of experimenting with different methods, often ritualistic in nature, in order to get the outcome wanted, that is very hard to quantify which method actually works. "Oh, be mean to the AI to get better results." "No, it'll just turn passive aggressive on you." "Easier to read languages work better with AI." "No, those with less tokens work better." "It's neither!" "You can get AIs to code in any language." "It works better with a popular language as there's more examples."

It's magic!

This is not engineering.

quasi_qua_quasi

I mean, a lot of that also sounds like the debate over what the best software language is, or the best design patterns, or the best git commit messages, or performance engineering, or...

That's not to say it's necessarily good, but there's always been a huge part of software development that's very squishy and difficult to quantify.
steveklabnik

"Engineering" is not synonymous with "determinism." In fact, it can often involve trying to take something non-deterministic or unknown and constrain it in order to get certain benefits or reduce certain risks.

Alchemy eventually turned into chemistry. But that took time, effort, and learning.

We are at an incredibly early stage with these tools. That doesn't mean you can't do engineering with them, it means that we are still figuring things out. That takes time and effort. And a lot of debate!
- jrgtt
  
  Alchemy eventually turned into chemistry.
  
  However, Astrology didn't become Astronomy. Not necessarily all efforts turn things into something useful.
  - benl
    
    Astrology didn't become Astronomy.
    
    It absolutely did.
    
    lonjil
    
    People building telescopes and figuring out the math of stellar movement lead to astronomy, but generally those were not the same people as those engaged in astrology, e.g. making predictions and giving advice based on stellar movement.
    
    This is in contrast to alchemy, where the people trying to create gold and all that, were also the people who figured out how stuff worked and eventually created a science.
    
    k749gtnc9l3w
    
    Nonetheless, Kepler spent a huge amount of time trying to restore astrology on a firmer philosophical footing, composing numerous astrological calendars, more than 800 nativities, and a number of treatises dealing with the subject of astrology proper.
    
    (This is a Wikipedia quote, it matches secondary sources I know)
    
    «Building telescopes» excludes even Tycho Brahe from astronomers, as he died just a few year before invention of a telecope. He also sometimes earned various forms of compensation by astrological services.
    
    Your description is generally looking a bit too late in time for the claim in question.
    
    lonjil
    
    I didn't mean that every astronomer built telescopes. Kepler and Brahe I would describe as being both astronomers and astrologers. But in general some people did astronomy without doing astrology, and many people did astrology without doing astronomy.
    
    k749gtnc9l3w
    
    I think the scaling up of celestial measurements in Renaissance pre-Kepler Europe was driven mainly by the fact that high-prestige astrology was actually supposed to include observational astronomy. Then prestige considerations allowed people interested in both astronomy and astrology to push for ever-increasing detail and precision; until it got precise enough for Kepler to build a workable and quite precise — unlike the Copernicus version — description of orbits as things in the world outside Earth and not just in the Earth's heaven.
    
    As for low-quality astrology work for scraps, not moving anything forward — which large-scale discipline with commercial income streams is free of that?
    
    Some powerful people were curious specifically about astronomy even pre-Brahe, and also moved it forward; but the stable funding stream for measurement improvements beyond what is practical on a ship seems to be astrology.
    
    After creation of modern astronomy by Kepler the situation surely changed, though.
    
    gerikson
    
    Maybe a more fruitful view is that today, there is only chemistry/physics, and no more alchemy.
    
    But we still have people who practice and believe in astrology. It has not been utterly subsumed by astronomy.
    
    Although perhaps the widespread skepticism of modern medicine and stuff like electromagnetic radiation can be derived from the same impulses that fed alchemy.
    
    k749gtnc9l3w
    
    True all that.
    
    And some parts of pharmacology might still be a bit of «doing alchemy better», on the sense of caring about properties that we cannot predict, and screening huge swaths of ideas blindly. I guess some of human-effect-oriented alchemists would approve of how we learned to scale up and systematise those efforts.
    
    … and maybe even recognise some of dietary-supplement stuff going on nowadays as straight alchemy.
    
    But also, by now we literally know what is needed to turn lead into gold (not that the methods we have would scale, but we also know why wide classes of more scalable methods definitely don't work).
    
    Predicting character traits at birth? It's not just that astrology promised it and we still cannot do it, we don't even know whether we will get usefully better at it within next twenty years.
    
    spc476
    
    And maybe, just maybe, not shoving LLM use down the throats of those that don't want to use them. IDEs didn't become popular because of mandates from CEOs that programmers must use them, but bottom up from developers finding them actually useful. But AI? LLMs? YOU MUST USE IT OR BE FIRED!
    
    Sorry, I'm just angry at the notion that I'm stupid for not wanting to use LLMs for programming. Why should it matter how I do my job as long as I do my job?
    
    steveklabnik
    
    Where did I say any of those things?
    
    silby
    
    Steve you keep forgetting that while you are asking people to keep an open mind and engage in reasonable debate while working for (1) the single most desired employer in tech right now followed by (2) a startup with your friends, most of your peers work for shitty bosses at shitty companies where the CTOs are handing down AI mandates as quarterly OKR goals and the shitty bosses are now hassling your peers about using Grok more. Your request for open-mindedness and reasonable debate implicitly and inevitably is on the side of the bosses. So while you may never say those things, because they are alien to your experience, you’re on the side of the bosses every time you ask people like spc to keep an open mind about prompt engineering. Sorry!
    
    steveklabnik
    
    I’m aware that lots of people in tech have issues with their jobs. Yes, there’s a lot of bad and stupid things going on out there.
    
    It just reads like a non-sequitur. I never said that I think shoving LLMs down people’s throats is a good thing, or that I think anyone is stupid for not using them. If the topic of those things was what was being discussed, I’d agree that all of that is bad. But I don’t see how it’s relevant to a discussion of “can you do engineering with LLMs?” Nowhere in my parent talked about a job, or being forced to use an LLM, or being called stupid.
    
    spc476
    
    You are right, my reply was a nonsequitur, but having thought about how to reply for most of the day, I would have to say that it's not possible to do engineering with LLMs. Think of this as a reply to both you (@steveklabnik) and @k749gtnc9l3w.
    
    Nature does not change. Physics, chemistry, astrophysics, they don't change. They are knowable and discoverable. Science (to me) is the reverse-engineering of nature. Engineering (again, to me) is the application of this reverse-engineering to allow us to do more. Yes, a scientist might not have a clear idea of what will and won't work when setting up an experiment, but the results can be replicated (what's the saying? "The difference between fooling around and science is writing down the results."). That is the basis of both science and engineering---replicating results.
    
    You don't get that with LLMs. LLMs change. What worked when ChatGPT first came out doesn't with the latest version. That's because the underlying nature of ChatGPT (and other LLMs) change over time. You can't replicate results with ChatGPT, and this is demonstrable. Start with a clean slate LLM. Record the session with the LLM until you get a result. Then start with a clean slate LLM. Rerun your session with the same starting prompt, and see how long until it diverges.
    
    It is not replicable. How is this engineering?
    
    And that, to me, explains the wide variance in experiences with LLMs---what works for you can't be replicated by me. That is not engineering. That is fooling around, even if you write your results down. It's because of the changing nature of LLMs. At best, you can reverse-engineer a single instance of an LLM. But that's it. When it's updated, how it works changes. That's why there's this ever evolving way of interacting with LLMs. Again, nature doesn't change, that's how we can understand it.
    
    I'm struck with the notion of insanity---doing the same thing over and over again, expecting different results. Yet, that's the opposite of LLMs---insanity is doing the same thing over and over again, expecting the same results.
    
    jado
    
    I would have to say that it's not possible to do engineering with people.
    
    People change. What worked with one colleague doesn't with the next. That's because the underlying nature of people (and other species) change over time. You can't replicate results with people, and this is demonstrable. Start with one person. Record a conversation with them until you get a result. Then start with another person. Rerun your conversation with the same starting prompt, and see how long until it diverges.
    
    It is not replicable. How is this engineering?
    
    And that, to me, explains the wide variance in experiences with people---what works for you can't be replicated by me. That is not engineering. That is fooling around, even if you write your results down. It's because of the changing nature of people. At best, you can reverse-engineer a single instance of a person. But that's it. When it grows, how it works changes. That's why there's this ever evolving way of interacting with people. Again, nature doesn't change, that's how we can understand it.
    
    Apologies, but that is the first thing I think of when seeing these kinds of arguments against the usefulness of non-deterministic LLMs. There are many ways this can be useful while building software. If I ask an LLM to fix a bug and it gets it wrong, then I can ask again, with or without changes to my initial prompt, and re-roll for a new solution.
    
    I of course observe this in my own behavior when I am struggling to implement an algorithm, sometimes for days on end, until one morning it "clicks" and I come up with a solution. I had a particularly nasty one last summer where I had a strict deadline, blocked many people, and needed to rewrite some particularly complex code (parsing generic signatures was the easy part!). I struggled for quite a while, with many failed attempts that had exponential blow up on the inputs or didn't handle all of the edge cases. Am I "not doing engineering" because the random perturbations of chemicals in my brain decided not reveal a coherent solution until the second week of me working on it? This definition feels far too rigid.
    
    th0ma5
    
    "These are lotto numbers that worked for me." One person's success is not guaranteed to be repeatable.
    
    k749gtnc9l3w
    
    Wait what engineering advance has not been about experimenting with different methods without a clear idea what could work? All I have heard about development electrical lightning, or usable rubber, or whatever, has been exactly that. (And on a smaller and personal scale, some of the dirty data crunching I had to do was also exactly that)
    
    spc476
    
    See this comment for my thoughts.
    
    jmmv
    
    I’ve had similar successes when doing very large, repetitive changes to a codebase. Instead of asking the agent to do it with a text prompt, I do one of the changes myself and save the diff to a file. Then I tell the agent to read the diff and replicate that same change wherever else is necessary (as determined e.g. by a sample grep or the results of failing tests).
    
    ag
    
    Interesting article. However, if you say to write Google/Meta/NASA production level code, the AI will "know" exactly what that means as there's plenty of training data on specifically that.
    
    jmtd
    
    Tickled that you illustrated with lots of superfluous “, like…”-constructs and it dutifully incorporated them into the results.