The End of Shouting: Prompts as Programs in GPT-5
6 points by youngbrioche
6 points by youngbrioche
Before/After: From Shouting to Rules
I'm having a hard time reading guides like that without hard data from experiments. Gpt5 has the IF score around 70%, so it's still far from perfect. If someone spends so much time to write up the guide and rules - why not prove it makes a difference. Even better - run some tests through DSPy to really nail it. Right now it's still "this is the vibe I'm getting" territory unfortunately.
The era of “shouting”—ALL CAPS, exclamation marks, redundant pleading—is over. Clear rules, explicit policies, and structured instructions now rule the day.
Did caps and exclamation points ever make a difference?
I’d like to see links to the cited studies, and I’d like to see the prompt in question tested against other models (including earlier gpts). I think all models that have been out for the last 6 months would do well with such a comprehensive prompt.
That said, I very very much appreciate worked examples with an analysis of what the author is trying to do with it.
Yes, all caps make a difference, asking for professional code, "production-ready" too. It is weird, but it also makes sense
Welcome back, INTERCAL:
(...) and modifiers such as "PLEASE". This last keyword provides two reasons for the program's rejection by the compiler: if "PLEASE" does not appear often enough, the program is considered insufficiently polite, and the error message says this; if it appears too often, the program could be rejected as excessively polite.
- A strict output contract The prompt ends with a non-negotiable output format: JSON with defined fields. No “pretty please output JSON”—just: “Respond only with a JSON object in this structure.”
Has the author (or anyone?) had the experience where these kinds of concrete directives, expressed in prompts or instructions or whatever, are actually and consistently satisfied? Personally I've yet to figure out a way to express any kind of inviolable rule that survives more than a few round-trips in a session, regardless of model...
The instructions have varied success, but if you're specifically after the output format, that's the wrong approach anyway. There's https://platform.openai.com/docs/guides/structured-outputs which is strictly enforced.
any kind of inviolable rule
You’re still in non-deterministic country. Don’t get me wrong: you won’t get deterministic results when it comes to instruction following, but the tendency towards reliable instruction following with 5 Thinking is so much higher than with any other model I used until today.
You’re still in non-deterministic country ... the tendency towards reliable instruction following with 5 Thinking is so much higher than with any other model I used until today.
Sure, no argument! But that's not really the question, the question is: how does one get out of non-deterministic country and into deterministic-country, when interacting with a model? Is it even possible to do via instructions/prompts/input/etc.?
Wow! That Full System Prompt is way too much work! I wonder if I can write a prompt to write my prompt for me?
That might seem flippant, but I'm serious. Metaprogramming is already a concept in this industry, so how long until we get articles about prompting for prompts? And how long until prompts become the source code?
Prompts will never become the source code until they generate perfect, equivalent programs every time.
As to metaprompting, this is essentially what agentic tools are already doing, especially as they encourage user and llm to work with todo lists.
I was pretty skeptical of LLMs writing prompts for other LLMs last year, because I didn't think there was enough established good advice on promoting in the training data for them to do a good job.
My opinion has changed this year: Claude 4 and GPT-5 both feel like they can do a decent enough job of prompting now, so I've started trying them for a few meta-prompting things.
I might have a lengthy conversation with GPT-5 to flesh out the idea for a software tool, then ask it to write a prompt describing the work to build the tool. Then I'll paste that into Claude Code and see what it builds.
Claude also frequently writes prompts for itself.. If you tell Claude Code to "use subagents" to solve a task it will fire off several new instances of itself each with fresh context window and a Claude-generated prompt. It's quite effective!
Author here. +1 to what @simonw is saysing. The system prompt in the post is the actual production system prompt of the app. The current version is the result of a long (and messy process) of lots of iterations. Mostly human, with machine feedback (mostly GPT-5 Thinking, and their Prompt Optimizer).