Automating the Rubber Stamp: What If an Agent Ran Your Deployment Gate?
2 points by vakradrz
2 points by vakradrz
I don't like this.
You've identified a problem: the humans aren't thinking about what our process says they should think about. And your solution is to make sure they can't.
A better use of automation, here, IMO, would be to use it to gather the metrics that were on the checklist, and present the gathered metrics next to the checklist items.
Also, tell people to stop approving this from the car park or the changing room at the swimming pool, or whatever, and only sign off when they can reasonably evaluate the presentation.
That would help the humans do the evaluation we've been expecting them to, rather than just deciding that it doesn't matter.
The system prompt that “stops asking for an opinion” still seems unnecessarily error-prone.
Run every gate check tool exactly once. Then:
- If every check returned pass: true, call approve_gate with […].
- If any check returned pass: false, call reject_gate. Quote the failing check's numbers in the comment. […]
- If any check errored or returned incomplete data, call page_oncall and stop. […]
You do not have an opinion on whether the release is safe. The checks do.
If you already have a system that can implement deterministic checks in plain Python, you can use Python to implement much of that system prompt. The Python can reliably run every gate check tool exactly once, decide which if condition applies, and then use the model only to write the summary comment. For example, a Python script might implement the second bullet point like this:
for result in test_results:
if not result['pass']:
llm_explanation = call_llm(
test_results,
"The proposed production release will be rejected because at least one of these checks returned a rejection. "
"Write a comment explaining the reason for the rejection. Quote the failing checks’ numbers in the comment."
)
reject_gate(llm_explanation)
break
Calling out this blog for being awesome - the graphics when I printed changed to white backgrounds like the paper instead of stubbornly using extra toner.