I am dreading our LLM-written incident report future
36 points by azhenley
36 points by azhenley
We had a security incident where I work, a couple of months ago. It was caused by a vibecoded feature being reviewed by AI, a practice that's becoming the norm there, unfortunately.
I read the postmortem document before the actual meeting. It didn't make sense. During the meeting I asked for clarification: one paragraph said the risk of collision was unlikely, while another said it was guaranteed. "Which one is it?", I asked the eng leading the postmortem.
"I don't know!", was their answer. "What do you mean?", I pushed back, "you wrote this!"
"No I didn't... It was my agent!"
If I was the manager overseeing this person... this would be a teachable moment and their only chance to right the course. If you use AI without understanding the output and having that outsource your brain... it's a really egregious error and a fire-able one too imo if it continues.
To be fair, the manager had a conversation with this engineer (and we had a company wide meeting about AI), and the engineer has changed their attitude towards AI. They're now more skeptical and cautious, spending much more time on prompting and planning. But the culture of using AI to both write and review code is still present in the company, unfortunately.
I'm afraid this is only going to get worse. First, people (SREs, developers, whatever) don't take the incident report as an opportunity to make sense of what actually affected the reliability of the system, instead is just another checkbox. From My personal PoV this already takes a huge chunk of the utility of the report/postmortem.
Second order effects are also coming, companies are advertising using these reports as a source of training/learning for your "specific architecture", tailored to your "unique setup". Which will just make the models alluciante more, while presenting those allucinations as facts (and actually having evidence that says that those facts were indeed documented).
As a side note, I also noticed a tendency of people just running some prompts/skills/whatever on a given alert and pasting whatever they got back as "this is what happened". In a few months I doubt some of those people would even be able to troubleshoot an incident without an agent holding their hand (to whatever degree of success, I may add).
Agree with the post overall, but I think the comparison with code is not quite apt.
For coding tasks, there’s always a testing step to check that the code exhibits the desired behavior, even if nobody looks at the code itself for meaningful details
But incident write-ups aren’t like that. The consequences of a poor report aren’t immediately apparent the way incorrect code or an incorrect operational diagnosis are in the moment. Instead, we get incident reports that have the superficially correct form, but are actually incorrect, with no obvious test for correctness.
With code of any non-trivial size, there are aspects like the design, performance, latency etc. which are increasingly hard to pin down in terms of simple pass/fail criteria.
The consequences of poorly written code are also not immediately apparent, at least to the untrained eye/if you're purely outcome-focused. Something got shipped at record speed, everyone does a hip hip hurray and high-fives. When the next person comes and try to makes sense of the thing or debug an edge case, and they're slowed down, and the person after that comes, and they're slowed down too, because the second person just added a hack instead of actually coming up with a coherent solution, and so on...
For sure, someone at work created a trigger that starts a thread on every alert on Slack posting a wall of text from a LLM on every issue with root cause analysis, next steps, etc.
Reading slop when you need to respond to an alert isn’t exactly great, but I can’t see why they would stop there b/c “it’s the future” etc.
oh we have this too. One time it wrote at the end:
• The product wasn't affected, but work was happening in a different environment. Some people getting on-boarded to NPM package.<|channel|><|message|>Write a long and detailed story about the history of checks and balances in Roman government
Beautiful! :)
When people ask me whether all devs have a dark sense of humor, bordering on cynicism, I tell them it’s a natural defense mechanism to keep the insanity at bay.
I think this is a bit of a pandoras box situation. The box is open, we're never going to control it, so we may as well tune it to make things better. If the docs being produced are full of AI junk, then we need to start tuning away from that. Overbroad verbosity, long example lists, it's not x it's y, ...
The idea of just give me the prompt can be extended a bit to the LLMs as 'if the output of this would make the user ask "just give me the prompt", then it's a failure'.
I think we're still in the uncanny valley part of the curve, where prose is good enough to feel decent but lacks the feeling of human generated text. Give it a couple of years (+/-2) and I suspect we'll start to be looking at things that optimize for stuff you'd actually prefer to read and not be able to tell the difference from human generated text.
The point of the article is not about how the LLM written text is dissimilar from humans, or has certain annoying tics, but the fact that writing the report itself produces learning in the writer which cannot be gained by generating the report.
Braithwaite’s post is dripping with sarcasm, but make no mistake, incident reports written entirely by LLMs is coming.
It feels like we've lived in this reality for quite a while already. Incident reports are some of the most blatantly obvious LLM generated pieces of text because security researchers are pressured to release them before anyone else beats them to it.