Amazon holds engineering meeting about GenAI based outages
18 points by dustyweb
18 points by dustyweb
I enjoy the engineers trying to avoid responsibility be saying "the LLM made a mistake and caused the outage". My people your job is to test everything you write. I don't care if you had an LLM help you or not if you deploy code and production goes down it is your fault.
Well it's tricky because upper management is also pressuring people to use these tools as much as possible at many of these companies. "Don't get left behind!" They want to tell their investors that they are ahead of the AI curb. You can see various companies bragging about it.
But the reality is that writing code is not the hard part of engineering, reviewing it is. And we have known since Ka-Ping Yee's dissertation on voting machines that even the smartest engineers struggle to find vulnerabilities and bugs in code upon review, including maliciously inserted ones, even in incredibly simplified examples.
And so, we are pressuring to turn peoples' jobs into the thing they are worst at, with the highest fatigue: reviewing code they didn't write that looks incredibly plausibly correct.
So I would have a hard time blaming the engineers for the situation. It's coming from top-down, and we're going to see consequences from it all over our industry, I think.
It's a shame the real solution--eliminating layers of useless management--is rarely, if ever, on the table. I see this in colleges, too, where the administrators are plentiful and overpaid while getting in the way of the actual work of educating. Good engineers are tempted to move up into high-paying managerial roles just like good teachers are tempted to move up into high-paying administrator roles. We'd be better off if the people doing the fundamental work were paid the most and encouraged to grow in the roles they belong in.
I can't see the article... it just takes me to a "Subscribe" modal instead.
The Financial Times has has a pay/regwall, flagging this as "broken link".
Hm, ok I will submit a different article
Unfortunately the only other article I can find is https://www.tomshardware.com/tech-industry/artificial-intelligence/amazon-calls-engineers-to-address-issues-caused-by-use-of-ai-tools-report-claims-company-says-recent-incidents-had-high-blast-radius-and-were-allegedly-related-to-gen-ai-assisted-changes
but that URL is over 250 chars long so lobste.rs won't let me post it.
Too bad. Seems important.
Here's a reprint from ArsTechnica: https://arstechnica.com/ai/2026/03/after-outages-amazon-to-make-senior-engineers-sign-off-on-ai-assisted-changes/
So they had a few production incidents with similar root causes and they had a discussion about it at a regularly scheduled all hands meeting. This seems like normal good engineering practice, not news.