“You Had One Job”: Why Twenty Years of DevOps Has Failed to Do it

2 points by wrs

quad

Most developers don’t interact with production on a daily basis, unless they’re hunting down a bug or something. Guess who does? Your operations crew—or as they are more likely to be called, cloud engineering, infrastructure, SREs, DevOps, or platform engineering.

Whatever you call them, somebody has to deal with operational feedback loops. They are the last line of defense for your system in the face of perpetual threats. In soccer terms, they are the goalie.

[emphasis mine]

Everything else in this article follows from this premise. And if you, as a "developer," don't operate what you build, then… well… I wish you the best of luck in your career.

intelfx

And if you, as a "developer," don't operate what you build, then… well… I wish you the best of luck in your career.

I mean, judging by the overall proportion of developers whose systems knowledge begins and ends with the "Build and Run" button in their favorite IDE, they do indeed have ample luck in their careers.
Vaelatern

Check out most large organizations that are not "tech companies."
wink

The idea is good but I've never not seen it break down somewhat in any org, over time.

What happens when the team is dissolved and single people get shuffled to other teams (I have a new colleague and we brought one decently sized service with him. We're responsible now and apart from some short onboarding it's a bit of a mystery).

What happens if you go easy on "you build it" - we have one old service where another team sometimes adds features (usually because they have phases of less workload, unlike my team) but we (and the we is quite liberal, we have one person who is still here after 5y who actively built it).

That said, it's usually still better than the dev-ops split, no disagreement here.

The main downside I see is (and I could be just an outlier) that I generally work in dev mode or in ops mode, in a given week, or at least a day. Either my prios are: look for alerts, analyze, fix, and in the idle times work on some chores to improve the state - and in dev mode I want a work package without interruptions. This can be somewhat alleviated by having a dedicated support person in the team for a given day, but I still think you're usually so invested and close to your team that you don't simply opt out of all important meetings or helping people.

Maybe I'm just a bad ops person at heart, but when I was only responsible for running I was the chillest person and the devs' problems did not interest me unless someone asked a question - if I'm doing ops for my team it feels like two jobs. When I am only developing I am very happy to hand off the infra stuff to ops.

olliej

There really needs to be a tag for "marketing masquerading as a technical post". This is just another ai startup trying to justify its existence, and using "technical" posts of questionable quality ("Most developers don’t interact with production on a daily basis" what?) to have a startup-tech-of-the-year slop push and a marketing message at the end.

koala

The sad part is that Honeycomb has been around for long with what I've always believed to be a futuristic product. When I first saw it, I thought in a few years everyone would be using this and I got a bit into playing with OpenTelemetry.

But I don't see a lot of uptake on what Honeycomb sells. They had what apparently looked like an awesome "rigorous" way to find anomalies and characterize them, which sounded like it would really make simpler finding issues.

But honestly, even though there's lots of talk from other observability vendors, I believe right now is still doing the equivalent of looking at logs with slightly better UI.

If Honeycomb works, then I find it pretty frustrating that they have gone into the flailing with AI stage because they have a product from the future too few understand.

Or maybe it works and I'm confused.

(Also, the author has some pretty awesome articles that I keep coming back to...)

Vaelatern

Yes sales pitch at the end.

It's clear to me the thesis is right, devs don't see prod and don't know when they are making decisions that hurt their ability to operate differently. Ops folks don't really care about the application. The two are different, mostly incompatible mindsets. I have a hard time imagining that AI makes this better, but I do believe different tooling can enable interested parties to make good calls.

Corbin

The article doesn't notice that the reason why developers and operators are distinct is because of how businesses treat computers. Developer silos evolve because business leaders and owners don't understand how computers are used within their business; each developer project is its own expedition from the boardroom to the server room to see what a computer can do for them today. Developers are necessary for a business to survive because if operators were left to themselves as a culture then they would not do what upper management says; after all, the typical capitalist doesn't know anything other than numbers on a spreadsheet and has no appreciation for the computer as a cultural artifact.

Frankly, I think that the author has been getting discounts on their cloud bills. My biggest issue with observability over the past three years, leading to writing yet another round of Prometheus-integrating tools, is that I cannot afford the disk space and bandwidth necessary to have detailed metrics from every machine in my fleet. OpenTelemetry is a step backward in this regard. The observability movement has failed to internalize the literature on sensor theory (compressed sensing, sensor fusion, sensor algebra) and continues to promise that one more dashboard will fix the dev-ops split.