Clickhouse is winning the Observability Wars
53 points by siddhartha_golu
53 points by siddhartha_golu
In 2019 I worked briefly in a startup that had managed to create an impressive amount of things. (They died soon after I joined, but some of their OSS projects live on.) They were using ClickHouse and they showed me a bit of it. I was absolutely impressed. Until that moment, I thought I'd rarely get to a point where I would need anything else than PostgreSQL, yet I wanted to use ClickHouse.
I've had experiences with ElasticSearch, InfluxDB, and a few others, and it has always sucked. Probably because they all implement query languages from scratch whereas ClickHouse adopts good old SQL and extends it just a little bit in the right places.
Plus, like most of the very successful products, ClickHouse reeks of great implementation and care for the user- it gets all the little details right out of the box.
We're using HyperDX at work now and... well, plotting metrics has been a bit annoying, but tracing works quite well. I'm a bit confused because I thought tracing would quickly become indispensable and would usher a good jump in productivity, but it seems it's not being adopted as much as I expected, likely because it's not as much of a game changer as I thought it would be. But I'm glad ClickHouse is becoming the player here and I don't have to deal with other stuff.
I'm a bit confused because I thought tracing would quickly become indispensable and would usher a good jump in productivity, but it seems it's not being adopted as much as I expected
I'll probably write about this more in depth eventually, but in my experience introducing tracing to an organization used to getting by on metrics and logs is a steep uphill battle. It requires doing a lot of outreach, collecting use cases and putting in the (hard) work to show people how they can now do things they couldn't do before, and how difficult things are now much easier. On the technical side it needs to be as seamless as possible to adopt, which depending on your stack and context can be a significant effort on its own (which will fall to you).
In a sense it's no different from any other cross-cutting initiative: it helps if you have personal reputation to bank on and one or two "true believers" across the company to help spread the word.
introducing tracing to an organization used to getting by on metrics and logs is a steep uphill battle
I worked at an org that provided distributed tracing from 2012-2017 and this was exactly my take away. Very few people understand the value prop of tracing, and it is not easy to communicate.
It’s interesting here as I’ve always strived to just keep what would be a trace inside one request handler with atomic effects. So request logs are… both sufficient and equivalent.
At my new job there’s a bit of API-calls-API and the datadog APM traces are my goto. I see the value just by using it.
I still am not convinced if we are taming accidental complexity with extra complexity here, or if the underlying complexity is fundamentally essential. I do a fair bit of distributed systems/fault tolerance/internal consistency type things so yes you can’t avoid multiple systems at scale but if the foundation is solid the application logic is relatively straightforward (it’s one fault tolerant distributed system handling a single request, not a trace of nested RPC with arbitrary failure modes at any node…).
With every attempt at tracing adoption at $WORK, we have been forced to sample traces in some fashion when used in production (whether for cost or performance reasons) whereas 100% of our production requests have a consistent set of metrics emitted. We have all this tracing instrumentation around, but the only time anyone looks at them is for local debugging because chances are the requests causing a problem don’t have captured traces around.
But even if it's not sampled, the trace id still gets propagated and included in the logs. That alone is huge.
I've had experiences with ElasticSearch, InfluxDB, and a few others, and it has always sucked. Probably because they all implement query languages from scratch whereas ClickHouse adopts good old SQL and extends it just a little bit in the right places.
Are you talking about ElasticSearch's JSON query language? It looks like they support multiple query languages, including a dialect of SQL. I don't know how good the support is for it, though. I've only ever used the JSON query language, and I didn't know about the others until just now.
As a query language, SQL isn't perfect; it annoys me that clauses can't be duplicated or appear in arbitrary order. But it is very good.
I haven't looked at ElasticSearch in a long time. (Honestly, I don't even know what it's good for!)
(I was mostly talking about InfluxDB, which I was reminded of today. I worked with it briefly and the looks-like-SQL-but-not-really query language was extremely infuriating.)
SQL is terrible, but everything else is just worse- and has to play catch up for the decades of evolution that SQL has gone through.
As a query language, SQL isn't perfect; it annoys me that clauses can't be duplicated or appear in arbitrary order. But it is very good.
My SQL experience is mostly biased by the fact that I machine-generate it from a compiler, but I kind of hate it. BQ's pipe syntax is far better and I wish it was more popular.
My experience mirrors the author's. ClickHouse has a shockingly good scalability story, even when self-hosting (just throw more cores at it). It's true there's perhaps a higher upfront setup cost, but honestly the schema is pretty much defined by the OTel exporter, so you just start with that and extract columns and materialized views as you go when you need extra query performance. In return you can stop worrying about scaling and all your data is available directly to do whatever analytics you want on it, including deriving metrics from traces.
However the other piece of the puzzle, visualization, could definitely use more work. Grafana is... suboptimal to display and analyze traces, especially compared to something like Honeycomb. Hopefully one of the existing players will address this soon (maybe HyperDX?).
This post felt like it started off strong with a compelling introduction, but the analysis of different companies devolved into a list format where the low-effort LLM writing became very apparent. And looking through the intro again more critically revealed more and more traces thereof.
This is a pattern I've seen on a growing portion of recent Lobsters posts, so I'd just like this comment to be my quick thoughts on the more generic observation instead of a specific dig.
I personally do not have much experience with "observability" tooling, which does make parts of this post interesting despite the authorship. But I don't want to follow the fallacy of trusting LLMs on topics I'm unfamiliar with while claiming their writing is flawed for topics I am familiar with. So I don't really know whether to factually trust most of this article or its kin. The author obviously abdicated their thinking on large parts of it, but also seems to have given the machine a reasonably clear thought to start with.
But when I start programming with "clear" thoughts of my own, I find that only the act of writing out a program that fails edge-case tests or doesn't type-check actually convinces me that the thoughts were missing something fundamental. So I think that in writing, if you don't make the arguments yourself and consider the counterpoints carefully after it's all together, then you've communicated no value beyond the original "clear" thought.
I'm pretty sure this is what the people arguing for writing code by just storing prompts for some future LLM are getting at.
But if that's all your programming or your writing consists of, then beyond just thought, you've abdicated rigor, diligence, and respect.
I don't really know what I'd like to see Lobsters do about this. The vibecoding tag is an obviously overworked solution. But perhaps an "LLM smell" tag would be helpful for telling me to be careful about a lack of rigor.
Yeah, I didn't want to post this as a top-level comment, but once I got into the meat I realized it was just LLM platitudes and felt a bit annoyed about the time I spent reading beforehand.
I do have some experience with observability tooling, and this seemed like a pretty clear and sensible article. The thesis is that the different products don't scale the same, and that is illustrated with diagrams and specific commentary about what happens to each of them at different scales.
Do you have some examples of where you saw "obviously abdicated thinking"?
Matthew is at the very least, well versed in observability (tech lead @ Braintree and now Prinicipal @ SYBO, of Subway Surfers fame). The swearing also manifests a bit of his own voice.
Dropping in GPTZero does give 71% chance LLM, 28% mixed (although I truncated the introduction, which read the most human, to fit in the character limit).
I understand the distaste, but I think he actually reviewed/iterated on this and is just not filtering for LLMisms/optimizing for tone. "No em-dash" is no longer enough to dodge the smell.
For folks who have experience with honeycomb, I'm curious how that fits into this. Afaik they also use a columnar storage format, and rely heavily on compression and bucketing to skip large quantities of data upon read (rather than an inverted index like elastic and datadog which afaik also runs elastic under the hood).
IIRC a few years ago Charity Majors (Honeycomb co-founder) said on Twitter that if CH had existed at the time they started the company they would've probably just used it. I'm not sure how much overlap there actually is between the two though, conceptually. Would be interesting to know if the problem space naturally steered them towards similar approaches, which is what I would expect.
I know this might make some folks grumpy, but I found the use of certain words very distracting so I simply search and replace those words with words that didn't bother me and the article was great! I'm not going to say what the word was but it's an increasingly imprecise term that bothers me. it was nice just to it read the article with a simple searching new place operation.