A hypothetical search engine on S3 with Tantivy and warm cache on NVMe

8 points by emschwartz


neil_f

Cool post!

It doesn’t mention it explicitly but it seems to be comparing shared-disk vs shared-nothing architectures

I didn’t see it mentioned but the creator of Tantivy, Paul Masurel, did build an open source shared disk search engine off of Tantivy called Quickwit. Datadog acquired them.

I’m a bit confused by this

Assumptions: 100 million documents, 10 GB of Tantivy indexes across 32 shards, 100 queries per hour, 24x7 operation.

Traditional search cluster setup:

3 nodes with 500 GB SSD each: ~$1500/month

Why does the traditional setup need 1.5 TB of SSD across 3 nodes for an index of size 10 GB? The article writes that the traditional architecture is “always-on clusters with fast disks and enough RAM to keep indexes hot”. Perhaps I’m missing something but seems like it’s off by at least an order of magnitude and it didn’t include the RAM size which is more important for an index this small.

A r7gd.large with 16GB of RAM and 118 GB SSD costs $0.1361/hour on demand which is $98/month and we could keep the entire 10GB index in RAM.

I’m guessing the three replicas are for availability.