A SQLite VFS for remote databases via gRPC

11 points by danthegoodman1

jitl

So, what’s the advantage of having sqlite pretend to be a local db but actually proxy to a remote DB via gRPC, vs just use the remote DB api directly? Is there a major performance advantage to batching remote writes to the page level vs the row level?

danthegoodman1

I think in some kinds of transactions it could be faster because of the atomic batch commits, but the idea is that you can decouple the local query execution with remote storage.

Not great for performance, but you can effectively make unlimited, bottomless sqlite databases with this. I think that’s pretty neat.

bruth

I get the theoretical value prop of the VFS which abstracts to any backend, but that is not what SQLite is purpose built for. Are there specific use cases in mind where this would be preferred given the performance trade-offs vs. an any remote/server-based database?

danthegoodman1

I list some in the README, main value is around “DB per user/agent” use cases. Similar to Turso or Neon.

Unlike running postgres per-user without something as special as Neon’s setup, there’s negligible overhead since you don’t have to schedule dedicated compute for it since you can use the compute of the query-caller

simonw

“Some databases that CANNOT support read-only replicas” lists S3. I wonder if that’s still true given some of the improvements to S3 over the past 12 months? Most notably, S3 can do conditional writes now, could that be enough of a primitive to enable read-only replicas using this system?

danthegoodman1

I think it probably deserves more detail to answer this:

S3 as “file per page”, no, not alone.

If you use a WAL on top of S3 to materialize the current “state”, then yes, as you could make the “page” and the name of the file disjoint (e.g. current state of WAL says this page points to this file in S3). Then you technically get atomic commits too!

Read replicas need to be able to read the DB from a specific point in time so they can maintain a consistent view of the DB. I guess technically that could be achieved with object versioning, but I only think that supports restoring, not reading specific old versions.

That’s why I had the “at least independently” caveat on it - you can’t use S3 directly in the way you might expect to map a file per page, you have to have some layer on top to establish point-in-time semantics

Edit: I updated the readme to reflect this answer