Local-first is not offline-first
24 points by abnercoimbre
24 points by abnercoimbre
I'm building an app that wants to be local-first, but is still just local-only, in that I haven't implemented sync between devices yet.
I've looked into many of the products that get talked about in the local-first circles, but it seems to me that very few of them focus on data sovereignty. The emphasis is on making offline-first, collaborative sync engines. They're cool and all, but I'd be much more interested in a tool that gives my users control and ownership of their data.
For example the proof of concept "CRDT sync engine over Dropbox" described in this post, which really puts the app's data in the hands of the user, relying on sync infrastructure they already use, without requiring them to self-host anything.
My dream tool would be a robust library / data framework allowing to sync a SQLite database over any file syncing service.
I have the same dream. I think Niki Tonsky’s concept is easy to build using currently available libraries, but I’m missing an opinionated, batteries included solution.
Would libSQL get you closer to your dream tool? https://github.com/tursodatabase/libsql
Not really. It does support syncing, but only to a server that you either need to get from turso, or you need to self-host. I don't want to tell my users "you need to self-host".
I am not sure I get what you’re trying to say. You want someone else to host the data for you, until you discover and sync? Because as you already know, there is no cloud… only other ppl’s servers.
So how are you going to achieve sync A -> B, when device A is offline?
The point is to let the user use a cloud of their choice, without necessarily understanding how the sync happens to that cloud. Instead, from the user's perspective, they should have devices A and B, and some other service that they know and understand (e.g. Dropbox). A and B should sync with each other via this external service.
This is still happening on another person's server, but in this case it's a server that the user already trusts (assuming they already have a Dropbox account and store data on there). That means they don't need to trust @pscanf additionally, because it would be possible to see that the data never gets sent to @pscanf's own servers. But it also means that they don't need to understand how self-hosting works, because @pscanf's software hooks into Dropbox, or Google Drive, or whatever infrastructure they're already familiar with. It would also hook into something like NextCloud, so that the user could self-host if they wanted to. But the key idea here is that the software can be written in a sync-agnostic way, so the user can use whatever tool they're comfortable with.
It's like OpenID but for storing/syncing data.
Man, that puts things into perspective for me. Ditto for this original quote from @pscanf about placing "the app's data in the hands of the user, relying on sync infrastructure they already use"
I really wanted to convince my partner to run this Actual Budget finance app (think of it as a decent enough open-source alternative to YNAB). You can connect to your real banks and everything.
The problem? My partner was interested, but felt put off by the idea of self-hosting the server, even though I offered to set things up. The response was "I'll think about it."
I now realize if I had just said "hey you can sync devices with Dropbox" then the barrier of entry would've gone way way down, right? My partner would be managing finances through a cool app instead of barebones excel.
If I'm developing a local-first app, I would like to be able tell my users: "to sync between devices, just give the app access to a dedicated folder that gets synced with Dropbox/iCloud/whatever". The app writes its data there, and Dropbox takes care of the sync. Obviously there are servers involved in this, but most likely the user is already using a file syncing service, so it's low effort on their part.
The dream tool I was talking about would allow my app to sync a SQLite database via file syncing service. Unfortunately you can't just put the sqlite file in Dropbox and call it a day because the file might get corrupted, and also conflict resolution would be very rough.
That'd be really awesome, but I wonder how it would work? Presumably, your app would have to be in control of resolving conflicts when they arise, but that requirement seems to conflict with it being usable with an arbitrary file sync service, which would have its own way of resolving conflicts.
Do you have a rough design in mind to address this, I'd be really curious to know more!
Do you have a rough design in mind to address this
Not really. I was thinking about an approach similar to the one shown in the tonsky article, but specific to my app's data structures. I haven't thought about a generalized solution. But indeed it would be an awesome tool to have. :)
I haven’t tried it myself, but possibly PouchDB would fit your requirements? https://pouchdb.com/guides/replication.html
It's already "trivial" to store a SQLite database inside a cloud sync filesystem and replicate that. It sounds like the difficulty you see is the issues with conflict resolution as multiple devices get in/out of sync and potentially multiple collaborators? I'm not sure if there's a good generic way around this as you're basically trying to create a generic concurrent system upon different substrates with varying consistency models (which is ultimately what the different cloud sync engines do.)
Maybe it would be easier to just modularize the "sync" layer from the "data" layer and write components that work with each cloud sync provider? This is a similar way to how git works with remotes.
It sounds like the difficulty you see is the issues with conflict resolution as multiple devices get in/out of sync and potentially multiple collaborators?
Yes, that's the difficult part. Even if you're using safe methods to save the SQLite file to the Dropbox-synced folder, you don't know when Dropbox will sync up the changes, if it'll sync down some other changes before, etc. When a conflict occurs, one of the branches will overwrite the others, and you can't even tell which one won and which ones lost.
I'm not sure if there's a good generic way around this as you're basically trying to create a generic concurrent system upon different substrates with varying consistency models (which is ultimately what the different cloud sync engines do.)
No idea either. It might very well be that it's not a generally solvable problem.
Maybe it would be easier to just modularize the "sync" layer from the "data" layer and write components that work with each cloud sync provider? This is a similar way to how git works with remotes.
I'm not sure I understand. How do you see the split helping with conflict resolution?
Don't many (most?) cloud drive services only sync at the file level? Which means if your single SQLite DB gets big, it's wasting a lot of I/O resyncing a single, big file. (Corrections welcome.)
I looked into it a few months back, and iirc Dropbox, Google Drive, and Syncthing actually support binary diff syncing. iCloud does the full sync every time. No idea about OneDrive. But yeah, if there's no binary diff sync it would become impractical for a single large SQLite db.
I imagine you wouldn't want to sync the database itself anyway. Instead, what if you just keep a log of all (side-effecting) queries executed, together with some form of distributed clock. You could chunk the logs as described in the blog post to make the sync even nicer
Thank you for your reply!
I'm a Google Drive user, and it appears the behavior changed in late 2024. I had no idea!
The term I’ve heard for this is Unhosted (https://unhosted.org/)
I really like the idea, and there’s a lot of cool stuff in the space, from libraries abstracting over Dropbox/Google Drive/etc to ideas like Solid and ATProto pods.
Offline-first apps focused primarily on staying functional during network interruptions but the server remained the primary data source. Data in this context is stored locally until a connection is restored, after which the locally stored data is deleted in favor of the remote store.
Is this really how people used "offline-first"? This sounds like "offline-capable online-first" at best to me.
Personally I see issues with this philosophy. They might be trade-offs you are willing to make, but I prefer my applications to have a centralized storage system (preferably under my control) that I can back up rather than having to find a way to back up and sync every single device I own. Putting everything on one node makes that job very easy, and my devices can become dumb "terminals" (preferably with some sort of offline cache) that I can wipe, reinstall, or lose without being concerned about losing the most important part to me: The data.
Where do I find actual local-first apps? The last time I did some looking, I noticed that "local-first" is used as a marketing term by cloud-based, often proprietary apps that don't fit the authors definition. If the app requires me to make an account on the publisher's proprietary web service before I can use the app, then I do not perceive it as being "local first". These apps should be called "cloud first", with limited local capabilities.
What I want is the ability to work indefinitely offline, with the full feature set available, after initial install. Also, if I am travelling with both my phone and my laptop, then I can sync them without internet access. I periodically travel to or through places with no cellular or internet acccess (eg, my favourite campground). I also aspire to a mostly-offline lifestyle. This will be increasingly important to me in the future, since the enshittification of the internet is accelerating due to AI.
The last time I went through this search, I tried to find a local-first calendar app, and failed.
Having the full feature set available is not really possible in the general case. Some activities simply require coordination. What local-first applications can do is identify the areas where the need for coordination can be relaxed so that many features are available when offline or otherwise partitioned from the full network.
If the two people coordinating are in the same room together, then the app should allow them to coordinate without connecting to the internet. A LAN or Bluetooth should be good enough.
If additional data needs to be accessed online in order to perform some task, then it should be possible to download that data ahead of time, so that I can perform that task when I am out in no-internet land. An example of this is accessing map data. I use Comaps on android, which lets me download map data ahead of time, before I drive to the terroritory where I need the map. Another example is Secure Scuttlebutt, a social medium that allows you to download somebody's feed ahead of time, then browse it later.
I think it's more likely that the full feature set is possible without internet access, but that people don't think about or don't care about the use cases I am describing.
Okay, I understood "offline" to mean "not connected to any other machines" but you meant "not connected to the internet", which is a different thing.
Self promotion warning: the app I'm building (Superego) is like that. Among other things, it aims to be a platform on top of which you build your own apps. Because yes, it's hard to find local-first apps that give you data ownership, but you can easily build your own! (Easily thanks to AI.)
Though, as I was saying in another comment, it's not local-first because it doesn't sync yet. I haven't found any tool that allows me to implement sync in a truly local-first way (no custom or proprietary cloud server involved). So, it looks like I’ll have to implement it myself, which I haven’t gotten around to yet. (The app is still in alpha.)