Maybe we don't need a server
54 points by abnercoimbre
54 points by abnercoimbre
File system is the wrong abstraction level for application developers, you'd just be ignoring the problems with synchronization and transactions, expecting them to hand roll their own solution.
A few years ago I remember reading through the Fuchsia docs and they had a Ledger concept [0] for maintaining synchronized state across applications. Of course, this being a Google project it had a dependency on the cloud, but you could move that over to an authoritative home / main node. It's worth reading through their docs and examples because they seemed to actually be trying to solve the data synchronization problem for a multi-device world.
Apple has their own solution with File Provider extensions, but my brain shuts off when I try to read through their terrible docs so I haven't been able to compare and evaluate.
Another unresolved issue is what to do when multiple application developers are interacting with your files and they cannot agree on a common interface and conflict resolution strategy. So maybe Apple is directionally correct with their File Provider concept, and you could build a standard extension for various file types which developers then build upon for their own applications. Sounds a bit cursed.
Personally, I feel a bit defeated so I've just resigned myself to running Tailscale with Syncthing for note synchronization. Since I only ever edit files on one device at a time and they're already synchronized I avoid any pain, but that's just me coping with the inadequacy of my solution.
[0] https://github.com/OpenTrustGroup/fuchsia/blob/master/src/ledger/docs/README.md
We built a thing in CoreObject as part of Étoilé around 20 years ago to do this kind of thing. The persistent object model was a tree. You made changes by modifying tree nodes and there were hooks for diffing and merging if you wanted a conflict resolution strategy. The backing store would record changes so you got undo and redo for free (including branching undo). You could build collaborative editing by providing a transport for diffs (I had a PoC that sent them over XMPP, so you could keep two copies of a document in sync with live editing).
Because the core abstraction was diffing and merging, with recorded history and branching points, you could use the same thing for live and deferred merging.
It makes me sad that mainstream F/OSS desktops never copied any of these ideas and are still providing bad reimplementations of abstractions from the 1980s.
Indeed. The filesystem is simple and "easy" but is definitely a primitive abstraction that leaves many common problems unsolved. It's even worse when you consider cross application information sharing.
In addition to ledger, a few more research papers and historical OSes that attempt a different core abstraction. (Of course there's countless more):
https://www.arxiv.org/abs/2512.09762
Operation-Based Evolution and Versioning of Data. Is a "platform for richly structured data supporting change in multiple dimensions: mutation over time, collaboration across space, and evolution through design changes". Interestingly the paper is very new! It addresses merging of data structures changed in different locations even with schema changes.
https://bracha.org/objectsAsSoftwareServices.pdf
Two decades old and talks about "orthogonal synchronization" as a concept (parallel to orthogonal persistence)
"In Genera, all functions and data share the same virtual memory. This shared memory is treated by the software as containing a set of data objects, not uninter- preted bits or bytes".
Also the objects are persisted into a "world" by the OS so really applications don't need to deal with files at all.
Had a location transparent single-level store.
Some operating systems divide storage into several levels. A ma- chine's main memory acts as the primary storage level, while the disk acts as secondary storage. Domain/OS uses a single-level storage mechanism, whereby a pro- gram gains access to an object by mapping object pages directly into the process's address space.
The notion of types and interfaces built into the OS.
Each object in the file system is marked with a unique type identifier (type UID)
So if you implement a new object with a interface used by existing apps, they interoperate.
Developers can use the Open System Toolkit to add interesting new file types to the system, while applications that use these new types continue to work without change. A simple example might be a circular log file type. Another useful type might be one which main- tains all the versions of a text file under source version control. When an application opened a text file under version control, it would read the most recent version of the text. This obviates the need to perform a separate "fetch" operation before an application can look at a source module.
Unfortunately we are entrenched with filesystems and are somewhat ironically solving the worthy problems at the library and service layers (eg couchdb, local first, ...). While it's not easy to move an established core abstraction, I do find all research in this space very interesting.
Oh cool, I didn't know fuschia had one too.
& that's what I've resigned myself too as well (syncthing). While all the local-first/CRDT apps I've been seeing are good at handling sync across two simultaneous versions of the same app, there's no common format there close to as common a format as plain text files.
expecting them to hand roll their own solution.
Or provide/recommend one?
In the bad old days of email gateways, the lowest common denominator for interop was a shared directory (one writer, one reader). You don't want the reader picking up a file until it is complete, so a basic interoperability "standard" was for the reader to ignore temporary files (signalled with a .tmp extension, or .foo leading dot). So the writer can:
Step 4 is atomic because you're on same-filesystem. If you need multiple writers, you need a way to avoid filename collisions, but you can typically name the writers and so namespace them.
If you want to do the same thing with multiple files, I guess you could similarly use subdirectories (everything in a temp-named subdir is ignored for sync). That imposes an additional structure during the txn, but we can rename afterwards.
There is a potential need for cleanup, but that is likely manageable with a simple grace time.
With this approach, I think you can write sync-friendly apps?
I really like Syncthing but there is a thing to watch out if you want to use it as an alternative to a server: data is synced only when devices are online at the same time. Depending on the use case it might or might not be an issue
You can always fall back to mimicking a server architecture by having an always-online node. You can even configure it so that it only sees encrypted data (if you're hosting it in the cloud or something).
But now you need to configure and maintain that always-online node. As the author points out, that's not necessarily trivial, even for someone like them who is technically capable. It's even worse for nontechnical users.
I really like @pscanf's point a few days back that the ideal solution would be a better framework for syncing application data via existing filesystem sync engines. That way you can use syncthing if it works (including hosting an always-online node if that's what makes sense for you), or self-host nextcloud or whatever else. But you can also just use Dropbox or Google if that's what's available, without the fear of being locked into those providers.
Unfortunately, you can't even work with plain files like that, because each system has its own approach to conflicts, so you need to build a new persistence layer that sits below the application, but above the file system and that can handle all the synchronisation issues. And even if you could, you probably still want something like SQLite sitting on top of that for application-level storage.
One easy way to have an always online node is to keep your PC on 24/7
Again, it's not really typical to be able to do that though. I don't own a PC, for example, in my household we have a couple laptops (which we would obviously rather not leave on 24/7 because then they would not be as portable), a tablet, and a raspberry pi. That last one is only there because I'm a techie weirdo. I know plenty of people who only have tablets and smartphones at their disposal.
The value of something like Dropbox is that it works for almost anyone, because the barrier to entry is so low. To me, that's an important goal - being in control of your own data should be as easy as possible for as many people as possible.
That last one is only there because I'm a techie weirdo.
I used to think raspberry pis were a common household item. I was shocked when some friends within our industry asked me "what's that?" after they saw mine. Gives you perspective!
You can have encrypted nodes. I feel like there's a possibility there for some encrypted-always-on-syncthing-node-as-a-service.
(Personally I just use a raspberry pi connected by cable to my router.)
I’m toying with the idea of a service like that, but so far it’s vaporware. One of these days…
Most people have an always-online device they carry with them to different devices they may use over the day: their smartphone. Unfortunately filesystem sandboxing and energy saving strategies make something like syncthing go against the grain of what smartphone operating systems want you to run.
Kind of disappointing that the article about Actually Serverless failed to even mention the biggest challenge about making something without a server.
I think the big problem is that handling merges correctly basically requires application-specific semantics, especially if you want to handle it invisibly and seamlessly. This doesn't mean you need a server, but it does mean you can't really decouple the synchronization layer from the application itself. And if you want to have some kind of undo history, you wind up needing to couple them even more.
Unless you're merging text files, in which case, both merging and history are mostly solved problems (though conflict resolution still sucks). Conflict resolution is one reason I stopped syncing my notes over Nextcloud and committed (pun unintended) to just using git. Which is not a solution for normal people, of course, unless a lot of sugar is put over it.
I don't think merging text files is a solved problem. You still need application-specific conflict resolution. Maybe if your data is just "human-readable text" automatic merging can be good enough, but it isn't "solved". For example Google Docs has pretty reasonable human text merging but can still surprise you with undesirable behaviours from time to time (mostly noticeable when you are working offline for a while).
You could also use iroh. Although under the hood it uses a relay server (you can self host or use their solution for free)
Have you heard of syncthing ? It syncs a folder between devices, like Dropbox, without the need for someone else's computer. No cloud, it's just your pc talking to your phone, directly and automatically, through your home WiFi.
You are still dependent on a network of discovery servers and possibly relay servers. But I suppose you could consider that part of the network infrastructure.
I do agree that most server-side software is just domain-specific state synchronisation, but that's because distributed state synchronisation is hard! The local-first software people are looking at how to build software like this: https://www.inkandswitch.com/local-first/
But I suppose you could consider that part of the network infrastructure.
Yea, syncthing also has config options to disable the discovery/relays and you can use tailscale for the network infra.
You can get around this, by having some public, always-online nodes that run the discovery component and/or that have well-known addresses that you configure all of your leaf nodes to talk to. I do both in different cases, but it then does require always-online nodes which the author wants to avoid.
I don't want to manage my own server for those things
That's my feeling about self-hosting as well. I care about privacy, and I'm more than comfortable deploying services, so on paper I'd be the ideal self-hoster. But I don't want the hassle and the responsibility of keeping up services and backing up data.
The other day I commented on a local-first post wishing for "a robust library / data framework allowing to sync a SQLite database over any file syncing service". I think that would allow so many apps to drop the self-hosting requirement.
I say "SQLite" specifically because it's probably the most common database for client-side apps, but I also like a lot the idea of storing plain text files (json, md, xml, etc), which are actually much easier to sync. I've actually tried doing that with an app I'm writing, but the lack of filesystem-level transactions was a dealbreaker, because then I need to deal myself with rollbacks, concurrent accesses, etc.
Does anybody know maybe about a tool/library that implements something like transactions on a generic filesystem? I've experimented with using git, pretending that a commit is transactional (though it's actually not, if I remember my research well), but when files get big or numerous, committing just take too much time to make the solution viable.
The other day I commented on a local-first post wishing for "a robust library / data framework allowing to sync a SQLite database over any file syncing service". I think that would allow so many apps to drop the self-hosting requirement.
It's not exactly what you're describing, but are you familiar with litestream?
Yes. S3 could actually be an acceptable alternative to file syncing, in that at least it's a standard(ish) protocol offered by many providers. Litestream in particular doesn't solve the conflict problem, though. But I guess one could make tables append only, and deal with conflicts in the application layer. Still, having a tool that does this for you and gives you a high-level API to resolve conflicts would simplify things a lot.
But I don't want the hassle and the responsibility of keeping up services and backing up data.
And thus, the Internet gets more and more centralized because of this.
I commented on a local-first post wishing for "a robust library / data framework allowing to sync a SQLite database over any file syncing service".
rsync?
Does anybody know maybe about a tool/library that implements something like transactions on a generic filesystem?
Perhaps I'm unimaginative, but what are you trying to accomplish? Why do you need rollbacks and concurrent accesses?
It's weird that often people can only see two alternatives: either everybody self-hosts or everything is p2p, maximally decentralized; or everybody uses a big central platform that provides some service in a proprietary way.
The obvious (in hindsight) solution for these issues is always to create a standard interface for a type of server (say, a photo server) then write client apps that speak that interface and let multiple implementations of such server exist: you can run your own server and use the same client apps, or you can rent it somewhere and use the same client apps, or you can use both and you get backups automatically, use different providers, maybe there is a community server that offers some free space for members, a family server, and you use a little bit of each, with maximum freedom.
Totally agreed! I hope Solid takes off, for something like that.
As I understand it, Solid is basically the PDS part of ATProto, which feels like it's taking off a little bit more than Solid has. (Although in fairness, that might just be the circles I hang in — certainly, neither has got anywhere near to meaningful mainstream adoption.)
We have that already with email. How many people bother with hosting their own email?
A feeling I get is that self-hosting is seen as something difficult, because you have to always update everything every 20 minutes, or it gets swamped with spam, or constantly probed for exploits, or web bots sucking everything down every 20 seconds.
Disclaimer: I self host my own email, web server, gopher, Gemini, qotd and DNS.
You’re making the same mistake as the parent comment refers to. It’s not a choice between using either Google or self-hosting. You can also use Fastmail, Protonmail, etc etc
The key point being that all these systems interoperate, and you are free to switch from one to another at any time. That is, it is the freedom to choose your provider that is more important than the freedom to become your own provider (although the latter is important for making the former possible).
Absolutely. I think it's sad people don't utilize their freedoms more. I know when Gmail first arrived on the scene, it was unimaginably great (2 GB of storage when most mail providers offered only 4MB or so, and a new search-based paradigm for managing mail, at a time when Google still proudly flew their "don't be evil" banner), but now things have changed a lot but people's habits are still firmly stuck. Gmail is now getting so entrenched it's getting harder to get away from it because it's too big to fail and has started to seemingly randomly drop mail from legitimate mailservers.
Yeah, that's completely true. Email works as both an example of what good decentralised systems can look like (a handful of interoperable providers, plus the ability to host your own), as well as an example of what bad decentralised systems can look like (as all that stuff gets worse, harder, and less interoperable).
It is also a great example of how decentralization is not (just) a technical problem, but a social one. Sure, the protocols need to have decentralization baked into them, but once you have that, it's not a done deal, but it is quite precarious.
With e-mail, we've seen how it can slide downhill quite fast: before GMail, things were much more decentralized. Hotmail was of course a big player, but it was not particularly great (it would delete your mail if it was old enough IIRC, or would stop accepting mail once the box was full), and it was fairly common to use your ISP's e-mail offering or various other e-mail services.
rsync?
I guess you mean the SQLite-specific version, but even that doesn't really work, in the sense that it doesn't handle conflicts, which you might get if you have a desktop version and a mobile version of your app.
But as others in the thread have said, solving conflicts is very application-specific, so maybe there's no general solution to the problem.
Perhaps I'm unimaginative, but what are you trying to accomplish? Why do you need rollbacks and concurrent accesses?
I'd say it's mostly developer convenience. For example, about transactions, say I'm building a meal tracker, and from one user interaction I want to save a food (with nutrition facts) and a meal (referencing many foods). If I store data in simple JSON files, then I need to handle what happens if saving the foods.json file succeeded, but then saving meals.json failed.
The unhosted (https://unhosted.org) people are trying to make apps that use a pluggable storage backend (Google Drive, Dropbox, Self Hosted). I totally agree it’s the right way to go. Similarly, the Solid project
Their version of transactions is to use ETags to give you the current version on the server and only write if the version the client saw last is the same as the one on the server. If you’re out of date, sync client side and then try again.
For SQlite with syncing, I’ve heard good things about PouchDB
I've looked into remotestorage.js, though their integration with Dropbox / Google Drive is via the API, not via the filesystem, which requires you to create or trust an OAuth app to access it.
But the unhosted way is exactly what I wish we'd have. Alas, there doesn't seem to be much movement in that community.
Haha, I'd just commented a link to that comment further up, and then scrolled down and seen this.
In theory, I think it should be possible with SQLite, or at least an SQLite-like interface — there's a list here of all sorts of SQLite-based distributed tools that have sprung up recently with the rise of "local-first" as a slogan. But I think a lot of those options are either mainly focused on scenarios where they control how things get synced over the network (i.e. not via a filesystem), or are very low-level, or are just experimental ideas that haven't been updated in ages.
Transactional filesystem could be a very interesting idea, I don’t know of any. Maybe there is some FUSE one?
Some of our tooling like in-house source control moved away from traditional client<->server communication, and started relying on NFS over (v)LAN, and features/utils of the filesystems like ZFS and NTFS. Backups and rollbacks are no longer dictated by the central authority, instead they rely on copy-on-write and snapshots, while metadata is stored in the SQLite database. On each local machine there is a daemon that agrees on locked files, pulls down upstream changes, tracks modifications (via inotify or ntfs journal), or schedules “commits” through merge queue to avoid conflicts.
This is not a bad idea. I already do this with my password manager database (keepassxc), and my notes. Problem is some things have their own internal structure and don't quite work when subjected to file based synchronization that is syncthing. For example, if you push to two different git branches on two separate instances of a git repo, then conflicts might occur when these two instances are synchronized by syncthing, while git itself would be able to resolve these kind of conflicts just fine.
Would be nice if syncthing allow extensible conflict resolutions. Right now I already have to manually merge my keepassxc databases from time to time, would be nice if that can be automated.
This is something I've been quite interested in. I have some ideas I've been pondering on for a FUSE filesystem that performs syncthing-like syncing under the hood, and uses content-addressing for tracking state between nodes.
I'm a self hosting enthusiast, but I still agree with this sentiment. Syncthing has a lot of reliability and usability issues that are partially resolved by hosting a central sync server. My gf loves syncthing too, but some glitches over the years have made her very reluctant to change any settings without my help. The central server helps avoid sync issues, but the damage is done.
My take is that for self hosting to be successful, it really needs to be built on appliances and common protocols. Otherwise, the whole concept ends up drowning in fragmentation and complexity. Theoretically, you want self hosting to be able to do anything a cloud host can do, but there's this really strong constraint on the user side for it to just be plug and play. Most people don't want to be constantly tinkering with this thing that they bought to replace Google or whatever. I'm a self hosting enthusiast and I don't even want to do that. I've done it for a long time, but at a certain point I want it to just work. I'm willing to do some maintenance, because it's something I'm enthusiastic about, and I want it to be accessible to more people, but really, I'm just having fun.
That said, I don't really know what this common protocol / appliance would look like right now. My set up speaks webdav MQTT ziggbee XMPP SSH like it's a lot of different protocols and I'm not completely finished with the setup. This is just next cloud, home assistant, and jabbered, which is my MQTT broker.
I tend to prefer unison these days, because of the offline-first approach.
When you put unison in a cron job, it works quite similar to syncthing, but personally I run the sync manually on demand. It's way better for battery life and allows for partial syncs - useful if you're on a 2gb data plan (like me) and just need to quickly sync a few files from your other device.
(How) do you use Unison on Android?
Cute idea, but anyone who has had to untangle a bad git conflict knows how complicated synchronizing "just files" can be...
More things could be just files
You just rediscovered the Unix approach ;-)
I was thinking the same but then realized how horribly painful Android and iOS make dealing with files.
I think the other threads here are more on the mark: applications using truly open protocols with interchangeable hosts so that you can choose where you want to store your stuff (including hosting it yourself). Direct peer to peer transfers would be nice too, of course, but I think we're missing ways to do that for non-technical folks.