I Just Want Simple S3
73 points by kwas
73 points by kwas
I use rclone serve s3. It's imo a lot easier to use than minio was and uses less resources. Assuming it has all the features you want it may be a good fit.
Documentation: https://rclone.org/commands/rclone_serve_s3/
I thought the attraction of S3 was “infinite” scaling, in several dimensions, and redundancy. I don’t really get running it locally, except for integration testing. Would be interested to hear your use cases!
A lot of tools support the s3 protocol for a more structured object storage. I use minio as a backup target for restic. Additionally, minio has many of those scaling and redundancy capabilities.
There are many ways to get that infinite storage. AWS is by far not the first to tackle that problem. And likely won't be the last.
The thing with S3 is mostly that it turned into a (horrible) protocol that everyone speaks as a "read and write something over the network" protocol.
S3 the protocol has effectively nothing to do with that "infinite" storage. Amazon could give you the same thing as WebDAV, SMB Share, FTP, 9P, etc.
But since it became defacto "standard" offering access to storage though that protocol without having to pay a lot to Amazon for cheap storage is something that people don't like to put up with. So you have companies offering S3 compatibility with and without the same guarantees. Eg. Backblaze and Hetzner make S3 storage available. To my knowledge both have the same or similar guarantees as Amazon just way cheaper.
And then of course you might run your local instance. Maybe you only need a small storage server with 100TB and back that up daily. Or you build a local or distributed cluster. There's probably hundreds of ways to do this nowadays. Just a lot not coming with S3 compatibility out of the box.
And some of these tools provide means to go essentially Infinite by just putting a service on an empry server.
It's really nice there is competition on that field. Twenty years ago all of that was niche and a lot of implementations of distributed storage were just devs that felt like implanting erasure coding, which has been around since 1960 and been widely used in consumer Hardware since at least the early 90s. So of course everyone wanted to use that as the basis for storing data across systems in an easy manner (see "A Parallel Interleaved File System" from 1988). But just like it was a hassle to do stuff when nobody could agree how to handle state before cloud providers forced you, nobody could agree on how to access files/objects until cloud providers forced you to.
So in short.. All about standards.
Its a way to edge your bet. By using a small simple local server, you avoid paying the complexity tax of a fancy replicated scalable solution. And if you ever need that scalability, you can swap the small local server for the fancy solution. All without having to change how the persistence work, because it speak the same protocol. Also it allow you to move to the cloud and use a managed S3.
Downside is that you have to deal with the S3 API and its limitation. Depends what are your need now and what you anticipate in the future I guess.
I don’t feel super strongly but object storage is a much nicer, more reliable, more usable, more secure, more scalable interface than the file system. On the other hand that object storage doesn’t have hard links seems insane.
For cloud use cases it's true, but it's, albeit unintended useful for storage disaggregation where the average latency is much higher than a typical network file system (like NFS) expects. I've been using it in pair with my NAS to make a room for uploaded blobs and so far I feel like S3-like is best solution here.
While I get it and sympathize I find it funny that anyone would call that great big XML protocol mess that S3 "simple". Yes people mashed to create relatively simple tooling for it, but S3 is a huge big mess of a protocol and very fast from simple.
Yet more reason to applaud these tools though.
The pigsty guy claims to be continuing development for security patches
Sooo… FTP?
https://rustfs.com/ perhaps? Saw this recently on that: https://www.youtube.com/watch?v=KfuFJS14aQY&t
Last time I looked into it, it did not inspire confidence at all with things like a hardcoded token which allowed bypassing all authentication... https://github.com/rustfs/rustfs/security/advisories/GHSA-h956-rh7x-ppgj
This kind of thing is not a one-off, either, if you look through the github security page for it there's several more vulnerabilities of the sort!
The project is clearly mostly vibe-coded, and marketing it as a competitor to MinIO is irresponsible at best. I wouldn't touch it with a ten-foot pole, honestly.
The comment about Versity being little-known outside of National Labs is amusing to me. I suppose everyone has a moment when they first discover the extensive universe of commercially supported, open-source software that spun out of the DoE labs and is used almost exclusively by them.
The upside of these projects is that the companies behind them tend to be a few PhDs who took a tech transfer option out of the labs because they wanted to work on their pet software project even more full time. These are some of the most passionate little software companies I have ever seen, engineers who have never worked in the mainstream software industry and are just happy to make a salary for working on their favorite project. I can't think of much that's more "enshittification-proof."
I'll go further, I have never heard about Versity, I don't know what you mean by National Labs (is this a prper name, or an abbreviation?) and I don't know what the E stands for in DoE. Energy? So the US? (Yeah I could go search now, and I'm not asking you to answer, just think about this response)
So then maybe this is so us-centric that 90% of the planet's population working in tech have zero touch points.
Not so much US centric, just science centric. I would expect someone who knows what that Max Plank institutes in Germany are, for example, to also know the US National Labs and that the Department of Energy provides much of their funding.
I'm definitely using jargon here, but I didn't think it made sense to explain the entire context of the Second World War, large-scale collaboration between academic institutions and the US government creating research centers, the development of the Federally Funded Research and Development Center model to fund these institutions, and the renaming of most of them as "National Laboratories" in order to create more consistency. At least not in a Lobsters comment.
But if you work in areas like HPC, they're some of the most prominent organizations in the world, so you will end up interacting with the many software projects (both open source and commercial) that they fund.
Just wait until someone brings up the MITRE Corporation...
I'm a fan of off label usage of the docker container registry. Easy protocol easy setup content addressable store for blobs, then use sha1s in place of blobs everywhere else.
I'm a fan of using object storage as an off label container registry. It's like what you said but opposite. Just store blobs and let docker/podman push and get them back
I like this too, and soon with things like podman artifact, I'm not sure I'd even consider it off-label.
Write files to filesystem tree. Do it immutably (that is, don't modify existing files), run rsync in a loop from every node and to every node forever. Every node is eventually consistent.
If this ever stops scaling I'll just start sharding.
Everything else is too complicated.
If this ever stops scaling I'll just start sharding.
I suspect this is gonna stop scaling very abruptly around (1/N^2) utilisation, where N is the number of nodes.
That is, if you have 10 nodes, expect it to fall over at around 1% of what a single box can do.