Gnutella: A Protocol Outliving the World That Created It
99 points by rickcarlino
99 points by rickcarlino
The funniest thing about Gnutella is that it has nothing to do with the GNU project, they just thought GNU was neat so they put it in their name.
Gnutella was good at providing file downloads that matched search queries, and that is what history remembers it for. Loads and loads of easy downloads. Usually MP3s.
Ahh, yeah. That and the other thing. Thanks for the nostalgia trip, it's a shame the web has turned into the monster that it is.
Slight digression, but when I went to college in the mid 2000s my dorm had a flat network and itunes was both extremely popular and shared all your music with anyone on the network. I filled up my brand new, 64 bit(!) HP laptop from Circuit City with Evanescence and Green Day in like a week. Eat my shorts, Columbia House CD Club.
Memories of my dorm having a fileshare network that the IT department unofficially knew all about (because they weren't idiots) but officially knew nothing about (because they were cool). Until a few years after I graduated when someone mentioned something in front of the wrong person, and then they officially knew about it and had to shut it down.
Dtella, by any chance? i have fond memories of a brief stint on the release team of our university’s (rogue, IT-disapproved) Dtella network, probably the first opportunity i had to work somewhat closely with other tech people.
edit: ohh i just realized you are talking about the network itself, not “a flat network” (of eg file sharing nodes) built atop the university network. still, university was a fun time for this sort of thing~
university was a fun time for this sort of thing
My university has blocked off connections between different the dorm buildings this year, and I just couldn't talk the admins into not breaking my shit :( This was the year I was going to finally set up a proper DC hub for my friends too...
I fear this might become more and more common as centralized services displace things people used to run themselves. Most people nowadays don't have pretty much any use for accessing other people's computers on LAN, so there isn't much incentive to keep these networks "open".
That's a shame, back in the mid-2000s RIT had a massive DC hub on our residential network (which was also just generally excellent, you got a real routable IP in your dorm room, no NAT) and the IT department looked the other way because it kept us all from saturating the external links with bittorrent traffic.
Thanks for writing this. I often wonder how much of Gnutella's core tech could be re-used for newer small-web, Gemini-esque content discovery.
Many have wrongly asserted that Gnutella "failed", but that's not a fair representation of what happened.
...
Gnutella stood the test of time and solved problems for a software user that no longer exists. It's still there today, chugging along at reduced capacity.
"It failed" is an incomplete idea. You cannot succeed or fail generically. You can only succeed or fail relative to some goal. There's two big reasons that software user no longer exists, both of which can be framed as failures:
I often wonder how much of Gnutella's core tech could be re-used for newer small-web, Gemini-esque content discovery.
If you ever want to collaborate on something, feel free to drop me a line. I actually had a similar idea:
I would love to hear what ideas you might have in the gemini + gnutella space. I am pretty easy to find (Linkedin, Reddit, Fediverse, etc...) and have contact info on my blog.
Aside from the specific tech choices such as using gemtext, this sounds similar to what Hyphanet (née Freenet) did. Roughly speaking, each site is a bundle of static files, and each bundle is versioned and signed. So when you click on a link, the URL contains the public key, and it uses that to do a P2P download of the latest version of the linked-to page.
I think there's room for another system like this, but I'm curious what would set it apart—maybe with the Gemini angle it would be more minimalist?
I believe a lot of the authors of early Gnutella were inspired by the original freenet project. I have found references and callouts to it in older RFCs for extensions to the protocol (though I can't remember which off the top of my head).
I like the concept of a content delivery mechanism ontop of Gnutella because:
That being said, I must admit I assumed "old school Freenet" died in the 2000s. I will be re-visiting the project this weekend, thanks!
There's a gemini extension called gempub that could be useful for something like this.
I have sketched some ideas for a similar kind of network. PlanetP is another p2p search network that I think has interesting properties. I should look at gnutella again tho its been a long time.
Do you have a link? I found a research paper from 2002 – is this the one? https://scholarship.libraries.rutgers.edu/esploro/outputs/technicalDocumentation/PlanetP-Using-Gossiping-to-Build-Content/991031549992904646/filesAndLinks?index=0
A big part of why Gnutella took off when it did was thanks to Gene Kan and Spencer Kimball, both members of Berkeley's XCF.
Spencer went on to do a lot of great engineering work at Google and now is the CEO of Cockroach Labs, the database company.
Gene had an early success selling a search company to Sun. Unfortunately he died tragically and far too young in 2002.
OnionShare is pretty fun, too! https://onionshare.org/ You can be part of the DaRkWeB!
Does this provide a search overlay or is it only for the transfer part? Looking at the docs, it looks more like a direct connect file transfer tool.
It's really only solving the part where you try to connect to another host directly, but it does so while obfuscating your identity!
AFAIR from a university exercise long ago there are no good solutions for making P2P networks both reasonably fair and open. I'm sure you could use some kind of cryptocurrency scheme to pay and be paid for bytes. It ruins the simplicity pretty hard and the community feeling as well.
From my experience, it is very hard to get to position where you actually can contribute meaningfully. A lifetime ago I've left an anime movie on indefinite seeding on a VPS in screen for a month on a private tracker. In the following decade I did not manage to gather as much outgoing traffic as I did that one time. I think ratio is overrated. Some people can run a seedbox. Some can not.
My highschool seniors, years before that, bragged about their DC++ collections and how much music they've put out there.
I think one part that is missing about outliving-conditions, is that there was no real incentive to spam P2P search (and later P2P search spam became a thing)
Gnutella had a higher level of trust than later P2P systems, which made it easy to spam, though the incentive is probably lower now that there are likely only a few thousand users. This is probably the same reason there's little spam on IRC nowadays.
It's interesting how many parts of the protocol rely on trusting the client:
If the protocol added a bunch of key signing mechanisms or reputation management, it might have become too complicated to implement. This might have been a reason for its success. You can actually build a Gnutella client. A lot of modern P2P projects miss this, I think. Secure Scuttlebutt, which I love, comes to mind. They try to account for diverse failure and abuse cases and build something that is near perfect but end up creating an ecosystem that only has one functioning client (built by the spec author and no one else).
The same example applies to gemini:// (federated protocol rather than P2P). The spec has a bunch of problems and plot holes, but ultimately people actually built clients for the spec and there is a decent diversity in that ecosystem, despite the problems.
They try to account for diverse failure and abuse cases and build something that is near perfect but end up creating an ecosystem that only has one functioning client (built by the spec author and no one else).
Oh yes. One - for me - notable quirk was the content-hash was done over JSON, requiring each client to serialize JSON numbers (and everything else) exactly like that one specific NodeJS version.
Advertised file counts, bandwidth, etc.. is all essentially based on trusting the user to tell the truth. A more complicated protocol migth have tried to actually verify these claims.
Offhand my understanding is that Bittorrent has mechanisms to mitigate lying about bandwidth? I seem to recall reading somewhere many years ago that you can get away with overstating your bandwidth by a factor of up to 2x in a Bittorrent swarm but above that the other peers will eventually stop believing you about it.
I don't think BitTorrent cares about your bandwidth in the first place (at least v1). The closest it gets is in the uploaded key sent to the tracker (which counts the amount of bytes uploaded in this session), which the tracker could use to estimate your bandwidth, but I don't know if any trackers actually bother with this. opentracker seems to ignore it completely.
Hm. I thought they used it to try to prioritise whom to send rare chunks to first, on the assumption that you'll share them to the rest of the swarm.