Backing up Spotify

112 points by Aks


dzwdz

Scraping Spotify is based but I also disagree with every point in their reasoning.

Over-focus on the highest possible quality. Since these are created by audiophiles with high end equipment and fans of a particular artist, they chase the highest possible file quality (e.g. lossless FLAC).

Yes, that's great for archival.

This inflates the file size and makes it hard to keep a full archive of all music that humanity has ever produced.

No, you can just transcode media. I believe private trackers incentivize creating torrents for both a lossless and a "decent" lossy version of an album too, for instance.

No authoritative list of torrents aiming to represent all music ever produced. An equivalent of our book torrent list (which aggregate torrents from LibGen, Sci-Hub, Z-Lib, and many more) does not exist for music.

Private trackers? I do agree that they're not perfect for preservation: they're, well, private, some rules are ridiculously strict (why the hell do you insist people use their home IPs on a website dedicated to commiting felonies?) - on the other hand I heard the incentives for long-term seeding are good... but I doubt that's necessary [1, 2].

Still - I've (sadly) used Spotify enough to see the tail of badly tagged albums, missing tracks, etc. I would absolutely not call a Spotify scrape authoritative.

Over-focus on the most popular artists. There is a long tail of music which only gets preserved when a single person cares enough to share it.

...is also hilarious considering how many artists I know aren't on Spotify, with even pretty popular ones removing their music from the platform [1, 2, 3]. Just yesterday I've noticed half of the tracks from an EP I used to listen to a lot are missing.

I also don't really see the issue with the "only gets preserved when a single person cares enough to share it" part. That's how most pirate libraries function, and they're doing great. It can be spun as a positive - there's a lot of AI slop on Spotify nowadays, and the "someone has to care about it" step filters it out. Not that it's a high bar either. I've heard of people buying obscure records and ripping them just to get some buffer on a private tracker.

For popularity>0, we got close to all tracks on the platform. The quality is the original OGG Vorbis at 160kbit/s. [...] For popularity=0, we got files representing about half the number of listens (either original or a copy with the same ISRC). The audio is reencoded to OGG Opus at 75kbit/s

Jesus. So we're re-encoding from one lossy codec to another? That doesn't strike me as a great idea.


For now this is a torrents-only archive aimed at preservation, but if there is enough interest, we could add downloading of individual files to Anna’s Archive. Please let us know if you’d like this.

Please don't. The last thing we need is RIAA on your ass. Anna's Archive is absolutely amazing for for ebook piracy, but we already have much better alternatives available for music (that are just as easily accessible, but superior in both quality and available material).

This makes me think of the "free lending" blunder archive.org made around the lockdowns (IIRC). It put them in a lot of unnecessary legal trouble, threatening their other, IMO much more valuable, endeavors. Obviously the situation here is a bit different but I'm still uneasy about this whole thing.