Automating What Backblaze Lifecycle Rules Don't Do Instantly
15 points by Tymscar
15 points by Tymscar
Rclone is awesome! I've been using it for over 5 years, in daily cronjobs, with zero issues. I love their encrypted remote setup. A true set it and forget it tool.
I had this exact lesson learned the hard way for my photo backup service. Running a health check that writes the exact 10MB file every 15seconds to see if the S3 service is still there or if we need to hot failover to the other (Hetzner). That was I guess the most expensive health check I ever had running.
The worst thing is that if you don’t use the b2 native api or tooling you don’t even see this, because the S3 api will straight lie to you regarding file presence or storage usage. It is even worse, because the default lifecycle settings treat delete as edits (hides), which meant files from expired trial accounts where never actually deleted. Only way this was detectable was through their monthly invoices and in the web console of backblaze.
Having now to cleanup millions of hidden file versions, I checked the b2 api several times, because I couldn’t believe there is no bulk delete option like for S3 api where you can at least delete 1000 keys at once.
Long story short, the cleanup job for 25 mio items ran over 2 days using gnu parallel.
Yeah, honestly whenever I do anything with online services, be it backup, or just AWS stuff, I am always so scared that some weird hidden thing will suddenly cost me thousands over night.
I’m really happy with restic -> B2 for backups. Just yesterday I set it up on my father’s laptop using the restic CLI in macOS Shortcuts, bringing his monthly bill down from $10 (unlimited computer backup) to 25 cents. I have the same setup for my laptop but also back up my homelab to the same repository so they dedup together.
I’m also using “Keep only the last version” since restic handles retention, so this is good to know! They should make this clearer.