Password reuse is rampant: nearly half of observed user logins are compromised
25 points by gmem
25 points by gmem
The point’s been raised elsewhere on social media, but just to mention it here: Cloudflare appears to have got this data by spying on cleartext usernames and passwords passed through their infrastructure between web service users’ browsers and their clients’ servers, and then analysing the data they got from that spying.
The ethics here are … questionable. Cloudflare, you will recall, claims to be neutral internet infrastructure. Imagine the phone company putting out a press release one day announcing how many phone calls included someone entering a credit card number over touch-tone on unsecured lines. (Okay, that doesn’t happen very much any more. Imagine it had happened in the 1990s.)
I know there are plenty of good arguments to avoid Cloudflare, but I don’t think I buy this one.
If I understand correctly, this is an entirely optional security feature. The website owner can disable this. Of course users might have a problem with it, but it’s nothing new that users must place their trust in website owners’ decisions.
As a user, do I have a problem with the fact that the website is running on a VPS in the cloud where the cloud operator could easily steal my passwords? Not really.
Or if you own your hardware and run your site from your basement, then do I have a problem with the fact that your website is being run on a compromised Gigabyte motherboard? Actually yes.
Anyway, these security-related decisions are very nuanced. I don’t think it’s necessarily an ethical problem for Cloudflare to offer such an optional feature.
Yeah. This is pretty much exactly what I’d expect from them, and you can read further between the lines with a standard-issue tinfoil thinking cap on.
Maybe you can read it as a warning notice. “If you use plaintext auth, any middlebox can compromise your service. See, we already did!”
It’s always the same issue: people deploying websites without knowing how things work.
If your SSL/TLS termination point is not on the same infrastructure than your application server (different provider, different region of the same provider), anyone between that termination point and your application server can read your users passwords.
This calls for two technical decisions:
I agree with your broad point, and with the general vibe that Cloudflare is bad for the internet, but it’s not that simple. We don’t know what percent of these connections negotiates TLS to the real backend.
At least when you use cloudflare for DNS it suggest by default to “proxy” your traffic through their CDN and with the auto-TLS, anti-DDoS, statistics etc… You just came for DNS and have all those features by default, which you need to explicitly disable if you don’t want their TLS backend.
So I would think a majority of users there have it enabled.
Genuine question: who’s coming to Cloudflare “just” for DNS?
Most people would use Cloudflare for its CDN features, but it wouldn’t allow them to do what they are doing here.
The key I think it is that they offer such a complete, stable and easy to use feature set that once you start using one it’s very easy to add additional features. Oh I use the CDN and if I add the DNS too I have all those other features for free.
I’m confused - in this comment you’re saying “most people would use Cloudflare for its CDN features”, but in grandparent you were talking about people using Cloudflare for DNS. Which is it that you think is popular and leads people to use the other?
Also, using Cloudflare just for its CDN features would absolutely allow Cloudflare to do what they’re doing here. (Unless you mean just pointing cdn.example.com
at Cloudflare for static assets, instead of putting example.com
behind a CDN?)
Also, using Cloudflare just for its CDN features would absolutely allow Cloudflare to do what they’re doing here.
Would your front-end send a POST request to Cloudflare’s CDN? The cached assets would definitely from from their CDN, but the HTML form (or JS for that matter) wouldn’t send the login request to the CDN server. Am I missing something here? Happy to be wrong and learn something.
Which is it that you think is popular and leads people to use the other?
It doesn’t matter which features drags you first into their world. My point is that once you use one of their feature, the feature set is very well tailored and easy to used, so you easily give more of your infra for a lot of convenience. It doesn’t matter whether that’s DNS, CDN or their DDoS protection.
Would your front-end send a POST request to Cloudflare’s CDN? The cached assets would definitely from from their CDN, but the HTML form (or JS for that matter) wouldn’t send the login request to the CDN server. Am I missing something here? Happy to be wrong and learn something.
“CDN” is just a fancy semi-defined-but-not-really word for “layer of geographically well-distributed caching reverse proxies”, and the answer to your question depends on how you set it up. Say you have a cache-busted SPA build that you want to serve and run on your website, example.com. Your SPA has an entry point main.ak6j3p.js
which is referenced in your HTML and at some point will make a POST request to your app server at https://example.com/api/graphql
. Where is this JS file hosted? That’s the key question. (Note that I’m picking a SPA because it makes the choice more obvious due to implying a lot of static assets for which caching is important and useful, but this tradeoff also applies to server side rendered apps.)
The two main options (exact path layout bikeshedding aside) laid out in grandparent are https://cdn.example.com/static/js/main.ak6j3p.js
, or https://example.com/static/js/main.ak6j3p.js
. These look very similar but they are actually very different. The former (cdn.
) case implies that you have a very clean separation between static assets and dynamically rendered responses - your SPA bundle, etc. goes on cdn.
, which at the DNS level points to the CDN, and your GQL POST requests go to example.com
, which at the DNS level points to your app servers.
But maybe you’re lazy, and you don’t want to maintain a clean separation between static assets and everything else. You’ve already tuned some caching headers for your SPA bundle, let’s just throw a CDN in front of that bad boy and make everything faster, automagically. This is the https://example.com/static/js/main.ak6j3p.js
case. In this case, you want the caching benefits of the CDN, so example.com
MUST be pointed at the CDN at the DNS level. But that implies that your GQL POST request is also going through the CDN! It won’t be cached, because POST requests aren’t cacheable, but it’ll still go through the CDN. This isn’t an unreasonable or, I suspect, uncommon setup.
But if they didn’t terminate the TLS connection, TLS would prevent them from looking at the passwords. So, my understanding is that none of these have TLS to the real backend.
Just because you’re terminating TLS doesn’t mean you can’t negotiate a new TLS connection on the backend. See this page.
A new TLS connection is not relevant to the point that Cloudflare has access to all the traffic, though.
Definitely not. I was replying to this specific point:
If your SSL/TLS termination point is not on the same infrastructure than your application server (different provider, different region of the same provider), anyone between that termination point and your application server can read your users passwords.
It’s perfectly possible that some significant portion of the people deploying these websites did not, as great-grandparent put it, “[deploy] websites without knowing how things work” - and because they knew how things worked, turned on TLS to the backend, making a conscious tradeoff to trust Cloudflare with this access but securing the rest of the communication path.
Again, I think Cloudflare is generally bad for the internet and probably pulls some really shady shit as a business. But let’s not pretend their feature set isn’t valuable.
I’m curious if you think that this data is more sensitive than anything on radar.cloudflare.com. I’d assume they do something smarter for analyzing password reuse than collecting all sent passwords in a giant list in cleartext. I also don’t see any indication that usernames were involved let alone associated with passwords.
This seems to be contradicted by the text in the article?
When we perform these checks, Cloudflare does not access or store plaintext end user passwords. We have built a privacy-preserving credential checking service that helps protect our users from compromised credentials. Passwords are hashed – i.e., converted into a random string of characters using a cryptographic algorithm – for the purpose of comparing them against a database of leaked credentials
Cloudflare does not access or store plaintext end user passwords.
Passwords are hashed – i.e., converted into a random string of characters using a cryptographic algorithm – for the purpose of comparing them against a database of leaked credentials
When they say they are “hashing passwords”, that means they started with a plaintext password and fed it into a hashing algorithm.
Perhaps “Cloudflare does not access or store plaintext end user passwords” in the sense that they don’t reach into their customers’ databases and extract plaintext passwords, but they do definitely have access to plaintext end user passwords. The service people pay them to provide requires them to have access to plaintext end user passwords. If they quietly pretended that they don’t (in the same way the postal service promises not to open letters you send, even though they could), that would be one thing, but evidently they believe it’s within their rights to examine and analyse these things as they’re passed along.
Cloudflare once again reminds us that it’s operating the largest sustained MitM attack of all time.
Orientation at the company I work for nearly forced us (very politely strongly insisted) that we make completely new passwords for company accounts because breaches had occured multiple times from passwords leaking out of employee’s other unrelated accounts which happened to share the same password. It was apparently their most common vulnerability.
Before Microsoft instituted internal universal 2FA for everything it was easy pickings for the red team to see how many logins they could crack with the password “seahawks2014”
I’m pretty sure Lobsters recently saw a couple of account takeovers by a spammer and have an open feature request. If someone has implemented this in Rails before, I’d really appreciate expert advice.
This is of course not an attack on anyone, or an ask for anyone (Lobsters/HIBP folks both) to do anything different, but: deeply ironic that the HIBP database has a Cloudflare sponsor shoutout at the bottom of the page, given the ethical concerns folks are raising in this comments section so far.
Related post from Troy Hunt on why it uses CF.
The problem is that - in my experience - CF are too easy/good against bot spam to not use them. Fully knowing why you shouldn’t. Especially in the light of AI scrapers.
This is the most important comment here! And not just because of lobste.rs but to let people know that they should all do this in their websites! Check against HIBP!
I recently changed the “new password” functionality in mox to only have a button “Generate new password”. You don’t get to choose it. At least by default, the admin can configure per account that users are allowed to pick their own passwords.
This was after a feature request for having unguessable usernames (instead of email addresses as usernames), to prevent brute force login attempts. If you don’t trust people to pick safe passwords, better do it for them!
Users with password managers won’t mind. Users without password managers may start using them, or use “login with email” (though that doesn’t work for mox, a mail server). Hopefully users won’t store the generated passwords in a cloud spreadsheet, though that would probably still be safer than reusing passwords.
Even strong passwords are phishable. At this point, might as well implement passkeys so your users can log in with simple biometrics instead of having to jump through hoops managing the password.
True, generating passwords helps against password reuse (rampant apparently), but not against phishing.
I’m not using passkeys yet. Have read about it, but need to find the time to understand all the details and implement them. But first figure out how to use them properly on Linux, there doesn’t appear to be a unix system API for managing and using passkeys.
You can’t use passkeys yet with IMAP/SMTP (submission). Work has just started on a SASL authentication mechanism for passkeys, and a SASL mechanism for reusing a previous authentication token (don’t want your email client to ask for confirmation every time the clients makes a connection!). There are challenges around registering the token for use in the IMAP/SMTP context too. Also see yesterday’s IETF kitten workgroup agenda, https://datatracker.ietf.org/meeting/122/materials/agenda-122-kitten-00.
In mox, you set the password in the web interface. Brute force login attempts happen mostly over SMTP and IMAP.
I am not saying you should but, if you are using Cloudflare and they are terminating TLS for you, you can use the information about compromised logins from them to ask users for further authentication and to ask them to change their passwords which is a pretty useful feature. See https://developers.cloudflare.com/waf/detections/leaked-credentials/.
Can’t wait for their next post drop: “And here are the most common un-compromised-yet username and passwords”
While I agree with the message of this post, there’s one critical missing detail: What is the proportion of bot logins vs human logins in the dataset?
If half of the bot logins are successful, but 99% of those are just different bots hammering the same credentials into the same login page over and over, then that skews the data. Maybe 50% of successful logins are bots, but only 2% of accounts are actually compromised.
Again, I’m not disputing the message. But that detail is pretty important. A more useful statistic would be the percentange of actual compromised accounts.
I’m not sure I support the word “spying on” here, at least in this case. More “the website owners willingly and maybe knowingly let a third party (cloudflare) be in a position to be able to read the data in plaintext”.
CloudFlare then violated some unspoken agreement to not do anything but forwarding that traffic.
Not sure how to phrase this without sounding like I’m trying to defend CF, which I am not. But my gripe is actually more with the people who handed the data to CF in the first place.
Sure not spying on admins, but it’s spying on unsuspecting users.
How do I know if my data was used here? I certainly didn’t consent.