WhatsApp is untrustable
86 points by apropos
86 points by apropos
I mean, yeah - but not just in a technical sense.
Meta owns WhatsApp. This is the very same Meta that installed snooping malware in a number of their Android apps.
Regardless of exactly how WhatsApp is implemented, this was enough for me to abandon it altogether, and write up a short blog post to send my WhatsApp contacts explaining why I was leaving.
+1; I am not a Meta hater, but you'd be a fool to trust that company given it's track record
Anyone remember Onavo? I installed it when I was too young to know any better, to “save up to 80% of your data” on my 500MB/month plan.
I haven't, so I looked up their Wikipedia page.
In an email dated June 9, 2016, Facebook CEO Mark Zuckerberg directed engineers at his company to find a method of obtaining "reliable analytics" about Snapchat, which he noted that Facebook lacked due to Snapchat's network traffic being encrypted. The solution Facebook engineers proposed to Zuckerberg's directive was to use Onavo [...] It did this by creating "fake digital certificates to impersonate trusted Snapchat, YouTube, and Amazon analytics servers to redirect and decrypt secure traffic from those apps for Facebook’s strategic analysis." The program [...] was later expanded to include Amazon and YouTube.
Jesus fucking christ.
I investigated open sourcing Whatsapp while I worked there many years ago. It was too difficult to do for the iOS app for many reasons. This is somewhat mitigated by the fact that at least at the time you could decompile the JVN bytecode for Android to verify that there were no hijynx.
Not being able to look at the source code does not mean anything at all about it being incorrect or malicious.
Something being closed source does not mean everything it says is a lie. The inability for you to verify it does not mean that it is malicious.
Rhetoric like this does nothing to advance oss because it just comes across as hyperbolic conspiracy theories. There is no evidence of any of the claimed lack of e2e encryption just “I can’t see the source so it’s presumably a lie”. In the US that’s probably meaningless, but I suspect in the EU that would be violating all kinds of laws around accurate representation, and probably some privacy specific regulations as well.
Correct, which is why that is not what this post is saying.
I am concerned about the other way around. What does a service need to do such that I do not need to trust it? The answer, in the case of end-to-end encryption, is that its clients need to be open source and verifiably reproducibly buildable. I don't care about open source outside of that. The server can be proprietary, for all I care.
Something being closed source does not mean everything it says is a lie. The inability for you to verify it does not mean that it is malicious.
Technically correct (the best kind of correct), but in the Rot Economy, if you are not lying to your customers about what data your software collects on them, you are leaving money on the table.
Hi Lobsters. I wrote this short piece as exposition on a common misconception I see in the discussion of encrypted messengers like WhatsApp (or Signal). I find that people online (including here!) tend to put quite a lot of weight on whether or not a messenger is "trustable" -- it's my opinion that that's something to be minimized, and ultimately should not matter for any properly-designed encrypted messenger.
This post is about WhatsApp, and uses it as an example (the client apps are not open source), but it is more so about how to think about these kinds of things in general.
Meta: I noticed the discrepancy between lobsters cached summary and the actual website on "N minutes read" at the beginning . And it seems you're returning a random number for the minute, why? (I didn't check whether the word count is correct , but at least it doesn't change between page reloads)
I have always found "estimated time to read" scores to be funny -- there's a lot of variation in people's reading speed / comprehension time, so it seems like a useless metric (when not on something tracking individual variation, ex. a Kobo). So I thought it would be cute to put a blatantly random value in there.
I had never given it much thought but I recently read there are studies on the topic and I just dug into that more (let's be honest: I'm procrastinating).
Basically, the reading speed as characters per minute (for a given alphabet) appears fairly constant.
I've looked at some implementations and their default value is sometimes a good one, sometimes a bad one. Some expect localization for these values, but some use site configuration for it instead. I don't think most website owners set that correctly (it's clearly better to do it through localization rather than expect everyone to look at that).
On top of that, some implementations let you tweak the estimate for faster or slower readers. If fast is +20% speed and slow is -20% speed, you get +50% difference between the estimates, which is pretty huge. I prefer when websites give a range because I know the "fast" estimate is more relevant to me. On websites that give only one estimate, I have no idea if they didn't go for a slow reading speed setting (and, yeah, often I read the article in half that time).
TBH, this would probably benefit from a shared and maintained database (I didn't check if one exists already). It's probably easy overall, low maintenance, and beneficial to less common languages.
So, between poor values, computation based on words rather than characters, different languages and alphabets, and tweaking to fast/slow readers, there's a lot of room for poor estimates.
Basically, the reading speed as characters per minute (for a given alphabet) appears fairly constant.
What kind of texts did they study? Like, fiction, popular nonfiction, encyclopedias, children's books, deeply technical content? Was there a metric for handling pictures/diagrams, which can be a significant portion of the content of e.g. children's books or technical content?
My reading speed for gentle nonfiction has got to be 10 times faster than my reading speed for a research paper. Lobsters posts vary along this axis, so I'd expect the reading speed for them to vary significantly (albeit never reaching "research paper" slowness).
From this study paper at https://iovs.arvojournals.org/article.aspx?articleid=2166061 , they had to develop their own texts which are specifically designed for use in comparisons and they prioritized having the same amount of information to process across languages.
The content has an impact that is so huge that these rules are probably only barely useful if it gets technical and involved. This makes me think of how I'm often listening to the news article through text-to-speech nowadays but I also tried to do that for technical articles (even simple ones) and that barely works.
I also think having an open source client for these kinds of apps is a very good thing and I agree that Signal is the best choice.
But it’s not impossible to learn how WhatsApp works under the hood. One can analyze it using a tool such as Ghidra or IDA Pro.
Can somebody answer this naive question - yes, Signal is open-source, but how do I confirm that the same open source code is the one that's installed through the iOS app? Last I checked, it's practically impossible.
On Android, you at least have the option to build it yourself and install directly - so that's better. That being said, I doubt the majority of users do that.
In the end, isn't it practically unverifiable for the majority of users?
First: I don’t expect regular user to be able to analyze a smartphone app using IDA Pro. If they want to, they can go to Github though and read the source code of Signal. That’s still practically unfeasible for most people – including me, since I’m no expert at that kind of stuff – so we still have to rely on experts for performing the analysis and alerting the public if they notice something fishy.
For the question of reproducible builds – see this sub-thread.
When one says Signal, he might mean one of three things: the protocol, its implementation or the service.
Complaints about terms of service prohibitimg third-party clients apply to the service, for example, while the praise of cryptographers is usually directed towards the protocol. I'm sure someone had a look and audited the implementation (client side) too, but it's a routine code audit, Signal's implementation is not anything special compared to any other security focused app.
Going from design-level security analysis of the protocol to a concrete service is a long way. The protocol does not dictate every detail of the implementation or service, there's many gaps you need to fill in along the way. It's not impossible to have a great protocol that gets implemented in a bad service. So there's no contradiction per se here: cryptographers who praise the protocol and the critics of service may both the right.
I’m critical of Signal on several grounds. Their desktop app sucks. They require a phone number to register.
They also will log you out of desktop app if you uninstall signal from your phone. A secure messenger that uses phone login and requires you to have and keep online an Android/iOS phone shouldn't be taken seriously, even if we believe they implemented it right.
shouldn't be taken seriously
I'm not a fan of this rhetorical turn of phrase. It obfuscates the claim that you're making, which makes it difficult to discuss.
It just seemed boring to go into the details of why phone number login is bad. I'm sure you guys can fill in the gap, and one reader's idea won't differ meaningfully from another's. If we discover it's not that trivial or there's disagreement, we can talk about it down the line, but I my idea was the trivial one and I really didn't feel like spending more text on it.
The dry reading of it can go like this: the use of phone number and SMS registration has serious security and privacy implications (not clarifying what they are, assuming you fill the gap 🥱), and the confinement of user into Android/iOS harms usability and freedom — which was already not great due to poor desktop app. A technical user who shares my sentiment may reject these outright until it's shown that these were either necessary or that Signal brings something so valuable to the table that these are worth tolerating. And mind that there's Matrix without these issues, so it's clearly not impossible.
Had you found it difficult to discuss before my explanation because you saw too many compelling interpretations of my message or too few?
Had you found it difficult to discuss before my explanation because you saw too many compelling interpretations of my message or too few?
My complaint is that "[signal] shouldn't be taken seriously" is a claim about what readers should be doing, rather than (what it sounds like was intended) a claim about the properties of the signal software. Not only am I aware of the arguments that signal is overly reliant on a phone number and Android/iOS, I largely agree with them! However, I think a discussion of where signal lives in the tradeoff space (specifically trading off security & privacy for usability, and the various dimensions wherein) is kicked-off best with technical claims about the subject.
The crux of my issue is this: though I am in agreement with your claims about signal, I don't think it's fair to frame that disagreement as signal being "unserious"
This applies to Signal as well, unless someone can verify that the publicly available source code compiles to the exact same blob people are installing from their app stores and as far as I know you can't.
Signal also has been historically actively hostile to alternative clients, which also makes one wonder why, since that would e a good signal about the solidity of their e2ee claims.
These days there is Molly as an alternative client, I don't know what changed in the Signal instance, but a quick look at the Molly home page is sufficient to make the official Signal client look even more suspicious: https://molly.im/
This applies to Signal as well, unless someone can verify that the publicly available source code compiles to the exact same blob people are installing from their app stores and as far as I know you can't.
Yes, you can. I cover this in the article -- Ctrl+F for "reproducibly buildable". The document I link to in the paragraph on Signal contains the (official) detailed instructions for validating that the Play Store version of Signal is the exact same blob that the publicly available source code compiles to.
Signal also has been historically actively hostile to alternative clients, which also makes one wonder why, since that would be a good signal about the solidity of their e2ee claims.
This is the kind of attitude I would like to dispel with this article. The solidity (scare emphasis intentional) of their e2ee claims doesn't matter -- you don't need to trust Signal. You can trust the independent cryptographic community who has verified their (openly inspectable) implementation.
Nevertheless, the last footnote of the article may be of interest to you.
Are there ways of doing that for iOS? As I understand it, Apple themselves handle parts of the build process to optimise for a given generation of the mobile cpu. This would make binary attention quite difficult, I presume.
Looking into it just now -- apparently, Telegram provides reproducible builds. According to them, you need a jailbroken device, 1.5h of time, and about 90GB of storage space for verification, but they do do it. Kind of? Looking at what their success step says, it seems like they only make a best-effort to compare files, and skip those they can't? For encryption reasons (issues with FairPlay DRM?), it looks like, not optimization reasons.
Signal (perhaps understandably given these conditions, but disappointingly) does not appear to have taken the effort to make their iOS builds reproducible. This isn't very relevant to me because I run GrapheneOS (an Android distribution) but it would be nice for them to have an official statement on the matter.
(Telegram has serious asterisks I mention in the article. This probably counts as a win over Signal, though.)
This isn't very relevant to me because I run GrapheneOS (an Android distribution) but it would be nice for them to have an official statement on the matter.
I don't follow: if you use signal, isn't the security of signal on every platform relevant to you? Your communications can only be secure as long as all ends of e2e are secure, no? So if any of your peers use iOS it does affect your security imo
Perhaps this is also a reason they are so hostile against alternative implementations? There's so much ad-ware and spyware crap in the app stores that it's not an unreasonable position to take.
OTOH, if that were the reason, they could perhaps publish a list of blessed alternative implementations.
Yes, you're right.
I wasn't actually aware the Signal iOS client was not reproducibly buildable before writing this article. I looked at and read through the Android and desktop client reproducibility guides, and assumed iOS was much the same, which it isn't. This does indeed mean in cases where you're talking with someone on iOS, you have to trust their iOS client -- you don't get the same type of guarantee that nobody can decrypt your messages as you do when messaging between two other clients, that lets you not need to trust Signal. (Contingent on your devices not being hacked / the public source code being secure, but that is not related to trusting Signal.)
Trusting Signal here (in that their iOS blob matches their source) is a very reasonable thing to do, of course, and this is realistically completely fine wrt. security -- but it does mean you need to trust them, yes. This is much less trust than you need to give WhatsApp (as they do not provide source code) but it is still some trust.
(I've updated the article to mention Signal's lack of reproducible builds for iOS, to emphasize the reproducibly buildable part, and to change one or two uses of "trust" that were inconsistent with the rest of the post)
TeleMessage isn't an "alternative client", it is a service designed specifically with the goal of leaking messages (leaking into their servers which were supposedly to be safe I assume) on purpose, with the full knowledge of people installing it (and probably paying a huge sum for it), so it is irrelevant to this discussion. Also Signal's anti-alternative-clients stance hasn't prevented the usage of TeleMessage, so it can't really be used as an argument.
I didn't know about Android reproducible builds, so good to know they have that. Maybe I was mislead by Molly's homepage claim that Signal includes "proprietary blobs". Still, if it is true that they have proprietary blobs, even if those blobs are published in the git repository, that's somewhat weird.
I assume the proprietary blobs refer to the Google Play Services library used for push notifications.
Via the same logic, iMessage is untrustable, and iOS is untrustable, and you are running Signal on iOS, so you can't trust anything, enter Ken Thompson's masterpiece
That logic is perfectly valid. That's one reason why the current duopoly is terrible and we need alternative (and open!) mobile platforms.
Worst of all, your CPU is untrustable. We know they run an entire hidden OS in there, totally opaque and undetectable to the user.
I’m afraid I don’t follow. What OS is that?
I assume this is a reference to Intel's "Management Engine" and AMD's analogue whose name escapes me just now. On Intel chips, at least since Skylake, it's Minix.
(I think I've heard of a similar facility in most ARM chips too, but the Intel one was most memorable because of some vulnerabilities and heavy reporting on the same.)
Also, on mobile devices, the baseband processor is an entirely opaque separate CPU/OS that controls the cellular radio system. It’s typically a bus master, meaning it can mess with the main CPU/OS. Look for “baseband modem exploit” to see whatever the latest horror is.
Moreover, if you are running Signal on Android with Google Play Services, they have insane amounts of access short of GrapheneOS-level sandboxing, and without Google Play Services last when I tried using Signal it was trying to suppress metadata leaks in a way that instadrained the battery.
Yes if a primary motivation for using a messaging app is e2e encryption then the client must be open-sourced. Signal is an obvious option at that point.
I'd make a stronger statement than that. If people believe/expect that their messages on a messaging app are private, they will act like they have e2e encryption even if that's not a primary motivation for the app. And if you don't actually have e2e encryption, you could learn very unpleasantly about that when there's a change of ownership in the app, even if all you asked for was "privacy" and not a specific mechanism.
And in order for e2e encryption to work, there needs to be some server-independent means of verifying that. A source audit seems like the most straightforward one to me, but I could imagine ways of using attestation to help communicate the results of that (and the fact that it covers the current version of the clients) to users who don't have the means to do that themselves.
Two other problems are:
I know from a fact Meta has strict internal data-access policies, but I treat WhatsApp like any other online service: you can’t guarantee that someone with privileged access won’t abuse it. That’s a risk I’m ok with for everyday chatting, but it’s also why I don’t put truly sensitive stuff into tools like Todoist or Notion :)
A few years ago I sketched a “bring your own keys” setup where the service never holds the real authority. You do, and you can revoke access whenever you want. In theory it could work for almost any website (basically "use this key so my data stays protected even from you"), but realistically and working on an auth provider, I don’t see it getting real adoption in our industry.