Internationalise The Fediverse (2024)
16 points by hongminhee
16 points by hongminhee
I do agree with this post, but it doesn't really grapple with homograph attacks seriously.
Exactly!
I doubt most people would notice when a capital i is replaced by a lower L - and vice-versa.
Maybe some people won't notlce, but you can notice it if you look closely (you probably realized I typed notlce there instead of notice once). That is not the case with o/о. If you see a post by @cоsarara you will assume it's me, but actually that o was a cyrillic o, so it's a totally different username.
"capital i" as in Italy vs ltaly.
@cоsarara would be solvable by marking "minority" blocks (as in: the minority in that specific field) in a distinctly different color. cоsarara, but more noticeable.
Thought I'd code up a more complete example: https://ifkee.de/~patrick/experiments/2026-02-visible-homographs/
I find this to be so much common sense, that I've never even imagined that there are people out there opposing Unicode text in the Fediverse. I already had a terrible opinion on the decision making capabilities of the Mastodon team, but not even them I would expect to have been opposed to something like this.
I have no horse in this race, because I don't use any Fedi stuff, but (despite coming from a very much not Anglosphere country) I prefer IDs/handles being ASCII. Beyond malicious attacks (which the post skims over pretty quickly), just the users trying to be whimsical can affect how others are able to engage with your profile/project.
I'm thinking of stuff like either having to bookmark fancy handles, or have the letters present on your keyboard, or switch to another layout just to type in an ID. And then all your other apps that perhaps integrate with this service needs to be fully aware and able to handle UTF-8. Not usually an issue, but it's still a far greater scope that needs to work than the couple dozen letters of ASCII.
Having all "machine/URL-oriented" data be ASCII just makes things easier for everyone. And then, you can have display names, and descriptions, and posts, and whatever else be UTF-8, and allow people to write in whichever script they please.
or switch to another layout just to type in an ID.
I mean, yeah, that would be really annoying. That's what people who use other alphabets are complaining about. Some people don't have ASCII on their keyboard.
I doubt most people would notice when a capital i is replaced by a lower L - and vice-versa. Similarly the kerning issue of an r and n looking like an m is well known.
With decent fonts and rendering, neither of those is a problem.
Are mixed language homographs more dangerous? I don't think so.
Amusingly, the author’s own name has several homographs: Τеrеnсе Εdеn (those are Greek Ε & Τ, and Cyrillic е & с).
On the one hand I do agree that allowing pretty much anything should work; OTOH I still experience issues with escaped or non-escaped entities and characters in various software. It seems safer to be a bit more conservative.