"﷽" U+FDFD: ARABIC LIGATURE BISMILLAH AR-RAHMAN AR-RAHEEM (Unicode Character)
50 points by JordiGH
50 points by JordiGH
Wikipedia has a bunch of documents on the standardization. It was controversial:
But if this is encoded as a Unicode character, the door should be open to any word or phrase that is commonly used in some stylized form. (Coca-Cola?)
What carried it through was L2/02-163, which noted that all official documents in Pakistan have to start with the glyph. The document also has a picture of what the real-world glyph actually looks like. It definitely loses something by being flattened to a single line
Depending on the font, some of the other ligatures in that block hold up a bit better as long as you bump up the font size enough. For example:
ﷺ
That's U+FDFA ARABIC LIGATURE SALLALLAHOU ALAYHE WASALLAM.
It seems to me that this is the more definitive reason:
This ligature is also part of UZT 1.01, the national standard code page of Government of Pakistan for Urdu.
Unicode wants to subsume all existing encodings, so anything that has already been encoded in another encoding must be included.
I'm personally not persuaded by the other reasons - the requirement to appear in Pakistani documents nor the difficulty of Urdu keyboard users to type an Arabic ligature, because neither of those reasons require them to be one Unicode codepoint. Computer systems will need to be moderned to use Unicode anyway, and I don't see an inherent restriction that would require any ligature to be exactly one Unicode codepoint.
What a strange coincidence, I've spent the better part of the last two days preparing a talk for Rust in Paris on friday about... Unicode. There are actually many more such ligatures (all in the same block). What is surprising is that some of them do have a decomposition mapping, and hence are NFK compatible with their "spelled out" form (e.g. U+FDFA) but others like U+FDFD don't!
Pages 1 and 3 of this extract of the unicode standard have a list of them.
Apparently, it means "In the name of God, the Most Gracious, the Most Merciful".
Yes. It's the Basmalah. It's recited often and can be found at the start of chapters in the Qur'an.
Not a fan. I read it as: A religion has claimed ownership of Unicode.
Lol, no. Unicode's whole point is to reflect natural language (and then some). It's part of natural language, so it is included. Not that difficult to understand, whatever god you do (or don't) have is just fine.
I like to think that for every person that's mad that Unicode contains ﷽, there's another, opposite person mad you can write 🍆🍑💦 using Unicode.
Also, it's been in Unicode since 2003. The time for complaints has passed.
Were you aware that if you are willing to sacrifice part of the meaning (the patient[^1]) you can use only 4 bytes and one codepoint: 𓂺! Much more efficient and clearer in meaning!
[^1] Modern views would of course argue that assuming 🍑 is the patient argument of this sentence is only true in phallo-dominant-patriarcal cultures, but this is beyond the scope of this humble comment.
I think 🍆🍑💦 is more popular than 𓂺 because 𓂺 is very awkward to use outside proggressive tenses (like the present continuous in English), as it clearly implies that the action is in progress. So it's not as flexible. 🍆🍑💦 lends itself far more easily to poetry, for example.
Plus, since it makes the patient explicit, it has an interpersonal quality that is absent from 𓂺, which is why the use of 𓂺 has been largely confined to situations where the ellipsis of the patient argument is entirely deliberate because it's physically absent (although I guess one might argue that in this case it's actually a reflexive so duh).
I believe all of those should have been addressed in ways other than a character encoding but having code points for them in Unicode isn't harming anyone.
May I present the case of серафими многоꙮчитїи as an argument that it’s not just one religion? :)
(Jokes aside, there’s a procedure to suggest changes to the consortium, and people are using the procedure for intents and purposes they see fit. I’d say that making life easier for people in the fifth most populous country on Earth is very legitimate.)
Oh dear, unicode might have lost the plot.
I wonder if they'll actually fix the issues with japanese vs chinese kanji. The Han unification was a disaster, and Japanese were particularly unhappy about it.
The plot was always descriptive, not prescriptive, and fucking up Han unification is a great example of why. You wanna fix it, go ahead and start working on it.
It's a legal requirement in Pakistan that official documents start with this. It's purely for an administrative reason (End-users with an Urdu keyboard must be able to easily reach out to this ligature instead of having to switch to another keyboard layout).
That's... such an awful take. Why would you say that? It's bigoted and incorrect. Does the inclusion of ✝️☦️✡️ mean that those religions have claimed ownership of unicode?
Unicode is intended to reflect natural language, and this is it doing that. Sure, to the extent that religion is reflected in natural language, it's also reflected in unicode. But there's nothing here about a religion taking ownership of unicode.
It's a very common and effectively atomic unit of Arabic typopgraphy. Why shouldn't Unicode have a way to represent that as a single code point? Or other similar atomic units of Arabic? Or of other languages/scripts?
It's difficult to see any reason for this comment that isn't to promote racism. I'm glad to see people have already given thoughtful rebuttals; do not do this again.
I am sad to see so many bad faith reactions.
There have even been those calling me names over an innocent post, and notifying operators in a reply instead of simply not engaging or using the flag feature.
Where did all we go wrong to let the climate get this bad.
This character does not have a decomposition mapping. U+FDFA, however, has a compatibility decomposition mapping. The decomposition of U+FDFA is the longest decomposition in Unicode. In the ICU4X normalizer, I special-cased U+FDFA in order to be able to allocate fewer bits for the length of every other decomposition.
Love that Unicode supports this level of expressiveness for all scripts and cultures. I wonder why this was submitted and why under the accessibility label though.
The document linked by hwayne includes a justification if the submission: common use in Pakistan, but since it's a different script it's bothersome to be changing keyboard configurations to type it
But how do they enter it with an urdu keyboard? Is it a dedicated key for it, or is it tied to a shortcut combination in the operating system? If the latter case, then they could just have used the regular arabic glyph saved in the shortcut.