An interactive introduction to the terrific experience of rendering Arabic typography and its technical debt
123 points by lr0
123 points by lr0
Absolutely incredible article.
Also I love the fact that this:
﷽
Is just one code point. Try copying it! That’s so cool. Apparently it means “in the name of Allah, the most gracious and the most merciful”.
That (Unicode) character was discussed here: https://lobste.rs/s/7s4sjp
I had fun following the references from the Wikipedia link referenced here https://lobste.rs/c/dq2ucz
Basically, it's in Unicode because it was in the Pakistani code page, and it was there because there's a legal requirement to include the phrase in legislation, and Urdu is a Indo-European language, so "shelling out" to an Arabic code page to write out the Basmallah would be hard to do with the technology of the time.
Unfortunately not all the comments show this community at its best.
﷽
The following line (from the post) is poetry:
A monument from the era when rendering was baked into the encoding because nobody trusted the renderer to do anything, preserved forever, like a fly in amber that recites.
Meta: another excellent example of the need for a typography tag.
Why? The subject seems popular. Is there a big drive to hide this kind of content?
(Tags are primarily for filtering on lobste.rs.)
FWIW I like that lobste.rs tags are an indicator of whether a submission is on-topic, i.e. if there's no tag that really fits your submission then maybe you'll reconsider. I think typography is very relevant to computing so would like to encourage those submissions! I think graphics is borderline for being the correct tag for these posts.
It's also why I'd personally like an embedded tag. Sometimes hardware or assembly just don't feel quite right for a post.
Tags are for more than just filtering, they also promote a topic's inclusion.
From about#tags:
It keeps the site on-topic by only allowing a predefined list of tags. These tags represent what most of the users of the site want to read, so content that does not fit into any of those categories should not be submitted.
We already have "art", "design" and "graphics".
Computer-aided typography is totally on-topic.
I wish we'd handle just the 4 German umlauts correctly (ä,ö,ü,ß), while these guys are light years ahead of us rendering a right-to-left language...
What an incredibly awesome and informative read. I really love the entire history providing extensive context to the entire story. As someone with both a history related degree and IT career this truly hit both interests perfectly.
Something about the text sets off my LLM detector, which is a shame because it's both in-depth and covers a lesser-documented part of the modern tech stack.
An example that reads as LLM-ish (but it's all throughout the article):
The reason no browser ships it is structural, and the structure is rather elegant as obstacles go. Latin justification treats shaped text as frozen, measure the words, pour the slack into the gaps, done. Shaping and layout stay in their separate boxes, and every text stack in production is architected around that separation. Kashida justification breaks the boxes open.
To @lr0, was this article text generated / polished / translated by an LLM? If so you might consider adjusting how much control it has over the final output. I looked through some older blog posts (for example https://lr0.org/blog/p/gpt/ and https://lr0.org/blog/p/linux_new_users/) and they felt far more human.
the Kashida section was contributed to this post from a talk in Arabic of Nawal Hadeed, which she translated and added to the post herself. Although I'm unsure of LLM usage in the translation process, looking at the original Arabic I felt some change in tone while editing the post. I could have either declined the translation and never have this documented, procrastinate in translating it myself (which has been ongoing for a while), or publish as it is. I found the last least damaging.
I don't find the occasional hint of LLMisms too worrisome if the whole of the submission holds together, which it does here.
Great article that I'll throw at anyone who believes they know better about this stuff.
Once had to deal with text editor devs who believed they could implement support for all of this themselves without using any existing libraries (they could not)
Such a fun read, and such great writing too. Many memorable lines.
One of my favorites:
Print and the Arabic script met badly, and that meeting set the pattern for almost everything since: when the machine cannot do the script, simplify the script, ship it, and call it progress.
It's true, too! Text is very much shaped by the tools used to write it and always has been, and I love that for some reason.
This is very lovely. I remember the first time I fed a string with a negative width to my employer's simple in-house typesetting system. It ended up exactly as over-printed as you'd think.
For a counterpoint to this, Andrea Stanton's Broken is Word (chapter 10 of Your Computer Is on Fire) covers the history of how (badly) consumer word processors handle representing Arabic.
Ah, IE’s text-justify property. Interesting things in that era. There was also text-justify: newspaper, which was decades later claimed by some to be Knuth-Plass or similar, but which I don’t believe was; and https://mediumwell.com/wp-content/uploads/2016/02/newsprint-justify-example.gif shows what it claims to be text-justify: newspaper matching what these days is specified as text-justify: inter-character.
IE really did have a lot of nifty things quite early on, which other browsers then allowed to languished in the “too hard” basket, and which either never came back, or took fifteen or thirty years to return. Firefox got text-justify: inter-character in 2017. Chromium finally implemented that stuff a few months ago. Safari still hasn’t.
For greater clarity, that was IE Macintosh Edition, right? IE Windows never got anything like that.
I can’t say. I had lost access to IE 5.5 and 6 by the time I learned about these maybe a decade ago and wanted to investigate. And documentation, never excellent, was basically nonexistent by then.