Punycode: My New Favorite Algorithm

26 points by iand675


dzwdz

Am I too paranoid, or are parts of this AI generated? The base36 section, for example, has ao much fluff just to say that it's all the characters allowed by the DNS spec. There's even a bullet list of examples of the different inefficient bases.

It's a shame - IMO all the fluff just detracts from what I think is the main point of the article (the adaptative encoding), which is genuinely interesting.


Base-36 extracts every bit of information density possible while playing by DNS’s rules. When you’re encoding hundreds of millions of domain names, these efficiency gains matter.

I think this doesn't have much to do with how many domain names you are encoding (if performance was the concern, I assume base32 could maybe even be the better choice?). Domain name lengths are limited, so you just don't want to waste space that could be used to store more data.

I don't understand the "aü北aü京" example. Won't the letters get sorted anyways? 35,180,170 doesn't really seem like a "wildly oscillating" bias, since it settled down around 170-180. The graph actually makes it look like it's reaching more extreme values with damping - but they don't match up with the numbers in the text. idk what that's about.

(Also, I'm not sure why, but bücher renders with a regular "e" for me.)