Zig 0.15.1 Release Notes
79 points by dgv
79 points by dgv
TIL Zig has a third mascot, Carmen the Allocgator! :3
That’s all I have to say :)
the line must go up!
Wait! Somewhere out there, a 6-year-old is asking how to tell the difference between an allocgator and a crocomalloc. Does the mascot have a pointy snout? No, it looks like it might be rounded. Are its lower teeth visible while its mouth is closed? No, just the upper teeth are visible. Well then. Looks like it really is an allocgator. The artist clearly knows their stuff.
Why have only two when you can have three? That’s what I’m thinking. Where all the programming language mascots at? Is Zig just stealing all of them?
Woo! finally Zig gets proper incremental updates, and nice they managed to get their self-hosted backend in a good state. This language sure is shaping up. Gotta love all those breaking changes and depracations though :) I am definitely rooting for this language, it’s a far cry from the usual “rewrite all C/C++ into Rust” crowd, however Zig does have a fair amount of annoyances and it is definitely not ready, but I guess that’s not a secret given the language is not 1.0 yet. My take on Zig is that it has more of an “embrace C” than “replace C” approach which is something I find appealing.
No not yet, the notes say that even though self hosted is the default incremental updates are not.
Excellent work as always. Compile-time improvements are finally beginning to show after years of work, congratulations to Andrew, Jacob, and the rest!
ArrayList: make unmanaged the default
I’m on my phone so can’t check this out. Does this mean that every single ArrayList.append
will require the allocator as an argument? All to shave some bytes off the structure which is a container type anyway…?
Is it really worth the pain? I understand the savings will add up for ArrayLists embedded in other containers/structures, but optimizing those should wait until profiling reveals a need for improvement — in which case a switch to ArrayListUnmanaged
can be done slowly and on a case-by-case basis.
Don’t misunderstand me; I admire the improvements and work that has been done over the last few years, especially the self-hosted backends and incremental compilation which is quite impressive. But it seems to me that the direction of the Zig language has seemingly always been towards removing ergonomics, in the name of clearness or performance.
Yes, noble goals, but at some point you have to wonder whether
.append(context.fba.allocator(), @divFloor(@as(isize, @intCast(coord.x)) - @as(isize, @intCast(refpoint.x)), 2))
is really better than
.append(((coord.x as isize) - (refpoint.x as isize)) / 2)
or even
.append(((coord.x.as_signed()) - (refpoint.x.as_signed())).div_floor(2))
(The above is only semi-contrived, the divFloor
expression inside append
is straight out of some code I wrote a while ago. I found it by grepping for @.*@.*@.*@.*
, which yielded about a hundred lines. Unrelated, but if Zig continues to morph into a @language @that @looks @like @this, I will become extremely insane :))
At least the @divFloor
shenanigans can be excused since there are different kinds of divisions one might want to perform for signed integers, and any of them might be chosen for different situations. But for a container, exactly how often are you going to deliberately use a different allocator halfway through populating an array?
This particular change is also mildly annoying for a different reason: it means I’m going to end up maintaining my own wrappers on top of the standard library’s ArrayList, which… well… isn’t the standard library one of the reasons I like Zig in the first place? i.e. not needing to carry around standard library enhancer to every project in the same way one might do with C?
Writergate
Despite the extra work of upgrading, I will agree with the justifications here. In my projects I tended to have a few giant functions that took Writers and were monomorphized many times, leading to binary bloat in the executable. Eventually I found some workarounds, but this is a good permanent solution.
Is it really worth the pain?
It usually is pain-neutral, and often encourages you to refactor the code to avoid individual small allocations. E.g., append is an anti-pattern, you generally want to bulk-reserve memory and then appendAssumeCapacity.
EDIT: I guess what would help is to take a look at the specific code that becomes painful in context?
The usefulness and size savings of ArrayListUnmanaged
become a lot more apparent when you have many arrays especially in the same struct. After a while I stopped using the managed variant altogether so this change seems like a natural decision to me.
I wonder if you could comptime generate a managed wrapper for any unmanaged thing, e.g. Managed(T)
which exposes the same methods as T
but eating any allocator params.
While it does save a few bytes, it has the added benefit of forcing functions that cause allocation to require the allocator passed in, rather than implicitly using the one in the array list. Ideally, functions that do not have allocators as arguments do not allocate (but obviously this isn’t always true; it would be nice if we had algebraic effects a la koka, but alas).
Awesome progress as always.
I was intrigued about the comments on unicode.
It seems the stdlib includes things such as networking, http, tls and deflate - but supporting unicode character identification and widths is a step too far? A lower priority for now? A permanent problem because new unicode versions are released so frequently and updating the toolchain for this is the wrong place?
It’s a combination of these two things:
As an example, a web server that processes a form in which a person inputs their name and email address does not, and should not be Unicode aware. Just take the data and shove it into the database. It’s often programmers who think they should be doing string manipulation that end up creating applications that fail on diverse inputs, and it’s the simpler software that has no Unicode awareness that gets it right.
Off the top of my head, applications which need to be Unicode aware are:
…I think that’s actually the entire list. I can’t think of any software that needs Unicode data and doesn’t do one of those two things.
As an example, a web server that processes a form in which a person inputs their name and email address does not, and should not be Unicode aware. Just take the data and shove it into the database.
I can’t even tell you how many times my name has gotten mangled as a result of software not handling encodings properly and then just forwarding their idea of what the correct bytes are to a different system with a different idea of what the bytes mean.
It’s particularly fun when the mangled version ends up being on shipping labels and then having the service point people refuse to let me pick it up because the mangled name doesn’t match my ID.
Yeah, but that’s the fault of those systems trying to be smart and encoding-aware. If you treat people’s names as arbitrary bags of bytes, rather than trying to be smart about it, this issue doesn’t happen—because you don’t touch the bytes.
That’s fundamentally incorrect. You cannot render text while treating the text as just “arbitrary bags of bytes.”
If you don’t know whether a bag of bytes is UTF-8 or EBCDIC you don’t know what the characters are, and if you don’t know what the characters are you cannot render them. If you don’t know how to render it you cannot print a shipping label. You can’t even properly render it on a web page because if it doesn’t match the encoding the browser expects for the rest of the content it’ll be rendered incorrectly there as well.
I think there’s a misunderstanding going on. I never said anything about rendering text. A web backend for processing orders doesn’t have to know anything about the text’s encoding, and web backends are where your names get all mangled.
Answering out of order,
and web backends are where your names get all mangled.
In like 99% of cases this is absolutely not the case. There’s pretty much no actual processing that they ever even do on the name because it’s not data they need to do anything with other than simply displaying it, either in a web frontend or on shipping labels and similar.
What actually happens is that the web backend needs to talk to the shipping company’s API to order shipping and generate shipping labels. Their API will require the data in some specific encoding, whether that’s only an implicit requirement in the form of the API requiring you to submit the data as JSON strings, which means that the data have to be in some Unicode encoding, (If they follow the IETF’s JSON RFC it must be UTF-8 in particular, but JSON in general can be in other Unicode encoding,) or it’s some fancy proprietary binary format where all strings are some flavor of EBCDIC, or whatever.
But what happens when you treat your input as “bags of bytes that you don’t need to know the encoding of” is that you now cannot serialize that input as a JSON string! To do that you need to either already know that the bytes are valid to include wholesale in a JSON string (but this is unlikely, and means that you can’t display it directly anywhere because you’ll have various things escaped that would need to be unescaped before rendering), or you need to know the current encoding so that you can first potentially convert the text to Unicode and then serialize the string as a JSON string.
So at this step you either 1) ship off random data to their API, very likely generating invalid JSON and violating the shipping company’s API in the process, or 2) you use some character encoding detection algorithm, and those regularly fail to give you the correct answer. There’s no reliable way to recover the encoding information when it’s once been thrown away.
And this whole process repeats in every system that receives and uses the name for anything. If you don’t know the encoding at every step then you will end up rendering broken text at some point.
A web backend for processing orders doesn’t have to know anything about the text’s encoding
So no, if you want to ever render the text that your web backend is responsible for storing (and if you don’t, then why are you storing data that you will never use?) then you must know the text’s encoding. Preferably you do this by either only accepting input in a given encoding or by normalizing all input to a given standardized encoding, depending on where the input comes from, but if it doesn’t do either of this then it needs to store the encoding that the client told it that the text was in.
If you don’t either 1) make sure all data is in a single known encoding, or 2) explicitly store the encoding of the data, then you can never reliably recover that information, which means that you can never reliably render that piece of text. Many times using algorithms that try to recover the encoding might seem to work correctly for certain inputs restricted inputs, mainly given to said algorithms by western Europeans and their descendants, but they inevitably will not handle all possible inputs because at that point there is no way to know what the user actually input.
It’s a good point that application developers munge strings far too much, and I appreciate the way you intentionally design Zig to encourage devs to follow the happy path.
I have implemented “pretty terminal printing” before (for printing tables to the terminal) and needed the monospace character widths to handle rendering correctly. Presumably any ratatui/textual style library in Zig will need grapheme clustering and character width information - but I think this would be a good fit for an external unicode library (it could update in cadence with the Unicode standard not Zig releases, for one). I’m not sure if that comes under “font rendering” (which I take to mean as what ghostty does - render pixels) or not, but yeah it’s relatively rare.
There are other parts of Unicode which aren’t so bad. I’ve had to deal with CSV files provided by the user that may or may not be UTF-16, for example. A parser works great on a stream of bytes but at some point you might want to validate that those bytes are at least valid UTF-8. I see this stuff is actually covered in https://ziglang.org/documentation/0.15.1/std/#std.unicode - but when I read the release notes I mistakingly took it to mean there was zero unicode support at all (just arrays of bytes).
I think in the end I agree with you - this is a good split.
Yeah to be clear when I say “Unicode awareness” I mean logic that requires access to UnicodeData.txt.
Converting between UTF-8, UTF16LE, UTF16BE, Shift-JIS, etc., is by comparison trivial and relevant to a much broader range of applications.
The Zig library I use for this is zg https://codeberg.org/atman/zg but it needs to be upgraded to work with the new release. It does some build time code generation that uses file readers, writers, and compression streams, all of which break with 0.15.
Claude Code fed with the full Zig changelog is surprisingly good at upgrading Zig code automatically.
It seems like properly localizing an application without Unicode would be challenging. You need to know glyph width to layout even terminal interfaces accurately.
I can see maybe pushing user input validation to a different layer of the stack, but I’m not sure that’s entirely a great idea.
I understand wanting to punt on it, but it seems like settling on something like UTF-8 early on for APIs would avoid a lot of headaches real world applications have to deal with.
As an example, a web server that processes a form in which a person inputs their name and email address does not, and should not be Unicode aware.
Where should validation that the user is inputting a valid email address happen? (Assuming we’re talking about them registering a new account, for example.)
Email address validation is a perfect example of where not only is decoding Unicode unnecessary, but probably you will cause a bug by writing extra code that doesn’t need to exist.
The entire process can be done while leaving the email address UTF-8 encoded. Arguably, the best thing to do is try sending an email to it, and then handle the “invalid email address” error from the email sending software. But even if you wanted to implement the entire email address validation spec redundantly in your application, it’s an operation that can and should be done entirely on encoded UTF-8 data.
Where should validation that the user is inputting a valid email address happen?
My take for this specific case? When they hit the “confirm” link they receive in the email :) I use a .email as the TLD for my email address and plenty of places tell me it’s not valid. There was also this [https://lobste.rs/s/gvtlpo/email_is_easy_email_address_quiz] email address quiz the other day showing how absurd it can be to try validate email addresses
Additionally, it is now allowed to have both
else
and_
:
fn someOtherFunction(value: Enum) void {
// Does not compile giving "error: else and '_' prong in switch expression"
switch (value) {
.A => {},
.C => {},
else => {}, // Named tags go here (so, .B in this case)
_ => {}, // Unnamed tags go here
}
}
The distinction here looks rather confusing and hard to discover.
I think the typo in the comment doesn’t help - I guess that should be “Previously did not compile”. I agree it’s not an entirely obvious distinction, but it does at least match how you define non-exhaustive enums, and they are kind of an advanced-ish feature anyway?
It does read weird, especially if you have pattern-matching intuition that _
is a wildcard. It looks less weird in the context of Zig, where _
is also part of the corresponding enum declaration:
const Enum = enum(u32) {
A = 1,
B = 2,
C = 44,
_
};
And, if you omit either else
or _
, you’ll get a compilation error. In other words (and this is the behavior of Zig 0.14 and older as well), else
doesn’t match un-named variants, and _
doesn’t match named variants, the two are disjoint.
I like the vision for Zig, for making everything correct and robust, but I wish that it also was a tiny bit more ergonomic. It’s really taking shape, though! Congrats to the Zig developers.
Would you mind elaborating on that a bit? I’m genuinely curious when devs mention “ergonomics”. I have a vague sense of it for my own work, but it feels difficult to define.
With “ergonomics” in connection with a chair or table, I think of it that “something that is not in the way, does not force you the bend the body in weird angles and does not hurt to use over extended stretches of time”. In my book, a similar definition works well for programming languages too.
That makes sense. I was thinking more about specific examples. In what ways do you consider Zig to not be ergonomic?
.{}
is uncomfortable to type in, when all you wanted was print "Hello, World"
. Same with the const std =
business, when all you wanted was import std
. Not a crisis, but a bit unergonomic, to me.