phoenix - A modern X server written from scratch in Zig
43 points by ahobson
43 points by ahobson
No tearing by default
Please don't do this. Vulkan, for example, has explicit presentation modes (see: VkPresentModeKHR)--please respect them. Let the application or intermediate layer (aka Zink for OpenGL on Vulkan) handle the issue and just respect their choices.
Trying to "fix tearing" is a task you really don't want to take on and will consume an enormous amount of code and engineering time for very little gain. And, to be fair, both Windows and MacOS don't try that hard. Window resizing forces back to software rendering on both of those and tears, for example.
Yes, the people who complain about tearing are extremely noisy. Tell them to pound sand.
The so-called "damage tracking" is much more important, especially on battery-driven devices (mobile phones, handhelds, laptops, EVs, etc.). It is a caching technique where the compositor subdivides the display into tiles (pixel clusters) and decides which parts should be re-rendered, in order to eliminate an excessive number of draw calls. There are also additional techniques, like the single-pixel buffer, where the application writes solid-color surfaces as a single pixel that is super cheap for the compositor to upscale. On a laptop, my first attempt at implementing a Wayland compositor was draining the entire battery in just 10–15 minutes until somebody told me about damage tracking. Combined with other caching methods, it made the battery last several hours instead.
What you the user perceive as tearing, we the driver authors perceive as buggy rendering. We aren't trying to be perfectionists or to insist that every frame is a painting, just to avoid situations where the user sees pixel patterns that no application intended for the user to see.
"Solving tearing" means expending a lot of coding resources solving a problem that 99.9% of people gain nothing from.
"Solving tearing" means expending a lot of GPU resources solving a problem that 99.9% of people gain nothing from.
"Solving tearing" means introducing problems of your own making (extra latency and power consumption, for example) that way more people care about than care about "solving tearing".
I might even go further and suggest the idea of "perfectly rendered" is a problem in and of itself. For example, what does it even mean to be "correct" when a window overlaps two monitors with different resolutions and refresh rates? (CAD programs probably don't want overlap at all given that software rendering would kill most of them while a GUI dialog box likely wants to be rendered the same physical size even if the resolutions are wildly different) Sure, you can make an arbitrary choice, but that choice is always going to be wrong for some subset of people. The best you can really do is give developers the ability to make a choice without getting in their way.
Solving tearing usually means enabling vsync and not honoring application-level requests to disable it. Multi-monitor setups are interesting but your desire doesn't make sense one a single monitor either. Consider a video game that, without vsync, runs at 200 frames/second. When output to a 60 Hz monitor, the user is no longer guaranteed to ever see a single complete frame as the developer intended it; perhaps each of the individual rendered frames is correct on its own, but the user might only see pieces of each frame with emergent beats from inter-frame sampling. (A similar problem occurs when mastering speedruns of 60 frames/second games to 24 Hz or 30 Hz video formats. As you may know, this is an old problem dating back to the introduction of television.) By merely enabling vsync, we (1) set a single flag; it's not much driver code; (2) reduce the load on the GPU by only drawing a third of the frames that we used to draw; (3) reduce next-frame latency and remove next-frame jitter (because the GPU is more idle when draw commands arrive for next frame) and overall power consumption. As an emergent bonus, many games have physics-engine correctness issues which only occur at extremely high or extremely low frame rates; for example, Antichamber has glitched collison at low frame rates and speedrunners regularly abuse it to clip through walls.
If you really feel this strongly about your position then I encourage you to roll up your sleeves and contribute to your GPU drivers. The functionality that you desire is reserved by standards but only exists in drivers because people contribute code.
So, okay. I'm hoping somebody could clue me in on this a little bit.
I've done a bit of Xlib/Xcb development (application-level, using the library, not developing it directly. I wrote/maintain a UI toolkit.). Xlib is bad. Like, really, really bad. Xcb replaces some of the "bad" with verbosity. There's stuff I would like to support (vsync/vblank, XInput2, among others), but working with these extensions is so frustrating that I just keep punting on it. Maybe Wayland is worse? I haven't done any Wayland dev yet, I can't say.
As a dev, my opinion is "can we please just let X die already".
As a user, I've been running Wayland on my desktop since... 2022? 2021? Maybe a bit earlier? Even with closed-source legacy apps running via Xwayland, stuff has just worked for a while now. I don't need any sort of remote desktop, screen sharing is a little jankier than on X but has caught up (but I also remember when "screen sharing" on X was "copy and paste this ffmpeg command")... I don't know.
I don't understand why people are clinging on to X like this. Is this another weird proxy-culture-war against Red Hat like systemd was/is? X was always kind of janky. Wayland's kind of janky too. And, for that matter, macOS and Windows are janky in their own ways.
I don't get it.
Xlib is bad. Like, really, really bad.
Can you elaborate as to why? Like I somewhat recently implemented XInput2 in my library and it was pretty easy and I'm happy with the result, my only wish is that joysticks were included too, since I do remote dev, including remote game dev, all the time and using a keyboard to test a game is so bleh.
I don't understand why people are clinging on to X like this.
It works well enough and has a massive ecosystem of useful things.
The main reason Xlib is bad is that it presents a synchronous API over an asynchronous protocol. I used XCB ages ago as a low-level interface that I exposed to a higher-level language. It exposes completion tokens, which you can map trivially to futures, which makes the whole thing nicer: you can do a bunch of asynchronous calls and not bother waiting for the reply until you actually need the result.
The main reason Xlib is bad is that it presents a synchronous API over an asynchronous protocol.
This is the answer I expected, but it is like... only one quarter true. Yes, there's some synchronous functions, but the majority of them aren't, since they don't require a reply anyway. This is why XFlush exists. If anything, the fact most of them are asynchronous complicates error reporting (you get a sequence number in the error report but xlib doesn't expose that normally, so piecing together what you did to generate that error event is a bit tricky.)
And yeah, I'd also grant that the xlib buffers need some attention, you gotta loop XPending and not just rely on readiness of the socket, since something might have been buffered internally by some unrelated call. But this kind of thing is hardly unique to xlib.
Maybe I should make a list of sync vs async Xlib functions but off the top of the head, the main ones that have a sync reply are XInternAtoms (and friends) and XGetWindowProperty (and friends), and XQueryExtension (etc). You don't generally use these outside initial setup - these are why some programs start up slowly (though if you batch your XInternAtoms call it helps a lot! that's why I used the plural one here, the singular XInternAtom should pretty much just never be used), but it has minimal effect on the bulk of the program's normal running.
And for those functions where the xlib calls do legitimately suck, you can swap out the xcb ones - xlib and xcb can be used together, coming from the days before they decided to abandon all compatibility and rewrite from scratch lol.
Calling it less than ideal over these unnecessarily synchronized round trips and difficult to synchronize error reports? Sure. Saying it was an unforced error for them to force the error of display disconnect to be fatal? Yeah, I'd throw an exception over that. (literally lol).
There's some other complaints you can bring up too, like there's an old page (I don't recall the reference now...) that complained about creating graphics contexts and mapping windows and flushing buffers and....remove those things and the usability result is worse, so you find pretty much everyone does it.
So, I don't think it is justified to call it bad over these things.
I mean XCB's raison d'etre is that XLIB was found to have race conditions that were fundamental to the their API (only the XLIB API, not the X11 protocol). That is pretty bad.
EDIT: Added a reference. from 2.4 Thread Safety
While Xlib was designed to support threaded applications, and while that support is not unusable, there are known race conditions that cannot be eliminated without changing the Xlib interface. In particular, if an application needs the protocol sequence number of the request it is making, Xlib forces it to make the request and, without any lock held, query the last sequence number sent. If another thread issues a request between these two events, the first thread will obtain the wrong sequence number. XCB simply returns the sequence number from every request.
I mean XCB's raison d'etre is that XLIB was found to have race conditions that were fundamental to the their API
Yeah, I've actually hit the error before where it is like "sequence number inconsistent, this is a bug in xlib and not your fault. sorry. aborting". like it literally says "this is not your fault" in the error message, lol. And this was using the XLockDisplay/XUnlockDisplay pair.
But at the same time... meh. Windows and Mac also have thread restrictions for ui objects, so you get used to handling thread affinity and message passing around it anyway.
Nevertheless, I agree error reporting - which is the main thing you'd want that sequence number for in xlib - is not very good here.
The one I remember was registering atoms, where you send a string and get back an integer. There were also a bunch of things related to extension discovery. I was using an X display with around a 10ms round trip time. Program startup took over a second with XLib with all of the round trips. With XCB, all of those were sent as requests and then I collected the responses when I wanted to use the atoms. This was pretty soon after the first request, but the whole thing took 10ms.
The one I remember was registering atoms, where you send a string and get back an integer.
Yeah, this is the most common slow startup reason, since you gotta do like 20 of them just to begin when following the icccm/netwm specs. But xlib has a solution for this, precisely because it is the most common problem: XInternAtoms, with the s, which does them all at once. It will wait until they all are answered to fulfill its api requirements, but it doesn't loop one at a time, meaning you'll prolly see about that 10ms total latency instead of 200ms or whatever.
There is no similar function for extensions, however, so that can still add up, but you're probably looking at just like 3-5 extensions vs 20+ atoms so it doesn't hurt as much. (One somewhat obvious addition would be to batch extensions too, like you could do this without even breaking binary compatibility by making XOpenDisplay fire off some of these extremely common queries and then the answer sits in a buffer until the application actually requests the answer, then it gets it from the local cache instantly.)
It has been a long time since I did any low-level X programming, but I vaguely remember they needed to be interleaved. Each extension (and it was more than 3-5: just render, damage, composite, and fixes is 4, and that’s the minimum for modern X) needed some extension-related atoms so it was a pile of round trips. Querying the set of extensions and registering the atoms as you got the replies was a big speedup.
I don’t think I tried using render / composite with XLib, but they also had a bunch of places where you could safely fire off a bunch of commands and then handle the responses together (which basically amounted to ‘everything is fine’ or ‘the X server has probably crashed’).
It has been a long time since I did any low-level X programming, but I vaguely remember they needed to be interleaved.
Extensions do often need additional atoms, but they're not actually tied to the extension; they're still just statically known strings, so you can intern them ahead of time in one big batch.
(and it was more than 3-5: just render, damage, composite, and fixes is 4, and that’s the minimum for modern X)
Damage and composite are unnecessary for virtually every application. You only need damage if you're watching paints of other windows - so if you're, for example, a compositor wanting to update your blended copy of the window (this is the specific reason why it was added), or if you're a screen recorder (or vnc server or similar) wanting to encode their changes into a video without constantly polling the whole thing. You can use it to send custom damage reports, but this is rarely necessary. Similarly, composite is primarily used for implementing compositors, and maybe again for screen recorders and such, but not normal applications.
I was thinking some combination of fixes, randr, glx, render, xinput2, present, sync, and mit-shm as the base set. You almost certainly don't actually need all of them since there's overlapping functionality so you'd pick the ones that best fit your needs. But even if you do use them all, it is surely a fraction of the number of atoms you'd want...
I still use X. I have a Linux desktop (tower computer) underneath my desk, and I still use several X-based programs. I can forward the display to my Mac laptop (via XQuartz) and still use them. I, for one, will hate to see this go.
Speaking of XInput: with Xorg I can dynamically recalibrate my cheap pen-screen (screen + stylus input), e.g. to rotate it and stuff; Wayland seems to want calibration static. Also, with Xorg I can experiment with tuning my keyboard layout easily, with Wayland I needed to restart the session (I have used this for figuring out some dumb-input workflow on the fly).
Speaking of compositors: X11 splits WM vs server correctly, Wayland couples user preference logic with talking to lying hardware into the same process. Also, Wayland has still not converged on the protocol of screen sharing so now the applications need to implement who knows how many.
So basically let's reduce sharing of hardware-side and applications-side decisions between compositors. I guess it can work to somewhat reduce user choice if that's the goal. But now I also need to track what application is compatible with which janky version of what basic functionality.
And of course with X11 split of concerns I can also restart WM without losing the session. Which lets me figure out the customisations with reasonable effort.
And of course xdotool is a valuable tool, and Wayland will take another ten years to decide enough about access control to make xdotool replacement usable and feature-comparable.
Separately, I have no mental model for wlrandr, because I got a persistent glitch there that I don't know how to interpret. And then every compositor (or at least compositor library?) has its own behaviour there. Of course with X11 I could help any Linux user to fix projector connection issues with xrandr. Of course with Wayland there is no single command to learn well.
Basically, if neither macOS nor Gnome doesn't look annoying to you, Wayland is fine, probably. If you have carefully-customised environment you built on X11 and want to have it possible to build the same even if from scratch, Wayland is horrible. And they clearly did a few design decisions even more worse-than-better style than X protocol.
TBH I don't know why anybody should care whether I use Wayland or X, but since you asked...
For me, moving to Wayland means developing Wayland support for the window manager I use (StumpWM), or moving to something else entirely. Neither is impossible, but both are costly.
On the other hand, Wayland doesn't solve any problems for me and I know a few features I use in X aren't available in Wayland. It's a sideways or backwards move.
From my perspective the move to Wayland is straight out of the Windows and OSX playbook - change just for the sake of it - and I don't have the time or desire to follow along. It's nothing malicious towards anybody, but I'm not changing something that isn't broken.
Wayland is definitely a backwards move in regards of ease of implementing a WM, which is a big cons in my book (Goes against 'Personal Mastery' from Smalltalk Design Principles). It forces one to implement a compositor if one wants to decide where windows go. IMHO X11's policy not mechanism was the right call.
Xorg hasn't been janky for me in forever, maybe XFree86 was painful with the monitor timings, but I don't really know. I suspect people who have problems with Xorg maybe purchased problematic hardware. I've never had problems with hardware but then I've always stuck with ATI/AMD/Intel and back in the day Matrox.
I use Wayland but there are some things I'm missing:
Being able to forward display with to Mac with XQuartz
Or the X forwarding scenario I use most often: Windows with Xming. (and yes, I pay for the "website version", well worth the $15, it has several advantages over the other free options on the web like not breaking under windows dpi scaling and more opengl forwarding works).
I often see people say either network transparency is useless or that wayland does it better anyway and both things are false - even if waypipe works on linux, I tried searching the web for Windows and Mac versions and came up empty. X works there today - not ideally, clipboard on mac is a bit awkward too, but it works. I even used an X server from android once... wasn't impressed at all with the one I had but it did still work.
Wayland is also bad. Just on different dimensions.
Doing the basics in Wayland is merely painful, but trying to write a program that works across OSes, window managers, and so on is an exercise in pain.
Gave this a quick go and it's promising but still early days. A lot of the core protocol stuff that basic X11 clients (xterm etc) expect are not implemented yet. No extensions either (RENDER etc).
As the README says:
Phoenix is not ready to be used yet. At the moment it can render simple applications that do GLX, EGL or Vulkan graphics (fully hardware accelerated) nested in an existing X server. Running Phoenix nested will be the only supported mode until Phoenix has progressed more and can run real-world applications.
Indeed, I was able to run some simple OpenGL and Vulkan programs like glxinfo, glxgears, vulkaninfo and vkcube.
If you want to try it make sure to have zig-0.14 around, it doesn't build with 0.15 yet.
I hope they use something like https://github.com/tonyg/xcb-shim eventually rather than just wrapping libxcb.
Indeed wrapping libxcb is less ideal that writing/generating the code directly from the XML specifications.
iirc xcb-shim is an augmented definition? A while back I gave writing a CL one a ago. It can successfully do the initial handshake and I think I implemented intern atom. It kind of died out at the figure out how to parse XML in CL so that I can translate it into a set of CL macro definitions (the birth of my daugther also might have had something to it). The biggest thing out side of the XML specs is how the request-response cookies (completion tokens?) work. What else did you find the XML specs to be missing?
Btw in the readme you mention about other implementations based on the XML spec. These are the ones I remember of the top of my head:
I've been thinking for a while about what it'd take to write a new X11 server, but never ended up trying.. This is really nice to see!
This seems nice. But I am not completely sure where this is aiming given that it's explicitly not trying to be fully X compatible.
Is it trying to basically replace it just with a different approach than Wayland? I think that's a noble goal. While personally being happy with Wayland as a user there seems to be mich valid/constructive criticism about things that have been overlooked, not really thought through and/or are hard, harder than necessary or impossible.
So it feels like right now is the last chance before everything is completely settled down probably making things quite a bit harder to port in 5+ years when a lot more breaks.
Always nice to see different approaches cause even now it seems that (current) Wayland is not the last call.
And personally I like the idea of seeing a "if we do the old approach but with the liberty to break things and start from scratch". I think it's a common fallacy in development to consider a new thing better when I'm reality its breaking compatibility and having a clean slate that is not the different way of doing certain things. Not saying it is in this case. Personally I do know w way too little about it.
Certainly not always the case but different working and not just conceptualized approaches are mine to have.
X design for WMs is just better for niche WMs, so using this under something like, I think, cage, could be a way to have better window management for Wayland applications too?