Addressing global removal race in Wayland

9 points by Aks

jmillikin

When the bind request finally arrives at the compositor side, the global won’t exist anymore and the only option left will be to post a protocol error and effectively crash the client.

From a distributed systems engineering perspective, why on earth does receiving an error response to an IPC mean the client must crash? Can't it just log/handle the error and continue with whatever it's doing?

The wl_fixes interface was added to work around the fact that the wl_registry interface is frozen.

I don't have a strong opinion on Wayland's technical merits, but sentences like this plus the drama around Frog Protocols make me think something at the protocol layer is deeply broken. When designing a greenfield protocol intended for to be implemented by independent projects over a multi-decade timespan, isn't backward-compatible evolution one of the most critical parts?

hobbified

And apparently the "fix" for a race that might occasionally crash a client is to change the protocol in such a way that if an old client (i.e. any software that exists today) exists anywhere in the system, it causes a resource leak in the compositor under normal operation?

I've dealt with systems that have "must crash on protocol error" semantics, and they do have their advantages, but this isn't the kind of situation that it was meant to cover. Surely there's room to make "you tried to interact with an object that doesn't exist anymore, but it's not your fault" into a new, nonfatal sort of status. And okay, maybe existing clients don't know how to deal with that and they crash anyway, but you haven't made anything worse. Now, as clients upgrade to a new protocol library they gain robustness, but a mixed state doesn't make things regress to worse-than-baseline.
calvin

I don't have a strong opinion on Wayland's technical merits, but sentences like this plus the drama around Frog Protocols make me think something at the protocol layer is deeply broken. When designing a greenfield protocol intended for to be implemented by independent projects over a multi-decade timespan, isn't backward-compatible evolution one of the most critical parts?

X11 had pretty much the same thing, also with drama.
mira

The goal was not to reinvent X11 but change the big paradigm and see if people follow that or go elsewhere.

(from pq in the thread you linked)

Is this the common understanding among Wayland leadership? :/

quasi_qua_quasi

From a distributed systems engineering perspective, why on earth does receiving an error response to an IPC mean the client must crash? Can't it just log/handle the error and continue with whatever it's doing?

My read was more that in practice most clients will crash if they get this error because they don't handle it recoverably, not that they must crash.

jmillikin

That's what I would have expected, but I can't find any indication that wl_registry::bind has a mechanism to signal non-fatal error.

The Wayland protocol spec (wayland.xml) has this definition for the method:

    <request name="bind">
      <description summary="bind an object to the display">
        Binds a new, client-created object to the server using the
        specified name as the identifier.
      </description>
      <arg name="name" type="uint" summary="unique numeric name of the object"/>
      <arg name="id" type="new_id" summary="bounded object"/>
    </request>

... and unlike other methods that are documented as returning a specific error type, this one seems to assume the bind call is infaliable.

Which leads us to the error event, documented as being "fatal (nonrecoverable)" -- if this is the only error condition that wl_registry::bind is allowed to reply with, then binding to a non-existent object would indeed cause a client to close the Wayland connection (and probably exit...):

    <event name="error">
      <description summary="fatal error event">
        The error event is sent out when a fatal (non-recoverable)
        error has occurred.  The object_id argument is the object
        where the error occurred, most often in response to a request
        to that object.  The code identifies the error and is defined
        by the object interface.  As such, each interface defines its
        own set of error codes.  The message is a brief description
        of the error, for (debugging) convenience.
      </description>
      <arg name="object_id" type="object" summary="object where the error occurred"/>
      <arg name="code" type="uint" summary="error code"/>
      <arg name="message" type="string" summary="error description"/>
    </event>