Product SiteDocumentation Site

Chapter 4. Wayland Protocol and Model of Operation

4.1. Basic Principles
4.2. Code Generation
4.3. Wire Format
4.4. Interfaces
4.5. Versioning
4.6. Connect Time
4.7. Security and Authentication
4.8. Creating Objects
4.9. Compositor
4.10. Surfaces
4.11. Input
4.12. Output
4.13. Data sharing between clients

4.1. Basic Principles

The Wayland protocol is an asynchronous object oriented protocol. All requests are method invocations on some object. The requests include an object ID that uniquely identifies an object on the server. Each object implements an interface and the requests include an opcode that identifies which method in the interface to invoke.
The server sends back events to the client, each event is emitted from an object. Events can be error conditions. The event includes the object ID and the event opcode, from which the client can determine the type of event. Events are generated both in response to requests (in which case the request and the event constitutes a round trip) or spontaneously when the server state changes.
  • State is broadcast on connect, events are sent out when state changes. Clients must listen for these changes and cache the state. There is no need (or mechanism) to query server state.
  • The server will broadcast the presence of a number of global objects, which in turn will broadcast their current state.

4.2. Code Generation

The interfaces, requests and events are defined in protocol/wayland.xml. This xml is used to generate the function prototypes that can be used by clients and compositors.
The protocol entry points are generated as inline functions which just wrap the wl_proxy_* functions. The inline functions aren't part of the library ABI and language bindings should generate their own stubs for the protocol entry points from the xml.

4.3. Wire Format

The protocol is sent over a UNIX domain stream socket, where the endpoint usually is named wayland-0 (although it can be changed via WAYLAND_DISPLAY in the environment). The protocol is message-based. A message sent by a client to the server is called request. A message from the server to a client is called event. Every message is structured as 32-bit words, values are represented in the host's byte-order.
The message header has 2 words in it:
  • The first word is the sender's object ID (32-bit).
  • The second has 2 parts of 16-bit. The upper 16-bits are the message size in bytes, starting at the header (i.e. it has a minimum value of 8).The lower is the request/event opcode.
The payload describes the request/event arguments. Every argument is always aligned to 32-bits. Where padding is required, the value of padding bytes is undefined. There is no prefix that describes the type, but it is inferred implicitly from the xml specification.
The representation of argument types are as follows:
int, uint
The value is the 32-bit value of the signed/unsigned int.
fixed
Signed 24.8 decimal numbers. It is a signed decimal type which offers a sign bit, 23 bits of integer precision and 8 bits of decimal precision. This is exposed as an opaque struct with conversion helpers to and from double and int on the C API side.
string
Starts with an unsigned 32-bit length, followed by the string contents, including terminating null byte, then padding to a 32-bit boundary.
object
32-bit object ID.
new_id
The 32-bit object ID. On requests, the client decides the ID. The only events with new_id are advertisements of globals, and the server will use IDs below 0x10000.
array
Starts with 32-bit array size in bytes, followed by the array contents verbatim, and finally padding to a 32-bit boundary.
fd
The file descriptor is not stored in the message buffer, but in the ancillary data of the UNIX domain socket message (msg_control).

4.4. Interfaces

The protocol includes several interfaces which are used for interacting with the server. Each interface provides requests, events, and errors (which are really just special events) as described above. Specific compositor implementations may have their own interfaces provided as extensions, but there are several which are always expected to be present.
Core interfaces:
wl_display - core global object
The core global object. This is a special singleton object. It is used for internal Wayland protocol features.
wl_registry - global registry object
The global registry object. The server has a number of global objects that are available to all clients. These objects typically represent an actual object in the server (for example, an input device) or they are singleton objects that provide extension functionality. When a client creates a registry object, the registry object will emit a global event for each global currently in the registry. Globals come and go as a result of device or monitor hotplugs, reconfiguration or other events, and the registry will send out global and global_remove events to keep the client up to date with the changes. To mark the end of the initial burst of events, the client can use the wl_display.sync request immediately after calling wl_display.get_registry. A client can bind to a global object by using the bind request. This creates a client-side handle that lets the object emit events to the client and lets the client invoke requests on the object.
wl_callback - callback object
Clients can handle the 'done' event to get notified when the related request is done.
wl_compositor - the compositor singleton
A compositor. This object is a singleton global. The compositor is in charge of combining the contents of multiple surfaces into one displayable output.
wl_shm_pool - a shared memory pool
The wl_shm_pool object encapsulates a piece of memory shared between the compositor and client. Through the wl_shm_pool object, the client can allocate shared memory wl_buffer objects. All objects created through the same pool share the same underlying mapped memory. Reusing the mapped memory avoids the setup/teardown overhead and is useful when interactively resizing a surface or for many small buffers.
wl_shm - shared memory support
A global singleton object that provides support for shared memory. Clients can create wl_shm_pool objects using the create_pool request. At connection setup time, the wl_shm object emits one or more format events to inform clients about the valid pixel formats that can be used for buffers.
wl_buffer - content for a wl_surface
A buffer provides the content for a wl_surface. Buffers are created through factory interfaces such as wl_drm, wl_shm or similar. It has a width and a height and can be attached to a wl_surface, but the mechanism by which a client provides and updates the contents is defined by the buffer factory interface.
wl_data_offer - offer to transfer data
A wl_data_offer represents a piece of data offered for transfer by another client (the source client). It is used by the copy-and-paste and drag-and-drop mechanisms. The offer describes the different mime types that the data can be converted to and provides the mechanism for transferring the data directly from the source client.
wl_data_source - offer to transfer data
The wl_data_source object is the source side of a wl_data_offer. It is created by the source client in a data transfer and provides a way to describe the offered data and a way to respond to requests to transfer the data.
wl_data_device - data transfer device
There is one wl_data_device per seat which can be obtained from the global wl_data_device_manager singleton. A wl_data_device provides access to inter-client data transfer mechanisms such as copy-and-paste and drag-and-drop.
wl_data_device_manager - data transfer interface
The wl_data_device_manager is a singleton global object that provides access to inter-client data transfer mechanisms such as copy-and-paste and drag-and-drop. These mechanisms are tied to a wl_seat and this interface lets a client get a wl_data_device corresponding to a wl_seat.
wl_shell - create desktop-style surfaces
This interface is implemented by servers that provide desktop-style user interfaces. It allows clients to associate a wl_shell_surface with a basic surface.
wl_shell_surface - desktop-style metadata interface
An interface that may be implemented by a wl_surface, for implementations that provide a desktop-style user interface. It provides requests to treat surfaces like toplevel, fullscreen or popup windows, move, resize or maximize them, associate metadata like title and class, etc. On the server side the object is automatically destroyed when the related wl_surface is destroyed. On client side, wl_shell_surface_destroy() must be called before destroying the wl_surface object.
wl_surface - an onscreen surface
A surface is a rectangular area that is displayed on the screen. It has a location, size and pixel contents. The size of a surface (and relative positions on it) is described in surface local coordinates, which may differ from the buffer local coordinates of the pixel content, in case a buffer_transform or a buffer_scale is used. Surfaces are also used for some special purposes, e.g. as cursor images for pointers, drag icons, etc.
wl_seat - group of input devices
A seat is a group of keyboards, pointer and touch devices. This object is published as a global during start up, or when such a device is hot plugged. A seat typically has a pointer and maintains a keyboard focus and a pointer focus.
wl_pointer - pointer input device
The wl_pointer interface represents one or more input devices, such as mice, which control the pointer location and pointer_focus of a seat. The wl_pointer interface generates motion, enter and leave events for the surfaces that the pointer is located over, and button and axis events for button presses, button releases and scrolling.
wl_keyboard - keyboard input device
The wl_keyboard interface represents one or more keyboards associated with a seat.
wl_touch - touchscreen input device
The wl_touch interface represents a touchscreen associated with a seat. Touch interactions can consist of one or more contacts. For each contact, a series of events is generated, starting with a down event, followed by zero or more motion events, and ending with an up event. Events relating to the same contact point can be identified by the ID of the sequence.
wl_output - compositor output region
An output describes part of the compositor geometry. The compositor works in the 'compositor coordinate system' and an output corresponds to rectangular area in that space that is actually visible. This typically corresponds to a monitor that displays part of the compositor space. This object is published as global during start up, or when a monitor is hotplugged.
wl_region - region interface
A region object describes an area. Region objects are used to describe the opaque and input regions of a surface.
wl_subcompositor - sub-surface compositing
The global interface exposing sub-surface compositing capabilities. A wl_surface, that has sub-surfaces associated, is called the parent surface. Sub-surfaces can be arbitrarily nested and create a tree of sub-surfaces. The root surface in a tree of sub-surfaces is the main surface. The main surface cannot be a sub-surface, because sub-surfaces must always have a parent. A main surface with its sub-surfaces forms a (compound) window. For window management purposes, this set of wl_surface objects is to be considered as a single window, and it should also behave as such. The aim of sub-surfaces is to offload some of the compositing work within a window from clients to the compositor. A prime example is a video player with decorations and video in separate wl_surface objects. This should allow the compositor to pass YUV video buffer processing to dedicated overlay hardware when possible.
wl_subsurface - sub-surface interface to a wl_surface
An additional interface to a wl_surface object, which has been made a sub-surface. A sub-surface has one parent surface. A sub-surface's size and position are not limited to that of the parent. Particularly, a sub-surface is not automatically clipped to its parent's area. A sub-surface becomes mapped, when a non-NULL wl_buffer is applied and the parent surface is mapped. The order of which one happens first is irrelevant. A sub-surface is hidden if the parent becomes hidden, or if a NULL wl_buffer is applied. These rules apply recursively through the tree of surfaces. The behaviour of wl_surface.commit request on a sub-surface depends on the sub-surface's mode. The possible modes are synchronized and desynchronized, see methods wl_subsurface.set_sync and wl_subsurface.set_desync. Synchronized mode caches the wl_surface state to be applied when the parent's state gets applied, and desynchronized mode applies the pending wl_surface state directly. A sub-surface is initially in the synchronized mode. Sub-surfaces have also other kind of state, which is managed by wl_subsurface requests, as opposed to wl_surface requests. This state includes the sub-surface position relative to the parent surface (wl_subsurface.set_position), and the stacking order of the parent and its sub-surfaces (wl_subsurface.place_above and .place_below). This state is applied when the parent surface's wl_surface state is applied, regardless of the sub-surface's mode. As the exception, set_sync and set_desync are effective immediately. The main surface can be thought to be always in desynchronized mode, since it does not have a parent in the sub-surfaces sense. Even if a sub-surface is in desynchronized mode, it will behave as in synchronized mode, if its parent surface behaves as in synchronized mode. This rule is applied recursively throughout the tree of surfaces. This means, that one can set a sub-surface into synchronized mode, and then assume that all its child and grand-child sub-surfaces are synchronized, too, without explicitly setting them. If the wl_surface associated with the wl_subsurface is destroyed, the wl_subsurface object becomes inert. Note, that destroying either object takes effect immediately. If you need to synchronize the removal of a sub-surface to the parent surface update, unmap the sub-surface first by attaching a NULL wl_buffer, update parent, and then destroy the sub-surface. If the parent wl_surface object is destroyed, the sub-surface is unmapped.

4.5. Versioning

Every interface is versioned and every protocol object implements a particular version of its interface. For global objects, the maximum version supported by the server is advertised with the global and the actual verion of the created protocol object is determined by the version argument passed to wl_registry.bind(). For objects that are not globals, their version is inferred from the object that created them.
In order to keep things sane, this has a few implications for interface versions:
  • The object creation hierarchy must be a tree. Otherwise, infering object versions from the parent object becomes a much more difficult to properly track.
  • When the version of an interface increases, so does the version of its parent (recursively until you get to a global interface)
  • A global interface's version number acts like a counter for all of its child interfaces. Whenever a child interface gets modified, the global parent's interface version number also increases (see above). The child interface then takes on the same version number as the new version of its parent global interface.
To illustrate the above, consider the wl_compositor interface. It has two children, wl_surface and wl_region. As of wayland version 1.2, wl_surface and wl_compositor are both at version 3. If something is added to the wl_region interface, both wl_region and wl_compositor will get bumpped to version 4. If, afterwards, wl_surface is changed, both wl_compositor and wl_surface will be at version 5. In this way the global interface version is used as a sort of "counter" for all of its child interfaces. This makes it very simple to know the version of the child given the version of its parent. The child is at the highest possible interface version that is less than or equal to its parent's version.
It is worth noting a particular exception to the above versioning scheme. The wl_display (and, by extension, wl_registry) interface cannot change because it is the core protocol object and its version is never advertised nor is there a mechanism to request a different version.

4.6. Connect Time

There is no fixed connection setup information, the server emits multiple events at connect time, to indicate the presence and properties of global objects: outputs, compositor, input devices.

4.7. Security and Authentication

  • mostly about access to underlying buffers, need new drm auth mechanism (the grant-to ioctl idea), need to check the cmd stream?
  • getting the server socket depends on the compositor type, could be a system wide name, through fd passing on the session dbus. or the client is forked by the compositor and the fd is already opened.

4.8. Creating Objects

Each object has a unique ID. The IDs are allocated by the entity creating the object (either client or server). IDs allocated by the client are in the range [1, 0xfeffffff] while IDs allocated by the server are in the range [0xff000000, 0xffffffff]. The 0 ID is reserved to represent a null or non-existant object. For efficiency purposes, the IDs are densely packed in the sense that the ID N will not be used until N-1 has been used. Any ID allocation algorithm that does not maintain this property is incompatible with the implementation in libwayland.

4.9. Compositor

The compositor is a global object, advertised at connect time.

4.10. Surfaces

Surfaces are created by the client. Clients don't know the global position of their surfaces, and cannot access other clients surfaces.
See Section A.14, “wl_surface - an onscreen surface” for the protocol description.

4.11. Input

A seat represents a group of input devices including mice, keyboards and touchscreens. It has a keyboard and pointer focus. Seats are global objects. Pointer events are delivered in surface local coordinates.
The compositor maintains an implicit grab when a button is pressed, to ensure that the corresponding button release event gets delivered to the same surface. But there is no way for clients to take an explicit grab. Instead, surfaces can be mapped as 'popup', which combines transient window semantics with a pointer grab.
To avoid race conditions, input events that are likely to trigger further requests (such as button presses, key events, pointer motions) carry serial numbers, and requests such as wl_surface.set_popup require that the serial number of the triggering event is specified. The server maintains a monotonically increasing counter for these serial numbers.
Input events also carry timestamps with millisecond granularity. Their base is undefined, so they can't be compared against system time (as obtained with clock_gettime or gettimeofday). They can be compared with each other though, and for instance be used to identify sequences of button presses as double or triple clicks.
See Section A.15, “wl_seat - group of input devices” for the protocol description.
Talk about:
  • keyboard map, change events
  • xkb on Wayland
  • multi pointer Wayland
A surface can change the pointer image when the surface is the pointer focus of the input device. Wayland doesn't automatically change the pointer image when a pointer enters a surface, but expects the application to set the cursor it wants in response to the pointer focus and motion events. The rationale is that a client has to manage changing pointer images for UI elements within the surface in response to motion events anyway, so we'll make that the only mechanism for setting or changing the pointer image. If the server receives a request to set the pointer image after the surface loses pointer focus, the request is ignored. To the client this will look like it successfully set the pointer image.
The compositor will revert the pointer image back to a default image when no surface has the pointer focus for that device. Clients can revert the pointer image back to the default image by setting a NULL image.
What if the pointer moves from one window which has set a special pointer image to a surface that doesn't set an image in response to the motion event? The new surface will be stuck with the special pointer image. We can't just revert the pointer image on leaving a surface, since if we immediately enter a surface that sets a different image, the image will flicker. Broken app, I suppose.

4.12. Output

An output is a global object, advertised at connect time or as it comes and goes.
  • laid out in a big (compositor) coordinate system
  • basically xrandr over Wayland
  • geometry needs position in compositor coordinate system
  • events to advertise available modes, requests to move and change modes

4.13. Data sharing between clients

The Wayland protocol provides clients a mechanism for sharing data that allows the implementation of copy-paste and drag-and-drop. The client providing the data creates a wl_data_source object and the clients obtaining the data will see it as wl_data_offer object. This interface allows the clients to agree on a mutually supported mime type and transfer the data via a file descriptor that is passed through the protocol.
The next section explains the negotiation between data source and data offer objects. Section 4.13.2, “Data devices” explains how these objects are created and passed to different clients using the wl_data_device interface that implements copy-paste and drag-and-drop support.
MIME is defined in RFC's 2045-2049. A registry of MIME types is maintained by the Internet Assigned Numbers Authority (IANA).

4.13.1. Data negotiation

A client providing data to other clients will create a wl_data_source object and advertise the mime types for the formats it supports for that data through the wl_data_source.offer request. On the receiving end, the data offer object will generate one wl_data_offer.offer event for each supported mime type.
The actual data transfer happens when the receiving client sends a wl_data_offer.receive request. This request takes a mime type and a file descriptor as arguments. This request will generate a wl_data_source.send event on the sending client with the same arguments, and the latter client is expected to write its data to the given file descriptor using the chosen mime type.

4.13.2. Data devices

Data devices glue data sources and offers together. A data device is associated with a wl_seat and is obtained by the clients using the wl_data_device_manager factory object, which is also responsible for creating data sources.
Clients are informed of new data offers through the wl_data_device.data_offer event. After this event is generated the data offer will advertise the available mime types. New data offers are introduced prior to their use for copy-paste or drag-and-drop.

4.13.2.1. Selection

Each data device has a selection data source. Clients create a data source object using the device manager and may set it as the current selection for a given data device. Whenever the current selection changes, the client with keyboard focus receives a wl_data_device.selection event. This event is also generated on a client immediately before it receives keyboard focus.
The data offer is introduced with wl_data_device.data_offer event before the selection event.

4.13.2.2. Drag and Drop

A drag-and-drop operation is started using the wl_data_device.start_drag request. This requests causes a pointer grab that will generate enter, motion and leave events on the data device. A data source is supplied as argument to start_drag, and data offers associated with it are supplied to clients surfaces under the pointer in the wl_data_device.enter event. The data offer is introduced to the client prior to the enter event with the wl_data_device.data_offer event.
Clients are expected to provide feedback to the data sending client by calling the wl_data_offer.accept request with a mime type it accepts. If none of the advertised mime types is supported by the receiving client, it should supply NULL to the accept request. The accept request causes the sending client to receive a wl_data_source.target event with the chosen mime type.
When the drag ends, the receiving client receives a wl_data_device.drop event at which it is expected to transfer the data using the wl_data_offer.receive request.