Pixel Kitty

0 views

Skip to first unread message

Idara Viengxay

unread,

Aug 5, 2024, 3:34:07 AM8/5/24

to wealthlatabre

Thegoal of this specification is to create a flexible and performant protocolthat allows the program running in the terminal, hereafter called the client,to render arbitrary pixel (raster) graphics to the screen of the terminalemulator. The major design goals are:

The graphics should integrate with the text, in particular it should be possible to draw graphicsbelow as well as above the text, with alpha blending. The graphics should also scroll with the text, automatically.

In order to know what size of images to display and how to position them, theclient must be able to get the window size in pixels and the number of cellsper row and column. The cell width is then simply the window size divided by thenumber of rows. This can be done by using the TIOCGWINSZ ioctl. Somecode to demonstrate its use

You can also use the CSI t escape code to get the screen size. Send[14t to STDOUT and kitty will reply on STDIN with[4;;t where height and width are the windowsize in pixels. This escape code is supported in many terminals, not justkitty.

The control data is a comma-separated list of key=value pairs. The payloadis arbitrary binary data, base64-encoded to prevent interoperation problemswith legacy terminals that get confused by control codes within an APC code.The meaning of the payload is interpreted based on the control data.

The first consideration when transferring data between the client and theterminal emulator is the format in which to do so. Since there is a vast andgrowing number of image formats in existence, it does not make sense to haveevery terminal emulator implement support for them. Instead, the client shouldsend simple pixel data to the terminal emulator. The obvious downside to thisis performance, especially when the client is running on a remote machine.Techniques for remedying this limitation are discussed later. The terminalemulator must understand pixel data in three formats, 24-bit RGB, 32-bit RGBA andPNG. This is specified using the f key in the control data. f=32 (which is thedefault) indicates 32-bit RGBA data and f=24 indicates 24-bit RGB data and f=100indicates PNG data. The PNG format is supported both for convenience, and as a compact wayof transmitting paletted images.

In these formats the pixel data is stored directly as 3 or 4 bytes per pixel,respectively. The colors in the data must be in the sRGB color space. Whenspecifying images in this format, the image dimensions must be sent in thecontrol data. For example:

The PNG format is specified using the f=100 key. The width and height ofthe image will be read from the PNG data itself. Note that if you use both PNG andcompression, then you must provide the S key with the size of the PNG data.

The client can send compressed image data to the terminal emulator, byspecifying the o key. Currently, only RFC 1950 ZLIB based deflatecompression is supported, which is specified using o=z. For example:

This is the same as the example from the RGB data section, except that thepayload is now compressed using deflate (this occurs prior to base64-encoding).The terminal emulator will decompress it before rendering. You can specifycompression for any format. The terminal emulator will decompress beforeinterpreting the pixel data.

A temporary file, the terminal emulator will delete the file after reading the pixel data. For security reasonsthe terminal emulator should only delete the file if itis in a known temporary directory, such as /tmp,/dev/shm, TMPDIR env var if present and any platformspecific temporary directories and the file has thestring tty-graphics-protocol in its full file path.

A shared memory object, which on POSIX systems is aPOSIX shared memory objectand on Windows is aNamed shared memory object.The terminal emulator must read the data from the memoryobject and then unlink and close it on POSIX and justclose it on Windows.

When opening files, the terminal emulator must follow symlinks. In case ofsymlink loops or too many symlinks, it should fail and respond with an error,similar to reporting any other kind of I/O error. Since the file paths comefrom potentially untrusted sources, terminal emulators must refuse to readany device/socket/etc. special files. Only regular files are allowed.Additionally, terminal emulators may refuse to read files in sensitiveparts of the filesystem, such as /proc, /sys, /dev/, etc.

Remote clients, those that are unable to use the filesystem/shared memory totransmit data, must send the pixel data directly using escape codes. Sinceescape codes are of limited maximum length, the data will need to be chunked upfor transfer. This is done using the m key. The pixel data must first bebase64 encoded then chunked up into chunks no larger than 4096 bytes. Allchunks, except the last, must have a size that is a multiple of 4. The clientthen sends the graphics escape code as usual, with the addition of an m keythat must have the value 1 for all but the last chunk, where it must be0. For example, if the data is split into three chunks, the client wouldsend the following sequence of escape codes to the terminal emulator:

Note that only the first escape code needs to have the full set of controlcodes such as width, height, format, etc. Subsequent chunks must have onlythe m and optionally q keys. When sending animation frame data, subsequentchunks must also specify the a=f key. The client must finish sendingall chunks for a single image before sending any other graphics related escapecodes. Note that the cursor position used to display the image must be theposition when the final chunk is received. Finally, terminals must not displayanything, until the entire sequence is received and validated.

Since a client has no a-priori knowledge of whether it shares a filesystem/shared memorywith the terminal emulator, it can send an id with the control data, using the i key(which can be an arbitrary positive integer up to 4294967295, it must not be zero).If it does so, the terminal emulator will reply after trying to load the image, sayingwhether loading was successful or not. For example:

Here the i value will be the same as was sent by the client in the originalrequest. The message data will be a ASCII encoded string containing onlyprintable characters and spaces. The string will be OK if reading the pixeldata succeeded or an error message.

Sometimes, using an id is not appropriate, for example, if you do not want toreplace a previously sent image with the same id, or if you are sending a dummyimage and do not want it stored by the terminal emulator. In that case, you canuse the query action, set a=q. Then the terminal emulator will try to loadthe image and respond with either OK or an error, as above, but it will notreplace an existing image with the same id, nor will it store the image.

As of May 2023, kitty has a complete implementation of this protocol andWezTerm has a mostly complete implementation. Konsole and wayst have partialsupport. We intend that any terminal emulator that wishes to support it can do so. Tocheck if a terminal emulator supports the graphics protocol the best way is tosend the above query action followed by a request for the primary deviceattributes. If you get back ananswer for the device attributes without getting back an answer for the queryaction the terminal emulator does not support the graphics protocol.

This means that terminal emulators that support the graphics protocol, mustreply to query actions immediately without processing other input. Mostterminal emulators handle input in a FIFO manner, anyway.

If you get back a response to the graphics query, the terminal emulator supportsthe protocol, if you get back a response to the device attributes query withouta response to the graphics query, it does not.

Every transmitted image can be displayed an arbitrary number of times on thescreen, in different locations, using different parts of the source image, asneeded. Each such display of an image is called a placement. You can eithersimultaneously transmit and display an image using the action a=T, or firsttransmit the image with a id, such as i=10 and then display it witha=p,i=10 which will display the previously transmitted image at the currentcursor position. When specifying an image id, the terminal emulator will replyto the placement request with an acknowledgement code, which will be either:

when the image with the specified id was not found. This is similar to thescheme described above for querying available transmission media, except thathere we are querying if the image with the specified id is available or needs tobe re-transmitted.

Since there can be many placements per image, you can also give placements anid. To do so add the p key with a number between 1 and 4294967295.When you specify a placement id, it will be added to the acknowledgement codeabove. Every placement is uniquely identified by the pair of the image idand the placement id. If you specify a placement id for an image that doesnot have an id (i.e. has id=0), it will be ignored. In particular this meansthere can exist multiple images with image id=0, placement id=0. Notspecifying a placement id or using p=0 for multiple put commands (a=p)with the same non-zero image id results in multiple placements the image.

The image is rendered at the current cursor position, from the upper left corner ofthe current cell. You can also specify extra X=3 and Y=4 pixel offsets to display froma different origin within the cell. Note that the offsets must be smaller than the size of the cell.

By default, the entire image will be displayed (images wider than the availablewidth will be truncated on the right edge). You can choose a source rectangle (in pixels)as the part of the image to display. This is done with the keys: x, y, w, h which specifythe top-left corner, width and height of the source rectangle. The displayedarea is the intersection of the specified rectangle with the source imagerectangle.