Max rect size change for tight

187 views
Skip to first unread message

RG

unread,
Jun 15, 2023, 10:17:23 AM6/15/23
to TurboVNC Developer Discussion
Hi,

I have been trying to improve performances of TVNC + NoVNC in a loopback environement. I have 1000x1000 image updates at 30fps and found that the tight compression created ~15 frames of 1000x64. This in turn make NoVNC take some time to read and write each image as well as some garbage collector issue due to to many image creation.

I changed the maxRectSize from to 65'536 to 1'048'576 in tightConf (in tight.c) which send the the full image to NoVNC et improve both timing and garbage collector issue.

I was wondering if I was playing with fire and risked some unintended effects ? Is there another solution to force TVNC to send bigger chunks ?

Regards,
Rémi

DRC

unread,
Jun 15, 2023, 11:34:08 AM6/15/23
to turbovn...@googlegroups.com

The Tight encoding specification requires rectangles to be <= 2048 pixels in width, but there isn't any documented limit on the rectangle size.  I don't think that you're playing with fire necessarily, although the Tight encoder has never been tested with rectangles > 64k, so I can't guarantee that there aren't hidden bugs.  However, I question whether Tight encoding is the most appropriate way to transfer pixels in a loopback environment.  It seems like you might be better served by transferring the pixels using Raw encoding, which would require more bus bandwidth but less CPU time.

Referring to https://turbovnc.org/pmwiki/uploads/About/tighttoturbo.pdf, a big reason why TurboVNC itself doesn't use larger rectangles is that there's a tradeoff in terms of encoding efficiency.  The larger the rectangle, the more likely it is that the number of unique colors in the rectangle will exceed the Tight palette threshold.  Thus, as the rectangle size increases, you will reach a point at which only JPEG is used to encode the non-solid subrectangles.  At that point, there isn't much benefit to the complexity of Tight encoding, and you'd probably get better performance by simply encoding every rectangle as a pure JPEG image.  I am not sure whether 1 megapixel is beyond that point, but given that the palette threshold is low (24 or 96 colors, depending on the Tight compression level), it wouldn't surprise me if 1-megapixel rectangles are almost always encoded as JPEG.  In that case, the only real benefit you'd get from Tight encoding is a slight reduction in the bitstream size if there are huge areas of solid color, since the Tight encoder can encode those as a bounding box and fill color (whereas JPEG has a not-insignificant amount of overhead, both in terms of compute time and bitstream size, when encoding a single-color image.)  However, I don't know whether that benefit is worth the additional computational overhead of analyzing the rectangle, nor whether it is worth the additional bitstream size overhead of dividing non-solid areas of the rectangle into multiple JPEG subrectangles (as opposed to sending the whole rectangle as a single JPEG image.)

That explanation also serves as an explanation for why I wouldn't be willing to add a Tight compression level with a rectangle size of 1048576.  I would, however, be willing to support the pure RFB JPEG encoding type in TurboVNC, if it proves to be of any benefit.  That encoding type is dead simple and would involve merely passing every RFB rectangle directly to libjpeg-turbo.

DRC

--
You received this message because you are subscribed to the Google Groups "TurboVNC Developer Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to turbovnc-deve...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/turbovnc-devel/df98e4d5-7d4c-4f8c-acc1-4cca24b59174n%40googlegroups.com.

RG

unread,
Jul 7, 2023, 6:26:17 AM7/7/23
to TurboVNC Developer Discussion
Thanks for this in-depth answer,

I have been playing with RAW and found that it has a similar performance as single tight rectangle + JPEG 80. But I see more latency variation from the network/communication (Likely due to my system having also some other activities on localhost and that there is a lot more data to transfert).

I am trying to implement the pure JPEG, as, based on your explanation, I think it will be a cleaner solution than increasing the max rect size.

Also RealVNC seem to have already a pure JPEG rectangle (As Encoding 21) if you ever decide to implement it.

Rémi

DRC

unread,
Jul 10, 2023, 1:45:14 PM7/10/23
to turbovn...@googlegroups.com

I am not opposed to implementing pure JPEG encoding (especially if I can get funded development for it), but I probably wouldn't expose it in the TurboVNC Viewer GUI (yet), since it has a somewhat esoteric use case.  The implementation would be relatively simple.  I would just feed the rectangle directly into libjpeg-turbo using the same JPEG quality and subsampling settings that are currently used by the Tight encoder.  The pure JPEG encoding type specifies that the encoder can send motion-JPEG frames without Huffman or quantization tables, which would avoid some of the JPEG header overhead with smaller rectangles and might even eliminate some of the advantage of the hybrid Tight encoding methods.  However, the TurboJPEG API would need to be extended to support the creation of motion-JPEG frames.  Thus, in order to implement the feature the right way, I would ideally want to secure funding in order to extend the TurboJPEG API and do a cursory low-level study regarding the performance advantages of motion-JPEG frames.

As an aside, I am also interested in replacing the existing Lossless Tight encoding methods with TightPNG, but that would require extensive low-level research that I don't have the funding or time to conduct right now.  It would be great to completely revisit the encoder design from the ground up, obtaining completely new test datasets using modern applications.  (The 2D datasets in particular use obsolete applications, window managers, and X11 rendering paradigms.  Modern X11 applications are much more likely to use image-based rendering, which would likely tilt the performance scales more in favor of JPEG than indexed color encoding.)  However, I don't predict that anyone cares enough about that to pay for the labor required to do it.  Supporting pure JPEG encoding would be a lot more realistic in terms of funding, since that is probably more like a 15-25 hour project than a hundreds-of-hours project.

DRC

unread,
Aug 16, 2023, 5:11:16 PM8/16/23
to turbovn...@googlegroups.com

I did some low-level experiments with the TurboVNC Benchmark Tools (https://github.com/TurboVNC/vncbenchtools), comparing the existing TurboVNC encoder, accelerated with the Intel zlib implementation and using the "Tight + Perceptually Lossless JPEG" and "Tight + Medium-Quality JPEG" presets, against pure JPEG encoding with the same JPEG quality and subsampling levels.  The results were interesting.


Perceptually Lossless JPEG:

As expected based on prior research (https://turbovnc.org/pmwiki/uploads/About/tighttoturbo.pdf), pure JPEG with no other modifications produced worse (often much worse) compression with all datasets except Lightscape, GLXSpheres, and Quake 3, which compressed about 4-5% better.  (Catia, Teamcenter Vis, Unigraphics NX, and Google Earth regressed by less than 10%.)  Pure JPEG with no other modifications also produced worse (often much worse) performance with all datasets except Pro/E (+8%) and Quake 3 (+4%).  (3D Studio Max, Ensight, Lightscape, and Google Earth regressed by less than 10%.)

However, modifying the TurboJPEG API library so that it generates "abbreviated image datastreams", i.e. JPEG images with no embedded tables (the equivalent of motion-JPEG frames), improved the performance of pure JPEG somewhat.  Now Lightscape (+5%), GLXSpheres (+6%), Google Earth (+9%), and Quake 3 (+16%) compressed better with pure JPEG than with the TurboVNC encoder.  (photos-24, kde-hearts-24, Catia, Teamcenter Vis, and Unigraphics NX regressed by less than 10%.)  As above, only Pro/E (+8%) and Quake 3 (+5%) were faster with pure JPEG.  (3D Studio Max, Ensight, Lightscape, and Google Earth regressed by less than 10%.)

I then repeated the same tests with the Interframe Comparison Engine enabled.  Interframe comparison significantly improves compression with the 2D datasets, with mixed results for the 3D datasets, but the relative differences between the TurboVNC encoder and pure JPEG were pretty similar with interframe comparison enabled vs. disabled.  With interframe comparison enabled, Pro/E now compressed better with pure JPEG than with the TurboVNC encoder, and Unigraphics NX compressed about the same.  However, all of the other datasets were generally in the same relative ranges as above (give or take.)


Medium-Quality JPEG:

When you reduce the JPEG quality, the size of JPEG rectangles and subrectangles decreases, but the size of indexed-color rectangles stays the same.  Thus, pure JPEG was more advantageous with medium-quality JPEG than it was with perceptually lossless JPEG.  In this case, I tested only with interframe comparison enabled (since the "Tight + Medium-Quality JPEG" preset always enables it) and only with abbreviated image datastreams.

kde-hearts-16 (+4%), photos-24 (+3%), kde-hearts-24 (+11%), Lightscape (+5%), Pro/E (+6%), Unigraphics NX (+4%), GLXSpheres (+12%), Google Earth (+24%), and Quake 3 (+39%) compressed better with pure JPEG than with the TurboVNC Encoder.  (Catia and Teamcenter Vis regressed by less than 10%.)  3D Studio Max (+13%), Ensight (+38%), and Pro/E (+13%) were faster with pure JPEG.  (Lightscape, SolidWorks, Google Earth, and Quake 3 regressed by less than 10%.)


AVX2 Instructions:

The initial tests were conducted on an older machine that lacks AVX2 instructions, so I re-ran the tests on a newer machine.  This gave pure JPEG more of a performance advantage, since the Intel zlib implementation cannot use AVX2 instructions but libjpeg-turbo can.

With perceptually lossless JPEG:  Ensight (+3%), Pro/E (+11%), and Quake 3 (+7%) were faster with pure JPEG than with the TurboVNC encoder, and 3D Studio Max was about the same.  (Lightscape and Google Earth regressed by less than 10%.)

With medium-quality JPEG:  3D Studio Max (+26%), Ensight (+57%), Pro/E (+18%), SolidWorks (+8%), and Quake 3 (+3%) were faster with pure JPEG, and Maya was about the same.  (kde-hearts-24, Catia, Lightscape, and Google Earth regressed by less than 10%.)


Caveat:

The datasets in question were captured in the early 2000s (the 3D datasets in 2008 and the 2D datasets years earlier), so many of them represent outdated workloads.  (Most modern X11 applications use some form of image-based rendering rather than X11 primitives.)  Also, because of limitations in the benchmark tools (which were inherited from TightVNC), the datasets had to be generated using a very old VNC server (TightVNC 1.3.9) and viewer (RealVNC 3.3.6) and an RFB proxy that sat between the two.  That infrastructure was slow and effectively dropped a lot of frames, so the session captures and benchmark tools are not the best simulation of the TurboVNC Server.  It would not surprise me if pure JPEG performs better in real-world usage than is reflected above.


General Conclusions:

Unsurprisingly, the TurboVNC encoder is the most advantageous, relative to pure JPEG, on older (X11-primitive-based) workloads, workloads with fewer unique colors, and workloads with large areas of solid color.  Pure JPEG is the most advantageous on image-based workloads, workloads with more unique colors, and workloads with few areas of solid color.  Pure JPEG also has more of an advantage when the JPEG quality is decreased and when AVX2 instructions are available.

It seems as if pure JPEG encoding is advantageous enough in enough cases to justify its existence.  I will look at including it in the next major release of TurboVNC, along with GUI modifications (https://github.com/TurboVNC/turbovnc/issues/70, as well as exposing the CompatGUI parameter in the GUI) that will make it more straightforward to enable non-Tight encodings.

I suspect that, if I were to completely revisit my analysis from 2008 and develop entirely new datasets, I would find little justification for indexed color subencoding with modern applications.  That would mean that most of the advantage of the TurboVNC encoder these days comes from its ability to send large areas of solid color using only a few bytes.  Both X11 and RFB were designed around the limitations of 1980s systems (including the need to support single-buffered graphics systems.)  Wayland jettisons the X11 legacy, but there is also a burning need for a more modern open source/open standard remote display protocol that is not beholden to the RFB legacy, preferably a protocol that is a better fit for image-based workloads, Wayland, GPU-resident framebuffers, and modern video codecs.  See https://www.reddit.com/r/linux_gaming/comments/yvjqby/comment/jvricah/?utm_source=reddit&utm_medium=web2x&context=3, https://github.com/TurboVNC/turbovnc/issues/18, and https://github.com/TurboVNC/turbovnc/issues/19, and https://github.com/TurboVNC/turbovnc/issues/373 for more of my musings on that topic.  Do I think that anyone will ever fund that kind of blue-sky research in an open source project such as this?  Probably not.  TurboVNC is innovative compared to other VNC solutions and maybe compared to most (but not all) open source remote display solutions, but there are proprietary solutions these days that do a lot of things that VNC will never be able to do.  (Let's start with streaming over UDP, which the RFB protocol could never support.)  People mostly use TurboVNC because it's free and good enough, so I don't foresee being able to do much more with the protocol other than minor tweaks like this that allow it to get out of the way of certain use cases.

On 7/7/23 6:26 AM, RG wrote:
turbovnc_purejpeg.ods

DRC

unread,
Aug 17, 2023, 1:27:22 PM8/17/23
to turbovn...@googlegroups.com

To clarify and tie my conclusions below to the comment I made previously about rectangle sizes:

One reason why the 2D datasets, which mostly represent legacy (primitive-based and single-buffered, as opposed to image-based and double-buffered) X11 rendering workloads, perform best with the TurboVNC encoder is that they have relatively small framebuffer updates.  To put numbers on this, here are the average framebuffer update rectangle sizes (in pixels) of the various datasets:

slashdot-24:  3467
photos-24:  2962
kde-hearts-24:  1854

3dsmax-04-24:  998105
catia-02-24: 
1005257
ensight-03-24:  837883
light-08-24:  681317
maya-02-24:  985356
proe-04-24:  843144
sw-01-24:  915438
tcvis-01-24:  786859
ugnx-01-24:  793831
glxspheres-24:  478242
googleearth-24:  8898
q3demo-24:  14875

The smaller the rectangle, the greater the chance that it will have a low enough number of unique colors to qualify for indexed color subencoding.  More modern image-based workloads usually double buffer, so the rendering occurs off-screen, and the entire back buffer is swapped to the X11 display in one throw.  Such workloads generally have large framebuffer update rectangles.  That type of workload is very similar to what VirtualGL does, so the 3D datasets (which consist of OpenGL applications and Viewperf datasets running with VirtualGL) are more reflective of modern X11 applications.  However, some of those Viewperf datasets simulate wireframe modes in certain CAD applications, so they have relatively low numbers of unique colors and still benefit from the TurboVNC encoder (relative to pure JPEG or modern video codecs such as H.264.)  Other Viewperf datasets have large areas of solid color and also benefit from the TurboVNC encoder.  Applications such as games or Google Earth that fill the whole screen, render a large number of unique colors, and render few areas of solid color, are the best candidates for pure JPEG or video codecs.  The small rectangle size in the Google Earth and Quake 3 datasets is likely a result of the aforementioned ancient session capture infrastructure.  However, even with those small rectangles, both datasets generally benefited from pure JPEG encoding because of their high color counts.  On the flip side, some of the 3D datasets (Catia and Teamcenter Vis, for instance) never benefited from pure JPEG, despite having large rectangle sizes, because of their low color counts.  It is also worth mentioning that JPEG is designed to compress continuous-tone images, so it does a relatively poor job of compressing sharp features, such as those generated by wireframe modes in CAD applications.  Wireframe modes were once more common, because they provided a way to smoothly interact with models that couldn't otherwise be rendered in real time by the slow 3D accelerators available at the time.  (The first 3D accelerators I worked with in the mid 1990s, based on the 3Dlabs GLINT chip, could render about 300k polys/sec.)  Those modes are less common these days, but they still exist.

tl;dr: The TurboVNC encoder is a compromise that maximizes performance across all of those application categories as best it can, but there are specific application categories for which a more video-like encoder is a better solution.  The 2D datasets are the same datasets that Constantin used when designing the TightVNC encoder, so one of the goals of the TurboVNC encoder overhaul in 2008 was to provide similar compression ratios on those datasets relative to TightVNC 1.3.x (to convince TightVNC users that they could switch to TurboVNC without losing any performance on low-bandwidth networks) while providing optimal compression ratios and performance for 3D applications running with VirtualGL.

DRC

DRC

unread,
Aug 17, 2023, 5:11:58 PM8/17/23
to turbovn...@googlegroups.com

DRC

unread,
Jul 31, 2024, 9:52:42 PM7/31/24
to TurboVNC Developer Discussion
Popping the stack on this.  It occurred to me that I could do something kind of sneaky to enable this, which is to use Compress Level 0 (which is currently unused when JPEG is enabled) to implement a mode in which the TurboVNC encoder only uses solid and JPEG subencoding (i.e. none of the subencodings that require zlib) and sets the max. rectangle size to 1048576.  In my testing, that approach tends to have more even across-the-board performance than pure JPEG.  The average performance is much better than pure JPEG with or without tables, and the average compression ratio is roughly the same as JPEG with tables.  I need to do some more testing with it, but the main advantage of it would be that it wouldn't require any upstream modifications to libjpeg-turbo.  It would sort of occupy the same niche as "Lossless Tight", which was a mode we came up with because some people were using gigabit networks with extremely slow (1990s-era) CPUs that couldn't decompress JPEG quickly enough to keep up with the network.

DRC

unread,
Aug 2, 2024, 3:52:31 PM8/2/24
to turbovn...@googlegroups.com

So, at this point, we have several potential solutions:

1) The easiest solution is simply to allow the Tight maximum rectangle size to be changed (perhaps via an environment variable.)

2) The next easiest solution is to disable all subencodings that use zlib (perhaps when the compression level is 0.)  This could be combined with Solution 1.

3) The hardest solution is to implement pure JPEG encoding.  Part of the challenge here is the fact that the JPEG RFB encoding doesn't define a header.  Per the spec, you have to keep reading until you find a JPEG EOF marker, which is not only hard to implement but also represents a mild DoS risk.  (I say "mild" because the attack would have to be implemented in the server, and the risk would be limited to freezing any connected viewers.)  Also, to take full advantage of this feature, the TurboJPEG API would need to be extended to handle image-only datastreams.

If (1) is the minimum necessary to solve the problem from your point of view, then let's just do that.  I personally can't find any compelling overall performance advantage to (2) or (3), which makes me reluctant to spend any time implementing one of those solutions in an official capacity.  But I'm OK with an environment variable that remains undocumented until/unless it proves generally useful.

Message has been deleted

RG

unread,
Aug 30, 2024, 6:20:02 AM8/30/24
to TurboVNC Developer Discussion
I think the solution 1 would be the best as I have played with the parameter and found that 1000x1000 buffer (my full buffer size update) didn't have the best performance but 1000x500 because we gain a bit of performance boost being multi-thread without breaking NoVNC GC.

DRC

unread,
Sep 18, 2024, 10:11:43 AM9/18/24
to turbovn...@googlegroups.com

I just pushed a commit to both main (3.1.x Stable) and dev (3.2 Evolving) that allows you to change the maximum Tight subrectangle size via an undocumented environment variable (TVNC_MAXTIGHTRECTSIZE).  Please let me know if that doesn't work for some reason.

DRC

Reply all
Reply to author
Forward
0 new messages