Re:[Tigervnc-devel] H.264 Encoding in VNC

1,848 views
Skip to first unread message

juergen...@gmail.com

unread,
Aug 4, 2014, 8:39:12 PM8/4/14
to tigervn...@googlegroups.com
Hi
I found your results a little strange, so i tried to recreate them.
I somehow couldnt get your numbers, (but i get also strange ones), so i may have to look further into it. But i have to admit im not a h264 expert.
3 strange things i found:

In slashdot-24.rfb you are creating around 19000 frames, one frame for every update, but your code doesnt seem to set a i_keyint_max, so it defaults to 250 if i remember correct. So basically you force h264 to make a full screen update on every 250 frame, even if h264 detects no changes in the image. (Setting the i_keyint_max to 20000 reduces the keyframes from 80 to 4 and the size from 21.6mb at CQP 23 to 11 mb (hextile is at 9mb?, strange anyway), kde-hearts-24 is at 2.2 mb, photos at 900k at the same settings)

the output of the -o flag looks odd, but i cant verify cause i havent found a way to replay the fbs-dumped files from rfbsessions.tar.bz2

"x264 provides several methods for controlling the size of the encoded stream, including specifying a constant frame rate and a constant bit rate."
Are you sure X264_RC_CQP is constant quality? What is constant framerate?

DRC

unread,
Aug 4, 2014, 9:39:40 PM8/4/14
to tigervn...@googlegroups.com
On 8/4/14 7:39 PM, juergen...@gmail.com wrote:
> Hi
> I found your results a little strange, so i tried to recreate them.
> I somehow couldnt get your numbers, (but i get also strange ones), so i may have to look further into it. But i have to admit im not a h264 expert.
> 3 strange things i found:
>
> In slashdot-24.rfb you are creating around 19000 frames, one frame for every update, but your code doesnt seem to set a i_keyint_max, so it defaults to 250 if i remember correct. So basically you force h264 to make a full screen update on every 250 frame, even if h264 detects no changes in the image. (Setting the i_keyint_max to 20000 reduces the keyframes from 80 to 4 and the size from 21.6mb at CQP 23 to 11 mb (hextile is at 9mb?, strange anyway), kde-hearts-24 is at 2.2 mb, photos at 900k at the same settings)

OK, that's good to know. I wasn't aware of the i_keyint_max setting,
but at the same time, setting it to such a high value is probably not
the best course of action for 3D apps.

The 2D datasets were created by Constantin Kaplinsky, the author of
TightVNC, quite a few years ago. He used them when designing the
encoding methods in TightVNC. I created the 3D datasets in 2008 when I
designed the encoding methods for TurboVNC. I tested and continue to
test Constantin's 2D datasets as well, primarily as a way of validating
that TurboVNC (and, by extension, TigerVNC) are capable of compressing
the output of those types of applications as well as TightVNC used to.
Whether the datasets are a relevant simulation of real-world application
workloads depends on the app, but I do know that (a) there are some
real-world app workloads that cause a lot of very small updates to be
sent (I have seen one oil & gas app, for instance, that draws 100,000
individual points on a grid without double buffering. Ugh.) and (b)
such app workloads are a poor fit for H.264 in general.


> the output of the -o flag looks odd, but i cant verify cause i havent found a way to replay the fbs-dumped files from rfbsessions.tar.bz2

If you are generating a tight-encoded stream, then the easiest way to
play it back is to use the -benchmark option in the TurboVNC 1.2.x Java
viewer. For the googleearth and q3demo datasets, you will also have to
pass "-rgb" to compare-encodings when generating the capture file, in
order to flip the blue and red channels. If you are generating an H.264
stream, then you can play back the output in VLC or any other viewer
that supports FLV video files.


> "x264 provides several methods for controlling the size of the encoded stream, including specifying a constant frame rate and a constant bit rate."
> Are you sure X264_RC_CQP is constant quality? What is constant framerate?

QP is the "quantization parameter", so yes, it has the same relation to
quality as the quality setting in JPEG encoders. Does it mean constant
perceptual quality or constant perceptual loss? No, because if you keep
the amount of quantization constant, a more complex image will incur
more perceptual loss than a less complex image. But the point is--
constant QP is similar enough to constant JPEG quality in its behavior
that I feel that it is a valid comparison.

juergen...@gmail.com

unread,
Aug 17, 2014, 9:00:12 AM8/17/14
to tigervn...@googlegroups.com
>OK, that's good to know. I wasn't aware of the i_keyint_max setting,
>but at the same time, setting it to such a high value is probably not
>the best course of action for 3D apps.

This is just the upper bound after that an update is forced and shouldnt effect 2d or 3d workloads at all. It just makes your compression rate bad. It is good for "forward error correction" and one stream for all situations.
The value should be infinity, because you already dealing with packetloss and you only have one stream per client.
The reason i picked 20000 is cause it is bigger than all the testcases framecount and i havent found out how to set it to infinity, which is possible cause at http://gstreamer.freedesktop.org/data/doc/gstreamer/head/gst-plugins-ugly-plugins/html/gst-plugins-ugly-plugins-x264enc.html#GstX264Enc--key-int-max
it its per default set to infinity (by setting it to zero).

(I tried setting it to zero too and got a full disk ( 19000 keyframes :( ) ;) )

Interesting about the slashdot testcase is that the framecount isnt that high compared too the length (i got around 50 fps). But the compressions is bad compared to all other codexes, even the losseless. But even the compression rate is bad, the bandwith demands are low.

>(I have seen one oil & gas app, for instance, that draws 100,000
>individual points on a grid without double buffering. Ugh.)

In my opinion frames are good concept. And i think it is questionable if you need "50 fps" if your network connection is the limiting factor, and you are trying to reduce network load at every other place (reducing resolution, reducing bitrate..).
(same topic: https://www.youtube.com/watch?v=RIctzAQOe44 Minute 29min)

DRC

unread,
Aug 19, 2014, 1:44:29 PM8/19/14
to tigervn...@googlegroups.com
I reproduced your results and re-wrote the report to take them into account.

http://www.turbovnc.org/About/H264

I also discovered that I could get decent performance both with the 2D
datasets-- and with the 3D datasets that consist of a bunch of small
updates-- by putting a frame rate governor on the H.264 encoder. When
the governor is active, the encoder continues to collect new framebuffer
updates in its YUV holding buffer, but the buffer is only sent to the
x264 encoder periodically (for instance, 25 or 30 times/second.) In an
actual VNC server environment, it may be unnecessary to do this, since
there are already other mechanisms (deferred update timer, RFB flow
control extensions) that would effectively act as frame rate governors.

At any rate, my conclusions have changed a bit in light of this. I
still find that H.264 doesn't compress as well as the TurboVNC encoder
(and, by extension, the TigerVNC encoder) for most workloads, so its
potential usefulness is going to be limited to only a subset of
applications. I still think that frame spoiling in 3D apps may affect
the ability of the H.264 encoder to detect inter-frame similarities,
which may limit its usefulness even further. I also still think that it
is generally too slow for most multi-user VNC deployments. For the
workloads that benefit from H.264, it can be CPU-limited on even
connections as slow as 3 Megabits/second. The whole point of H.264 is
to squeeze more frames/second out of WAN connections, but if it's
CPU-limited, then it doesn't matter that it's reducing the bandwidth--
the frame rate will stay the same. However, I think it could be
beneficial on very slow connections-- for instance, if I wanted to play
a game remotely over a DSL line, H.264 would be the way to do it. For
anything greater than 10 Megabits/second, though, you're probably going
to be better off with the TurboVNC/TigerVNC encoder.

juergen...@gmail.com

unread,
Aug 21, 2014, 6:33:24 PM8/21/14
to tigervn...@googlegroups.com
wow, good and fast work. My guess, looking at the 2d and especially 3d test cases, is that your codec is compressing low color frames/updates much better. And of course is much faster.
I agree h264 seems more useful for very slow connections and good cpus.
(I wish my internet bandwidth (up&down) would scale with cpu improvements)

DRC

unread,
Aug 21, 2014, 7:22:41 PM8/21/14
to tigervn...@googlegroups.com
I agree. The TurboVNC/TigerVNC variant on Tight encoding does several
things that are specifically designed to improve performance and
compression for non-photographic workloads:

-- For each framebuffer update rectangle, we first try to pick out
subrectangles of significant size that are comprised of just one color.
These can be sent as simply a bounding box and a fill color.

-- For the remaining subrectangles, each is encoded based on its color
count. If it has only 2 colors (which used to be a more common case
before anti-aliased fonts became common), then it is encoded as a 1-bit
bitmap and a 2-color palette. If it has N colors, 3 <= N <= {24 or 96,
depending on the compression level}, then it is encoded using an 8-bit
bitmap and an N-color palette. If it has a higher number of colors, it
is encoded using JPEG.

Many hours were spent designing these encoding methods, mainly using
experimental trial and error. Thus, they are tuned specifically for the
types of images that one would typically encounter in a remote desktop
environment that deals both with 2D and 3D apps, and the mix of apps
(the same ones used in this study) consists of some that do
primitive-based 2D drawing, some that do image-based 2D drawing, some
low-color 3D apps (CAD apps) and some high-color 3D apps, as well as
some "smooth" 3D content and some wireframe and "non-smooth" 3D content.
Not surprisingly, H.264 generally does well on the smooth, high-color
3D content and on the image-based 2D apps.

Matthew Lai

unread,
Mar 12, 2015, 6:06:39 PM3/12/15
to tigervn...@googlegroups.com, juergen...@gmail.com
Sorry to dig up an ancient thread, but I just found this.

I was working on another H.264 desktop streaming program a while ago, and found that the biggest difference it makes from a usability perspective is smooth scrolling (which obviously the motion estimator is very good at encoding).

On websites with some images (which is virtually all websites nowadays), scrolling on VNC has always been extremely laggy, even over relatively fast connections.

Presumably that's because all the images essentially have to to sent for every scroll update, assuming the custom encoding does not do motion estimation.

Scrolling is also very common. Anyone using a browser through VNC would be scrolling a lot.

Do the datasets include those cases?

Matthew

Peter Astrand

unread,
Mar 13, 2015, 3:44:18 AM3/13/15
to Matthew Lai, tigervn...@googlegroups.com, juergen...@gmail.com
On Thu, 12 Mar 2015, Matthew Lai wrote:

> Sorry to dig up an ancient thread, but I just found this.
> I was working on another H.264 desktop streaming program a while ago, and found that the biggest difference it makes from a usability
> perspective is smooth scrolling (which obviously the motion estimator is very good at encoding).
>
> On websites with some images (which is virtually all websites nowadays), scrolling on VNC has always been extremely laggy, even over relatively
> fast connections.

That's not our experience. Are you using WinVNC?


> Presumably that's because all the images essentially have to to sent for every scroll update, assuming the custom encoding does not do motion
> estimation.

No, there's a "CopyRect" encoding, which solves most of this problem.


Br,
Peter


>
> Do the datasets include those cases?
>
> Matthew
>
> On Tuesday, August 5, 2014 at 1:39:12 AM UTC+1, juergen...@gmail.com wrote:
> Hi
> I found your results a little strange, so i tried to recreate them.
> I somehow couldnt get your numbers, (but i get also strange ones), so i may have to look further into it. But i have to admit im
> not a h264 expert.
> 3 strange things i found:
>
> In slashdot-24.rfb you are creating around 19000 frames, one frame for every update, but your code doesnt seem to set a
> i_keyint_max, so it defaults to 250 if i remember correct. So basically you force h264 to make a full screen update on every 250
> frame, even if h264 detects no changes in the image. (Setting the i_keyint_max to 20000 reduces the keyframes from 80 to 4 and the
> size from 21.6mb at CQP 23 to 11 mb (hextile is at 9mb?, strange anyway), kde-hearts-24 is at 2.2 mb, photos at 900k at the same
> settings)
>
> the output of the -o flag looks odd, but i cant verify cause i havent found a way to replay the fbs-dumped files from
> rfbsessions.tar.bz2
>
> "x264 provides several methods for controlling the size of the encoded stream, including specifying a constant frame rate and a
> constant bit rate."
> Are you sure X264_RC_CQP is constant quality? What is constant framerate?
>
> --
> You received this message because you are subscribed to the Google Groups "TigerVNC Developer Discussion" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to tigervnc-deve...@googlegroups.com.
> To post to this group, send email to tigervn...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tigervnc-devel/a86a3755-1e81-48bd-b7d7-030fdecab8c0%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
>


---
Peter Astrand ThinLinc Chief Developer
Cendio AB https://cendio.com
Teknikringen 8 https://twitter.com/ThinLinc
583 30 Linkoping https://facebook.com/ThinLinc
Phone: +46-13-214600 https://google.com/+CendioThinLinc

Matthew Lai

unread,
Mar 13, 2015, 5:49:24 AM3/13/15
to tigervn...@googlegroups.com, m...@matthewlai.ca, juergen...@gmail.com



That's not our experience. Are you using WinVNC?

That is possible, though I tried quite a few VNC programs. It's quite a while ago so I don't remember if TigerVNC was one of them.
 
Will give it a try some time.

Thanks!
Matthew

DRC

unread,
Mar 13, 2015, 2:12:01 PM3/13/15
to tigervn...@googlegroups.com
On 3/12/15 5:06 PM, Matthew Lai wrote:
> Sorry to dig up an ancient thread, but I just found this.
>
> I was working on another H.264 desktop streaming program a while ago,
> and found that the biggest difference it makes from a usability
> perspective is smooth scrolling (which obviously the motion estimator is
> very good at encoding).
>
> On websites with some images (which is virtually all websites nowadays),
> scrolling on VNC has always been extremely laggy, even over relatively
> fast connections.
>
> Presumably that's because all the images essentially have to to sent for
> every scroll update, assuming the custom encoding does not do motion
> estimation.
>
> Scrolling is also very common. Anyone using a browser through VNC would
> be scrolling a lot.
>
> Do the datasets include those cases?

Not really, but the Google Earth dataset has some image scrolling
aspects to it.

The latest report is here:
http://www.turbovnc.org/About/H264

It's difficult to compare the region-based codec used by TigerVNC (which
is based on the TurboVNC codec, which itself is based on the TightVNC
codec) with a frame-based codec such as H.264. TurboVNC and TigerVNC
are very fast CPU-wise, so their performance will typically not become
CPU-limited unless the network is very fast (20-100 Mbps.) Thus, in a
bandwidth-constrained environment, the performance of TurboVNC and
TigerVNC is going to be purely a factor of network speed and compression
ratio. However, H.264's performance is more complicated. It hangs very
heavily on the ability to coalesce framebuffer updates into frames and
encode them at a fixed interval. This is already a very awkward fit for
VNC's architecture, and the report goes into more details as to why.
VNC has a mechanism for coalescing X11 updates over a certain period of
time into a single framebuffer update, but it doesn't re-encode the
entire screen if this framebuffer update only covers a portion of it.
H.264, on the other hand, requires that the entire screen be re-encoded
each time a new update of it is sent.

In order to make H.264 perform well, you have to govern its frame rate,
so basically you will end up coalescing all of the VNC framebuffer
updates over a particular period of time and sending them as a single
frame using H.264, even if all the user did during that period of time
was type a single character into a terminal window. H.264 does a
reasonable job of interframe compression, so the subsequent frame that
only contains that tiny change won't occupy anywhere near as much space
as the original frame that has all of the pixels, but unfortunately, it
will take just as long to compress it. That's really the issue with
H.264-- it causes every framebuffer update to encode at a constant time,
regardless of how big/small the update is.

There's no way to really compare this with the TurboVNC/TigerVNC codec
meaningfully at the low level, because there is no concept of frames in
VNC. What we'd really need in order to make a reasonable comparison
would be a VNC session capture of a full-screen application, such as a
game, so there would be a direct correlation between the number of
pixels in the dataset and the frame rate. As near as I can figure,
though, if we make some simplifying assumptions-- such as analyzing the
data for specific target networks (I typically think in terms of 5
megabits as being a slow network these days) and governing H.264 to a
specific frame rate (25 fps), only the "game-like" datasets (GLXSpheres,
Google Earth, Quake) show any benefit from H.264. For instance, when
using Compression Level 2 and "low-quality JPEG" (Q=40 4:2:0, equivalent
roughly to Quality=2 in TigerVNC), the Google Earth dataset would
perform at about 5 Megapixels/sec on a 5-megabit network, whereas H.264
could deliver 25 fps on the same network. Again, the relationship
between Mpixels/sec and frames/sec is unclear, since the Google Earth 3D
region did not occupy the full screen and there were other GUI updates
involved in that dataset. However, I don't think that accounts for a 5x
difference, so my gut says that H.264 is still significantly better for
that dataset if you're on a slow network.

Matthew Lai

unread,
Mar 13, 2015, 3:02:59 PM3/13/15
to tigervn...@googlegroups.com
Thanks for the detailed reply!

From a CPU usage perspective, I guess the main thing is whether hardware-accelerated encoding is available.

All Intel, NVIDIA, and AMD GPUs have included a H.264 hardware encoder for quite some time now (Intel Quick Sync, NVENC, VCE, respectively), and since virtually all computers have accelerated decoding at least, it's possible that CPU usage can be lower than the VNC-specific protocol/encodings if hardware acceleration is used, especially on the client, where battery life is more often a serious concern. In the very near future, it's likely that more or less all computers will have H.264 encoding and decoding. For older GPUs, there are also CUDA and OpenCL based solutions.

Newer NVIDIA GPUs also have a new API called NVFBC that can handle the entire capture + encode pipeline in hardware, for capturing even high end 3D games with no performance penalty and no CPU usage.

1 significant downside of using hardware acceleration is limited number of streams. Intel Quick Sync (on their CPUs with integrated graphics) only supports 5 1080p streams for example. So this wouldn't work for people running machines with say 20 simultaneous users on a 16-core machine with very few GPUs.

Sending a H.264 frame per update is definitely a bad idea, as you mentioned in the report.

I originally thought about extending VNC to support H.264 instead of rolling my own solution from scratch, and it's the updates-based (vs frame-based) nature of VNC that made it seem much more difficult, so I definitely know where you are coming from.

For things like complex 3D games, I'm fairly certain H.264 will perform much better (quality/bandwidth-wise). If you look at Youtube videos of game footages at 1080p for example, many of them are virtually indistinguishable from raw, and Youtube uses about 3Mbps for 1080p.

In those cases the VNC method essentially degenerates to MJPEG, and H.264 is 7-10 times more bandwidth-efficient than MJPEG for video/game-like videos.

DRC

unread,
Mar 13, 2015, 6:05:55 PM3/13/15
to tigervn...@googlegroups.com
On 3/13/15 2:02 PM, Matthew Lai wrote:
> From a CPU usage perspective, I guess the main thing is whether
> hardware-accelerated encoding is available.
>
> All Intel, NVIDIA, and AMD GPUs have included a H.264 hardware encoder
> for quite some time now (Intel Quick Sync, NVENC, VCE, respectively),
> and since virtually all computers have accelerated decoding at least,
> it's possible that CPU usage can be lower than the VNC-specific
> protocol/encodings if hardware acceleration is used, especially on the
> client, where battery life is more often a serious concern. In the very
> near future, it's likely that more or less all computers will have H.264
> encoding and decoding. For older GPUs, there are also CUDA and OpenCL
> based solutions.

The problem with GPU-assisted encoding is: how to get the images to and
from the GPU, given that VNC's framebuffer is in main memory. TurboVNC
is installed in sites that are serving up 3840x1200 desktops and putting
50 users on a single server machine, so if all of those users are
constantly sending 4-megapixel images back and forth to the GPU, bus
bandwidth is going to run out pretty quickly, not to mention that the
transfer overhead may kill any performance advantage that would
otherwise be realized from using the GPU.


> Newer NVIDIA GPUs also have a new API called NVFBC that can handle the
> entire capture + encode pipeline in hardware, for capturing even high
> end 3D games with no performance penalty and no CPU usage.

Yeah, the ideal would be for VirtualGL to take advantage of that. The
issue, however, is how to get the encoded video stream to the client.
The most effective way I can think of seems to involve some sort of a
"compressed PutImage" extension, whereby VirtualGL tunnels the video
stream through the RFB protocol somehow, bypassing the VNC encoder.
Then the issue becomes how to handle it on the viewer end-- mainly how
to deal with compositing and window management, since the pixels are no
longer going through the X server. I know that IBM developed such
extensions (an X extension and an RFB extension) for RealVNC
Enterprise-- primarily because RealVNC wasn't fast enough to stream 3D
content on its own. I don't know if their RFB extension is officially
registered or not.

Another possibility is modifying VNC such that it stores its framebuffer
on the GPU, but I suspect that this would exhaust GPU memory pretty quickly.


> 1 significant downside of using hardware acceleration is limited number
> of streams. Intel Quick Sync (on their CPUs with integrated graphics)
> only supports 5 1080p streams for example. So this wouldn't work for
> people running machines with say 20 simultaneous users on a 16-core
> machine with very few GPUs.

I don't think any serious multi-user 3D servers are going to be running
Intel GPUs. Intel stuff is more consumer-oriented, so it would
primarily be a client-side thing.


> Sending a H.264 frame per update is definitely a bad idea, as you
> mentioned in the report.
>
> I originally thought about extending VNC to support H.264 instead of
> rolling my own solution from scratch, and it's the updates-based (vs
> frame-based) nature of VNC that made it seem much more difficult, so I
> definitely know where you are coming from.
>
> For things like complex 3D games, I'm fairly certain H.264 will perform
> much better (quality/bandwidth-wise). If you look at Youtube videos of
> game footages at 1080p for example, many of them are virtually
> indistinguishable from raw, and Youtube uses about 3Mbps for 1080p.
>
> In those cases the VNC method essentially degenerates to MJPEG, and
> H.264 is 7-10 times more bandwidth-efficient than MJPEG for
> video/game-like videos.

Yeah, the other issue -- as stated in the report -- is that, in order to
minimize CPU usage with x264 as much as possible, I had to maintain a
YUV-encoded copy of the remote framebuffer (so as to avoid converting
the entire framebuffer from RGB to YUV each time a new H.264 frame was
encoded.) That wouldn't be an issue with GPU-based encoding. I have no
doubt that H.264 is going to be the best solution for games, and there
is already proof of this with companies like OnLive and Gaikai using
predictive codecs for their remote game streaming solutions. It's just
a matter of figuring out how best to implement it in VNC. Maybe the
deferred update timer could be leveraged as a frame rate governor. Much
testing would need to be done, which is going to require much money.

Matthew Lai

unread,
Mar 13, 2015, 6:40:19 PM3/13/15
to tigervn...@googlegroups.com

The problem with GPU-assisted encoding is:  how to get the images to and
from the GPU, given that VNC's framebuffer is in main memory.  TurboVNC
is installed in sites that are serving up 3840x1200 desktops and putting
50 users on a single server machine, so if all of those users are
constantly sending 4-megapixel images back and forth to the GPU, bus
bandwidth is going to run out pretty quickly, not to mention that the
transfer overhead may kill any performance advantage that would
otherwise be realized from using the GPU.

Transferring to the GPU shouldn't be a problem even if the whole frame needs to be transmitted every time, for a single user.

Even at 3840x1200x4, it's only 18MB per frame. PCI-E 3.0 x16 bandwidth is about 16GB/s. There is a ~3ms per-GPU-call overhead, so we'd be looking at a few ms delay in total.

Transferring back would be even faster since the encoded image is much smaller.

Also, it should be possible to keep a frame buffer in VRAM, and use CUDA/OpenCL to update it with just updated regions (though at the memory bandwidth we have, that's probably not necessary).

For multiple users, there would need to be multiple GPUs (each with its own bus), just like how there needs to be many CPUs. A high end GPU should be able to comfortably support 3-5 users. All modern Xeon CPUs have 40 lanes of PCI-E, so 5 GPUs at 8x each should work, and still provide enough bandwidth per GPU.
 

Yeah, the ideal would be for VirtualGL to take advantage of that.  The
issue, however, is how to get the encoded video stream to the client.
The most effective way I can think of seems to involve some sort of a
"compressed PutImage" extension, whereby VirtualGL tunnels the video
stream through the RFB protocol somehow, bypassing the VNC encoder.
Then the issue becomes how to handle it on the viewer end-- mainly how
to deal with compositing and window management, since the pixels are no
longer going through the X server.  I know that IBM developed such
extensions (an X extension and an RFB extension) for RealVNC
Enterprise-- primarily because RealVNC wasn't fast enough to stream 3D
content on its own.  I don't know if their RFB extension is officially
registered or not.

I am not really familiar with RFB, but that sounds reasonable.
 

Another possibility is modifying VNC such that it stores its framebuffer
on the GPU, but I suspect that this would exhaust GPU memory pretty quickly.


Modern GPUs have many gigabytes of VRAM, so I would think encoding speed would become a problem before reaching that point (if we keep increasing the number of users per GPU). It would be a whole lot of work, though, especially to do it in a portable way. CUDA supports unified memory (unified virtual addressing, and the driver does all the actual transfers behind the scene as needed), but CUDA is NVIDIA-only.
 

> 1 significant downside of using hardware acceleration is limited number
> of streams. Intel Quick Sync (on their CPUs with integrated graphics)
> only supports 5 1080p streams for example. So this wouldn't work for
> people running machines with say 20 simultaneous users on a 16-core
> machine with very few GPUs.

I don't think any serious multi-user 3D servers are going to be running
Intel GPUs.  Intel stuff is more consumer-oriented, so it would
primarily be a client-side thing.

Definitely. With high end NVIDIA GPUs, encoding 5-10 streams at the same time shouldn't be a problem. It wouldn't increase GPU load significantly either, since H.264 encoding is done by dedicated circuitry that can't be used for anything else anyways (unless those users happen to want to encode H.264 streams on the GPU...).
 

Yeah, the other issue -- as stated in the report -- is that, in order to
minimize CPU usage with x264 as much as possible, I had to maintain a
YUV-encoded copy of the remote framebuffer (so as to avoid converting
the entire framebuffer from RGB to YUV each time a new H.264 frame was
encoded.)  That wouldn't be an issue with GPU-based encoding.  I have no
doubt that H.264 is going to be the best solution for games, and there
is already proof of this with companies like OnLive and Gaikai using
predictive codecs for their remote game streaming solutions.  It's just
a matter of figuring out how best to implement it in VNC.  Maybe the
deferred update timer could be leveraged as a frame rate governor.  Much
testing would need to be done, which is going to require much money.

Yeah a GPU can do that very quickly, but would be a problem for CPU-based encoding.

I wouldn't mind working on this when I have some free time, but I'm not sure when that will be. I have a lot of experience with H.264 (did a lot of hardware encoder tuning at my last job) and GPUs, but I know next to nothing about VNC. 

Matthew Lai

unread,
Mar 14, 2015, 1:05:28 PM3/14/15
to tigervn...@googlegroups.com
The bigger problem would be the GPU not being able to encode 50 H.264 streams simultaneously I believe.

Most can only do 5-10 (even high end GPUs), so computing power would become a limiting factor before bus bandwidth or VRAM usage. At least a few GPUs will be required to support 50 simultaneous H.264 streams.

The vast majority of 3D applications (even very texture-intensive games) right now don't use even half of the available PCI-E bandwidth, so while I can't say there is no case where it will be an issue - because I'm sure there are a few applications out there that just use bus bandwidth very inefficiently, I would say that would be a very rare situation.

On 2015-03-14 2:56 AM, DRC wrote:
That's all really good info.  With regards to bus bandwidth, though, what you didn't take into account is the frame rate. Per my previous message, there's no way to encode a partial frame with H.264, so if the remote desktop is 4MP, then all 4MP has to be downloaded to the GPU with every frame. 50 users * 4MP * 25 fps = 22.5 GB/s. OK, so we can make some simplifying assumptions-- in all likelihood, H.264 will be used only by employees connecting from remote locations and not a LAN, and with automatic desktop resizing, they will probably use a 3840x1200 desktop at work and then resize it to something smaller at home. And not all of them will be using the system at the same time or using H.264 at the same time. However, the other thing to bear in mind is that there is a lot of 3D data transiting the bus as well, particularly when dealing with volume visualization applications. So while I don't know for a fact that it will be an issue, I can certainly envision a scenario in which it would be. Storing a YUV copy of the framebuffer in GPU memory seems to make the most sense, because then you only have to send the changed regions down to the GPU.
--
You received this message because you are subscribed to the Google Groups "TigerVNC Developer Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tigervnc-deve...@googlegroups.com.
To post to this group, send email to tigervn...@googlegroups.com.

DRC

unread,
Mar 14, 2015, 2:07:30 PM3/14/15
to Matthew Lai, tigervn...@googlegroups.com
VirtualGL and TurboVNC originally came out of the oil & gas industry, where multi-gigabyte seismic datasets are common. That was originally why the concept of server-side OpenGL rendering was concocted-- when you are passing a planar probe through one of these multi-gig datasets, it has to generate very large textures on the fly, and there is no data reuse between frames (unlike with games, in which there can be at least some reuse of textures.) Thus, for these types of volume viz apps, server-side rendering is absolutely necessary (otherwise, you end up sending all of these huge textures over the network.)

These are admittedly extreme examples, but I mention them to point out the fact that games are not really what I would be targeting with this solution (primarily because the money to develop it is likely to come from visualization users.)

Matthew Lai

unread,
Mar 17, 2015, 6:00:27 AM3/17/15
to tigervn...@googlegroups.com
That makes sense, but in those kind of applications, wouldn't it make more sense to store them as 3D textures in VRAM? I guess it would require more programming to swap 3D regions and scales in and out of VRAM.
You received this message because you are subscribed to a topic in the Google Groups "TigerVNC Developer Discussion" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/tigervnc-devel/kGS73nsjXBg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to tigervnc-deve...@googlegroups.com.

To post to this group, send email to tigervn...@googlegroups.com.

DRC

unread,
Mar 17, 2015, 4:43:15 PM3/17/15
to tigervn...@googlegroups.com
The textures are stored in VRAM, but they are not reused very much.
Imagine the Visible Human dataset, which is of a similar scale to a lot
of seismic datasets. The Visible Human is 2048 x 1216 x 1871 voxels, so
assuming you were passing a planar probe down the length of the dataset,
you'd be generating 7.5-megabyte (2048 x 1216 x 24 bits) textures on the
fly, and each subsequent texture would be different. You would only
reuse the textures if you stopped and rotated or zoomed the model
instead of passing the probe through it.

Again, I don't think this necessarily negates the usefulness of H.264.
Passing planar probes through large datasets is only one of many things
that people will be doing with the system, and they won't be doing that
all the time. It's just something I have to bear in mind, because
GPU-assisted H.264 will increase the bus load. My software is used in
some supercomputer centers where the usage will definitely tend more
toward the high-end volume rendering workloads. I think that storing a
YUV-encoded copy of the framebuffer on the GPU is probably the solution
to this. Then you'll only be sending incremental updates down to the
GPU instead of the whole frame.


On 3/14/15 1:19 PM, Matthew Lai wrote:
> That makes sense, but in those kind of applications, wouldn't it make
> more sense to store them as 3D textures in VRAM? I guess it would
> require more programming to swap 3D regions and scales in and out of VRAM.
>
> On 2015-03-14 6:07 PM, DRC wrote:
>> VirtualGL and TurboVNC originally came out of the oil & gas industry,
>> where multi-gigabyte seismic datasets are common. That was originally
>> why the concept of server-side OpenGL rendering was concocted-- when
>> you are passing a planar probe through one of these multi-gig
>> datasets, it has to generate very large textures on the fly, and there
>> is no data reuse between frames (unlike with games, in which there can
>> be at least some reuse of textures.) Thus, for these types of volume
>> viz apps, server-side rendering is absolutely necessary (otherwise,
>> you end up sending all of these huge textures over the network.)
>>
>> These are admittedly extreme examples, but I mention them to point out
>> the fact that games are not really what I would be targeting with this
>> solution (primarily because the money to develop it is likely to come
>> from visualization users.)
>>
>> On Mar 14, 2015, at 12:05 PM, Matthew Lai <m...@matthewlai.ca
>>>>> <mailto:tigervnc-deve...@googlegroups.com>.
>>>>> To post to this group, send email to
>>>>> tigervn...@googlegroups.com
>>>>> <mailto:tigervn...@googlegroups.com>.
>>>>> <https://groups.google.com/d/msgid/tigervnc-devel/96086064-7f4c-4a13-a414-0bf3c17ad884%40googlegroups.com?utm_medium=email&utm_source=footer>.
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "TigerVNC Developer Discussion" group.
>>> To unsubscribe from this group and stop receiving emails from it,
>>> send an email to tigervnc-deve...@googlegroups.com
>>> <mailto:tigervnc-deve...@googlegroups.com>.
>>> To post to this group, send email to tigervn...@googlegroups.com
>>> <mailto:tigervn...@googlegroups.com>.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/tigervnc-devel/55046A51.4040101%40matthewlai.ca
>>> <https://groups.google.com/d/msgid/tigervnc-devel/55046A51.4040101%40matthewlai.ca?utm_medium=email&utm_source=footer>.
>>> For more options, visit https://groups.google.com/d/optout.
>> --
>> You received this message because you are subscribed to a topic in the
>> Google Groups "TigerVNC Developer Discussion" group.
>> To unsubscribe from this topic, visit
>> https://groups.google.com/d/topic/tigervnc-devel/kGS73nsjXBg/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to
>> tigervnc-deve...@googlegroups.com
>> <mailto:tigervnc-deve...@googlegroups.com>.
>> To post to this group, send email to tigervn...@googlegroups.com
>> <mailto:tigervn...@googlegroups.com>.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/tigervnc-devel/ED45DCDE-B348-4DBD-A127-6B21BEF62D57%40virtualgl.org
>> <https://groups.google.com/d/msgid/tigervnc-devel/ED45DCDE-B348-4DBD-A127-6B21BEF62D57%40virtualgl.org?utm_medium=email&utm_source=footer>.
>> For more options, visit https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to the Google
> Groups "TigerVNC Developer Discussion" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to tigervnc-deve...@googlegroups.com
> <mailto:tigervnc-deve...@googlegroups.com>.
> To post to this group, send email to tigervn...@googlegroups.com
> <mailto:tigervn...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tigervnc-devel/55047BAB.9090805%40matthewlai.ca
> <https://groups.google.com/d/msgid/tigervnc-devel/55047BAB.9090805%40matthewlai.ca?utm_medium=email&utm_source=footer>.

Lew Palm

unread,
Mar 19, 2015, 10:41:46 AM3/19/15
to tigervn...@googlegroups.com
Hi,

what do you think about H.264 as an additional enhancement for the tight
encoding?

Out customers are using VNC usually for web surfing. The tight (tiger,
turbo) encoder is great for that. But from time to time they watch a
Youtube video, and that consumes a lot of bandwidth. That's the
circumstance where H.264 encoding would hit the spot.

My idea is:
Use the tight encoding algorithm, just enhance it by using H.264
I-frames (or I-slices) instead of JPEG compression for the rectangles.
And: the VNC server should identify a movie area automagically. How?
It's the same rectangle (size, position) updated very often. After a
'movie rect' is identified, the next rectangles can be encoded as
P-frames (or P-slices). This should save a lot of bandwidth. I don't see
the need for a constant frame rate here.

What do you think about this approach? Is it worth a try?

Regards,
Lew

--
Dipl.-Inf. Lew Palm
Softwareentwicklung

m-privacy GmbH
Werner-Voß-Damm 62
12101 Berlin
Fon: +49 30 24342334
Fax: +49 30 99296856
http://www.m-privacy.de
GnuPG-Key-ID: 0xD51C760C

Amtsgericht Charlottenburg, HRB 84946
Geschäftsführer:
Dipl.-Kfm. Holger Maczkowsky,
Roman Maczkowsky

DRC

unread,
Mar 19, 2015, 12:45:44 PM3/19/15
to tigervn...@googlegroups.com
On 3/19/15 9:41 AM, Lew Palm wrote:
> what do you think about H.264 as an additional enhancement for the tight
> encoding?
>
> Out customers are using VNC usually for web surfing. The tight (tiger,
> turbo) encoder is great for that. But from time to time they watch a
> Youtube video, and that consumes a lot of bandwidth. That's the
> circumstance where H.264 encoding would hit the spot.
>
> My idea is:
> Use the tight encoding algorithm, just enhance it by using H.264
> I-frames (or I-slices) instead of JPEG compression for the rectangles.
> And: the VNC server should identify a movie area automagically. How?
> It's the same rectangle (size, position) updated very often. After a
> 'movie rect' is identified, the next rectangles can be encoded as
> P-frames (or P-slices). This should save a lot of bandwidth. I don't see
> the need for a constant frame rate here.
>
> What do you think about this approach? Is it worth a try?

As we have been discussing, without GPU-based compression, you would
very quickly run into a situation in which the server is CPU-limited,
and thus, even though H.264 might be doing a better job of compressing
the video region of the display, the overall performance would still
likely be slower. Let's put some numbers on this. Because you're
proposing not to use a constant frame rate with the H.264 encoder, we
can actually make a reasonable apples-to-apples comparison between H.264
and the TurboVNC encoder, using the quality-equivalent "low-quality"
settings described in the article:

http://www.turbovnc.org/About/H264

More specifically: Compression Level 2 with 4:2:0 Q=40 JPEG in the
TurboVNC encoder (approximately equal to Quality=2 in TigerVNC) and the
"High" profile with 1 thread and CQP=31 in the x264 encoder.

When you compare the TurboVNC encoder with H.264, given the restriction
of not using a constant frame rate, it quickly becomes apparent that
most of the datasets do not compress significantly better with H.264
(and some compress significantly worse.) The three that do compress
significantly better are GLXSpheres, Google Earth, and Quake 3 (not
surprising, since these are the most video-like of the datasets.)

Since the spreadsheet is available at the above link, I won't repost the
stats on the datasets here, but taking those stats, we can compute the
theoretical performance of the datasets on a 10 Mbps and a 5 Mbps
network, based on whether the encoder will be able to completely fill
the pipe or whether it will be CPU-limited:

Google Earth:

TurboVNC Mpixels/sec (10 Mbps network): 9.750 (network-limited)
TurboVNC Mpixels/sec (5 Mbps network): 4.875 (network-limited)

x264 Mpixels/sec (10 Mbps network): 0.780 (CPU-limited)
x264 Mpixels/sec (5 Mbps network): 0.780 (CPU-limited)

Quake 3:

TurboVNC Mpixels/sec (10 Mbps network): 11.289 (network-limited)
TurboVNC Mpixels/sec (5 Mbps network): 5.644 (network-limited)

x264 Mpixels/sec (10 Mbps network): 1.241 (CPU-limited)
x264 Mpixels/sec (5 Mbps network): 1.241 (CPU-limited)

GLX Spheres:

TurboVNC Mpixels/sec (10 Mbps network): 23.011 (network-limited)
TurboVNC Mpixels/sec (5 Mbps network): 11.505 (network-limited)

x264 Mpixels/sec (10 Mbps network): 19.297 (CPU-limited)
x264 Mpixels/sec (5 Mbps network): 19.297 (CPU-limited)


So what's happening here? In the Google Earth and Quake 3 datasets,
there are some other things going on outside of the 3D region of the
display, and these smaller updates are killing performance (because,
each time one of them occurs, it requires re-encoding the entire remote
framebuffer.) This is what I meant when I said that limiting the frame
rate is critical to the success of H.264. But what you're proposing is
to just use H.264 for the video regions of the display, which is more
akin to the GLX Spheres workload (in that dataset, only the 3D window is
updating, so each update is approximately equal to one frame.) Even
then, x264 is still too CPU-limited to best TurboVNC except on a very
thin pipe.

Also, the assumption you're making (that the VNC server can
automagically identify a movie area) is not valid. The only sane way I
can think of that we could identify a video region of the display is by
using the X Video Extension, and that would assume that the movie player
has XV support (most standalone movie players do, but I seriously doubt
that playing a video in the web browser is going to use XV, since XV is
window-based.) And again, it wouldn't necessarily be faster unless
you're on a very low-bandwidth network.

We could venture an educated guess, such as: if we see more than a
certain number of "large" framebuffer updates of the same size occurring
in quick succession, we might be able to assume that a video is playing,
but the problem is: such a workload could just as easily be a 3D
application running in VirtualGL, and H.264 is not necessarily a good
thing to use for all 3D applications (for CAD application workloads, for
instance, it would not do as good a job of compression as the TurboVNC
encoder does.) And for videos and games, H.264 has too much overhead
for us to guess wrong.

Lew Palm

unread,
Mar 24, 2015, 9:07:29 AM3/24/15
to tigervn...@googlegroups.com
Hi DRC and Matthew,

thank you very much for your opinions.

I see that CPU usage is the problem with my approach.

I will, anyway, do some experiments. I will replace TigerVNC's JPEG
decoder / encoder with a x264 still image (i-frame only) decoder /
encoder and do some benchmarks.

Regards,
Lew

--
Dipl.-Inf. Lew Palm
Softwareentwicklung

m-privacy GmbH
Werner-Voß-Damm 62
12101 Berlin
Fon: +49 30 24342334
Fax: +49 30 99296856 (NEU!)

DRC

unread,
Mar 24, 2015, 6:11:57 PM3/24/15
to tigervn...@googlegroups.com
I'll save you the trouble. I hacked up a codec in the TurboVNC
Benchmark Tools
(http://sourceforge.net/p/turbovnc/code/HEAD/tree/vncbenchtools/trunk/)
that does what you are proposing (replacing JPEG-encoded subrectangles
with H.264 iframes of varying sizes.) The results are attached, and--
no surprise-- H.264 does significantly worse than JPEG, both in terms of
compression ratio and performance.

The only dataset that realizes a marginal compression improvement is
Google Earth, and again, the CPU usage is so high that it can't even
fill a 2-megabit pipe. So the overall performance would be higher with
the existing (JPEG-based) codec.

Like I said before, in order to get decent performance out of H.264, we
would need to govern the frame rate somehow, so as to feed the encoder
large chunks of data at a time, and we'd need to allow it to do
interframe compression on its own. Otherwise, there is no point to
using a video codec. If you're just encoding iframes, then you're
incurring a lot of overhead for nothing. You'd be better off using a
still-image codec-- which is exactly what we're doing already. The
entropy codec in H.264 (CAVLC, CABAC) is designed with video in mind,
not still images. If you are trying to squeeze the bandwidth down to
the bare minimum, then you'd be much better off using arithmetic coding
in libjpeg-turbo. That is known to reduce the size of JPEGs by as much
as 20-25%. It will also cause about a 5-6x drop in performance, but
that's compared to the 20x drop in performance incurred by x264. In
either case, though, I will re-emphasize that squeezing down the
datastream is really of no benefit if the encoder is so slow that it
can't fill up the available network bandwidth. You're just trading a
network performance limit for a CPU performance limit.
turbovnc_h264.ods

scottbu...@gmail.com

unread,
Dec 15, 2016, 10:48:32 AM12/15/16
to TigerVNC Developer Discussion
You clearly have not used the hardware accelerated H.264, which is the entire point of trying to use the encoding algorithm. Of course H.264 will be worse if you try to do it in software! If you can manage to change your implementation so that it uses the hardware available in almost any modern computer, I would be shocked if you didn't see a marked improvement. Just look at what Steam and NVidia have done with In-Home Streaming, they can do 60 FPS 1920x1080 on a decent network connection with no issues.

DRC

unread,
Dec 15, 2016, 11:59:15 AM12/15/16
to tigervn...@googlegroups.com
Yes, and of course I'm looking into GPU-based H.264 right now (but I
have a lot of other things competing for my time.) I've spent a lot of
time researching this already (http://www.turbovnc.org/About/H264), and
it's not just a matter of how fast or slow the codec is. It also has to
do with the nature of H.264. H.264 is designed for movie frames, so it
lends itself very nicely to the types of image workloads generated by
games. (Games tend to produce images in which most of the pixels in the
image change from frame to frame, and those changes tend to have a high
degree of inter-frame correlation.) However, H.264 does not lend itself
very well to other types of image workloads, such as those generated by
CAD applications. The adaptive encoding method used by TigerVNC (which
is based on the TurboVNC encoder I designed 8 years ago, which itself is
based on the TightVNC encoder) breaks down every framebuffer update into
sub-rectangles. It starts by finding large regions of solid color and
sending those first (as a bounding box + fill color), then it counts the
number of colors in each remaining sub-rectangle, encoding the ones with
2 colors using 1-bit indexed color + Zlib, encoding the ones with a
relatively low color count using 8-bit indexed color + Zlib, and
encoding the ones with high color counts using JPEG. This is, in
practice, a much more efficient way of encoding the images produced by
non-gaming workloads (word processors, browsers, image editing
applications, design applications, and even non-immersive 2D games like
Spider Solitaire.)

Imagine the extreme example of typing in an xterm. You might only be
changing 100 pixels every time you type a character, but H.264 would
still have to encode the entire screen in order to detect that only 100
pixels changed. Even with GPU compression, that is a *lot* of
overhead-- particularly when you realize that Xvnc would have to
download its entire framebuffer to the GPU, wait for the GPU to compress
it, then fetch the compressed H.264 frame each time you typed a
character in that xterm. If H.264 offered a way for applications to
give it "hints" as to which pixels have changed, then it would become
much more trivial to implement it in a VNC environment, but that is
unfortunately not the case. That's why I'm investigating hybrid
approaches, such as using H.264 only for the pixels generated by
VirtualGL (that is, only for 3D applications.) For instance, instead of
transferring the pixels from PBOs to the X proxy using MIT-SHM,
VirtualGL could potentially pass those PBO handles to the X proxy using
some as-yet-to-be-defined custom X11 extension (sort of like MIT-SHM but
using CUDA memory instead of SysV shared memory segments, essentially),
and the X proxy would take care of compressing those pixels on the GPU
and inserting them into the protocol stream sent to the client. In the
case of VNC, the non-3D pixels-- such as the characters you type into
the xterm-- would continue to be encoded using normal RFB methods.

https://github.com/TurboVNC/turbovnc/issues/19

gives more details regarding the challenges of such a solution and why
it's not as simple as just using the GPU. Right now, I'm in the process
of incorporating NVENC into the TurboVNC Benchmark Tools, so I can
ascertain whether a naive approach (simply implementing NVENC to encode
the full VNC framebuffer) is worth pursuing or whether the overhead of
such a solution would necessitate more of a hybrid approach. NVENC is
not exactly the easiest API to use, and it's made more difficult by the
fact that I'm forced to use it at an extremely low level to avoid the
proprietary nVidia wrapper code. I'll keep everyone apprised of my
progress, as all of my work is being conducted in open source.

The other limitation of a GPU-based solution is likely going to be the
fact that these GPUs only support a very limited number of simultaneous
H.264 encoder instances (even as low as 2 simultaneous instances in some
cases.) It is not unusual for dozens of users to share a GPU in a
large-scale VirtualGL server, and because the image encoding is
performed on the CPU, this is a very balanced configuration. If
suddenly those dozens of users are trying to compress H.264 frames on
the GPU as well as using it for real-time 3D rendering, I could easily
foresee issues with GPU contention. Steam in-home streaming is not an
"on-demand" solution that has to support dozens of simultaneous users on
the same GPU, so they can get away with monopolizing the GPU resources.
We can't.

In short, I think H.264 will benefit some use cases, but it's far from
being a panacea. I'm still trying to figure out how best to take
advantage of it, and given that Wayland is now making its way into the
mainstream, that throws another wrench into the works. It doesn't make
much sense to develop some complicated hybrid solution to support
GPU-based H.264 encoding in an X proxy when X proxies will eventually go
away. Since Wayland is all image-based, it lends itself much more to H.264.

kyleju...@gmail.com

unread,
Jun 12, 2017, 12:03:59 PM6/12/17
to TigerVNC Developer Discussion, juergen...@gmail.com
I believe most of the modern remote control programs are all using H264/H265/VP8/VP9 Codecs.

I am a windows system admin.

The most popular ones for home users are Splashtop and Steam Streamer. They both use H264 alike codec with variable bitrates(so in the quick motion I can see the quality drops to maintain remote control smoothness and responsive. When steady, the picture quick turn sharper). The result is so stunning, users actually can play 3D shooting games via remote desktop.

For the business, I see the nomachine using VP9 as NX protocol. And Sunlogin is so similar to VNC that uses "Mirror Driver" for capture and H264 for transfer. The result is so excellent that I can play youtube @30 fps on the remote server and watch it from home pc @20 fps without noticeable quality loss. (bad thing is most of those program are heavy commercial, closed source, all need to login to their server, keep popup "consider buying" dialog and so on.)

The real deal is, the smoothness of H264 for scrolling/drag drop/big dialog openning etc makes me enjoy my work much more. (the tiny laggy from vnc protocol create headache overtime)

DRC

unread,
Jun 12, 2017, 12:57:55 PM6/12/17
to tigervn...@googlegroups.com
I can certainly appreciate that, but because of the nature of X11 and
the RFB protocol, Xvnc has always been an awkward fit for these video
codecs. X11 is primitive-based, not frame-based, so much of the
complexity in the interface between the X server and the VNC server
involves determining when the X server has updated the remote
framebuffer and which parts of that framebuffer should be transmitted as
framebuffer updates. Video codecs throw a monkey wrench into that,
because they eliminate the ability to send partial framebuffer updates.
If, for instance, H.264 allowed us to manually compute and specify the
inter-frame differences external to the codec, then this would all be
academic. As it stands, however, we would have to encode the entire
framebuffer as a video frame and allow the video codec to handle any
differences relative to previous frames. That means that there would be
a fixed computational overhead on the server for each framebuffer
update, and we'd incur that overhead regardless of whether 1 pixel had
changed or the entire framebuffer had changed. In order to counter
this, it would be necessary to coalesce framebuffer updates (perhaps
using the deferred update timer or a similar mechanism) in order to
ensure that the codec doesn't bog down on a bunch of tiny updates. This
is extremely inefficient, because we'd basically be ignoring all of the
hints that the X server is feeding the VNC server regarding which
regions of the framebuffer have changed. There are some applications
that can be shown to benefit from this (games, for instance), but for 2D
applications and certain types of 3D applications (CAD, for instance)
that modify much smaller subsets of the framebuffer, it definitely isn't
a win.

Also, it's potentially much easier to use an onboard GPU encoder like
NVENC to encode a physical framebuffer (which is already in GPU memory)
as opposed to a virtual framebuffer such as Xvnc uses (which is in main
memory.) This becomes even more thorny when we introduce VirtualGL into
the mix, because potentially the rendered pixels from a 3D application
could transit the bus three times (once when VirtualGL reads them back
and displays them using XShmPutImage(), another when the VNC server
sends them back to the GPU for encoding using NVENC, and a third to read
back the encoded video frame for transmission.) The "screen scraper"
architecture (which is inherently only single-user, which is why
multi-user on-demand solutions like ThinLinc and TurboVNC don't use it)
is actually better positioned to take advantage of these video codecs,
because (a) the pixels are already in GPU memory, and (b) screen
scrapers generally already encode the framebuffer on a fixed interval,
so the video codec is eliminating solution complexity, not adding it.

Wayland greatly simplifies a lot of this, because Wayland actually is
frame-based rather than primitive-based, and there are ways of building
a remote-display-enabled Wayland compositor that keeps its framebuffer
resident in GPU memory.

I'm in the process of figuring out how (or if it even makes sense) to
use NVENC within Xvnc, but figuring out how to program NVENC without
violating the GPL is challenging. It is basically requiring me to
access it at the lowest possible levels of the codec, bypassing nVidia's
user friendly (but unfortunately GPL-incompatible) wrapper code.
Reply all
Reply to author
Forward
0 new messages