Display artifact due to "repair" of JPEG artifacts?

126 views
Skip to first unread message

Mark Mielke

unread,
Jul 28, 2018, 1:47:10 AM7/28/18
to tigervn...@googlegroups.com
Hi all:

I've upgraded my clients and servers to TigerVNC 1.9.0. They're working quite well. No problems so far. I want to start by saying good job, and thank you.

As I have been typing with TigerVNC 1.9.0, I noticed a very fine flickering (almost unnoticeable) of the pixels of the line or two of text in my terminal window just above the cursor. I doesn't happen in 1.8.0 or prior. I believe I am seeing an artifact of this feature from the release notes:

   - Automatic "repair" of JPEG artefacts on screen in all servers

It seems like it is very quickly displaying a lower resolution of the text that might introduce a small amount of greyscale, and then quickly fixing it immediately after.

If I had high latency, I can understand why this would be useful. But my latency is relatively low. If I turn off "Allow JPEG Compression", the artifact disappears and the text is crystal clear as I type with no flickering and no lag.

I'm wondering if the sensitivity for when to use this feature might be too high. What is the threshold for this taking effect?

Also, I wonder why the few lines above might be flickering, since according to my understanding - they are not changing. Why isn't only the cursor and characters being inserted refreshing? Why might it stretch a row or two of text above? It makes me wonder if there might be a calculation problem here, where the JPEG image being applied is larger than it needs to be, and this makes the correction larger than it needs to be?

I suspect most people wouldn't notice this. I also have a particular configuration which might be unusual and might exaggerate the impact of artifacts. In my scenario, I typically have a VNC client on Windows 10, connecting to a CentOS 7 VM, and then I have a VPN connection from the CentOS 7 VM connecting to a remote Oracle Linux 7 server. However, I don't think this is the cause of the problem - it is just that my scenario might make it slightly more perceptible. Also, to be clear - I have TigerVNC 1.9.0 as client and server on all points.

Thanks!

--
Mark Mielke <mark....@gmail.com>

Mark Mielke

unread,
Jul 29, 2018, 3:27:44 AM7/29/18
to tigervn...@googlegroups.com
My ability to detect this artifact has faded (or it has disappeared). Perhaps something was triggering higher sensitivity to latency in past sessions, and it is no longer passing the threshold today.
--
Mark Mielke <mark....@gmail.com>

Mark Mielke

unread,
Jul 29, 2018, 4:12:18 AM7/29/18
to tigervn...@googlegroups.com
I spoke too soon. It's back. Especially in Emacs where it shimmers the text that I'm not changing, above the text that is changing.

Also, noticing it when running "ls" and other commands in a terminal that refresh larger portions of the terminal. I had thought previously that this used "copy region" to scroll the pixels, but perhaps it was always refreshing the entire region? In any case, now it looks like it is sending JPEG of the region getting scrolled, and then a correction update immediately following causing the text to shimmer. It's again almost unnoticeable, but I see it and it a little bit distracting.

--
Mark Mielke <mark....@gmail.com>

Pierre Ossman

unread,
Aug 13, 2018, 10:54:55 AM8/13/18
to Mark Mielke, tigervn...@googlegroups.com
On 28/07/18 07:46, Mark Mielke wrote:
> Hi all:
>
> I've upgraded my clients and servers to TigerVNC 1.9.0. They're working
> quite well. No problems so far. I want to start by saying good job, and
> thank you.

Your welcome. :)

>
> As I have been typing with TigerVNC 1.9.0, I noticed a very fine flickering
> (almost unnoticeable) of the pixels of the line or two of text in my
> terminal window just above the cursor. I doesn't happen in 1.8.0 or prior.
> I believe I am seeing an artifact of this feature from the release notes:
>
> - Automatic "repair" of JPEG artefacts on screen in all servers
>

Yeah, that sounds the most likely. I am however unable to reproduce the
effect. I don't suppose you could make a recording?

What are your network conditions like?

Have you made any changes to the compression settings?

Which TigerVNC server is this?

>
> I'm wondering if the sensitivity for when to use this feature might be too
> high. What is the threshold for this taking effect?
>

The feature is always running as soon as JPEG is used. How quick it is
depends on how much bandwidth it is detecting. It tries to make sure the
repair is only using "spare" bandwidth.

> Also, I wonder why the few lines above might be flickering, since according
> to my understanding - they are not changing. Why isn't only the cursor and
> characters being inserted refreshing? Why might it stretch a row or two of
> text above? It makes me wonder if there might be a calculation problem
> here, where the JPEG image being applied is larger than it needs to be, and
> this makes the correction larger than it needs to be?
Sometimes a larger part of the screen is updated by the application than
strictly necessary. However a default configured TigerVNC will detect
such redundant changes, so it is a bit odd...

> I suspect most people wouldn't notice this. I also have a particular
> configuration which might be unusual and might exaggerate the impact of
> artifacts. In my scenario, I typically have a VNC client on Windows 10,
> connecting to a CentOS 7 VM, and then I have a VPN connection from the
> CentOS 7 VM connecting to a remote Oracle Linux 7 server. However, I don't
> think this is the cause of the problem - it is just that my scenario might
> make it slightly more perceptible. Also, to be clear - I have TigerVNC
> 1.9.0 as client and server on all points.

So you run a second vncviewer on the CentOS machine to connect to a
second server on the Oracle machine?

Regards
--
Pierre Ossman Software Development
Cendio AB https://cendio.com
Teknikringen 8 https://twitter.com/ThinLinc
583 30 Linköping https://facebook.com/ThinLinc
Phone: +46-13-214600 https://plus.google.com/+CendioThinLinc

A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

Mark Mielke

unread,
Aug 13, 2018, 6:57:58 PM8/13/18
to Pierre Ossman, tigervn...@googlegroups.com
On Mon, Aug 13, 2018 at 10:54 AM Pierre Ossman <oss...@cendio.se> wrote:
On 28/07/18 07:46, Mark Mielke wrote:
> As I have been typing with TigerVNC 1.9.0, I noticed a very fine flickering
> (almost unnoticeable) of the pixels of the line or two of text in my
> terminal window just above the cursor. I doesn't happen in 1.8.0 or prior.
> I believe I am seeing an artifact of this feature from the release notes:
>     - Automatic "repair" of JPEG artefacts on screen in all servers
Yeah, that sounds the most likely. I am however unable to reproduce the
effect. I don't suppose you could make a recording?

Will try. Never done that before.
 
What are your network conditions like?

I have 1 Gbit/s to the home over broadband to connect to my work machine. The packet latency is around 50 ms, probably because the packets traverse through a hub in another city. The VPN software and appliance at work may constrain the bandwidth upstream from me, but for the bandwidth from VNC I think this normally shouldn't be an issue. However, if there was a presumption around packet latency implying reduced bandwidth, I could see this coming into play.

I don't see the behaviour local from my Windows 10 desktop to my Fedora 28 server sitting right next to me. However, after investigating the symptoms and options, I have since configured -CompressLevel 1 -NoJpeg for this local traffic as there is no real benefit to JPEG for this use case.

I have seen another symptom which might be similar with users in Gurgaon connecting to Ottawa, where it seems to heavily favour JPEG, but this then makes the experience unusable. In a few of these cases I have recommended they turn off JPEG, and they say the experience is much better. This suggests to me that the decision of when to send a JPEG frame instead of the lossless types might have some worst case conditions where it decides wrong?
 
Have you made any changes to the compression settings?

No changes when I originally tried it and saw the symptoms. This would have been with the default -QualityLevel of 8 I think, supposedly imperceptible? Not imperceptible to me it seems? :-)

I have since tried to adjust the QualityLevel and CompressionLevel to minimize the impact, and also investigate these options for the first time. I'm since settled on -QualityLevel 4 -CompressLevel 1 and this seems to be the best compromise I could come up with for fast screen refresh, low system overhead, and minimum visual artifacts. The -QualityLevel 4 would seem to make the perception worse - but for my own purposes it seems to make it better. My best guess right now is that if it takes less time to compress the image, and less packets to send the image, so less time to send the image, then it can more quickly get to the correcting the image, and as long as it corrects the part of the image I am looking at first, this means less perceptible issues for me. It means when I quickly scroll a rich web page in Firefox, or if I have logs streaming by ini the background, all things that I can't really focus on properly in the first place, the lower quality setting seems to work better for providing the appearance of responsive, but when it does slow down enough to focus, it quickly fixes the part of the screen I care about. Basically - I think this new feature may enable a lower QualityLevel.
 
Which TigerVNC server is this?

It's now TigerVNC 1.9.0 on all points. Started from systemd with "vncserver -autokill". So frame buffer, without any need for screen scraping.

 
> I'm wondering if the sensitivity for when to use this feature might be too
> high. What is the threshold for this taking effect?
The feature is always running as soon as JPEG is used. How quick it is
depends on how much bandwidth it is detecting. It tries to make sure the
repair is only using "spare" bandwidth.

I have since additionally noticed a panning effect. In cases where large amounts of screen are refreshed, I've noticed exactly what you describe. It seems to repair in blocks. I think I remember left to right top to bottom blocks being repaired. It's all very quick, but I see it as a shimmer with short but clear gap between it processing one block, and the next block.
 
> Also, I wonder why the few lines above might be flickering, since according
> to my understanding - they are not changing. Why isn't only the cursor and
> characters being inserted refreshing? Why might it stretch a row or two of
> text above? It makes me wonder if there might be a calculation problem
> here, where the JPEG image being applied is larger than it needs to be, and
> this makes the correction larger than it needs to be?
Sometimes a larger part of the screen is updated by the application than
strictly necessary. However a default configured TigerVNC will detect
such redundant changes, so it is a bit odd...

This is the part that I really think is the biggest concern. I can regularly see a much larger part of the screen refreshing then I think should be necessary. I will see if I can figure out how to video it to show you. As I move the cursor - I clearly seems like a section a 3 or 4 lines above the cursor, and 5 or 10 characters to the left and right of the cursor that all shimmer/flash as I type. I think if the clip region was smaller, I wouldn't be able to see anything.

Also, I don't understand why it would be sending JPEG for something this small? Is there a minimum JPEG size somewhere in there, and the logic to send JPEG is kicking in unnecessarily, and then the logic to repair the JPEG is kicking in unnecessarliy as a result?

> I suspect most people wouldn't notice this. I also have a particular
> configuration which might be unusual and might exaggerate the impact of
> artifacts. In my scenario, I typically have a VNC client on Windows 10,
> connecting to a CentOS 7 VM, and then I have a VPN connection from the
> CentOS 7 VM connecting to a remote Oracle Linux 7 server. However, I don't
> think this is the cause of the problem - it is just that my scenario might
> make it slightly more perceptible. Also, to be clear - I have TigerVNC
> 1.9.0 as client and server on all points.
So you run a second vncviewer on the CentOS machine to connect to a
second server on the Oracle machine?

Yes:

1) User (me) on Windows 10, running TigerVNC 1.9.0 x86_64 from bintray.
2) Server on CentOS 7, running custom built TigerVNC 1.9.0 x86_64 from source.
3) Client on CentOS 7, inside the CentOS 7 session started in 2, running the same client built for 2.
4) Server on Oracle Linux 7, running custom built TigerVNC 1.9.0 x86_64 from source.

I could try RHEL as well if that would help, but I don't expect any difference as they are all really the same software versions in place. Since I last posted:

1) The connection from 1)-2) above is now -CompressLevel 1 -NoJpeg, which should eliminate that leg from being a factor.
2) I have applied a few additional patches since 1.9.0 that seemed important for other reasons. They have not made a difference to the symptoms I report above.

I suppose there is some possibility that the RHEL/CentOS/OL builds I am doing are using the GCC that comes with these, and it has a bug. That seemed a remote possibility to me though, and also difficult to test for. :-) I'm more hoping that by describing my symptoms, it will clue the experts such as yourself on what code to look at, and you'll easily spot the place where "-1" should be "+1" or some such accident. :-)

--
Mark Mielke <mark....@gmail.com>

Mark Mielke

unread,
Aug 13, 2018, 7:03:14 PM8/13/18
to Pierre Ossman, tigervn...@googlegroups.com
One important clarification below which I'll correct in place... by "cursor", I mean "text cursor", not "mouse cursor". The "mouse cursor" has no artifacts.

This is the part that I really think is the biggest concern. I can regularly see a much larger part of the screen refreshing then I think should be necessary. I will see if I can figure out how to video it to show you. As I move the text cursor - I clearly seems like a section a 3 or 4 lines above the text cursor, and 5 or 10 characters to the left and right of the text cursor that all shimmer/flash as I type. I think if the clip region was smaller, I wouldn't be able to see anything.

Also, I don't understand why it would be sending JPEG for something this small? Is there a minimum JPEG size somewhere in there, and the logic to send JPEG is kicking in unnecessarily, and then the logic to repair the JPEG is kicking in unnecessarliy as a result?

> I suspect most people wouldn't notice this. I also have a particular
> configuration which might be unusual and might exaggerate the impact of
> artifacts. In my scenario, I typically have a VNC client on Windows 10,
> connecting to a CentOS 7 VM, and then I have a VPN connection from the
> CentOS 7 VM connecting to a remote Oracle Linux 7 server. However, I don't
> think this is the cause of the problem - it is just that my scenario might
> make it slightly more perceptible. Also, to be clear - I have TigerVNC
> 1.9.0 as client and server on all points.
So you run a second vncviewer on the CentOS machine to connect to a
second server on the Oracle machine?

Yes:

1) User (me) on Windows 10, running TigerVNC 1.9.0 x86_64 from bintray.
2) Server on CentOS 7, running custom built TigerVNC 1.9.0 x86_64 from source.
3) Client on CentOS 7, inside the CentOS 7 session started in 2, running the same client built for 2.
4) Server on Oracle Linux 7, running custom built TigerVNC 1.9.0 x86_64 from source.

I could try RHEL as well if that would help, but I don't expect any difference as they are all really the same software versions in place. Since I last posted:

1) The connection from 1)-2) above is now -CompressLevel 1 -NoJpeg, which should eliminate that leg from being a factor.
2) I have applied a few additional patches since 1.9.0 that seemed important for other reasons. They have not made a difference to the symptoms I report above.

I suppose there is some possibility that the RHEL/CentOS/OL builds I am doing are using the GCC that comes with these, and it has a bug. That seemed a remote possibility to me though, and also difficult to test for. :-) I'm more hoping that by describing my symptoms, it will clue the experts such as yourself on what code to look at, and you'll easily spot the place where "-1" should be "+1" or some such accident. :-)


--
Mark Mielke <mark....@gmail.com>

Mark Mielke

unread,
Aug 13, 2018, 7:19:02 PM8/13/18
to Pierre Ossman, tigervn...@googlegroups.com
And here is the link to the screen capture:


I was able to capture some worst case examples here. When Emacs updates the cursor location in response to arrow keys, it also updates the line number in the status bar and often you can see the artifacts in both locations. Always brief and almost imperceptible - but I think you will easily see it in several places in this video clip.

Thanks for looking at this. Let me know how I can help!

--
Mark Mielke <mark....@gmail.com>

Pierre Ossman

unread,
Aug 15, 2018, 4:06:49 AM8/15/18
to Mark Mielke, tigervn...@googlegroups.com
On 14/08/18 00:57, Mark Mielke wrote:
>
> I have seen another symptom which might be similar with users in Gurgaon
> connecting to Ottawa, where it seems to heavily favour JPEG, but this then
> makes the experience unusable. In a few of these cases I have recommended
> they turn off JPEG, and they say the experience is much better. This
> suggests to me that the decision of when to send a JPEG frame instead of
> the lossless types might have some worst case conditions where it decides
> wrong?
>

The algorithm isn't terribly complex and is definitely on the todo list
to be improved.

>
> No changes when I originally tried it and saw the symptoms. This would have
> been with the default -QualityLevel of 8 I think, supposedly imperceptible?
> Not imperceptible to me it seems? :-)
>

It was until it started blinking. ;)

>
> Also, I don't understand why it would be sending JPEG for something this
> small? Is there a minimum JPEG size somewhere in there, and the logic to
> send JPEG is kicking in unnecessarily, and then the logic to repair the
> JPEG is kicking in unnecessarliy as a result?
>

The algorithm was written with the idea that CPU was the bottle neck and
using JPEG was generally faster. So it can be overly aggressive in using
JPEG.

>
> 1) The connection from 1)-2) above is now -CompressLevel 1 -NoJpeg, which
> should eliminate that leg from being a factor.

Do you also use -CompressLevel 1 for the second hop? Because your video
clearly shows that something isn't working with the comparison logic,
and that setting unfortunately has the side effect of disabling it.

Could you try using -CompressLevel 2 insted? Or start the server with
-CompareFB 1?

Mark Mielke

unread,
Aug 15, 2018, 4:26:43 AM8/15/18
to Pierre Ossman, tigervn...@googlegroups.com
On Wed, Aug 15, 2018 at 4:06 AM Pierre Ossman <oss...@cendio.se> wrote:
On 14/08/18 00:57, Mark Mielke wrote:
> Also, I don't understand why it would be sending JPEG for something this
> small? Is there a minimum JPEG size somewhere in there, and the logic to
> send JPEG is kicking in unnecessarily, and then the logic to repair the
> JPEG is kicking in unnecessarliy as a result?
The algorithm was written with the idea that CPU was the bottle neck and
using JPEG was generally faster. So it can be overly aggressive in using
JPEG.

Why such a big are for JPEG, though? Why not just the before and after of where the text cursor was, and the pixels that changed in the frame buffer from before and after? I'm feeling doubtful that xfc4-terminal or other is flashing the screen... but maybe it is?
 
> 1) The connection from 1)-2) above is now -CompressLevel 1 -NoJpeg, which
> should eliminate that leg from being a factor.
Do you also use -CompressLevel 1 for the second hop? Because your video
clearly shows that something isn't working with the comparison logic,
and that setting unfortunately has the side effect of disabling it.

Yes. I thought CompressLevel 1 would minimize latency and CPU overhead, and it seemed to have the best results for me in reducing my perception of this artifact.

Could you try using -CompressLevel 2 insted? Or start the server with
-CompareFB 1?

If the -CompareFB 1 is important, I can try it. But I would lose my dozens of active windows and sessions, so if CompressLevel 2 is also satisfactory, please see this link:


On perception, I found CompressLevel 2 to make things worse. The MP4 has mostly captured the experience that I see. (I had been worried the MP4 would introduce its own artifacts, or soften the artifacts I did see... but it's pretty close...)

--
Mark Mielke <mark....@gmail.com>

Mark Mielke

unread,
Aug 15, 2018, 4:29:11 AM8/15/18
to Pierre Ossman, tigervn...@googlegroups.com
Sorry... wrong link... that was the prior one...


On Wed, Aug 15, 2018 at 4:26 AM Mark Mielke <mark....@gmail.com> wrote:
On Wed, Aug 15, 2018 at 4:06 AM Pierre Ossman <oss...@cendio.se> wrote:
On 14/08/18 00:57, Mark Mielke wrote:
> Also, I don't understand why it would be sending JPEG for something this
> small? Is there a minimum JPEG size somewhere in there, and the logic to
> send JPEG is kicking in unnecessarily, and then the logic to repair the
> JPEG is kicking in unnecessarliy as a result?
The algorithm was written with the idea that CPU was the bottle neck and
using JPEG was generally faster. So it can be overly aggressive in using
JPEG.

Why such a big are for JPEG, though? Why not just the before and after of where the text cursor was, and the pixels that changed in the frame buffer from before and after? I'm feeling doubtful that xfc4-terminal or other is flashing the screen... but maybe it is?
 
> 1) The connection from 1)-2) above is now -CompressLevel 1 -NoJpeg, which
> should eliminate that leg from being a factor.
Do you also use -CompressLevel 1 for the second hop? Because your video
clearly shows that something isn't working with the comparison logic,
and that setting unfortunately has the side effect of disabling it.

Yes. I thought CompressLevel 1 would minimize latency and CPU overhead, and it seemed to have the best results for me in reducing my perception of this artifact.

Could you try using -CompressLevel 2 insted? Or start the server with
-CompareFB 1?

If the -CompareFB 1 is important, I can try it. But I would lose my dozens of active windows and sessions, so if CompressLevel 2 is also satisfactory, please see this link:


On perception, I found CompressLevel 2 to make things worse. The MP4 has mostly captured the experience that I see. (I had been worried the MP4 would introduce its own artifacts, or soften the artifacts I did see... but it's pretty close...)


--
Mark Mielke <mark....@gmail.com>

Pierre Ossman

unread,
Aug 15, 2018, 4:40:49 AM8/15/18
to Mark Mielke, tigervn...@googlegroups.com
On 15/08/18 10:26, Mark Mielke wrote:
>
> Why such a big are for JPEG, though? Why not just the before and after of
> where the text cursor was, and the pixels that changed in the frame buffer
> from before and after? I'm feeling doubtful that xfc4-terminal or other is
> flashing the screen... but maybe it is?
>

Probably not the terminal, but rather your window manager. This is
common with compositing where the wm is lazy and just updates the entire
window (which is usually cheap since it is accelerated by a GPU, but
very troublesome for things like VNC).

>
> If the -CompareFB 1 is important, I can try it. But I would lose my dozens
> of active windows and sessions, so if CompressLevel 2 is also satisfactory,
> please see this link:
>

They should have the same effect, but apparently it's not working...

Just to make sure, could you try -CompressLevel 2 on the first client as
well?

And could you open the options dialogs on both clients while connected
to verify that the setting has truly taken effect?

You can also check the logs of the two VNC servers. The comparing
tracker will output some statistics on each disconnect. It should
indicate a 1:1 ratio (i.e. that it wasn't used).

Mark Mielke

unread,
Aug 15, 2018, 4:58:52 AM8/15/18
to Pierre Ossman, tigervn...@googlegroups.com
On Wed, Aug 15, 2018 at 4:40 AM Pierre Ossman <oss...@cendio.se> wrote:
On 15/08/18 10:26, Mark Mielke wrote: 
> If the -CompareFB 1 is important, I can try it. But I would lose my dozens
> of active windows and sessions, so if CompressLevel 2 is also satisfactory,
> please see this link:
They should have the same effect, but apparently it's not working...
Just to make sure, could you try -CompressLevel 2 on the first client as
well?

Sorry, I should have been clear.

I changed the first hop to -CompressLevel 2 -NoJpeg, and the second hop to -CompressLevel 2 -QualityLevel 4.
 
And could you open the options dialogs on both clients while connected
to verify that the setting has truly taken effect?

I have F8 on the outer session, and F7 on the inner session, and they are both aligned with the above settings.
 
You can also check the logs of the two VNC servers. The comparing
tracker will output some statistics on each disconnect. It should
indicate a 1:1 ratio (i.e. that it wasn't used).

The first hop server had:

Wed Aug 15 04:10:48 2018
 VNCSConnST:  Server default pixel format depth 24 (32bpp) little-endian rgb888
 VNCSConnST:  Client pixel format depth 24 (32bpp) little-endian rgb888

Wed Aug 15 04:17:38 2018
 Connections: closed: 192.168.1.140::50314 (Clean disconnection)
 EncodeManager: Framebuffer updates: 1273
 EncodeManager:   Tight:
 EncodeManager:     Solid: 1.605 krects, 16.6545 Mpixels
 EncodeManager:            25.0781 KiB (1:2594.92 ratio)
 EncodeManager:     Bitmap RLE: 262 rects, 52.233 kpixels
 EncodeManager:                 7.72363 KiB (1:26.8145 ratio)
 EncodeManager:     Indexed RLE: 2.443 krects, 11.844 Mpixels
 EncodeManager:                  2.70307 MiB (1:16.7252 ratio)
 EncodeManager:     Full Colour: 1.719 krects, 18.9703 Mpixels
 EncodeManager:                  8.48579 MiB (1:8.53021 ratio)
 EncodeManager:   Total: 6.029 krects, 47.5211 Mpixels
 EncodeManager:          11.2209 MiB (1:16.1616 ratio)
 TLS:         TLS session wasn't terminated gracefully
 ComparingUpdateTracker: 2.29322 Gpixels in / 46.1928 Mpixels out
 ComparingUpdateTracker: (1:49.6446 ratio)
Window manager warning: Invalid WM_TRANSIENT_FOR window 0x38000f1 specified for 0x38000f8 (Network Co).

The second hop server had:

Wed Aug 15 04:11:31 2018
 VNCSConnST:  Server default pixel format depth 24 (32bpp) little-endian rgb888
 VNCSConnST:  Client pixel format depth 24 (32bpp) little-endian rgb888

Wed Aug 15 04:17:29 2018
 Connections: closed: 10.176.154.179::52376 (Clean disconnection)
 EncodeManager: Framebuffer updates: 1229
 EncodeManager:   Tight:
 EncodeManager:     Solid: 243 rects, 3.14434 Mpixels
 EncodeManager:            3.79688 KiB (1:3235.67 ratio)
 EncodeManager:     Bitmap RLE: 106 rects, 26.722 kpixels
 EncodeManager:                 3.17969 KiB (1:33.2187 ratio)
 EncodeManager:     Indexed RLE: 1.184 krects, 4.42461 Mpixels
 EncodeManager:                  954.189 KiB (1:18.128 ratio)
 EncodeManager:     Full Colour: 370 rects, 1.35482 Mpixels
 EncodeManager:                  372.076 KiB (1:14.2352 ratio)
 EncodeManager:   Tight (JPEG):
 EncodeManager:     Full Colour: 753 rects, 6.42291 Mpixels
 EncodeManager:                  2.16377 MiB (1:11.3275 ratio)
 EncodeManager:   Total: 2.656 krects, 15.3734 Mpixels
 EncodeManager:          3.46577 MiB (1:16.93 ratio)
 TLS:         TLS session wasn't terminated gracefully
 ComparingUpdateTracker: 185.522 Mpixels in / 9.86783 Mpixels out
 ComparingUpdateTracker: (1:18.8007 ratio)

If it's stumping you... I'm starting to wonder about the RHEL/OL 7.5 side build. I use a variant of the RPM spec file that comes with the source. It results in:

-bash-4.2$ rpm -q xorg-x11-server-Xorg
xorg-x11-server-Xorg-1.19.5-5.el7.x86_64

-bash-4.2$ ldd /usr/bin/Xvnc | grep jpeg
libjpeg.so.62 => /usr/lib64/libjpeg.so.62 (0x00007effb3341000)

-bash-4.2$ rpm -q -f /usr/lib64/libjpeg.so.62
libjpeg-turbo-1.2.90-5.el7.x86_64

I'm now wondering if the RHEL/OL 7.5 libjpeg-turbo-1.2.90-5.el7 might introduce artifacts?

Tomorrow I'm going to go into work and see if I can reproduce the artifacts in reverse, connecting back home. Not sure if it'll prove or disprove anything, but it will add a data point for consideration.

--
Mark Mielke <mark....@gmail.com>

Pierre Ossman

unread,
Aug 16, 2018, 7:39:28 AM8/16/18
to Mark Mielke, tigervn...@googlegroups.com
On 15/08/18 10:58, Mark Mielke wrote:
>
> The first hop server had:
>
> ComparingUpdateTracker: 2.29322 Gpixels in / 46.1928 Mpixels out
> ComparingUpdateTracker: (1:49.6446 ratio)
>

Brain fart. It should of course _not_ have a 1:1 ratio. And it doesn't
here, meaning the comparisons are running.

> ComparingUpdateTracker: 185.522 Mpixels in / 9.86783 Mpixels out
> ComparingUpdateTracker: (1:18.8007 ratio)
>

Same thing here. So why are you seeing those artefacts...

Can you do a setup with just a single hop and see if the problem remains?

>
> Tomorrow I'm going to go into work and see if I can reproduce the artifacts
> in reverse, connecting back home. Not sure if it'll prove or disprove
> anything, but it will add a data point for consideration.
>

Please do. I have no good idea what's going on right now.

Mark Mielke

unread,
Aug 20, 2018, 1:01:52 AM8/20/18
to Pierre Ossman, tigervn...@googlegroups.com
On Thu, Aug 16, 2018 at 7:39 AM Pierre Ossman <oss...@cendio.se> wrote:
On 15/08/18 10:58, Mark Mielke wrote:
>   ComparingUpdateTracker: 185.522 Mpixels in / 9.86783 Mpixels out
>   ComparingUpdateTracker: (1:18.8007 ratio)
Same thing here. So why are you seeing those artefacts...
Can you do a setup with just a single hop and see if the problem remains?
 
> Tomorrow I'm going to go into work and see if I can reproduce the artifacts
> in reverse, connecting back home. Not sure if it'll prove or disprove
> anything, but it will add a data point for consideration.
Please do. I have no good idea what's going on right now.

I haven't finished investigating (in between other things). But some strangeness to keep you up-to-date on:

1) I tried from Windows 7 notebook over VPN direct with one hop only. Result: I didn't see the artifacts.
2) I tried from Windows 7 notebook (with integrated GPU?) over wireless, with the same two hops, Result: I didn't see the artifacts.
3) I tried from Windows 10 desktop (with nVidia dedicated GPU) over 1 Gbit/s network, with the same two hops. Result: Artifacts visible as before.

I don't know which is the variable here that is causing the artifacts, but I am planning to come up with tests to try and isolate:

1) Windows 7 vs Windows 10 as client?
2) Integrated GPU vs nVidia dedicated GPU? I tried to disable 2D/3D "enhancements"... no change so far, but I'm also not sure I successfully disabled it yet either.
3) Monitor? (I tried to adjust the monitoring "enhancements"... no change, and also that doesn't make sense that I could record the problem as that should be downstream of the monitor...)
4) Wireless vs Wired networking? Could this muck with the algorithm for guessing available bandwidth? But, the problematic hop seems to be one hop removed from this?

I have a gut sense that there is a "flash" involved in one of the legs, where the screen goes blank and then gets filled, and this is long enough to trigger the update tracker to consider it a new frame, which is then long enough for it to require a JPEG update to refresh, which then causes the JPEG image to get corrected. But, this could be a whole lot of useless conjecture as I only barely know what I'm talking about. :-)

--
Mark Mielke <mark....@gmail.com>

Mark Mielke

unread,
Aug 20, 2018, 2:43:32 AM8/20/18
to Pierre Ossman, tigervn...@googlegroups.com
On Mon, Aug 20, 2018 at 1:01 AM Mark Mielke <mark....@gmail.com> wrote:
I haven't finished investigating (in between other things). But some strangeness to keep you up-to-date on:

1) I tried from Windows 7 notebook over VPN direct with one hop only. Result: I didn't see the artifacts.
2) I tried from Windows 7 notebook (with integrated GPU?) over wireless, with the same two hops, Result: I didn't see the artifacts.
3) I tried from Windows 10 desktop (with nVidia dedicated GPU) over 1 Gbit/s network, with the same two hops. Result: Artifacts visible as before.

I don't know which is the variable here that is causing the artifacts, but I am planning to come up with tests to try and isolate:

1) Windows 7 vs Windows 10 as client?
2) Integrated GPU vs nVidia dedicated GPU? I tried to disable 2D/3D "enhancements"... no change so far, but I'm also not sure I successfully disabled it yet either.
3) Monitor? (I tried to adjust the monitoring "enhancements"... no change, and also that doesn't make sense that I could record the problem as that should be downstream of the monitor...)
4) Wireless vs Wired networking? Could this muck with the algorithm for guessing available bandwidth? But, the problematic hop seems to be one hop removed from this?

Another interesting symptom... If I turn off JPEG on the fly for the second hop using the option panel, the artifact disappears as I mentioned earlier, but if I turn JPEG back on using the same mechanism, the artifact does not return. No more bad behaviour. What does this mean?

1) When it re-initialized the JPEG compression (noJpeg / qualityLevel), it does it better in some way the second time?
2) JPEG compression doesn't actually get re-activated?

I'm trying to understand the code, and I think JPEG should turn back on right away making 1) the right understanding?

But I can't see where it might initialize better the second time yet... 

--
Mark Mielke <mark....@gmail.com>

Pierre Ossman

unread,
Aug 21, 2018, 9:00:35 AM8/21/18
to Mark Mielke, tigervn...@googlegroups.com
On 20/08/18 07:01, Mark Mielke wrote:
>
> I haven't finished investigating (in between other things). But some
> strangeness to keep you up-to-date on:
>
> 1) I tried from Windows 7 notebook over VPN direct with one hop only.
> Result: I didn't see the artifacts.
> 2) I tried from Windows 7 notebook (with integrated GPU?) over wireless,
> with the same two hops, Result: I didn't see the artifacts.
> 3) I tried from Windows 10 desktop (with nVidia dedicated GPU) over 1
> Gbit/s network, with the same two hops. Result: Artifacts visible as before.
>

That's very odd. None of those things should have any effect. The only
thing I can think of is that the Windows 7 machine has poor contrast so
the artefacts aren't as noticeable.

It would have been nice if that Windows 10 machine could have been
tested with a single hop as well.

>
> I have a gut sense that there is a "flash" involved in one of the legs,
> where the screen goes blank and then gets filled, and this is long enough
> to trigger the update tracker to consider it a new frame, which is then
> long enough for it to require a JPEG update to refresh, which then causes
> the JPEG image to get corrected. But, this could be a whole lot of useless
> conjecture as I only barely know what I'm talking about. :-)
>

Possibly. But I'm not sure what component would trigger such a clear. We
don't do that in the viewer at least.

> Another interesting symptom... If I turn off JPEG on the fly for the second
> hop using the option panel, the artifact disappears as I mentioned earlier,
> but if I turn JPEG back on using the same mechanism, the artifact does not
> return. No more bad behaviour. What does this mean?
>
> 1) When it re-initialized the JPEG compression (noJpeg / qualityLevel), it
> does it better in some way the second time?
> 2) JPEG compression doesn't actually get re-activated?

That shouldn't happen, so sounds like some form of bug. Let's focus on
one thing at a time here, so avoid changing settings on the fly. :)

Have you verified that auto mode is properly disabled? Otherwise we
might be seeing some interaction with that.

Are you running the standard GNOME environment on both servers?

Mark Mielke

unread,
Sep 29, 2018, 2:26:57 PM9/29/18
to Pierre Ossman, tigervn...@googlegroups.com
On Tue, Aug 21, 2018 at 9:00 AM Pierre Ossman <oss...@cendio.se> wrote:
On 20/08/18 07:01, Mark Mielke wrote:
> Another interesting symptom... If I turn off JPEG on the fly for the second
> hop using the option panel, the artifact disappears as I mentioned earlier,
> but if I turn JPEG back on using the same mechanism, the artifact does not
> return. No more bad behaviour. What does this mean?
>
> 1) When it re-initialized the JPEG compression (noJpeg / qualityLevel), it
> does it better in some way the second time?
> 2) JPEG compression doesn't actually get re-activated?

That shouldn't happen, so sounds like some form of bug. Let's focus on
one thing at a time here, so avoid changing settings on the fly. :)

Have you verified that auto mode is properly disabled? Otherwise we
might be seeing some interaction with that.

Are you running the standard GNOME environment on both servers?

I noticed your recent patches related to the lossless refresh. I applied them and at first, there was no real change. However, when I removed "-QualityLevel 4" that I had been using to try and minimize the symptoms, so far so good. I'm not seeing the brief obscured pixels around the cursor when moving, or changing tint of colors (particularly green?) in a larger screen of text that is moving.

I still have never managed to reproduce with a single hop. It's always with the two hops. It always seems worse moving my text cursor in my Xfce terminals than Firefox or Chrome (for which I normally don't notice any artifacts). I've also not heard a single complaint from any of the people that use this particular build, so I may be very unique with my double hop scenario (and could be unique in other ways?).

I haven't had a lot of time to look into this myself, and with my symptoms possibly cleared with the default of "-QualityLevel 8", the itch to scratch might be gone (as long as I don't reduce the -QualityLevel in future!). It still seems like my scenario is able to trigger a bug, but I don't know for sure it is in TigerVNC. Perhaps it is in libjpeg on RHEL 7, or Xorg?

I will try to find time to see if I can tickle this a little more and figure out what causes it. If you can think of anything more for me to try testing, please let me know.

Thanks for the other fixes though. TigerVNC continues to be a great solution and I love to see it honed.

--
Mark Mielke <mark....@gmail.com>

Mark Mielke

unread,
Sep 30, 2018, 12:17:48 PM9/30/18
to Pierre Ossman, tigervn...@googlegroups.com
On Sat, Sep 29, 2018 at 2:26 PM Mark Mielke <mark....@gmail.com> wrote:
I noticed your recent patches related to the lossless refresh. I applied them and at first, there was no real change. However, when I removed "-QualityLevel 4" that I had been using to try and minimize the symptoms, so far so good. I'm not seeing the brief obscured pixels around the cursor when moving, or changing tint of colors (particularly green?) in a larger screen of text that is moving.

I still have never managed to reproduce with a single hop. It's always with the two hops. It always seems worse moving my text cursor in my Xfce terminals than Firefox or Chrome (for which I normally don't notice any artifacts). I've also not heard a single complaint from any of the people that use this particular build, so I may be very unique with my double hop scenario (and could be unique in other ways?).

I haven't had a lot of time to look into this myself, and with my symptoms possibly cleared with the default of "-QualityLevel 8", the itch to scratch might be gone (as long as I don't reduce the -QualityLevel in future!). It still seems like my scenario is able to trigger a bug, but I don't know for sure it is in TigerVNC. Perhaps it is in libjpeg on RHEL 7, or Xorg?

I will try to find time to see if I can tickle this a little more and figure out what causes it. If you can think of anything more for me to try testing, please let me know.

*big sigh*

I spoke to soon above. There were periods of no perceptible issue, but after sufficient use it came back again, similar to if I temporarily disable JPEG and re-enable it, it disappears for a period. 

This is starting to smell like memory corruption to me now. A small number of bits not right causing conditional impact. Perhaps something in the JPEG data structures? I wish knew more about how that stuff worked...

--
Mark Mielke <mark....@gmail.com>

Pierre Ossman

unread,
Oct 8, 2018, 4:05:56 AM10/8/18
to Mark Mielke, tigervn...@googlegroups.com
On 30/09/2018 18:17, Mark Mielke wrote:
>
> *big sigh*
>
> I spoke to soon above. There were periods of no perceptible issue, but
> after sufficient use it came back again, similar to if I temporarily
> disable JPEG and re-enable it, it disappears for a period.
>
> This is starting to smell like memory corruption to me now. A small number
> of bits not right causing conditional impact. Perhaps something in the JPEG
> data structures? I wish knew more about how that stuff worked...
>

General memory corruption usually results in worse symptoms than this,
so I don't think that is likely.

Do you think you can experiment with other configurations to pin point
when this happens? I'm trying to replicate it here but failing. So there
must be something specific about your setup that triggers this. E.g.
could you set up test machines with some other distribution?
Reply all
Reply to author
Forward
0 new messages