vncviewer freeze sometimes

39 views
Skip to first unread message

Daniel Pirkl

unread,
May 23, 2022, 3:48:37 AM5/23/22
to TigerVNC Developer Discussion
Hallo everybody,

I am using Slackwarearm 15.0 on Raspberry 4B. I am running vncviewer 1.12.0 on it, the raspberry works as remote desktop to some workstation. Problem is, that vncviewer freeze periodically after a few days. I have made some debuging and I have found out, that one thread is waiting in  producerCond->wait() function and four other threads are waiting on manager->consumerCond->wait() function, while there is  one message in the workQueue. I don't understand the code well, but I tried to make a simple patch. 

--- /root/tigervnc-1.12.0/common/rfb/DecodeManager.cxx    2021-11-09 07:51:28.000000000 +0000
+++ DecodeManager.cxx    2022-05-19 07:38:33.752184358 +0000
@@ -190,7 +190,9 @@
 
   // We only put a single entry on the queue so waking a single
   // thread is sufficient
-  consumerCond->signal();
+  //XXX signal has been changed to broadcast
+  //consumerCond->signal();
+  consumerCond->broadcast();
 
   queueMutex->unlock();
 
@@ -202,6 +204,7 @@
   queueMutex->lock();
 
   while (!workQueue.empty())
+    //XXX one thread was waiting here
     producerCond->wait();
 
   queueMutex->unlock();
@@ -273,6 +276,7 @@
     entry = findEntry();
     if (entry == NULL) {
       // Wait and try again
+      //XXX four threads were waiting here
       manager->consumerCond->wait();
       continue;
     }

To my great surprise the vncviewer stops freezing. I don't fully understand the code, so I am informing you about this bug and I hope, that you will want to make some proper solution. 
 
Daniel Pirkl

Pierre Ossman

unread,
May 25, 2022, 3:04:23 AM5/25/22
to Daniel Pirkl, TigerVNC Developer Discussion
On 23/05/2022 09:48, Daniel Pirkl wrote:
> Hallo everybody,
>
> I am using Slackwarearm 15.0 on Raspberry 4B. I am running vncviewer 1.12.0
> on it, the raspberry works as remote desktop to some workstation. Problem
> is, that vncviewer freeze periodically after a few days. I have made some
> debuging and I have found out, that one thread is waiting
> in producerCond->wait() function and four other threads are waiting
> on manager->consumerCond->wait() function, while there is one message in
> the workQueue. I don't understand the code well, but I tried to make a
> simple patch.
>

I'm afraid this is not an issue we've heard of before. And that code has
been around for many years. So it's a bit surprising to see issues now. :/

I don't think your patch fixes whatever issue is behind this. It's
likely only making it more rare, as more threads will be active and
fighting over work.

It's unfortunate that it takes so long to trigger. We would need more
details about the work queue to figure out why this happens. The
expected behaviour is that at least one of those threads should have
grabbed that remaining entry and processed it.

Do you have the details about what is in that last entry?

And what server are you using?

Regards
--
Pierre Ossman Software Development
Cendio AB https://cendio.com
Teknikringen 8 https://twitter.com/ThinLinc
583 30 Linköping https://facebook.com/ThinLinc
Phone: +46-13-214600

A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

Daniel Pirkl

unread,
May 26, 2022, 5:28:04 AM5/26/22
to TigerVNC Developer Discussion
Dne středa 25. května 2022 v 9:04:23 UTC+2 uživatel Pierre Ossman napsal:

Do you have the details about what is in that last entry?

And what server are you using?


I switched back to original version. If vncviewer freeze again, I will send you some more detailed info.

I am using x11vnc  server version  0.9.16 on x86_64 architecture. 

Daniel Pirkl

unread,
Jun 7, 2022, 1:23:43 AM6/7/22
to TigerVNC Developer Discussion
The vncviewer freeze again (it is running under gdb now). I am sending some info:

(gdb) info threads
  Id   Target Id                                Frame.
* 1    Thread 0xb6fef5c0 (LWP 1866) "vncviewer" 0xb6bd9ed0 in __futex_abstimed_wait_cancelable64 () from /lib/libpthread.so.0
  2    Thread 0xb5c6a070 (LWP 1870) "vncviewer" 0xb6bd9ed0 in __futex_abstimed_wait_cancelable64 () from /lib/libpthread.so.0
  3    Thread 0xb5269070 (LWP 1871) "vncviewer" 0xb6bd9ed0 in __futex_abstimed_wait_cancelable64 () from /lib/libpthread.so.0
  4    Thread 0xb48ff070 (LWP 1872) "vncviewer" 0xb6bd9ed0 in __futex_abstimed_wait_cancelable64 () from /lib/libpthread.so.0
  5    Thread 0xb40fe070 (LWP 1873) "vncviewer" 0xb6bd9ed0 in __futex_abstimed_wait_cancelable64 () from /lib/libpthread.so.0
(gdb) bt
#0  0xb6bd9ed0 in __futex_abstimed_wait_cancelable64 () from /lib/libpthread.so.0
#1  0xb6bd2ab4 in pthread_cond_wait@@GLIBC_2.4 () from /lib/libpthread.so.0
#2  0x000735b0 in os::Condition::wait (this=<optimized out>) at /eldis/tigervnc-1.12.0/common/os/Mutex.cxx:123
#3  0x00054de4 in rfb::DecodeManager::flush (this=this@entry=0xf4a00) at /eldis/tigervnc-1.12.0/common/rfb/DecodeManager.cxx:208
#4  0x0004d5e8 in rfb::CConnection::framebufferUpdateEnd (this=this@entry=0xf4888) at /eldis/tigervnc-1.12.0/common/rfb/CConnection.cxx:493
#5  0x000333b4 in CConn::framebufferUpdateEnd (this=0xf4888) at /eldis/tigervnc-1.12.0/vncviewer/CConn.cxx:372
#6  0x000510f4 in rfb::CMsgReader::readMsg (this=0x124040) at /eldis/tigervnc-1.12.0/common/rfb/CMsgReader.cxx:204
#7  0x0004c748 in rfb::CConnection::processMsg (this=this@entry=0xf4888) at /eldis/tigervnc-1.12.0/common/rfb/CConnection.cxx:133
#8  0x000327ac in CConn::socketEvent (fd=4, data=0xf4888) at /eldis/tigervnc-1.12.0/vncviewer/CConn.cxx:254
#9  0xb6f20ed8 in ?? () from /usr/lib/libfltk.so.1.3
(gdb) up 3
#3  0x00054de4 in rfb::DecodeManager::flush (this=this@entry=0xf4a00) at /eldis/tigervnc-1.12.0/common/rfb/DecodeManager.cxx:208
208        producerCond->wait();
(gdb) print workQueue.size ()
$6 = 1
(gdb) print *workQueue,front ()
$7 = {active = false, rect = {tl = {x = 288, y = 320}, br = {x = 320, y = 324}}, encoding = 7, decoder = 0x127860, server = 0xf488c, pb = 0x124810, bufferStream = 0xfc618, affectedRegion = {rgn = 0x1afda8}}

Daniel Pirkl


Pierre Ossman

unread,
Jun 16, 2022, 5:21:27 AM6/16/22
to Daniel Pirkl, TigerVNC Developer Discussion
On 07/06/2022 07:23, Daniel Pirkl wrote:
> (gdb) print *workQueue,front ()
> $7 = {active = false, rect = {tl = {x = 288, y = 320}, br = {x = 320, y =
> 324}}, encoding = 7, decoder = 0x127860, server = 0xf488c, pb = 0x124810,
> bufferStream = 0xfc618, affectedRegion = {rgn = 0x1afda8}}
>

That entry looks completely normal. I'm staring at the code, and I'm not
seeing any way where that entry can be left on the queue.

I can only think of two scenarios happening:

a) The queue is empty, all threads idle, that last rectangle is read by
the main thread, put on the queue, and somehow it fails to wake a thread
to consume it

b) This entry is blocked by another entry, which means at least one
thread is busy doing stuff. Once that thread finishes processing the
blocking entry, that previously blocked entry should be grabbed, but
isn't for some reason.


Scenario a) shouldn't be possible, as the main thread holds the queue
lock when adding the new entry, and keeps the lock whilst waking a
thread. So the entry and the signal should be safely paired.

On the other side, the threads only stop and wait for the signal after
holding the queue lock and checking if there are any entries. The lock
guarantees that this is done in a way where the signal cannot be missed.


Scenario b) should also not be possible. When the thread handling the
blocking rect finishes, it will both wake all other threads, and look at
the queue itself. The remaining entry is then at the front of the queue,
and not active. So the expected result is that the same thread will grab
the entry. And even if it doesn't the three other threads should also
look at the queue and grab it.


Your workaround suggests that scenario a) is happening, as that's where
it will have the most effect. But I don't see how that's possible,
barring some system level (i.e. kernel or glibc) bug.


I'm not sure how to proceed here. Normally, some more diagnostic info
would be added (like prints). But that doesn't seem feasible, given how
long it takes to provoke the issue. :/
Reply all
Reply to author
Forward
0 new messages