vglrun 25 minute delay but only with non-root user on AD-bound Linux server

55 views
Skip to first unread message

Ryan Salomon

unread,
Apr 21, 2022, 2:22:10 PM4/21/22
to VirtualGL User Discussion/Support
Hello! I have a Linux GPU server upon which I'm testing logging into via a TurboVNC session and running 3D apps via VirtualGL, which I've configured. 

I've tested vglrun -d :0 glxinfo and vglrun -d :0 glxgears , and both run perfectly fine as the root user, but run fine as me but with an insaaane 25 minute delay before they output anything. 

When running vglserver_config , I answered No to all of the questions because this host is sequestered sufficiently from the rest of our network (not to discuss security concerns, but to give background of the setup). 

Also for background of the setup, this is a CentOS 7.9 host, that is AD bound to our ActiveDirectory domain, and my user account that I tested with, as well as client accounts, are AD bound. 
I add this in case it is at all likely that the delay is from AD doing something silly. 

Ryan Salomon

unread,
Apr 27, 2022, 5:17:35 PM4/27/22
to VirtualGL User Discussion/Support
bump! Anyone have at least an ideal direction to look in for more information?

DRC

unread,
Apr 27, 2022, 6:34:26 PM4/27/22
to virtual...@googlegroups.com

Your message ended up in my spam folder for some reason, and I was out of the office last week anyhow.  Unfortunately I've never encountered those symptoms, so I have no good ideas.  The only symptoms I've observed that are even remotely similar are due to nVidia's HardDPMS feature, which causes 3D applications to run at 1 frame/second with VirtualGL if the screen saver is active on the 3D X server.  (Add

Option "HardDPMS" "false"

under the "Device" or "Screen" section in xorg.conf to work around that issue.)

Other shots in the dark:

- Double check that 'vglrun -d :0 glxinfo' reports a direct OpenGL context.  "Back in the day" (15 years ago), I seem to recall that, on some systems, insufficient 3D X server permissions resulted in an indirect OpenGL context rather than an error.  I can't imagine why that would cause a 25-minute delay, but it would almost certainly cause a delay.

- Try removing ~/.Xauthority and restarting the system, in case there is some cruft in that file that is causing problems.

- Try running 'vglrun /opt/VirtualGL/bin/glxspheres64' instead of 'vglrun glxgears' and observing whether the behavior is different.  That may provide a clue.

- Try running 'vglrun /opt/VirtualGL/bin/glxspheres64' and then 'vglrun -sp /opt/VirtualGL/bin/glxspheres64' and observing whether the behavior is different.  That may also provide a clue.

- On the off-hand chance that this is a TurboVNC problem, which version of TurboVNC are you using on the host and the client?  Can you provide more details about your network connection?  Try requesting a screen refresh from the TurboVNC Server (using Ctrl-Alt-Shift-R or the toolbar button.)

- Try setting VGL_PROBEGLX=0 in the environment prior to invoking vglrun, on the off-hand chance that VirtualGL's 2D X server GLX probing (which is unnecessary in a TurboVNC session) is causing issues.  (TurboVNC 3.0 will set that environment variable by default.)

DRC

Ryan Salomon

unread,
Apr 28, 2022, 10:59:13 AM4/28/22
to VirtualGL User Discussion/Support
The version of TurboVNC is TurboVNC Server v2.2.90 (build 20211222)

I had however tried version 3 and had encountered the same issues. 

Hmm. So far none of the suggested steps helped. 
Running vglrun glxspheres64, glxgears, etc results in a blank output window. 

I'm currently trying to run vglrun +tr glxinfo to see if it gives any useful output

Ryan Salomon

unread,
Apr 28, 2022, 3:54:35 PM4/28/22
to VirtualGL User Discussion/Support
Direct rendering says Yes. 

I ran vglrun +tr glxinfo, here's a snippet of the result, with a large gap that I had added manually by pressing enter when there was the extremely long delay. 
This might not help but it was my first thought in terms of possibly getting more visibility

 vglrun +tr glxinfo
[VGL 0x88d75800] XOpenDisplay (name=NULL dpy=0x00db5b00(:1.0) ) 2.102852 ms
name of display: :1.0
[VGL 0x88d75800] glXChooseVisual (dpy=0x00db5b00(:1.0) screen=0 attrib_list=[0x0004 0x0008=0x0001 0x0009=0x0001 0x000a=0x0001 0x000c=0x0001 0x000d=0x0001 0x000e=0x0001 0x000f=0x0001 0x0010=0x0001 0x0011=0x0001 0x0005 ] glxattribs=[0x000c=0x0001 0x000d=0x0001 0x000e=0x0001 0x000f=0x0001 0x0010=0x0001 0x0011=0x0001 0x0005=0x0001 0x0008=0x0001 0x0009=0x0001 0x000a=0x0001 0x8011=0x0001 0x8010=0x0006 ] [VGL] dlopen (filename=libGLX_nvidia.so.0 flag=1 retval=0x00dd2f30)
[VGL] dlopen (filename=libX11-xcb.so.1 flag=1 retval=0x00e392b0)
[VGL] dlopen (filename=libxcb.so.1 flag=1 retval=0x7fb888d78000)
[VGL] dlopen (filename=libxcb-glx.so.0 flag=1 retval=0x00e39290)
[VGL] dlopen (filename=libGLX_mesa.so.0 flag=1 retval=0x00e4ddc0)
[VGL] dlopen (filename=libGLX_mesa.so.0 flag=258 retval=0x00e4ddc0)
[VGL] dlopen (filename=/usr/lib64/dri/tls/swrast_dri.so flag=258 retval=0x00000000)
[VGL] dlopen (filename=/usr/lib64/dri/swrast_dri.so flag=258 retval=0x00e973b0)
vis=0x00e68f50(0x21) config=0x00e3bdd0(0x79) ) 117.562056 ms
[VGL 0x88d75800] glXChooseFBConfig (dpy=0x00db5b00(:1.0) screen=0 attrib_list=[0x8011=0x0001 0x0008=0x0001 0x0009=0x0001 0x000a=0x0001 0x0005=0x0000 ] glxattribs=[0x0005=0x0000 0x0008=0x0001 0x0009=0x0001 0x000a=0x0001 0x8011=0x0001 ] configs[0]=0x00e9dfc0(0x7f) configs[1]=0x00e9c520(0xab) configs[2]=0x00e66930(0x7b) configs[3]=0x00e65cc0(0xa7) configs[4]=0x00e5a340(0x83) configs[5]=0x00de8d90(0xaf) configs[6]=0x00ed1d60(0x89) configs[7]=0x00ea34c0(0xb5) configs[8]=0x00e37130(0x91) configs[9]=0x00dd4830(0xbd) configs[10]=0x00e9ef00(0x8b) configs[11]=0x00ea0ac0(0xb7) configs[12]=0x00e9e1a0(0x93) configs[13]=0x00e9c700(0xbf) configs[14]=0x00e9ae00(0x97) configs[15]=0x00e65ea0(0xc3) configs[16]=0x00e65270(0x9b) configs[17]=0x00e64640(0xc7) configs[18]=0x00e639e0(0x9f) configs[19]=0x00e62f50(0xcb) configs[20]=0x00e62360(0xa3) configs[21]=0x00e60a00(0xcf) configs[22]=0x00e5fd00(0x80) configs[23]=0x00e5f110(0xac) configs[24]=0x00e5e4a0(0x7c) configs[25]=0x00e5d840(0xa8) configs[26]=0x00e5cbe0(0x84) configs[27]=0x00e5bd80(0xb0) configs[28]=0x00e5b170(0x8a) configs[29]=0x00e5a520(0xb6) configs[30]=0x00e59820(0x92) configs[31]=0x00e39960(0xbe) configs[32]=0x00ea36b0(0x8c) configs[33]=0x00ea29a0(0xb8) configs[34]=0x00ea1d10(0x94) configs[35]=0x00e38060(0xc0) configs[36]=0x00e36760(0x98) configs[37]=0x00de9cb0(0xc4) configs[38]=0x00de8f70(0x9c) configs[39]=0x00e9ec90(0xc8) configs[40]=0x00e9d0d0(0xa0) configs[41]=0x00e9b810(0xcc) configs[42]=0x00e614e0(0xa4) configs[43]=0x00e9bac0(0xd0) *nelements=44 ) 0.370026 ms
[VGL 0x88d75800] glXGetProcAddressARB ((char *)procName=glXCreateContextAttribsARB [INTERPOSED]) 0.004768 ms
[VGL 0x88d75800] glXCreateContextAttribsARB (dpy=0x00db5b00(:1.0) config=0x00e9dfc0(0x7f) share_context=0x00000000 direct=1 attribs=[0x2091=0x0004 0x2092=0x0006 0x9126=0x0001 ] [VGL] dlopen (filename=NULL flag=1 retval=0x7fb888d9b150)
[VGL] dlopen (filename=NULL flag=1 retval=0x7fb888d9b150)
[VGL] dlopen (filename=libdbus-1.so.3 flag=1 retval=0x00f780f0)
[VGL] dlopen (filename=libdrm.so.2 flag=1 retval=0x00e57b70)
[VGL] dlopen (filename=liballocator.so.0 flag=1 retval=0x00000000)
ctx=0x00e80b48 ) 77.847004 ms
[VGL 0x88d75800] glXIsDirect (dpy=0x00db5b00(:1.0) ctx=0x00e80b48 direct=1 ) 0.001192 ms
[VGL 0x88d75800] glXGetVisualFromFBConfig (dpy=0x00db5b00(:1.0) config=0x00e9dfc0(0x7f) vis=0x0118e810(0x21) ) 0.023842 ms
[VGL 0x88d75800] XCreateWindow (dpy=0x00db5b00(:1.0) parent=0x000003af x=0 y=0 width=100 height=100 depth=24 c_class=1 visual=0x00dc1e70(0x21) win=0x02000002 ) 0.020027 ms
[VGL 0x88d75800] glXMakeCurrent (dpy=0x00db5b00(:1.0) drawable=0x02000002 ctx=0x00e80b48 [VGL] dlopen (filename=liballocator.so.0 flag=1 retval=0x00000000)








config=0x00e9dfc0(0x7f) drawable=0x00600002 renderer=NVIDIA A30/PCIe/SSE2 ) 1540881.658077 ms
[VGL 0x88d75800] glXGetProcAddressARB ((char *)procName=glGetProgramivARB [passed through]) 0.017166 ms
[VGL 0x88d75800] glXGetProcAddressARB ((char *)procName=glGetStringi [INTERPOSED]) 0.007868 ms
[VGL 0x88d75800] glXGetProcAddressARB ((char *)procName=glGetConvolutionParameteriv [passed through]) 0.009060 ms
display: :1  screen: 0
[VGL 0x88d75800] glXIsDirect (dpy=0x00db5b00(:1.0) ctx=0x00e80b48 direct=1 ) 0.003099 ms
direct rendering: Yes
server glx vendor string: VirtualGL
server glx version string: 1.4
server glx extensions:

DRC

unread,
Apr 28, 2022, 4:49:02 PM4/28/22
to virtual...@googlegroups.com

It's interesting that the delay occurs in the body of glXMakeCurrent().  Can you ascertain whether the delay occurs in the backend::makeCurrent() method (which, when using the GLX back end, is just a wrapper for glXMakeContextCurrent())?

DRC

--
You received this message because you are subscribed to the Google Groups "VirtualGL User Discussion/Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to virtualgl-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/virtualgl-users/fbe4168f-480b-40f3-b061-6d193bdd09d3n%40googlegroups.com.
Message has been deleted
Message has been deleted

Ryan Salomon

unread,
Apr 28, 2022, 8:52:59 PM4/28/22
to VirtualGL User Discussion/Support
Hmm, I'm not sure how to do that.

All I did was run vglrun +tr glxinfo

Ok, I tried running the same command but with strace at the start, and found something interesting during the start of the wait:

poll([{fd=4, events=POLLIN}], 1, -1)    = 1 ([{fd=4, revents=POLLIN}])
recvmsg(4, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\1\0023\0\0\0\0\0H\2@\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", iov_len=4096}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 32
getpid()                                = 16364
getpid()                                = 16364
getpid()                                = 16364
getpid()                                = 16364
getuid()                                = 37869
geteuid()                               = 37869
getgid()                                = 37869
getegid()                               = 37869
getuid()                                = 37869
geteuid()                               = 37869
getgid()                                = 37869
getegid()                               = 37869
stat("/home/username/.nv/GLCache", {st_mode=S_IFDIR|0700, st_size=4096, ...}) = 0
mkdir("/home/username/.nv/", 0700)      = -1 EEXIST (File exists)
mkdir("/home/username/.nv/GLCache", 0700) = -1 EEXIST (File exists)
mkdir("/home/username/.nv/GLCache/9f7e9a18c64449cd7a0049bdadc5d015/", 0700) = -1 EEXIST (File exists)
mkdir("/home/username/.nv/GLCache/9f7e9a18c64449cd7a0049bdadc5d015/cf07cf62221389dd/", 0700) = -1 EEXIST (File exists)
open("/home/username/.nv/GLCache/9f7e9a18c64449cd7a0049bdadc5d015/cf07cf62221389dd/c1b003dd27d07ca9.toc", O_RDWR|O_CLOEXEC) = 15
fcntl(15, F_SETLK, {l_type=F_WRLCK, l_whence=SEEK_CUR, l_start=0, l_len=1}) = -1 EIO (Input/output error)
close(15)                               = 0
getuid()                                = 37869
geteuid()                               = 37869
getgid()                                = 37869
getegid()                               = 37869
getuid()                                = 37869
geteuid()                               = 37869
getgid()                                = 37869
getegid()                               = 37869
stat("/home/username/.nv/GLCache", {st_mode=S_IFDIR|0700, st_size=4096, ...}) = 0
mkdir("/home/username/.nv/", 0700)      = -1 EEXIST (File exists)
mkdir("/home/username/.nv/GLCache", 0700) = -1 EEXIST (File exists)
mkdir("/home/username/.nv/GLCache/9f7e9a18c64449cd7a0049bdadc5d015/", 0700) = -1 EEXIST (File exists)
mkdir("/home/username/.nv/GLCache/9f7e9a18c64449cd7a0049bdadc5d015/cf07cf62221389dd/", 0700) = -1 EEXIST (File exists)
open("/home/username/.nv/GLCache/9f7e9a18c64449cd7a0049bdadc5d015/cf07cf62221389dd/c1b003dd27d07caa.toc", O_RDWR|O_CLOEXEC) = 15
fcntl(15, F_SETLK, {l_type=F_WRLCK, l_whence=SEEK_CUR, l_start=0, l_len=1}) = -1 EIO (Input/output error)

and it repeats from there

home directory edited here since this is public

DRC

unread,
Apr 28, 2022, 9:09:20 PM4/28/22
to virtual...@googlegroups.com

Seems like it may be an issue with nVidia's shader cache.  Try deleting ~/.nv or setting _GL_SHADER_DISK_CACHE=0 in the environment.

DRC

On 4/28/22 7:52 PM, Ryan Salomon wrote:
Hmm, I'm not sure how to do that.

All I did was run vglrun +tr glxinfo

Ok, I tried running the same command but with strace at the start, and found something interesting during the start of the wait:

poll([{fd=4, events=POLLIN}], 1, -1)    = 1 ([{fd=4, revents=POLLIN}])
recvmsg(4, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\1\0023\0\0\0\0\0H\2@\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", iov_len=4096}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 32
getpid()                                = 16364
getpid()                                = 16364
getpid()                                = 16364
getpid()                                = 16364
getuid()                                = 37869
geteuid()                               = 37869
getgid()                                = 37869
getegid()                               = 37869
getuid()                                = 37869
geteuid()                               = 37869
getgid()                                = 37869
getegid()                               = 37869
stat("/home/username/.nv/GLCache", {st_mode=S_IFDIR|0700, st_size=4096, ...}) = 0
mkdir("/home/username/.nv/", 0700)      = -1 EEXIST (File exists)
mkdir("/home/username/.nv/GLCache", 0700) = -1 EEXIST (File exists)
mkdir("/home/username/.nv/GLCache/9f7e9a18c64449cd7a0049bdadc5d015/", 0700) = -1 EEXIST (File exists)
mkdir("/home/username/.nv/GLCache/9f7e9a18c64449cd7a0049bdadc5d015/cf07cf62221389dd/", 0700) = -1 EEXIST (File exists)
open("/home/rsalomon/.nv/GLCache/9f7e9a18c64449cd7a0049bdadc5d015/cf07cf62221389dd/c1b003dd27d07ca9.toc", O_RDWR|O_CLOEXEC) = 15

fcntl(15, F_SETLK, {l_type=F_WRLCK, l_whence=SEEK_CUR, l_start=0, l_len=1}) = -1 EIO (Input/output error)
close(15)                               = 0
getuid()                                = 37869
geteuid()                               = 37869
getgid()                                = 37869
getegid()                               = 37869
getuid()                                = 37869
geteuid()                               = 37869
getgid()                                = 37869
getegid()                               = 37869
stat("/home/username/.nv/GLCache", {st_mode=S_IFDIR|0700, st_size=4096, ...}) = 0
mkdir("/home/username/.nv/", 0700)      = -1 EEXIST (File exists)
mkdir("/home/username/.nv/GLCache", 0700) = -1 EEXIST (File exists)
mkdir("/home/username/.nv/GLCache/9f7e9a18c64449cd7a0049bdadc5d015/", 0700) = -1 EEXIST (File exists)
mkdir("/home/username/.nv/GLCache/9f7e9a18c64449cd7a0049bdadc5d015/cf07cf62221389dd/", 0700) = -1 EEXIST (File exists)
open("/home/username/.nv/GLCache/9f7e9a18c64449cd7a0049bdadc5d015/cf07cf62221389dd/c1b003dd27d07caa.toc", O_RDWR|O_CLOEXEC) = 15
fcntl(15, F_SETLK, {l_type=F_WRLCK, l_whence=SEEK_CUR, l_start=0, l_len=1}) = -1 EIO (Input/output error)

and it repeats from there

home directory edited here since this is public

On Thursday, April 28, 2022 at 4:49:02 PM UTC-4 DRC wrote:

Ryan Salomon

unread,
Apr 29, 2022, 1:21:00 PM4/29/22
to VirtualGL User Discussion/Support
Huh, most of the way there now!!

i removed ~.nv, which then subsequent runs created .cache/  and so on from there and still had the issue. 

So I renamed that directory, and lo and behold, it works!! Also it works upon subsequent runs /within/ that terminal session in xterm, through the VNC client. 

...and then I close the xterm, open a new one, and same thing. It doesn't work until I recursively remove the .cache folder. 

DRC

unread,
Apr 29, 2022, 1:36:07 PM4/29/22
to virtual...@googlegroups.com

That really seems like a driver bug.  Did setting __GL_SHADER_DISK_CACHE=0 in the environment have any effect?  (Note that I typoed below.  There should be two underscores preceding the environment variable.)

DRC

DRC

unread,
Apr 29, 2022, 1:42:05 PM4/29/22
to virtual...@googlegroups.com

I'm not sure why the shader cache is so slow on your particular machine, but if your home directory is NFS-mounted, then that could be part of the problem.  You can use the __GL_SHADER_DISK_CACHE_PATH environment variable to relocate the cache to a local volume.  I've never seen this particular issue before, and nVidia would be better equipped to explain what is happening.  Note that there are also xorg.conf settings for disabling the shader disk cache, so perhaps enabling one of those settings on the 3D X server would fix this system-wide.

DRC

Ryan Salomon

unread,
Apr 29, 2022, 3:30:32 PM4/29/22
to VirtualGL User Discussion/Support
Ah I hadn't copy-pasted, didn't realize there were two underscores __ at the beginning of that environment variable.  
Exporting that to disable the cache works. 

Yes our homes are NFS mounted via AutoFS, so something strange might be happening there. 

I'll look into this, and will also pose the question to Nvidia via the thread that I have open with them. 

Thanks so much for your help! 
Also I'll update this thread once I get more info
Reply all
Reply to author
Forward
0 new messages