VirtualGL works with FastX but not generic TigerVNC install - keeps using llvmpipe

244 views
Skip to first unread message

Jake Carroll

unread,
Aug 29, 2020, 10:56:14 PM8/29/20
to VirtualGL User Discussion/Support
Hi.

I think I need a little bit of VirtualGL help. 

We've got an installation of FastX running on our SLURM controlled AMD Rome nodes. The systems have 4 * nVidia T4 GPU's contained within.

Using FastX + VirtualGL sessions works perfectly with MATE. So well, that users often say how happy they are with it.

However - we also run a custom TigerVNC based platform too, called StrudelWeb. This was a local development. The problem we've got is that, despite the same xorg.conf and everything else we can think of - the TigerVNC sessions launched via Strudel do not seem to be able to use anything but the llvmpipe MESA path. We can run some environmental variables within such that VGL_LOGO=1 or similar exports absolutely pop up the "VGL" logo in our X display windows over our Strudel Tiger VNC sessions (glxspheres shows the VGL logo etc) but it is absolutely using the software renderer. What we can't figure out is why VirtualGL + Tiger VNC won't pick up the nvidia hardware or xorg config, but using FastX with an identical xorg.conf seems to work perfectly.

I'd post my xorg.conf but I don't want to fill this post with mess until someone advice where I should start/what to look for first.

So far I've tried a few things, including this in the xorg.conf:

  Option "UseDisplayDevice" "none" 

Which seems to have broken everything entirely (the nVidia T4 is a headless GPU).

I also looked at this:


And thought it might help - but it assumes no implementation of something like VirtualGL, so I wondered how relevant it was.

So - I'm trying to work out what might be wrong with my remote launched remote TigerVNC session via Strudel. 

For reference on what Strudel actually "is"...


Thank you for your time. 

Regards,

-jc

DRC

unread,
Aug 29, 2020, 11:41:13 PM8/29/20
to virtual...@googlegroups.com
xorg.conf only affects the 3D X server. It isn’t clear from your message whether TigerVNC is running on the same machines as FastX. If it is not, then a bad xorg.conf could be the problem on the TigerVNC machines. The first thing I would try is accessing the GPU through the 3D X server on those machines without using VGL (see the “Sanity Check” section in the User’s Guide.) If you meant that TigerVNC is running on the same machines as FastX, then perhaps, for some reason, the TigerVNC customizations set LD_LIBRARY_PATH to point to a Mesa implementation of libGL rather than the GPU-accelerated version. Also double check that the StrudelWeb environment isn’t doing something stupid like setting VGL_DISPLAY to the 2D X server rather than the 3D X server. 

On Aug 29, 2020, at 9:56 PM, Jake Carroll <jake.l....@gmail.com> wrote:

Hi.
--
You received this message because you are subscribed to the Google Groups "VirtualGL User Discussion/Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to virtualgl-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/virtualgl-users/d84f788d-3566-4e6f-8dc5-9a31be944d19n%40googlegroups.com.

Jake Carroll

unread,
Aug 30, 2020, 12:50:04 AM8/30/20
to VirtualGL User Discussion/Support
Hi.

Thanks for getting back to me. So - for clarity - the TigerVNC host/daemon actually does run on the same nodes - but it is only TigerVNC that seems to have the problem. FastX (whatever it does/however it works!) does not seem to have the issue and it happily accelerates OpenGL out of the box just fine.

You mentioned the LD_LIBRARY_PATH before and possibly that Tiger is referencing the wrong libs. I found this floating around...

From here and a few other places:


They recommend the following: 

sudo mv /usr/lib/xorg/modules/extensions/libglx.so /usr/lib/xorg/modules/extensions/libglx.so.orig 
sudo ln -s /usr/lib/xorg/modules/extensions/libglx.so.XXX.YY /usr/lib/xorg/modules/extensions/libglx.so 

 Have you ever seen anything like this before? I have not tried it as yet.

Thanks again. 

DRC

unread,
Aug 30, 2020, 1:03:54 AM8/30/20
to virtual...@googlegroups.com
If the same 3D X server works with FastX and not with TigerVNC, then the problem is not with the 3D X server. That means that anything related to xorg.conf and the Xorg modules is probably a red herring. I would focus on the environment and the dynamic linker. Compare the output of ‘env’ in a FastX vs a TigerVNC session. Compare ‘vglrun ldd /opt/VirtualGL/bin/glxsheres’ in both sessions. Try explicitly setting VGL_GLLIB=/usr/lib/libGL.so.1 in the environment.

Jake Carroll

unread,
Aug 30, 2020, 1:46:02 AM8/30/20
to VirtualGL User Discussion/Support
From the StrudelWeb/TigerVNC based session, which is currently not accelerated:

[me@gpunode-2-0 ~]$ vglrun ldd /opt/VirtualGL/bin/glxspheres64 
linux-vdso.so.1 =>  (0x00007fffc52b4000)
libdlfaker.so => /lib64/libdlfaker.so (0x00007ffa35f61000)
libvglfaker.so => /lib64/libvglfaker.so (0x00007ffa35c0b000)
libGL.so.1 => /lib64/libGL.so.1 (0x00007ffa35962000)
libX11.so.6 => /lib64/libX11.so.6 (0x00007ffa35624000)
libGLU.so.1 => /lib64/libGLU.so.1 (0x00007ffa353a4000)
libm.so.6 => /lib64/libm.so.6 (0x00007ffa350a2000)
libc.so.6 => /lib64/libc.so.6 (0x00007ffa34cd4000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007ffa34ad0000)
libXv.so.1 => /lib64/libXv.so.1 (0x00007ffa348cb000)
libXext.so.6 => /lib64/libXext.so.6 (0x00007ffa346b9000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007ffa3449d000)
/lib64/ld-linux-x86-64.so.2 (0x00007ffa36163000)
libGLX.so.0 => /lib64/libGLX.so.0 (0x00007ffa3426d000)
libGLdispatch.so.0 => /lib64/libGLdispatch.so.0 (0x00007ffa33f9a000)
libxcb.so.1 => /lib64/libxcb.so.1 (0x00007ffa33d72000)
libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007ffa33a6b000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007ffa33855000)
libXau.so.6 => /lib64/libXau.so.6 (0x00007ffa33651000)


Jake Carroll

unread,
Aug 30, 2020, 1:50:23 AM8/30/20
to VirtualGL User Discussion/Support
And from the FastX session that is/does accelerate correctly...

[me@gpunode-2-0 ~]$ vglrun ldd /opt/VirtualGL/bin/glxspheres64 
linux-vdso.so.1 =>  (0x00007fff0a4fa000)
libdlfaker.so => /lib64/libdlfaker.so (0x00007fb90b3ad000)
libvglfaker.so => /lib64/libvglfaker.so (0x00007fb90b057000)
libGL.so.1 => /lib64/libGL.so.1 (0x00007fb90adae000)
libX11.so.6 => /lib64/libX11.so.6 (0x00007fb90aa70000)
libGLU.so.1 => /lib64/libGLU.so.1 (0x00007fb90a7f0000)
libm.so.6 => /lib64/libm.so.6 (0x00007fb90a4ee000)
libc.so.6 => /lib64/libc.so.6 (0x00007fb90a120000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007fb909f1c000)
libXv.so.1 => /lib64/libXv.so.1 (0x00007fb909d17000)
libXext.so.6 => /lib64/libXext.so.6 (0x00007fb909b05000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fb9098e9000)
/lib64/ld-linux-x86-64.so.2 (0x00007fb90b5af000)
libGLX.so.0 => /lib64/libGLX.so.0 (0x00007fb9096b9000)
libGLdispatch.so.0 => /lib64/libGLdispatch.so.0 (0x00007fb9093e6000)
libxcb.so.1 => /lib64/libxcb.so.1 (0x00007fb9091be000)
libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007fb908eb7000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fb908ca1000)
libXau.so.6 => /lib64/libXau.so.6 (0x00007fb908a9d000)


DRC

unread,
Aug 30, 2020, 10:16:40 AM8/30/20
to virtual...@googlegroups.com
What about the environment? Is VGL_DISPLAY set in one session but not the other? What about LD_PRELOAD? If not, then I have no explanation. VirtualGL works properly with unmodified TigerVNC, so if you can verify that that is the case on your systems, that would give you a baseline against which to compare StrudelWeb and determine where the problem is.

On Aug 30, 2020, at 12:50 AM, Jake Carroll <jake.l....@gmail.com> wrote:

And from the FastX session that is/does accelerate correctly...

Jake Carroll

unread,
Aug 30, 2020, 4:55:49 PM8/30/20
to VirtualGL User Discussion/Support
Mmm. 

So, on the system that is _not_ accelerating:

[me@gpunode-2-0 ~]$ echo $LD_PRELOAD
libdlfaker.so:libvglfaker.so
[me@gpunode-2-0 ~]$ echo $VGL_DISPLAY
:1

When I check the system that IS accelerating correctly:

[me@gpunode-2-0 ~]$ echo $VGL_DISPLAY

[me@gpunode-2-0 ~]$ echo $LD_PRELOAD
libdlfaker.so:libvglfaker.so

Odd huh?

Does this point to anything specific? I note that on the system that DOES NOT have the display set in the variable - things work. What the?

Jake Carroll

unread,
Aug 30, 2020, 5:08:07 PM8/30/20
to VirtualGL User Discussion/Support
That did it! 

When I export VGL_DISPLAY= 

...on the system which is NOT acclerating, it all works!

[me@gpunode-2-0 ~]$ export VGL_DISPLAY=
[me@gpunode-2-0 ~]$ echo $VGL_DISPLAY

[me@gpunode-2-0 ~]$ /opt/VirtualGL/bin/glxspheres64 
Polygons in scene: 62464 (61 spheres * 1024 polys/spheres)
GLX FB config ID of window: 0x7d (8/8/8/0)
Visual ID of window: 0x288
Context is Direct
OpenGL Renderer: Tesla T4/PCIe/SSE2
158.817812 frames/sec - 177.240678 Mpixels/sec
156.659550 frames/sec - 174.832058 Mpixels/sec
160.199829 frames/sec - 178.783009 Mpixels/sec
158.225050 frames/sec - 176.579156 Mpixels/sec
159.431940 frames/sec - 177.926045 Mpixels/sec
155.437629 frames/sec - 173.468394 Mpixels/sec
168.780167 frames/sec - 188.358666 Mpixels/sec
156.255911 frames/sec - 174.381597 Mpixels/sec
158.956569 frames/sec - 177.395531 Mpixels/sec
156.650750 frames/sec - 174.822237 Mpixels/sec
158.987188 frames/sec - 177.429702 Mpixels/sec
159.112906 frames/sec - 177.570004 Mpixels/sec
162.952582 frames/sec - 181.855082 Mpixels/sec
159.708156 frames/sec - 178.234302 Mpixels/sec

So the question is WHY this var gets set to :1 and what I can do about that, I guess...

:)

DRC

unread,
Aug 30, 2020, 7:02:51 PM8/30/20
to virtual...@googlegroups.com
Yes, VGL_DISPLAY should point to the 3D X server (the default value of that variable is :0.0, so you don’t need to set it if the GPU you wish to use is attached to X display 0, screen 0.) DISPLAY should point to the 2D X server. :1 is apparently an X proxy instance (TigerVNC or FastX), so pointing VGL_DISPLAY to :1 instructs VirtualGL to use that X proxy instance as a 3D X server. Since the X proxy only has software OpenGL, using it as a 3D X server effectively causes VirtualGL to unaccelerate OpenGL applications rather than accelerate them.

Long story short, whoever set the VGL_DISPLAY environment variable to :1 did not read the VirtualGL documentation. Unfortunately, that misunderstanding regarding VGL_DISPLAY is a common one, and I don’t understand why it occurs. I can’t fathom why someone would set an environment variable without first verifying that the system works without setting it.

On Aug 30, 2020, at 4:08 PM, Jake Carroll <jake.l....@gmail.com> wrote:

That did it! 
Reply all
Reply to author
Forward
0 new messages