Are there features to help detect bad data?

48 views
Skip to first unread message

Alexandre Vaillancourt

unread,
Mar 13, 2020, 4:44:16 AM3/13/20
to OpenSceneGraph Users
Hello all!

We've recently been having some intermittent crashes ("hangs"--Windows shuts down the process after a 2 seconds freeze in the graphics API) occurring in the nVidia's OpenGL dll. Google searches revealed that this can occur if we supply bad data to OpenGL.

Are there features in OSG to detect such incorrect data in our models? Concurrent vertices, bad normals, bad UVs, etc.--stuff that could be correct or incorrect if defaulted (the crash is intermittent)?

We're using OSG 3.6.3, we read .flt files and convert them to .osgb with a convert tool of our own (using OSG), then we read those .osgb in our app.

Thanks!

--
Vaillancourt

OpenSceneGraph Users

unread,
Mar 13, 2020, 5:19:18 AM3/13/20
to OpenSceneGraph Users
Hi Vaillancourt

On Fri, 13 Mar 2020 at 09:00, OpenSceneGraph Users <osg-...@lists.openscenegraph.org> wrote:
We've recently been having some intermittent crashes ("hangs"--Windows shuts down the process after a 2 seconds freeze in the graphics API) occurring in the nVidia's OpenGL dll. Google searches revealed that this can occur if we supply bad data to OpenGL.

Do you see any OpenGL errors reported?  The OSG checks once per frame for errors.
 

Are there features in OSG to detect such incorrect data in our models? Concurrent vertices, bad normals, bad UVs, etc.--stuff that could be correct or incorrect if defaulted (the crash is intermittent)?

There are places where values are checked but there isn't a specific class for general checking of validity of data, such errors are so open ended I don't think you have one validator to rule them all.

One way of checking things is to convert the data to that ascii .osgt format using osgconv and then inspect the data to see if there are any oddities.  The files could be quite large so I'd suggest starting with small models that you know are causing problems and have a look at these first.

It would also be worth trying things out on a different OS, hardware, drivers etc, to see if there are any correlations.

In essence you need to try out different strategies to tease out the cause of the problem, once you understand the specifics of the problem you can start to craft a solution.

Robert.

Chris Djali / AnyOldName3

unread,
Mar 13, 2020, 9:01:03 AM3/13/20
to OpenSceneGraph Users
I've found the Windows Nvidia drivers to be really bad about actually reporting OpenGL errors unless you create a debug context. It's possible that doing that will reveal something. You might have to tweak OSG itself to do that, though. My application creates the context with SDL2 and then connects it to OSG, so I've not had do so it with push alone.

Also, when I've wanted to know the cause of (rather than just the existence of) an error, it's been easier to do so via a debug callback than basic error checking as you get the full call stack and sometimes a descriptive error string. This page explains how to set that up with OSG: http://thermalpixel.github.io/opengl/osg/2014/02/06/gl-khr-debug-in-osg.html

Chris Djali / AnyOldName3

unread,
Mar 13, 2020, 9:02:47 AM3/13/20
to OpenSceneGraph Users
s/push/OSG/

Don't forget to proofread when writing things on your phone, kids.

Cheers,

Chris

Alexandre Vaillancourt

unread,
Mar 16, 2020, 2:24:55 PM3/16/20
to OpenSceneGraph Users
Thanks Robert and Chris!


> Do you see any OpenGL errors reported?

Unfortunately, no. The symptoms are "freeze", then app shutdown with return code 0xC0000409; if there is a debugger attached, the error appears to be coming from the nVidia OpenGL dll. If there is no debugger attached, there is no other error displayed. There are a few more clues in the Windows Event Viewer,


------------------
The description for Event ID 1 from source NVIDIA OpenGL Driver cannot be found. Either the component that raises this event is not installed on your local computer or the installation is corrupted. You can install or repair the component on the local computer.

If the event originated on another computer, the display information had to be saved with the event.

The following information was included with the event:

A TDR has been detected.
The application must close.


Error code: 7
 (pid=17844 tid=14328 [APPNAME].exe 32bit)

Visit http://nvidia.custhelp.com/app/answers/detail/a_id/3633 for more information.

The message resource is present but the message was not found in the message table
-----------

So it looks like we give the driver something it has trouble to handle.

>
I'd suggest starting with small models that you know are causing problems and have a look at these first.

Yeah I think the next step is to figure out which model (models?) cause that issue, and inspect it.

>
It would also be worth trying things out on a different OS, hardware, drivers etc, to see if there are any correlations.

Some of my colleagues have not been able to reproduce the issue on their workstation. The only difference is the hardware. That's an annoyance.

> I've found the Windows Nvidia drivers to be really bad about actually reporting OpenGL errors unless you create a debug context.

I was not aware that there was such a thing as a debug context, I'll check into this.

Again, thanks for the help provided!

--
Vaillancourt

David Heitbrink

unread,
Mar 16, 2020, 6:33:39 PM3/16/20
to OpenSceneGraph Users
I had for a good long time an error that sounds similar to this. We had timeBeginPeriod set pretty low, to like 1. Turns out this was causing a lot of dead-locking in the NVidia driver. I have also found NVIDIA Nsight to be pretty useful for tracking down some issues with hangs. We had another issue where a 3rd party lib we were using made a few unneeded glGetInt calls every frame. This seemed to cause some intermittent long hangs when we ran multiple instances of our program at the same time. I was able to find this one pretty quickly with Nsight.
Reply all
Reply to author
Forward
0 new messages