[osg-users] Deadlock when loading osg.dll, singletons are evil

35 views
Skip to first unread message

Tanguy Fautre

unread,
Jul 29, 2009, 11:51:43 AM7/29/09
to OpenSceneGraph Users, Neil Groves
Hi,

We've encountered several times a random deadlock that would happen when
starting our application. After some playing around, we've identified
the problem to be quite a subtle one.

Here is what's happening. When osg.dll (in our case the full name is
osg59-osgd.dll) gets loaded, it initializes several singletons directly
and indirectly. These initializations lead to a call to
DisplaySettings::readEnvironmentalVariables(), which in turn calls
getenv().

All would be fine if it weren't for two subtle details:

1. The singleton initializations are indirectly called from DllMain().
Unfortunately, DllMain() is fairly limited in what you can do in it. For
example, calling synchronization primitives is usually a very bad idea.

2. getenv() calls a lock (a synchronization primitive) on the
environment.

The result of combining these two points is as given in the title of
this post: a nice random deadlock.

More about DllMain:
http://msdn.microsoft.com/en-us/library/ms682583%28VS.85%29.aspx
http://msdn.microsoft.com/en-us/library/aa290048%28VS.71%29.aspx#vcconmi
xeddllloadingproblemanchor3

A few months ago, there was a heated discussion about singletons in
OpenSceneGraph. Since then, this issue as well as others have now
convinced me that singletons are not simply a bad practice, but are
actually completely broken (i.e. they do not work).

Singletons are generally considered to be an anti-pattern due to the
hidden dependencies they introduce. Google is full of papers on this,
and I can attest from our personal experience with OSG that this has
been a problem on numerous occasions. For example, to improved resource
usage, we have to keep the screengraph alive but destroy the OpenGL
context when it's not being rendered. This proved to be an extremely
difficult endeavour as we had to track all these hidden dependencies by
trial and error, and manually reset them one by one.

The aforementioned deadlock however leads me to the conclusion that
singletons are not just an anti-pattern but are completely broken.
Because of the DllMain() limitations, any C/C++ standard function is
potentially unsafe. Therefore, any singleton using a C/C++ standard
function is itself potentially unsafe if used in a DLL. All it takes is
for one of these standard functions implementation to call a
synchronization primitive that is illegal in DllMain().

Regards,

Tanguy


_______________________________________________
osg-users mailing list
osg-...@lists.openscenegraph.org
http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org

Jean-Sébastien Guay

unread,
Jul 29, 2009, 12:28:11 PM7/29/09
to OpenSceneGraph Users
Hi Tanguy,

> We've encountered several times a random deadlock that would happen when
> starting our application. After some playing around, we've identified
> the problem to be quite a subtle one.

What you're describing makes sense in some way, but I have two questions:

1. Why have there not been more discussions about random deadlocks in
the recent past? For example, we have not experienced anything
resembling what you describe. I am not an expert in Win32 programming,
so what you describe may well be true, but we haven't encountered this,
and I'm pretty sure if it were such a widespread problem we'd have heard
of it on this list before. Of course, that's not proof in itself that
the issue is not there, but it's an indication that it might be
something on your side rather than a widespread problem.

2. If singletons are bad practice as you say, what do you propose to
replace them? I'm always a bit skeptical of sentences like "Google is
full of papers [saying that singletons are broken]". People have a
tendency to be extreme in their judgements, and if someone finds an
instance where singletons give bad results, they will often start saying
that they are bad in all cases (and inevitably post to their blog about
it). It's a tool for a job, and the developer has to know when to use it
and when not to. That's the developer's responsibility, not the tool's.

I'm not saying that all uses of singletons in OSG are good ones (most
obviously because I've probably not personally seen all of them) but
singletons are useful (they give a single point of access to something -
data, functionality) and that has to be replaced by something else if
singletons have a downside in some/most/all(?) situations.

J-S
--
______________________________________________________
Jean-Sebastien Guay jean-seba...@cm-labs.com
http://www.cm-labs.com/
http://whitestar02.webhop.org/

Tanguy Fautre

unread,
Jul 29, 2009, 1:36:38 PM7/29/09
to OpenSceneGraph Users
Hi J-S,

Your questions are quite legitimate. I'll try to answer them to my best knowledge.

1.

The DllMain problem is quite well documented (cf. the MSDN links from my previous post). The thing is that an illegal use of DllMain is quite similar to a race condition. Most of the time, everything just seems to work. Our application has already about 10 threads up and running before osg.dll is being loaded, significantly increasing the chance of a deadlock. I would expect most OSG-based applications to do just fine as they start using OSG (and therefore osg.dll) when still single threaded.

Because our application is quite multithreaded, it's not the first time we've uncovered thread-safety bugs in what are supposed to be quite stable libraries. Recently, we've just found a couple of thread-safety issues with LibCurl and OpenSSL (their developers have accepted these as genuine defects and are fixing these issues). Google found a few (not many) posts related to these issues, but due to the lack of proper investigations they often got dismissed.

2.

Just going to http://en.wikipedia.org/wiki/Singleton_pattern you'll see that the first paragraph has already 6 anti-singleton references.

A few other references I have at hand include:
http://accu.org/index.php/journals/337
http://code.google.com/p/google-singleton-detector/wiki/WhySingletonsAreControversial
http://blogs.msdn.com/scottdensmore/archive/2004/05/25/140827.aspx

How can you replace singletons?

If one still insists on using singletons, the first thing one can do to alleviate the problem is to provide init/destroy functions (note that they have to be thread-safe, re-entrant, and should be able to cope with multiple initializations). For example: initOsg() and destroyOsg() (or better: an object that calls init in its constructor and destroy in its destructor). These two functions control the lifetime of the whole library. This is a common approach, and for good reasons. Because you give the user control on the lifetime of the library, it effectively resolves the DllMain problem.

A better way is not to use singletons at all. Just provide explicitly exposed objects and classes. The benefit is that you get rid of any hidden dependency. It also gives even more control to the user as far as the objects (and thus the library) lifetime is concerned.


A lot of people (myself included) find singletons convenient because they do hide implicit dependency, which results in less code to be written. Unfortunately, I'm starting to think that it's a bad decision in most (all?) cases, especially in a library that is going to be used by others. As they force the user to give up lifetime control and dependency control, singletons just don't go well with some of C++ most common idioms (e.g. RAII).

Furthermore, singletons tend to be a fuzzy concept. It states that you have a single instance of an object, but it does not state per what. With a naive "static Object" approach, you can end up having several singletons per thread (e.g. thread local storage), several singletons per process (one per DLL/EXE), and of course several singletons per applications (if there are several processes). Stretch it even more and imagine you're working on a distributed system. It's not uncommon to have several singletons when you're only supposed to have one (and it's very easy to fail to see this).


Cheers,

Tanguy

Thrall, Bryan

unread,
Jul 29, 2009, 2:04:11 PM7/29/09
to OpenSceneGraph Users
Tanguy Fautre wrote on Wednesday, July 29, 2009 12:37 PM:
> The DllMain problem is quite well documented (cf. the MSDN links from my
> previous post). The thing is that an illegal use of DllMain is quite similar
> to a race condition. Most of the time, everything just seems to work. Our
> application has already about 10 threads up and running before osg.dll is
> being loaded, significantly increasing the chance of a deadlock. I would
> expect most OSG-based applications to do just fine as they start using OSG
> (and therefore osg.dll) when still single threaded.
>
> Because our application is quite multithreaded, it's not the first time we've
> uncovered thread-safety bugs in what are supposed to be quite stable
> libraries. Recently, we've just found a couple of thread-safety issues with
> LibCurl and OpenSSL (their developers have accepted these as genuine defects
> and are fixing these issues). Google found a few (not many) posts related to
> these issues, but due to the lack of proper investigations they often got
> dismissed.

The links you posted explicitly said that creating synchronization primitives is allowed in DllMain; the problem is acquiring them when other threads are trying to acquire the loader lock (to enter some DllMain function). Are you saying that the problem is the mutex used in creating the singletons, or is there some other limitation of DllMain that you think OSG is violating?

A simple test case demonstrating the problem you're describing would be helpful, I think.

--
Bryan Thrall
FlightSafety International
bryan....@flightsafety.com

Aymeric Barthe

unread,
Jul 30, 2009, 2:46:04 AM7/30/09
to osg-...@lists.openscenegraph.org

bthrall wrote:
>
> The links you posted explicitly said that creating synchronization primitives is allowed in DllMain; the problem is acquiring them when other threads are trying to acquire the loader lock (to enter some DllMain function). Are you saying that the problem is the mutex used in creating the singletons, or is there some other limitation of DllMain that you think OSG is violating?
>


The violation is that no C function should be called from DllMain().

The problem is that not only the loader lock is taken (because in DllMain), but since you use a standard C function, another lock might be taken by the CRT. In this specific case, getenv() is called and the Microsoft CRT takes the CRT lock _ENV_LOCK (getenv.c at line 81).

In the meantime, we have another thread that uses CURL, and is trying to use the C function stat(). It so happens that stat() will take the CRT locks _TIME_LOCK (tzset.c at line 90) and _ENV_LOCK (tzset.c at line 135). Then GetTimeZoneInformation() is called and decides it needs to load ntdll to do the job and thus attempts to get the loader lock... boom!

Classical deadlock.
1-> Thread-A: loaderlock taken by LoadLibrary() of OSG
2-> Thread-B: use stat()
3-> Thread-B: stat() takes _ENV_LOCK
4-> Thread-B: LoadLibrary() fails to take loaderlock (see 1)
5-> Thread-A: DllMain of Osg calls getenv() which fails to get _ENV_LOCK (see 3)

Aymeric Barthe

------------------
Read this topic online here:
http://forum.openscenegraph.org/viewtopic.php?p=15624#15624

Aymeric Barthe

unread,
Jul 30, 2009, 6:02:08 AM7/30/09
to osg-...@lists.openscenegraph.org

aymericb wrote:
>
> The violation is that no C function should be called from DllMain().
>


Sorry. Small/Big mistake here. Forgot one important word. I meant "The violation is that no C library function should be called from DllMain()."

The deadlock problem is clearly similar to the one explained by Raymond Chen:
http://blogs.msdn.com/oldnewthing/archive/2004/01/28/63880.aspx

Since CRT functions are likely to lock things internally, you may end up in the same situation than his "g_csGlobal" example, but with a CRT lock instead.

I think the fix should be to do what MSDN documentation suggest, instead of relying on obscure undocumented implementation details:

>
> Warning There are serious limits on what you can do in a DLL entry point. To provide more complex initialization, create an initialization routine for the DLL. You can require applications to call the initialization routine before calling any other routines in the DLL.
>


Kind Regards,
Aymeric Barthe

------------------
Read this topic online here:

http://forum.openscenegraph.org/viewtopic.php?p=15636#15636

Tanguy Fautre

unread,
Aug 3, 2009, 12:09:14 PM8/3/09
to osg-...@lists.openscenegraph.org
We've tried to temporarily fix the problem on our side by commenting out
osg::DisplaySettings::readEnvironmentalVariables() implementation.

But the deadlock was not gone, now it locks on the following line:

Glextensions.cpp, line 42 (oh, the irony)
static const char* envVar = getenv("OSG_GL_EXTENSION_DISABLE");

I don't think it can get any simpler than that.


Aymeric, are we still trying to make a small example program? I was
thinking that you could spawn several processes of an unsafe program
(that would show this problem) in a loop until one of them deadlocks.


Tanguy


-----Original Message-----
From: osg-user...@lists.openscenegraph.org
[mailto:osg-user...@lists.openscenegraph.org] On Behalf Of Aymeric
Barthe
Sent: 30 July 2009 07:46
To: osg-...@lists.openscenegraph.org
Subject: Re: [osg-users] Deadlock when loading osg.dll, singletons are
evil

Tanguy Fautre

unread,
Aug 3, 2009, 2:34:22 PM8/3/09
to OpenSceneGraph Users
Hi all,

With the help of Aymeric, we've been able to create a small program
reproducing the problem. On my machine (Core i7, Vista x64, VC8 and VC9,
Debug and Release) it deadlocks 99% of the time. Other machines have
shown to deadlock less often (Aymeric's machine deadlocks roughly 20% of
time, Dave's only about 10 %).

The example program is very simple. While you'll find the full program
with its cmake files attached, I've copy/pasted the heart of the program
directly into this post. The full explanation of why it is deadlocking
has already been posted by Aymeric
(http://forum.openscenegraph.org/viewtopic.php?p=15636).

Cheers,

Tanguy

// Library.cpp (to be compiled in Library.dll)

static const char * env =
std::getenv("LIBRARY_UBER_OPTION_ENV_VARIABLE");

// Main.cpp (to be compiled in Main.exe)

void libraryThreadMain(void *)
{
LoadLibraryA("Library.dll");
}

void threadMain(void * argv)
{
const char * prog_name = ((char **)argv)[0];

struct stat sd;
stat(prog_name, &sd);
}

int main(int argc, char * argv)
{
HANDLE thread_a = (HANDLE) _beginthread(&libraryThreadMain, 0,
0);
HANDLE thread_b = (HANDLE) _beginthread(&threadMain, 0, argv);

WaitForSingleObject(thread_a, INFINITE);
WaitForSingleObject(thread_b, INFINITE);

CloseHandle(thread_a);
CloseHandle(thread_b);
}

SingletonDeadlock.zip

Robert Osfield

unread,
Aug 5, 2009, 3:56:01 AM8/5/09
to OpenSceneGraph Users
Hi Tanguy,

I don't have a windows box to test against so can't do anything first
hand. It does very much sound like Windows threading issue that will
require a specific Windows mechanism in your application set up to
avoid hitting these problems. I don't what what the solution might
be, but perhaps it would require serialization of all system calls
like getenv.

The solution will not be ditching singletons. They are extremely
useful to design and implementation of the OSG. You simply can't
replace them without major surgery of API and implementation, and even
if you did you'd loose significant flexibility and function -
something which is currently loosely coupled (yes a great OO asset)
would end up be far more tightly coupled than it is now.

Please remember that the OSG is used heavily by thousands of
developers on thousands of applications across dozens of platforms,
with many applications which are heavily threaded - and it's been
under decades development without this issue cause problems. The
problems you are seeing are serious, but they aren't the norm, it's
because of your particular application usage model and platform. I'm
certainly open to finding an OSG solution, but it'll have to be
unobtrusive and require minimal code and API changes, otherwise it
simply won't be merged as the cost and risks to existing users will be
too great.

As a general note, please abstain from using immotive language to try
and make your point. I have found over the years that the quality of
reasoning if often inversely proportional to how immotive the language
is used by a poster, so once I see such language I know it's a early
warning of what may well be an unreasonable request. If you have a
good point, you don't need trumped up rhetoric, the facts speak for
themselves.

Robert.

Tanguy Fautre

unread,
Aug 5, 2009, 7:55:06 AM8/5/09
to OpenSceneGraph Users
Hi Robert,

It is indeed a Windows specific issue. If you get the chance to test the example on a Windows machine, it would really be appreciated. Aymeric and I have spent 3 days tracking, understanding and reproducing the problem. As a consequence, a fair amount of detailed technical explanation and MSDN references have been posted on this discussion to illustrate it.

It is indeed a problem that requires a specific set of conditions to be triggered. It mainly requires a Windows application to be already running several threads when loading another dll (in the example, the dll is explicitly loaded to easily to reproduce the problem). It is a form of subtle and intricate race condition.

We've had that potential race condition in our code for many months, but it was not until recently that the we saw that deadlock occurs in a consistent manner. Hence, I'm not surprised that nobody reported this issue before.

I apologize if I've appeared too emotional. The fact that the discussion has been mixing two points (the deadlock technical details and the singleton design critic) is probably not helping.

Although I will not hide what I think about singletons. Just in the last two weeks, not counting the deadlock with osg.dll, we've spent quite a long time tracking two thread-safety issues in OpenSSL and LibCurl that are actually related to the singleton design (for the story, the OpenSSL developers have acknowledged it as an API limitation and the LibCurl developers have already committed a fix). Hence, it may not be surprising that I appear emotional about singletons. My experience leads me to disagree on the statement that singletons help in having a loosely coupled design. I think that singletons are actually a hidden and tightly coupling design.


To come back to the deadlock issue, I don't think there is any way to fix it without breaking the current OSG API compatibility. While I would favour removing the singletons (and would heavily suggest other designs for OSG 3.0), I perfectly understand and agree with you that such an approach is unacceptable in the short term.

The least disruptive solution I can think of (while being quite robust) would be to introduce an initOsg() and a destroyOsg() function. It's a fairly common approach and is in fact the one mandated by MSDN regarding the limitations of DllMain (and the deadlocks that may follow if violated).

initOsg() would initialize and construct all the global variables of Osg when called, while destroyOsg() would take care of the destruction of such objects.

The benefits are twofold:
- The user would have a better control on the lifetime of the OSG and its global variables (among other things, solving the deadlock issue exposed in this discussion).
- The user would have the ability to reset the library in a predictable manner, which is currently impossible.

A few points should be observed:
1. init/destroyOsg needs to be referenced counted (allowing multiple and re-entrant initializations)
2. init/destroyOsg needs to be thread-safe
3. init/destroyOsg needs to be aware that osg is divided into several components (e.g. osg, osgDB, osgViewer, etc). It would probably be needed that the user can in fact select which components of osg he wants to init/destroy (in which case points 1 and 2 need to be done per component).


Regards,

Tanguy

-----Original Message-----
From: osg-user...@lists.openscenegraph.org [mailto:osg-user...@lists.openscenegraph.org] On Behalf Of Robert Osfield
Sent: 05 August 2009 08:56
To: OpenSceneGraph Users
Subject: Re: [osg-users] Deadlock when loading osg.dll, singletons are evil

Robert Osfield

unread,
Aug 5, 2009, 8:25:52 AM8/5/09
to OpenSceneGraph Users
Hi Tanguy,

On Wed, Aug 5, 2009 at 12:55 PM, Tanguy
Fautre<tan...@aristechnologies.com> wrote:
> To come back to the deadlock issue, I don't think there is any way to fix it without breaking the current OSG API compatibility. While I would favour removing the singletons (and would heavily suggest other designs for OSG 3.0), I perfectly understand and agree with you that such an approach is unacceptable in the short term.

Um... I have yet to see a compelling reason for not using singletons
like Registry or a decent proposal for a replacement. For OSG-3.0 my
plan is to focus on refactoring just the core scene graph and how
rendering is implemented, for the rest of the API changes we should
keep as minimal as possible to try and reduce to headache of porting
from OSG-2.x to OSG-3.x. We can't avoid refactoring the core scene
graph state management to be able to do shader composition, but the
rest of changes we minimize.

> The least disruptive solution I can think of (while being quite robust) would be to introduce an initOsg() and a destroyOsg() function. It's a fairly common approach and is in fact the one mandated by MSDN regarding the limitations of DllMain (and the deadlocks that may follow if violated).

My gut reaction is YUK. Design wise, implementation wise and support
wise I know it's a hack.


> initOsg() would initialize and construct all the global variables of Osg when called, while destroyOsg() would take care of the destruction of such objects.

OK. So this uber initOsg() method would need to initialize all global
variables and hence know about all global variance and hence be
tightly coupled with all global variables. Now if you have a very
simply set of libs to work with, and everything is fixed in it's
relationship this might work, but... if you have a loosely coupled
design that allows you to extend it at runtime what then?? For
instance NodeKits... who inits these? Do we have explict init's for
each NodeKit the user might use. What about NodeKits that are pulled
in at runtime to enable the loading of a particular databases?


> The benefits are twofold:
> - The user would have a better control on the lifetime of the OSG and its global variables (among other things, solving the deadlock issue exposed in this discussion).
> - The user would have the ability to reset the library in a predictable manner, which is currently impossible.
>
> A few points should be observed:
> 1. init/destroyOsg needs to be referenced counted (allowing multiple and re-entrant initializations)
> 2. init/destroyOsg needs to be thread-safe
> 3. init/destroyOsg needs to be aware that osg is divided into several components (e.g. osg, osgDB, osgViewer, etc). It would probably be needed that the user can in fact select which components of osg he wants to init/destroy (in which case points 1 and 2 need to be done per component).

OK state all possible benefits and ignore the monstrous downsides such
as tight coupling and blowing out of the water the extensibility that
the OSG is known for.

I sure want to see a good solution to the problems you are seeing but
I won't idly stand by while the OSG is made less easy to use and less
extensible. You go explain to all the thousands of OSG users that
having to re-factor they code for no gain, whilst compromising in
extensibility that just supporting your particular usage pattern.

What we have at hand is very specific issue that occurs when a very
specific set of things are done in a client application, it's not a
general problem that requires a major re-factor. Careful initiation
at your end may well be the solution.

Robert.

Tanguy Fautre

unread,
Aug 6, 2009, 7:03:07 AM8/6/09
to OpenSceneGraph Users

Hi Robert,

You do have to understand that what is currently happening is an
implicit init/destroyOsg through DllMain. The problem is that DllMain is
full of limitations. Only a handful of kernel functions can be safely
called in DllMain. Worse, Microsoft in all their wisdom does not provide
the list of safe functions.

According to DllMain documentation on MSDN, and due to the fact that
it's an implementation detail, calling any C standard functions in
DllMain is undefined behaviour. Our "particular" scenario is just one
possible way things can go wrong. MSDN documentation states that an
access violation error is also possible.


I've been thinking last night to try finding a solution to the problem
that is less intrusive on the API than an explicit init/destroyOsg. I
was thinking of using a "lazy initialisation" on any singleton that
makes a C function call. For example:

static const char * env = getenv("OSG_GL_EXTENSION_DISABLE");

could be replaced by

static Mutex s_mutex;

const char * getSingletonGlExtensionDisable()
{
ScopedLock lock(s_mutex);

static const char * env = getenv("OSG_GL_EXTENSION_DISABLE");

return env;
}


Such a solution guarantees that no C functions is called in DllMain
during DLL loading (note that the Mutex construction is guaranteed to be
safe by MSDN doc). Although, destruction of the singletons still occurs
in DllMain, so it should be checked that the singleton destructors do
not call any C functions.

I'm aware that such a solution does have a performance impact because of
the mutex (the mutex is mandatory to guarantee thread safety). However,
if an object needs to often access a singleton, it can cache the
returned pointer (once an object's got the singleton, it is safe to use
the returned pointer without having to call getSingleton() again and
again, except of course when DllMain is called to unload the DLL).

It's not a perfect solution, but I think it's better than the current
undefined behaviour. What do you think?

Regards,

Tanguy


-----Original Message-----
From: osg-user...@lists.openscenegraph.org
[mailto:osg-user...@lists.openscenegraph.org] On Behalf Of Robert
Osfield
Sent: 05 August 2009 13:26
To: OpenSceneGraph Users
Subject: Re: [osg-users] Deadlock when loading osg.dll, singletons are
evil

Daniel Trstenjak

unread,
Aug 6, 2009, 7:23:51 AM8/6/09
to OpenSceneGraph Users

Hi Tanguy,

> static Mutex s_mutex;
>
> const char * getSingletonGlExtensionDisable()
> {
> ScopedLock lock(s_mutex);
>
> static const char * env = getenv("OSG_GL_EXTENSION_DISABLE");
>
> return env;
> }

This doesn't make sense, because 'getenv' is called only once, during
the initialization of the static variable 'env'. After that the
value of 'env' doesn't change, but for every call to
'getSingletonGlExtensionDisable' a lock is used.

If the value of 'OSG_GL_EXTENSION_DISABLE' shouldn't change during
runtime:

static Mutex s_mutex;

const char* getSingletonGlExtensionDisable()
{
static char* env = 0L;
if (env == 0L) {
ScopedLock lock(s_mutex);


env = getenv("OSG_GL_EXTENSION_DISABLE");
}

return env;
}


Greetings,
Daniel


--

Daniel Trstenjak Tel : +49 (0)7071-9457-264
science + computing ag FAX : +49 (0)7071-9457-511
Hagellocher Weg 73 mailto: Daniel.T...@science-computing.de
D-72070 Tübingen WWW : http://www.science-computing.de/
--
Vorstand/Board of Management:
Dr. Bernd Finkbeiner, Dr. Roland Niemeier,
Dr. Arno Steitz, Dr. Ingrid Zech
Vorsitzender des Aufsichtsrats/
Chairman of the Supervisory Board:
Michel Lepert
Sitz/Registered Office: Tuebingen
Registergericht/Registration Court: Stuttgart
Registernummer/Commercial Register No.: HRB 382196

Tanguy Fautre

unread,
Aug 6, 2009, 7:40:55 AM8/6/09
to OpenSceneGraph Users
Hi Daniel,

Your proposed solution looks like a variant of the Double Checked Locking. Unfortunately, this design pattern is very subtly broken and not thread safe. See http://www.aristeia.com/Papers/DDJ_Jul_Aug_2004_revised.pdf for more information.

I've however shown this example just to illustrate a more general problem, as most of the getenv() are (in)directly called from within class constructors. Hence, what we're looking as is something more similar to the following.

static Mutex s_mutex;

Object & getSingletonObject()
{
ScopedLock lock(s_mutex);

static Object object;

return object;
}


Cheers,

Tanguy


-----Original Message-----
From: osg-user...@lists.openscenegraph.org [mailto:osg-user...@lists.openscenegraph.org] On Behalf Of Daniel Trstenjak
Sent: 06 August 2009 12:24
To: OpenSceneGraph Users
Subject: Re: [osg-users] Deadlock when loading osg.dll, singletons are evil

Robert Osfield

unread,
Aug 6, 2009, 7:48:04 AM8/6/09
to OpenSceneGraph Users
Hi Tanguy,

I was wondering about the possibility of wrapping up the access to the
sensitive C functions using a Mutex. We couldn't do it in local code,
but would need to do it via a single custom getenv() etc. function
implementation, otherwise we'd still end up with multiple functions
potentially accessing standard C functions like getenv in a parallel.

Like Daniel's suggestion of moving the Mutex into the getenv init
avoiding the overuse of the locking the mutex, doing the lock in
central function would also achieve this aim.

So perhaps we co do something like:

static Mutex s_mutex;

const char* serialized_getenv(const char* str)
{
ScopedLock lock(s_mutex);
return getenv(str);
}

void* serialized_otherSensitiveCFunction(...)
{
ScopedLock lock(s_mutex);
return otherSensitiveCFunction(...);
}

Then all the init code would call these methods. Note the
implementations would all be in the .cpp's.

Thoughts?

Tanguy Fautre

unread,
Aug 6, 2009, 7:53:37 AM8/6/09
to OpenSceneGraph Users
Hi Robert,

Unfortunately serializing the access to getenv() is not likely to solve
the issue. The problem is not that OSG calls several getenv() in
parallel, the problem is that getenv() should not be called *at all* in
DllMain().

Serializing getenv() in OSG does not guarantee that another C function
will not be called in another thread unrelated to OSG. In fact, this is
exactly what is happening to us.

Wrt to Daniel's suggestion, it's a variant of the double checked
locking, which is unfortunately subtly broken (cf. previous post).


Regards,

Tanguy


-----Original Message-----
From: osg-user...@lists.openscenegraph.org
[mailto:osg-user...@lists.openscenegraph.org] On Behalf Of Robert
Osfield
Sent: 06 August 2009 12:48
To: OpenSceneGraph Users
Subject: Re: [osg-users] Deadlock when loading osg.dll, singletons are
evil

Daniel Trstenjak

unread,
Aug 6, 2009, 8:03:14 AM8/6/09
to OpenSceneGraph Users

Hi Tanguy,

> Your proposed solution looks like a variant of the Double Checked
> Locking. Unfortunately, this design pattern is very subtly broken and
> not thread safe. See
> http://www.aristeia.com/Papers/DDJ_Jul_Aug_2004_revised.pdf for more
> information.

It depends what you want to make thread safe. If the call to
'getSingletonGlExtensionDisable' should be thread safe, than
you're right. If it's enough to make the call to 'getenv'
thread safe, than the solution should be alright.

Robert Osfield

unread,
Aug 6, 2009, 8:13:40 AM8/6/09
to OpenSceneGraph Users
Hi Tanguy,

On Thu, Aug 6, 2009 at 12:53 PM, Tanguy
Fautre<tan...@aristechnologies.com> wrote:
> Unfortunately serializing the access to getenv() is not likely to solve
> the issue. The problem is not that OSG calls several getenv() in
> parallel, the problem is that getenv() should not be called *at all* in
> DllMain().
>
> Serializing getenv() in OSG does not guarantee that another C function
> will not be called in another thread unrelated to OSG. In fact, this is
> exactly what is happening to us.

OK... but as yet I'm still not clear on why you suggestion avoids the
init. Is it simple that static's in the global scope are getting
initialized automatically, while ones inside methods are only
initialized on first call... So you've moved the static initializer
into the method...

Tanguy Fautre

unread,
Aug 6, 2009, 8:23:38 AM8/6/09
to OpenSceneGraph Users
Hi Robert,

As you said, doing (2)

const char * getSingletonGlExtensionDisable()
{
ScopedLock lock(s_mutex);
static const char * env = getenv("OSG_GL_EXTENSION_DISABLE");
return env;
}

instead of (1)

static const char * env = getenv("OSG_GL_EXTENSION_DISABLE");

Ensures that the singleton initialization occurs only when the function
getSingletonGlExtensionDisable is called.

The problem is that solution (1) implies that the initialization is done
in DllMain: when osg.dll is loaded, the singleton will be automatically
initialized, which leads to getenv() being called in DllMain.

Solution (2) delays the initialization until the function getSingleton()
is called. If there is no global singleton left (or more precisely, if
there is no global scoped variable initialization code calling
getSingleton()), then you have the guarantee that the singleton will
only be initialized when the user starts explicitly calling osg
functions/classes.

In other words, solution (2) guarantees that DllMain has long been
executed before any singleton initialization occurs. Which is exactly
want we want.

Regards,

Tanguy


-----Original Message-----
From: osg-user...@lists.openscenegraph.org
[mailto:osg-user...@lists.openscenegraph.org] On Behalf Of Robert
Osfield
Sent: 06 August 2009 13:14
To: OpenSceneGraph Users
Subject: Re: [osg-users] Deadlock when loading osg.dll, singletons are
evil

Robert Osfield

unread,
Aug 6, 2009, 8:27:33 AM8/6/09
to OpenSceneGraph Users
Hi Tanguy,

As a first step I've just moved the GLExtension.cpp code that does the
static getenv() into the osg::getGLExtensionDisableString() method so
it now reads:

std::string& osg::getGLExtensionDisableString()
{


static const char* envVar = getenv("OSG_GL_EXTENSION_DISABLE");

static std::string
s_GLExtensionDisableString(envVar?envVar:"Nothing defined");

return s_GLExtensionDisableString;
}

This isn't yet thread safe, but since the problem in hand isn't
directly a threading one this should be at least a little step towards
helping solve the problem. Modified file attached.

Could you try this modification out at your end to see if it is
appropriate step forward. A quick search shows other places where
there is static getenv in the global scope so I don't expect this one
tweak to be a complete solution, but if it does look like a way
forward then we can start rolling the fix out more generally.

Cheers,
Robert.

GLExtensions.cpp

Tanguy Fautre

unread,
Aug 6, 2009, 8:58:58 AM8/6/09
to OpenSceneGraph Users
Hi Robert,

We've got a custom build of OSG where we've commented out all the unsafe
getenv (we do not use env variables in our application anyway).

I'm gonna give your patch a few runs. In theory, it should not deadlock
(considering all the other unsafe getenv are already commented out).


Cheers,

Tanguy


-----Original Message-----
From: osg-user...@lists.openscenegraph.org
[mailto:osg-user...@lists.openscenegraph.org] On Behalf Of Robert
Osfield
Sent: 06 August 2009 13:28
To: OpenSceneGraph Users
Subject: Re: [osg-users] Deadlock when loading osg.dll, singletons are
evil

Hi Tanguy,

return s_GLExtensionDisableString;
}

Cheers,
Robert.

Tanguy Fautre

unread,
Aug 7, 2009, 10:14:01 AM8/7/09
to OpenSceneGraph Users
Hi Robert,

I had the chance to test your patch. As expected, it fixes the problem.
The debugger shows the singleton is only initialized well after DllMain
has been called. No deadlock to report.

As you stated, we need to fix all the getenv() that can be called from
constructing a global singleton. We've seen our app deadlocking on them
before.

Not sure if you want to apply the patch as it stands though, as
thread-safety is quite important to us.

Cheers,

T

Robert Osfield

unread,
Aug 7, 2009, 10:51:01 AM8/7/09
to OpenSceneGraph Users
Hi Tanguy,

On Fri, Aug 7, 2009 at 3:14 PM, Tanguy
Fautre<tan...@aristechnologies.com> wrote:
> I had the chance to test your patch. As expected, it fixes the problem.
> The debugger shows the singleton is only initialized well after DllMain
> has been called. No deadlock to report.

Great, a good first step then.

> As you stated, we need to fix all the getenv() that can be called from
> constructing a global singleton. We've seen our app deadlocking on them
> before.

With your other work I presume you've not enumerated all the files
that have this global static's that use getenv? Do you have a list?

My thought is to look at the usage pattern and then come up with
standard implementation pattern for these variables. Clearly it'll
involve a singleton access method/function, but this might be
something we can use macro's or templates to help keep regular.

> Not sure if you want to apply the patch as it stands though, as
> thread-safety is quite important to us.

Init is usually done single threaded, is there something specific
about the threading model you have that concerns you? There may be
cases which threading may be an issue, but most cases I'd guess as not
being threaded, we'll want to avoid adding mutex locks across the
board.

Tanguy Fautre

unread,
Aug 7, 2009, 12:56:48 PM8/7/09
to OpenSceneGraph Users
Hi Robert,

>From our SVN log I've extracted a unified diff of the modifications I've
had to make on our branch of OSG (i.e. a stabilized trunk snapshot taken
around OSG 2.9.3). This should give a fair idea of the places where I
found a potentially dangerous getenv(). See attached file.

Be careful! It's not impossible I've missed a couple (especially in
other modules than osg.dll).

But as importantly, I've probably commented out a couple false positives
(i.e. getenv() that are perfectly safe). I can probably help you making
a more exhaustive and safer list by using the debugger to see exactly
which getenv() are really getting called in DllMain() (note that's what
I already did for most, but not all).


> Init is usually done single threaded, is there something specific
> about the threading model you have that concerns you? There may be
> cases which threading may be an issue, but most cases I'd guess as not
> being threaded, we'll want to avoid adding mutex locks across the
> board.

The problem with the getSingleton() solution is that you cannot tell
when they will be called. Hence, even with the best intentions in the
world you cannot guarantee that two threads won't call getSingleton() at
the same time. I'm afraid that if we don't make the getSingleton()
functions thread-safe, the fix will actually introduce more threading
bugs than it actually fixes.

I guess that what is needed to reduce the impact of locking in
getSingleton is to:

1. Check whether a particular getSingleton() is potentially called
frequently (if it is not, we can probably afford the lock).

2. Check whether the call to getSingleton() can be cached by the
object(s) using it. For example, if an object needs to call getSingleton
quite enough, then it should call it in its constructor and cache the
returned address/reference in a member variable.

3. See whether the call to getSingleton() is really necessary and
whether it could be moved somewhere else where it does not have to be
called that often. (same idea as 2.)


Cheers,

Tanguy

-----Original Message-----
From: osg-user...@lists.openscenegraph.org
[mailto:osg-user...@lists.openscenegraph.org] On Behalf Of Robert
Osfield
Sent: 07 August 2009 15:51
To: OpenSceneGraph Users
Subject: Re: [osg-users] Deadlock when loading osg.dll, singletons are
evil

FilesToBeModified.txt
Reply all
Reply to author
Forward
0 new messages