Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

exceptions across DSO boundary - unrecognised when caught

3 views
Skip to first unread message

mitch

unread,
Nov 11, 2009, 1:18:57 PM11/11/09
to
Hi all

Would much appreciate a little help with a problem. In short, my
executable is linked (implicitly) to a shared library X.so. X.so uses

dlopen(path.c_str(), RTLD_NOW | RTLD_DEEPBIND | RTLD_GLOBAL)

to open Y.so, then dlsym() to obtain an exported C function from Y.
The exported C function returns an object of class P, and calling a
member function of P raises an exception. This exception leaves Y as a
known type, and is caught as (...), i.e. unrecognised, in X.

I think I understand the principles of this problem, but try as I
might (and I have tried a lot), I just can't get an exception thrown
in Y to be caught as the same type in X. I know this is sort of an
FAQ, and I've read a few how-tos about it over the past couple of
days, but I must still be doing something wrong. Perhaps someone can
point out my silly mistake?

The exception class (brahms::error::Error) is declared in error.h, and
defined in error.cpp, which gets compiled into a X.so. I run:

nm -C -D X.so | grep error | grep typeinfo

and get...

000afb84 V typeinfo for brahms::error::Error
00096b5b V typeinfo name for brahms::error::Error

I compile Y.so, which includes error.h, and run the same nm line on
that to give

0002ae44 V typeinfo for brahms::error::Error
0002409d V typeinfo name for brahms::error::Error

That is, the symbol brahms::error::Error typeinfo is "V" in both DSOs.
also for "typeinfo name". I can't decode "typeinfo name", nor can I
really say I understand the significance of "V", as it is described in
man nm (it will defer to a "normally defined" symbol, but I can't find
a definition of that, nor any indication of how to generate one for a
typeinfo). My compile lines are:

g++ -D_GNU_SOURCE -fPIC -Werror -Wall -ffast-math -pthread -O3 -
fvisibility=hidden -fvisibility-inlines-hidden X.cpp -o X.so.0.7.3

g++ -D_GNU_SOURCE -fPIC -Werror -Wall -ffast-math -pthread -O3 -
fvisibility=hidden -fvisibility-inlines-hidden -shared Y.cpp -o Y.so.
0.7.3

(some local information ellided). I am using the "hidden" flags on the
advice of http://gcc.gnu.org/wiki/Visibility but I was having this
problem before I tried that approach, so am confident it's unrelated.
Since adding the hidden flags, my exception class is declared with an
attribute, thus:

struct __attribute__ ((visibility("default"))) Error
{
Error();
...
};

in both builds (i.e. there's no macro magic about that attribute, it's
exactly as written above).

Running the following:

LD_DEBUG=files,symbols,bindings myapp > out 2>&1

generates an awful lot of data, of which i imagine the following is
relevant:

...
27234: symbol=_ZTSN6brahms5error5ErrorE; lookup in file=brahms-
execute [0]
27234: symbol=_ZTSN6brahms5error5ErrorE; lookup in file=/usr/lib/
libXt.so.6 [0]
27234: symbol=_ZTSN6brahms5error5ErrorE; lookup in file=/usr/lib/
libXaw.so.7 [0]
27234: symbol=_ZTSN6brahms5error5ErrorE; lookup in file=/lib/tls/
i686/cmov/librt.so.1 [0]
27234: symbol=_ZTSN6brahms5error5ErrorE; lookup in file=X.so [0]
27234: binding file X.so [0] to X.so [0]: normal symbol
`_ZTSN6brahms5error5ErrorE'
...
27234: symbol=_ZTSN6brahms5error5ErrorE; lookup in file=/home/
user/SystemML/bin/Y.so [0]
27234: binding file /home/user/SystemML/bin/Y.so [0] to /home/
user/SystemML/bin/Y.so [0]: normal symbol `_ZTSN6brahms5error5ErrorE'
...

I note there is a path on the second binding, not the first -
relevant? I've no idea. In any case, it seems to agree with the
uncaught exception problem: the symbol is bound to the one in X, from
X, and to the one in Y, from Y. Whereas I think what I'm seeking is
that the symbol in Y gets "bound" (in truth, I'm not 100% on exactly
what that means) to the one in X, and thus they both agree on address-
based typeinfo().

My hope is that I'm just passing a wrong flag to the compiler, or some
such?
Any help much appreciated :)

Ben

mitch

unread,
Nov 11, 2009, 1:21:49 PM11/11/09
to
My compile lines are:
>
> g++ -D_GNU_SOURCE -fPIC -Werror -Wall -ffast-math -pthread -O3 -
> fvisibility=hidden -fvisibility-inlines-hidden X.cpp -o X.so.0.7.3

(in reply to myself)

beg pardon - this compile line also has "-shared" (i.e. they both do)

mitch

unread,
Nov 15, 2009, 6:06:12 PM11/15/09
to
Finally got to the bottom of this, so I'll post my mistake here in
case it helps another.

Trouble was that I was specifying RTLD_DEEPBIND to dlopen(), which
"[places] the lookup scope of the symbols in this library ahead of the
global scope" (man dlopen). Hence the output from "LD_DEBUG..." has
the second binding made immediately, as the symbol is found in the
first library searched (Y.so). I've lost track in the mists of time of
why I was using DEEPBIND at all, and given the reason for its original
introduction - "This can be used to work around problems of morons who
cannot keep ABIs stable and have the same symbol name in different
DSOs." (Ulrich Drepper) - I don't see that it's needed here.

Any chance someone here is in a position to add a short note to this
effect to http://gcc.gnu.org/faq.html#dso ? No matter how many times I
read that FAQ, it didn't click...

cheers
ben

0 new messages