I'm having this problem that was earlier described by Eric Hughes on  
this list:
http://thread.gmane.org/gmane.comp.encryption.cryptopp/2305
Here is the buildbot (automated unit tester) which shows how the  
problem manifests when I run my unit tests:
http://allmydata.org/buildbot-pycryptopp/builders/gutsy-syslib/builds/ 
20/steps/test/logs/stdio
I've built a dynamic library named 'ecdsa.so' which depends on  
libcryptopp.so.  When I try to use it I get this error:
terminate called after throwing an instance of  
'CryptoPP::NameValuePairs::ValueTypeMismatch'
   what():  NameValuePairs: type mismatch for 'InputBuffer', stored  
'N8CryptoPP23ConstByteArrayParameterE', trying to retrieve  
'N8CryptoPP23ConstByteArrayParameterE'
The problem as I currently understand it is that the  
ConstByteArrayParameter class is defined in algparam.h, which is  
(transitively) #include'd by my code, and so when I build my  
ecdsa.so, any RTTI code in ecdsa.so (for example the typeid operator)  
will identify the copy of ConstByteArrayParameter that was built into  
ecdsa.so.  If at run-time I want to compare type information, such as  
for the named-argument features which Crypto++ uses, then if one of  
the arguments came from libcryptopp.so and the other came from  
ecdsa.so, or equivalently if libcryptopp.so is throwing an exception  
of a given type and ecdsa.so is catching exceptions of that type,  
then the types will not compare equal and the program will fail.
I've spent about a day learning about dynamic linking and visibility  
and so forth, and I don't see a clean way to fix this without  
changing the Crypto++ source code.  There is an unclean way to fix  
it, which is to set the RTLD_GLOBAL flag before dlopening ecdsa.so.   
This was offered as a solution by Geoff Beier:
http://allmydata.org/pipermail/tahoe-dev/2009-February/001153.html
http://www.opengroup.org/onlinepubs/009695399/functions/dlopen.html
I think what this does is makes it so that symbols defined in  
ecdsa.so when it is loaded are used to resolve libcryptopp.so's needs  
for symbols.  This happens to fix the two known problems that my  
project currently encounters, but it seems fragile.  For example, if  
ecdsa.so is the one raising the exception and libcryptopp.so is the  
one trying to catch it, this won't work, will it?  Also if some  
program has unluckily loaded libcryptopp.so before it loads my  
ecdsa.so, then it will be too late for my RTLD_GLOBAL to take effect.
The only clean, long-term solution I can think of for the long term  
is to change Crypto++ to remove definitions from header files so that  
code like mine will get only undefined symbols by #include'ing Crypto+ 
+ header files.  For example, if ConstByteArrayParameter were  
declared as "class ConstByteArrayParameter;" in algparam.h and  
defined only in some implementation file such as "algparam.cpp", then  
the compiler and linked when building my ecdsa.so would know that it  
didn't know the actual type of ConstByteArrayParameter and any RTTI  
code would be compiled to use an indirect symbol that would not be  
resolved until load time.  Is that right?
Is there any easier solution for me?  I'd like to build ecdsa.so that  
it requires libcryptopp.so at load-time, and that ecdsa.so gets all  
Crypto++ symbols resolved at load-time from libcryptopp.so's  
definitions of those symbols.
Thanks!
Regards,
Zooko
I'm not sure I understand what you're suggesting. If algparam.h only 
contains "class ConstByteArrayParameter;", how do users call its members?
Is this still a Mac specific problem, BTW?
 
> I'm not sure I understand what you're suggesting. If algparam.h  
> only contains "class ConstByteArrayParameter;", how do users call  
> its members?
Oh, I'm sorry.  I was vaguely thinking of the way in C that you can  
separate definitions from declarations, but as my knowledge of C++  
slowly and painfully swaps back into my head, I remember that it  
isn't so simple.
How about this question:  is there a way in C++ to say that the  
following identifier denotes something which *must* not be treated as  
defined in the current compilation module?  This is what "extern"  
does in C, but "extern" doesn't apply to classes.
> Is this still a Mac specific problem, BTW?
No, it seems to be general to gcc.  It is blocking me from getting  
Tahoe-LAFS into Debian and Fedora.  It also strikes on Mac.
Regards,
Zooko
It is as I feared -- using the RTLD_GLOBAL flag for dlopen solves the  
problem in question but leads to other problems.
My project produces four different .so files, each of which is built  
by including some of the .o files from Crypto++.  If I set  
RTLD_GLOBAL then if more than one of these .so files is loaded into  
the same process, the second and later ones to be loaded have  
something messed up which quickly leads to a crash.  Attached is the  
output of valgrind showing the details of one such crash.  (This is  
all 100% reproducible using the pycryptopp unit test suite.)
What I really want is for there to exist some way in C++ that you can  
express the following alternatives:
1.  For a given symbol, for example the type_info of an exception  
class, then any code which #included that symbol, when loaded at run- 
time, will get the same unique value so that throw and catch and name- 
based arguments will work between any pair of DSOs.
2.  For other symbols, any code which #includes that code will get  
its own separate address (symbol value) at run-time so that changes  
made to the value stored in that address (symbol) by one DSO won't  
affect other DSOs.
I *think* that this is the intent of __declspec(dllexport) and its  
brethren.
However, not understanding how to make this work with g++, if it is  
even possible, or if it is even a coherent thing to want, my next  
step is to declare that you can't have more than one DSO which uses  
Crypto++ code in your process, so I'll refactor my pycryptopp library  
to build all of the four features (AES, SHA256, RSA, and ECDSA), as  
well as upcoming features (XSalsa20, Tiger) in one DSO which is  
linked by including .o files from Crypto++.  This will work as long  
as nobody tries to use my DSO along with another DSO which also uses  
Crypto++.
Oh, there's another alternative open to me -- make the pycryptopp  
build system build a custom Crypto++ DSO (or maybe just the standard  
Crypto++ DSO as specified by the Crypto++ GNUmakefile), and then  
build four DSOs each of which dynamically links to that shared DSO  
and continue to use RTLD_GLOBAL.  This is actually already shown to  
work for our current unit test suite, but I don't trust it because I  
don't understand why RTLD_GLOBAL causes these crashes in other  
situations.
Regards,
Zooko
---
Tahoe, the Least-Authority Filesystem -- http://allmydata.org
store your data: $10/month -- http://allmydata.com/?tracking=zsig
I am available for work -- http://zooko.com/résumé.html
On May 23, 2009, at 21:03 PM, Zooko Wilcox-O'Hearn wrote:
> my next step is to declare that you can't have more than one DSO  
> which uses Crypto++ code in your process, so I'll refactor my  
> pycryptopp library to build all of the four features (AES, SHA256,  
> RSA, and ECDSA), as well as upcoming features (XSalsa20, Tiger) in  
> one DSO which is linked by including .o files from Crypto++.  This  
> will work as long as nobody tries to use my DSO along with another  
> DSO which also uses Crypto++.
This was imprecise.  There are two known problems.  One is if you do  
*not* turn on the RTLD_GLOBAL flag for dlopen(), and you try to pass  
a type_info between DSO's, such as by throwing an exception from  
libcryptopp.so and catching that exception in rsa.so, or such as by  
using the named-arguments feature.  (I'm not sure precisely how that  
latter one results in type_info crossing a DSO boundary, but  
apparently it does.)
This is the problem that Eric Hughes reported two and a half years  
ago [1] and that I started trying to solve a week ago [2].
The other is if you *do* turn on the RTLD_GLOBAL flag for dlopen(),  
and you try to load multiple DSOs which use symbols by the same name  
(because they each separately #included those symbols from Crypto++  
header files), but which are supposed to be private to the DSO.  This  
is the second sort of failure that I reported yesterday along with a  
stack trace from valgrind: [3].
So, if I go the first route, leaving RTLD_GLOBAL off and packing  
together all my crypto functionality into one DSO, then probably no  
harm will result because exceptions and named-arguments are not part  
of the API of my modules, therefore presumably nobody will ever try  
to catch exceptions thrown from my DSO.
The sticking point here is that Debian and Fedora have a policy that  
any code which uses a library  *must* be linked against the system- 
provided shared library.  The Tahoe-LAFS project, if it is to be  
included in Debian and Fedora, is not allowed to build its own copy  
of Crypto++ internally -- it is required to re-use the system- 
provided shared library of Crypto++.
Hm.  I'm not sure, but I think that means I will have to implement  
*both* of these workarounds.  I'll have to turn on RTLD_GLOBAL so  
that I can link against the system-provided libcryptopp.so on those  
two operating systems, and I'll also have to bundle my crypto code  
together into a single DSO in order to avoid the symbol collisions  
caused by turning on RTLD_GLOBAL.
Sigh.  I really feel like there must be a general solution to this.   
I suspect that the __cdecl(dllexport) machinery that is already baked  
into Crypto++ for building DSOs on Windows (DLLs) could probably be  
used to solve my problem if only I understood it better.  See also  
http://gcc.gnu.org/wiki/Visibility .
Thanks!
Regards,
Zooko
[1] http://thread.gmane.org/gmane.comp.encryption.cryptopp/2305
[2] http://groups.google.com/group/cryptopp-users/browse_thread/ 
thread/eb815f228db50380
[3] http://groups.google.com/group/cryptopp-users/msg/1a5553410c6976e5
>  I suspect that the __cdecl(dllexport) machinery that is already baked
>  into Crypto++ for building DSOs on Windows (DLLs) could probably be
>  used to solve my problem....
declspec(dllexport) is used to export variables, functions, and
classes [1]. In C++, the functions are exported with mangled names, so
they are usually accompanied by 'extern C'. (Also of interest might be
'Using dllimport and dllexport in C++ Classes' [2].)
Richter gives the subject a very nice treatment in 'Programming
Application for Microsoft Windows' and its successor 'Windows via
C/C++'. If the topic were covered by W. Richard Stevens, the Unix
programming series would be a great reference. Unfortunately SO's were
not around when the books were written.
[1] http://msdn.microsoft.com/en-us/library/3y1sfaz2.aspx
[2] http://msdn.microsoft.com/en-us/library/81h27t8c.aspx
> Zooko, have you tried asking for advice on other mailing lists?
Okay, some googling about showed me a mailing list that is likely to  
help -- the Python cplusplus-sig list.  Here is my summary of the  
problem and the four possible solutions that I can think of:
http://mail.python.org/pipermail/cplusplus-sig/2009-May/014531.html
Please read it yourself in case doing so provides some flash of  
insight that you can share with me.
Thank you,
By the way, who added the features to GNUmakefile to build a "dll"  
using gcc?  And what is it for?
I'm guessing that this is for building a DLL on Windows where "gcc"  
means mingw.  I've been tinkering with porting it to Linux in order  
to build a dynamic library (.so) on Linux which exports only the  
symbols marked by "CRYPTOPP_DLL".  There appears to be some bitrot,  
for example the DLLSRCS variable in GNUmakefile seems to omit  
some .cpp files that are necessary, possibly because those .cpp files  
were added after this feature of the GNUmakefile was added.
Regards,
Zooko