I'm having this problem that was earlier described by Eric Hughes on
this list:
http://thread.gmane.org/gmane.comp.encryption.cryptopp/2305
Here is the buildbot (automated unit tester) which shows how the
problem manifests when I run my unit tests:
http://allmydata.org/buildbot-pycryptopp/builders/gutsy-syslib/builds/
20/steps/test/logs/stdio
I've built a dynamic library named 'ecdsa.so' which depends on
libcryptopp.so. When I try to use it I get this error:
terminate called after throwing an instance of
'CryptoPP::NameValuePairs::ValueTypeMismatch'
what(): NameValuePairs: type mismatch for 'InputBuffer', stored
'N8CryptoPP23ConstByteArrayParameterE', trying to retrieve
'N8CryptoPP23ConstByteArrayParameterE'
The problem as I currently understand it is that the
ConstByteArrayParameter class is defined in algparam.h, which is
(transitively) #include'd by my code, and so when I build my
ecdsa.so, any RTTI code in ecdsa.so (for example the typeid operator)
will identify the copy of ConstByteArrayParameter that was built into
ecdsa.so. If at run-time I want to compare type information, such as
for the named-argument features which Crypto++ uses, then if one of
the arguments came from libcryptopp.so and the other came from
ecdsa.so, or equivalently if libcryptopp.so is throwing an exception
of a given type and ecdsa.so is catching exceptions of that type,
then the types will not compare equal and the program will fail.
I've spent about a day learning about dynamic linking and visibility
and so forth, and I don't see a clean way to fix this without
changing the Crypto++ source code. There is an unclean way to fix
it, which is to set the RTLD_GLOBAL flag before dlopening ecdsa.so.
This was offered as a solution by Geoff Beier:
http://allmydata.org/pipermail/tahoe-dev/2009-February/001153.html
http://www.opengroup.org/onlinepubs/009695399/functions/dlopen.html
I think what this does is makes it so that symbols defined in
ecdsa.so when it is loaded are used to resolve libcryptopp.so's needs
for symbols. This happens to fix the two known problems that my
project currently encounters, but it seems fragile. For example, if
ecdsa.so is the one raising the exception and libcryptopp.so is the
one trying to catch it, this won't work, will it? Also if some
program has unluckily loaded libcryptopp.so before it loads my
ecdsa.so, then it will be too late for my RTLD_GLOBAL to take effect.
The only clean, long-term solution I can think of for the long term
is to change Crypto++ to remove definitions from header files so that
code like mine will get only undefined symbols by #include'ing Crypto+
+ header files. For example, if ConstByteArrayParameter were
declared as "class ConstByteArrayParameter;" in algparam.h and
defined only in some implementation file such as "algparam.cpp", then
the compiler and linked when building my ecdsa.so would know that it
didn't know the actual type of ConstByteArrayParameter and any RTTI
code would be compiled to use an indirect symbol that would not be
resolved until load time. Is that right?
Is there any easier solution for me? I'd like to build ecdsa.so that
it requires libcryptopp.so at load-time, and that ecdsa.so gets all
Crypto++ symbols resolved at load-time from libcryptopp.so's
definitions of those symbols.
Thanks!
Regards,
Zooko
I'm not sure I understand what you're suggesting. If algparam.h only
contains "class ConstByteArrayParameter;", how do users call its members?
Is this still a Mac specific problem, BTW?
> I'm not sure I understand what you're suggesting. If algparam.h
> only contains "class ConstByteArrayParameter;", how do users call
> its members?
Oh, I'm sorry. I was vaguely thinking of the way in C that you can
separate definitions from declarations, but as my knowledge of C++
slowly and painfully swaps back into my head, I remember that it
isn't so simple.
How about this question: is there a way in C++ to say that the
following identifier denotes something which *must* not be treated as
defined in the current compilation module? This is what "extern"
does in C, but "extern" doesn't apply to classes.
> Is this still a Mac specific problem, BTW?
No, it seems to be general to gcc. It is blocking me from getting
Tahoe-LAFS into Debian and Fedora. It also strikes on Mac.
Regards,
Zooko
It is as I feared -- using the RTLD_GLOBAL flag for dlopen solves the
problem in question but leads to other problems.
My project produces four different .so files, each of which is built
by including some of the .o files from Crypto++. If I set
RTLD_GLOBAL then if more than one of these .so files is loaded into
the same process, the second and later ones to be loaded have
something messed up which quickly leads to a crash. Attached is the
output of valgrind showing the details of one such crash. (This is
all 100% reproducible using the pycryptopp unit test suite.)
What I really want is for there to exist some way in C++ that you can
express the following alternatives:
1. For a given symbol, for example the type_info of an exception
class, then any code which #included that symbol, when loaded at run-
time, will get the same unique value so that throw and catch and name-
based arguments will work between any pair of DSOs.
2. For other symbols, any code which #includes that code will get
its own separate address (symbol value) at run-time so that changes
made to the value stored in that address (symbol) by one DSO won't
affect other DSOs.
I *think* that this is the intent of __declspec(dllexport) and its
brethren.
However, not understanding how to make this work with g++, if it is
even possible, or if it is even a coherent thing to want, my next
step is to declare that you can't have more than one DSO which uses
Crypto++ code in your process, so I'll refactor my pycryptopp library
to build all of the four features (AES, SHA256, RSA, and ECDSA), as
well as upcoming features (XSalsa20, Tiger) in one DSO which is
linked by including .o files from Crypto++. This will work as long
as nobody tries to use my DSO along with another DSO which also uses
Crypto++.
Oh, there's another alternative open to me -- make the pycryptopp
build system build a custom Crypto++ DSO (or maybe just the standard
Crypto++ DSO as specified by the Crypto++ GNUmakefile), and then
build four DSOs each of which dynamically links to that shared DSO
and continue to use RTLD_GLOBAL. This is actually already shown to
work for our current unit test suite, but I don't trust it because I
don't understand why RTLD_GLOBAL causes these crashes in other
situations.
Regards,
Zooko
---
Tahoe, the Least-Authority Filesystem -- http://allmydata.org
store your data: $10/month -- http://allmydata.com/?tracking=zsig
I am available for work -- http://zooko.com/résumé.html
On May 23, 2009, at 21:03 PM, Zooko Wilcox-O'Hearn wrote:
> my next step is to declare that you can't have more than one DSO
> which uses Crypto++ code in your process, so I'll refactor my
> pycryptopp library to build all of the four features (AES, SHA256,
> RSA, and ECDSA), as well as upcoming features (XSalsa20, Tiger) in
> one DSO which is linked by including .o files from Crypto++. This
> will work as long as nobody tries to use my DSO along with another
> DSO which also uses Crypto++.
This was imprecise. There are two known problems. One is if you do
*not* turn on the RTLD_GLOBAL flag for dlopen(), and you try to pass
a type_info between DSO's, such as by throwing an exception from
libcryptopp.so and catching that exception in rsa.so, or such as by
using the named-arguments feature. (I'm not sure precisely how that
latter one results in type_info crossing a DSO boundary, but
apparently it does.)
This is the problem that Eric Hughes reported two and a half years
ago [1] and that I started trying to solve a week ago [2].
The other is if you *do* turn on the RTLD_GLOBAL flag for dlopen(),
and you try to load multiple DSOs which use symbols by the same name
(because they each separately #included those symbols from Crypto++
header files), but which are supposed to be private to the DSO. This
is the second sort of failure that I reported yesterday along with a
stack trace from valgrind: [3].
So, if I go the first route, leaving RTLD_GLOBAL off and packing
together all my crypto functionality into one DSO, then probably no
harm will result because exceptions and named-arguments are not part
of the API of my modules, therefore presumably nobody will ever try
to catch exceptions thrown from my DSO.
The sticking point here is that Debian and Fedora have a policy that
any code which uses a library *must* be linked against the system-
provided shared library. The Tahoe-LAFS project, if it is to be
included in Debian and Fedora, is not allowed to build its own copy
of Crypto++ internally -- it is required to re-use the system-
provided shared library of Crypto++.
Hm. I'm not sure, but I think that means I will have to implement
*both* of these workarounds. I'll have to turn on RTLD_GLOBAL so
that I can link against the system-provided libcryptopp.so on those
two operating systems, and I'll also have to bundle my crypto code
together into a single DSO in order to avoid the symbol collisions
caused by turning on RTLD_GLOBAL.
Sigh. I really feel like there must be a general solution to this.
I suspect that the __cdecl(dllexport) machinery that is already baked
into Crypto++ for building DSOs on Windows (DLLs) could probably be
used to solve my problem if only I understood it better. See also
http://gcc.gnu.org/wiki/Visibility .
Thanks!
Regards,
Zooko
[1] http://thread.gmane.org/gmane.comp.encryption.cryptopp/2305
[2] http://groups.google.com/group/cryptopp-users/browse_thread/
thread/eb815f228db50380
[3] http://groups.google.com/group/cryptopp-users/msg/1a5553410c6976e5
> I suspect that the __cdecl(dllexport) machinery that is already baked
> into Crypto++ for building DSOs on Windows (DLLs) could probably be
> used to solve my problem....
declspec(dllexport) is used to export variables, functions, and
classes [1]. In C++, the functions are exported with mangled names, so
they are usually accompanied by 'extern C'. (Also of interest might be
'Using dllimport and dllexport in C++ Classes' [2].)
Richter gives the subject a very nice treatment in 'Programming
Application for Microsoft Windows' and its successor 'Windows via
C/C++'. If the topic were covered by W. Richard Stevens, the Unix
programming series would be a great reference. Unfortunately SO's were
not around when the books were written.
[1] http://msdn.microsoft.com/en-us/library/3y1sfaz2.aspx
[2] http://msdn.microsoft.com/en-us/library/81h27t8c.aspx
> Zooko, have you tried asking for advice on other mailing lists?
Okay, some googling about showed me a mailing list that is likely to
help -- the Python cplusplus-sig list. Here is my summary of the
problem and the four possible solutions that I can think of:
http://mail.python.org/pipermail/cplusplus-sig/2009-May/014531.html
Please read it yourself in case doing so provides some flash of
insight that you can share with me.
Thank you,
By the way, who added the features to GNUmakefile to build a "dll"
using gcc? And what is it for?
I'm guessing that this is for building a DLL on Windows where "gcc"
means mingw. I've been tinkering with porting it to Linux in order
to build a dynamic library (.so) on Linux which exports only the
symbols marked by "CRYPTOPP_DLL". There appears to be some bitrot,
for example the DLLSRCS variable in GNUmakefile seems to omit
some .cpp files that are necessary, possibly because those .cpp files
were added after this feature of the GNUmakefile was added.
Regards,
Zooko