that problem with comparing types across DLLs

106 views
Skip to first unread message

Zooko Wilcox-O'Hearn

unread,
May 22, 2009, 12:56:30 PM5/22/09
to Crypto++ Users
Folks:

I'm having this problem that was earlier described by Eric Hughes on
this list:

http://thread.gmane.org/gmane.comp.encryption.cryptopp/2305

Here is the buildbot (automated unit tester) which shows how the
problem manifests when I run my unit tests:

http://allmydata.org/buildbot-pycryptopp/builders/gutsy-syslib/builds/
20/steps/test/logs/stdio

I've built a dynamic library named 'ecdsa.so' which depends on
libcryptopp.so. When I try to use it I get this error:

terminate called after throwing an instance of
'CryptoPP::NameValuePairs::ValueTypeMismatch'
what(): NameValuePairs: type mismatch for 'InputBuffer', stored
'N8CryptoPP23ConstByteArrayParameterE', trying to retrieve
'N8CryptoPP23ConstByteArrayParameterE'

The problem as I currently understand it is that the
ConstByteArrayParameter class is defined in algparam.h, which is
(transitively) #include'd by my code, and so when I build my
ecdsa.so, any RTTI code in ecdsa.so (for example the typeid operator)
will identify the copy of ConstByteArrayParameter that was built into
ecdsa.so. If at run-time I want to compare type information, such as
for the named-argument features which Crypto++ uses, then if one of
the arguments came from libcryptopp.so and the other came from
ecdsa.so, or equivalently if libcryptopp.so is throwing an exception
of a given type and ecdsa.so is catching exceptions of that type,
then the types will not compare equal and the program will fail.

I've spent about a day learning about dynamic linking and visibility
and so forth, and I don't see a clean way to fix this without
changing the Crypto++ source code. There is an unclean way to fix
it, which is to set the RTLD_GLOBAL flag before dlopening ecdsa.so.
This was offered as a solution by Geoff Beier:

http://allmydata.org/pipermail/tahoe-dev/2009-February/001153.html
http://www.opengroup.org/onlinepubs/009695399/functions/dlopen.html

I think what this does is makes it so that symbols defined in
ecdsa.so when it is loaded are used to resolve libcryptopp.so's needs
for symbols. This happens to fix the two known problems that my
project currently encounters, but it seems fragile. For example, if
ecdsa.so is the one raising the exception and libcryptopp.so is the
one trying to catch it, this won't work, will it? Also if some
program has unluckily loaded libcryptopp.so before it loads my
ecdsa.so, then it will be too late for my RTLD_GLOBAL to take effect.

The only clean, long-term solution I can think of for the long term
is to change Crypto++ to remove definitions from header files so that
code like mine will get only undefined symbols by #include'ing Crypto+
+ header files. For example, if ConstByteArrayParameter were
declared as "class ConstByteArrayParameter;" in algparam.h and
defined only in some implementation file such as "algparam.cpp", then
the compiler and linked when building my ecdsa.so would know that it
didn't know the actual type of ConstByteArrayParameter and any RTTI
code would be compiled to use an indirect symbol that would not be
resolved until load time. Is that right?

Is there any easier solution for me? I'd like to build ecdsa.so that
it requires libcryptopp.so at load-time, and that ecdsa.so gets all
Crypto++ symbols resolved at load-time from libcryptopp.so's
definitions of those symbols.

Thanks!

Regards,

Zooko

Wei Dai

unread,
May 22, 2009, 6:08:49 PM5/22/09
to Crypto++ Users, Zooko Wilcox-O'Hearn
Zooko wrote:
> The only clean, long-term solution I can think of for the long term
> is to change Crypto++ to remove definitions from header files so that
> code like mine will get only undefined symbols by #include'ing Crypto+
> + header files. For example, if ConstByteArrayParameter were
> declared as "class ConstByteArrayParameter;" in algparam.h and
> defined only in some implementation file such as "algparam.cpp", then
> the compiler and linked when building my ecdsa.so would know that it
> didn't know the actual type of ConstByteArrayParameter and any RTTI
> code would be compiled to use an indirect symbol that would not be
> resolved until load time. Is that right?

I'm not sure I understand what you're suggesting. If algparam.h only
contains "class ConstByteArrayParameter;", how do users call its members?

Is this still a Mac specific problem, BTW?

Zooko Wilcox-O'Hearn

unread,
May 22, 2009, 6:48:36 PM5/22/09
to Wei Dai, Crypto++ Users
On May 22, 2009, at 16:08 PM, Wei Dai wrote:

> I'm not sure I understand what you're suggesting. If algparam.h
> only contains "class ConstByteArrayParameter;", how do users call
> its members?

Oh, I'm sorry. I was vaguely thinking of the way in C that you can
separate definitions from declarations, but as my knowledge of C++
slowly and painfully swaps back into my head, I remember that it
isn't so simple.

How about this question: is there a way in C++ to say that the
following identifier denotes something which *must* not be treated as
defined in the current compilation module? This is what "extern"
does in C, but "extern" doesn't apply to classes.

> Is this still a Mac specific problem, BTW?

No, it seems to be general to gcc. It is blocking me from getting
Tahoe-LAFS into Debian and Fedora. It also strikes on Mac.

Regards,

Zooko

Zooko Wilcox-O'Hearn

unread,
May 23, 2009, 11:03:29 PM5/23/09
to Zooko Wilcox-O'Hearn, Wei Dai, Crypto++ Users
Dear Wei Dai et al.:

It is as I feared -- using the RTLD_GLOBAL flag for dlopen solves the
problem in question but leads to other problems.

My project produces four different .so files, each of which is built
by including some of the .o files from Crypto++. If I set
RTLD_GLOBAL then if more than one of these .so files is loaded into
the same process, the second and later ones to be loaded have
something messed up which quickly leads to a crash. Attached is the
output of valgrind showing the details of one such crash. (This is
all 100% reproducible using the pycryptopp unit test suite.)

What I really want is for there to exist some way in C++ that you can
express the following alternatives:

1. For a given symbol, for example the type_info of an exception
class, then any code which #included that symbol, when loaded at run-
time, will get the same unique value so that throw and catch and name-
based arguments will work between any pair of DSOs.

2. For other symbols, any code which #includes that code will get
its own separate address (symbol value) at run-time so that changes
made to the value stored in that address (symbol) by one DSO won't
affect other DSOs.

I *think* that this is the intent of __declspec(dllexport) and its
brethren.

However, not understanding how to make this work with g++, if it is
even possible, or if it is even a coherent thing to want, my next
step is to declare that you can't have more than one DSO which uses
Crypto++ code in your process, so I'll refactor my pycryptopp library
to build all of the four features (AES, SHA256, RSA, and ECDSA), as
well as upcoming features (XSalsa20, Tiger) in one DSO which is
linked by including .o files from Crypto++. This will work as long
as nobody tries to use my DSO along with another DSO which also uses
Crypto++.

Oh, there's another alternative open to me -- make the pycryptopp
build system build a custom Crypto++ DSO (or maybe just the standard
Crypto++ DSO as specified by the Crypto++ GNUmakefile), and then
build four DSOs each of which dynamically links to that shared DSO
and continue to use RTLD_GLOBAL. This is actually already shown to
work for our current unit test suite, but I don't trust it because I
don't understand why RTLD_GLOBAL causes these crashes in other
situations.

Regards,

Zooko
---
Tahoe, the Least-Authority Filesystem -- http://allmydata.org
store your data: $10/month -- http://allmydata.com/?tracking=zsig
I am available for work -- http://zooko.com/résumé.html

crash.txt

Zooko Wilcox-O'Hearn

unread,
May 24, 2009, 9:28:29 AM5/24/09
to Zooko Wilcox-O'Hearn, Wei Dai, Crypto++ Users
[following-up to my own post]

On May 23, 2009, at 21:03 PM, Zooko Wilcox-O'Hearn wrote:

> my next step is to declare that you can't have more than one DSO
> which uses Crypto++ code in your process, so I'll refactor my
> pycryptopp library to build all of the four features (AES, SHA256,
> RSA, and ECDSA), as well as upcoming features (XSalsa20, Tiger) in
> one DSO which is linked by including .o files from Crypto++. This
> will work as long as nobody tries to use my DSO along with another
> DSO which also uses Crypto++.

This was imprecise. There are two known problems. One is if you do
*not* turn on the RTLD_GLOBAL flag for dlopen(), and you try to pass
a type_info between DSO's, such as by throwing an exception from
libcryptopp.so and catching that exception in rsa.so, or such as by
using the named-arguments feature. (I'm not sure precisely how that
latter one results in type_info crossing a DSO boundary, but
apparently it does.)

This is the problem that Eric Hughes reported two and a half years
ago [1] and that I started trying to solve a week ago [2].

The other is if you *do* turn on the RTLD_GLOBAL flag for dlopen(),
and you try to load multiple DSOs which use symbols by the same name
(because they each separately #included those symbols from Crypto++
header files), but which are supposed to be private to the DSO. This
is the second sort of failure that I reported yesterday along with a
stack trace from valgrind: [3].

So, if I go the first route, leaving RTLD_GLOBAL off and packing
together all my crypto functionality into one DSO, then probably no
harm will result because exceptions and named-arguments are not part
of the API of my modules, therefore presumably nobody will ever try
to catch exceptions thrown from my DSO.

The sticking point here is that Debian and Fedora have a policy that
any code which uses a library *must* be linked against the system-
provided shared library. The Tahoe-LAFS project, if it is to be
included in Debian and Fedora, is not allowed to build its own copy
of Crypto++ internally -- it is required to re-use the system-
provided shared library of Crypto++.

Hm. I'm not sure, but I think that means I will have to implement
*both* of these workarounds. I'll have to turn on RTLD_GLOBAL so
that I can link against the system-provided libcryptopp.so on those
two operating systems, and I'll also have to bundle my crypto code
together into a single DSO in order to avoid the symbol collisions
caused by turning on RTLD_GLOBAL.

Sigh. I really feel like there must be a general solution to this.
I suspect that the __cdecl(dllexport) machinery that is already baked
into Crypto++ for building DSOs on Windows (DLLs) could probably be
used to solve my problem if only I understood it better. See also
http://gcc.gnu.org/wiki/Visibility .

Thanks!

Regards,

Zooko

[1] http://thread.gmane.org/gmane.comp.encryption.cryptopp/2305
[2] http://groups.google.com/group/cryptopp-users/browse_thread/
thread/eb815f228db50380
[3] http://groups.google.com/group/cryptopp-users/msg/1a5553410c6976e5

Wei Dai

unread,
May 24, 2009, 11:29:02 AM5/24/09
to zo...@zooko.com, Crypto++ Users
Zooko, have you tried asking for advice on other mailing lists? If this isn't a Mac-only problem, many people must have been bitten by it. What did they do? You can also try asking for help from the Debian package maintainer for Crypto++. I know he subscribes to this mailing list, but maybe he hasn't noticed this thread because the subject says "DLLs" which is usually a Windows term?



Hotmail® has ever-growing storage! Don’t worry about storage limits. Check it out.

Jeffrey Walton

unread,
May 24, 2009, 12:18:32 PM5/24/09
to Crypto++ Users
Hi Zooko,

> I suspect that the __cdecl(dllexport) machinery that is already baked
> into Crypto++ for building DSOs on Windows (DLLs) could probably be

> used to solve my problem....
declspec(dllexport) is used to export variables, functions, and
classes [1]. In C++, the functions are exported with mangled names, so
they are usually accompanied by 'extern C'. (Also of interest might be
'Using dllimport and dllexport in C++ Classes' [2].)

Richter gives the subject a very nice treatment in 'Programming
Application for Microsoft Windows' and its successor 'Windows via
C/C++'. If the topic were covered by W. Richard Stevens, the Unix
programming series would be a great reference. Unfortunately SO's were
not around when the books were written.

[1] http://msdn.microsoft.com/en-us/library/3y1sfaz2.aspx
[2] http://msdn.microsoft.com/en-us/library/81h27t8c.aspx

Zooko Wilcox-O'Hearn

unread,
May 24, 2009, 7:31:27 PM5/24/09
to Wei Dai, Crypto++ Users
On May 24, 2009, at 9:29 AM, Wei Dai wrote:

> Zooko, have you tried asking for advice on other mailing lists?

Okay, some googling about showed me a mailing list that is likely to
help -- the Python cplusplus-sig list. Here is my summary of the
problem and the four possible solutions that I can think of:

http://mail.python.org/pipermail/cplusplus-sig/2009-May/014531.html

Please read it yourself in case doing so provides some flash of
insight that you can share with me.

Thank you,

Wei Dai

unread,
May 26, 2009, 7:22:37 AM5/26/09
to Zooko Wilcox-O'Hearn, Crypto++ Users
Zooko, do Debian and Fedora include Crypto++ in static library form (.a
file) in addition to shared library (.so) form? If so, can you link to it
instead? If not, can we petition for them to include it?

Unfortunately I have little knowledge of how shared libraries work on
Linux/Mac. Crypto++ includes many cryptographic algorithms, and any
application will only use a fraction of them. A static library seems to make
more sense as it allows the linker to discard the unused code. I didn't
originally intend to support Crypto++ as a shared library, and that's why
the makefile doesn't include an option to compile to shared library.
(Crypto++ is supported as a Windows DLL, but that was forced on me by a FIPS
requirement to have clear "cryptographic boundaries".)

Maybe there is a case to be made for supporting Crypto++ as a shared
library, but I haven't heard it yet. The distributions that included
Crypto++ as a shared library did so without consulting me...

--------------------------------------------------
From: "Zooko Wilcox-O'Hearn" <zo...@zooko.com>
Sent: Sunday, May 24, 2009 4:31 PM
To: "Wei Dai" <wei...@weidai.com>
Cc: "Crypto++ Users" <cryptop...@googlegroups.com>
Subject: Re: that problem with comparing types across DLLs

Zooko Wilcox-O'Hearn

unread,
May 26, 2009, 11:03:07 AM5/26/09
to Wei Dai, Crypto++ Users
Dear Wei Dai, et al.:

By the way, who added the features to GNUmakefile to build a "dll"
using gcc? And what is it for?

I'm guessing that this is for building a DLL on Windows where "gcc"
means mingw. I've been tinkering with porting it to Linux in order
to build a dynamic library (.so) on Linux which exports only the
symbols marked by "CRYPTOPP_DLL". There appears to be some bitrot,
for example the DLLSRCS variable in GNUmakefile seems to omit
some .cpp files that are necessary, possibly because those .cpp files
were added after this feature of the GNUmakefile was added.

Regards,

Zooko

Wei Dai

unread,
May 26, 2009, 4:57:35 PM5/26/09
to Zooko Wilcox-O'Hearn, Crypto++ Users
That's the remains of a failed experiment several years ago to see if I
could build a Windows DLL using GCC/Cygwin. I ended up getting a lot of
linker errors and didn't pursue it further, but left the entries in the
GNUmakefile in case I or someone else wanted to try again later.

The problem with this approach is that the symbols marked CRYPTOPP_DLL are
only a part of Crypto++ that is FIPS-approved. If you want to use
non-FIPS-approved algorithms, such as curve25519, you'll also need to link
with a static library that includes the non-FIPS-approved parts of Crypto++.
It's probably easier to link with just the full static library than to get
this unconventional library structure accepted into Linux distributions.

If linking with the static library form of Crypto++ doesn't work, I think
your workaround option 2 (in your post to cplusplus-sig) is best. Perhaps if
you file a bug on RTLD_GLOBAL, it will be fixed by the time someone wants to
import two different python modules that link to libcryptopp.so.
http://gcc.gnu.org/faq.html#dso also says to use RTLD_GLOBAL, BTW. It also
says you have to use "-Wl,-E". Are you doing that already?

--------------------------------------------------
From: "Zooko Wilcox-O'Hearn" <zo...@zooko.com>
Sent: Tuesday, May 26, 2009 8:03 AM
To: "Wei Dai" <wei...@weidai.com>
Cc: "Crypto++ Users" <cryptop...@googlegroups.com>
Subject: Re: that problem with comparing types across DLLs

>
Reply all
Reply to author
Forward
0 new messages