Solaris 11 x86_64 crash (was: Please Test Crypto++ 5.6.4 Release Candidate)

31 views
Skip to first unread message

Jeffrey Walton

unread,
Sep 12, 2016, 5:32:59 AM9/12/16
to Andrew Marlow, Crypto++ Users
On Mon, Sep 12, 2016 at 5:02 AM, Andrew Marlow <marlow...@gmail.com> wrote:
> Many thanks for releasing version 564. I see there are lots of improvements
> :-)
>

Yeah, the Solaris folks got a lot of improvements. You guys deserve it
for enduring SunCC.

SunCC is a good compiler when its not fighting back or not crashing.
In the Crypto++ context I find it produces code slightly less than
GCC/ICPC, but better than Clang and MSC.

> I had a go testing it on Solaris 11 with Sun Studio 12.4 in 64 bit and have
> to report a few problems:
>
> 1) There could be more info in the Readme.txt to explain how to build it on
> solaris.

OK,for this, I think we need a wiki page. Wiki pages get indexed by
search engines, so users will find the information easier.

Wiki pages are on my TODO list. I need a 32-hour day....

> 2) I had to hack the makefile to remove use of the -pipe option, near where
> it says: Add -pipe for everything except ARM (allow ARM-64 because they
> seems to have > 1 GB of memory)

OK, so this is going to be tough but doable with tradeoffs. The
complete remediation would be to get VM stats in the makefile, but
that's not going to happen.

When I experienced the problem, I was surprised to learn a
dev-workstation with 8GB of AM and 100+ GB of storage did not have
enough virtual memory to compile a program. I added more virtual
memory to solve the problem; see "Verify there's enough memory and
storage to compile a file?", http://superuser.com/q/1098800 .

The tradeoff is to disable -pipe for Solaris. It may fix the problem
but it will surely slow down compiles. My question to you is: what is
it you want us to do? Do you want us to disable -pipe for Solaris?
Just SunCC? Maybe something else?

> 3) The test program core dumps when run with the v option.
> Testing MessageDigest algorithm SHA-384.
> ..signal BUS (invalid address alignment) in CryptoPP::SHA512::Transform at
> line 27 in file "sha.cpp"
> 27 #define blk0(i) (W[i] = data[i])
> (dbx) print data
> data = 0xffffffff7fffc1ec
> (dbx) print i
> dbx: "i" is not defined in the scope
> `cryptest.exe`sha.cpp`CryptoPP::SHA512::Transform(unsigned long long*,const
> unsigned long long*)`
> dbx: see `help scope' for details
> (dbx) where
> =>[1] CryptoPP::SHA512::Transform(state = 0x1010f1980, data =
> 0xffffffff7fffc1ec) (optimized), at 0x1006041a8 (line ~27) in "sha.cpp"
> [2] CryptoPP::IteratedHashWithStaticTransform<unsigned long
> long,CryptoPP::EnumToType<CryptoPP::ByteOrder,1>,128U,64U,CryptoPP::SHA384,48U,false>::HashEndianCorrectedBlock(this
> = 0x1010f18d0, data = 0xffffffff7fffc1ec) (optimized), at 0x1004b30b8 (line
> ~89) in "iterhash.h"
> [3] CryptoPP::IteratedHashBase<unsigned long
> long,CryptoPP::HashTransformation>::HashMultipleBlocks(this = 0x1010f18d0,
> input = 0xffffffff7fffc1ec, length = <value unavailable>) (optimized), at
> 0x1005cff6c (line ~91) in "iterhash.cpp"

OK, this is good stuff here. I can't duplicate in my modest test
environment, but its obvious 'data = 0xffffffff7fffc1ec' is only
aligned to 2-bytes, while you likely need 8-byte or 16-byte alignment
due to SSE2.

Here's the first - and only - thing you should do at the moment:

gmake distclean
cp config.recommend config.h

gmake -j 4 ...

./cryptest.exe v
./cryptest.exe tv all

That should isolate it to the known undefined behavior we are
[currently] carrying around. If it fixes the issue, then problem
solved until we can make config.recommend the default (Crypto++ 5.7
when it arrives).

The hairier result is, it does not fix the problem. In this case, we
will need to investigate why the caller is not using
OptimalDataAlignment(). Also see
https://www.cryptopp.com/docs/ref/class_s_h_a3.html.

Jeff

Jeffrey Walton

unread,
Sep 12, 2016, 6:17:54 AM9/12/16
to Andrew Marlow, Crypto++ Users
> ...

Here's more of the back story...

> That should isolate it to the known undefined behavior we are
> [currently] carrying around. If it fixes the issue, then problem
> solved until we can make config.recommend the default (Crypto++ 5.7
> when it arrives).

We wanted to cut-over to config.recommend for regular users; but
withhold the cut-over for distros like Debian and Ubuntu. I inquired
how to detect a package build so we could supply the different
configuration on one of the Debian mailing lists. I got scolded by the
Debian admin for wanting to do such a thing.

So everyone gets the backwards compatible configuration, and users who
want to avoid undefined behavior (like the unaligned data accesses you
are witnessing) must do something special. That has never sat well
with me, but we can't risk breaking millions of distro users.

> The hairier result is, it does not fix the problem. In this case, we
> will need to investigate why the caller is not using
> OptimalDataAlignment(). Also see
> https://www.cryptopp.com/docs/ref/class_s_h_a3.html.

If you want to trace what's going on, then OptimalDataAlignment()
eventually references this piece of goodness
(http://github.com/weidai11/cryptopp/blob/master/misc.h#L871):

template <class T>
inline unsigned int GetAlignmentOf(T *dummy=NULL) // VC60 workaround
{
// GCC 4.6 (circa 2008) and above aggressively uses vectorization.
#if defined(CRYPTOPP_ALLOW_UNALIGNED_DATA_ACCESS)
if (sizeof(T) < 16)
return 1;
#endif
CRYPTOPP_UNUSED(dummy);
#if defined(CRYPTOPP_CXX11_ALIGNOF)
return alignof(T);
#elif (_MSC_VER >= 1300)
return __alignof(T);
#elif defined(__GNUC__)
return __alignof__(T);
#elif CRYPTOPP_BOOL_SLOW_WORD64
return UnsignedMin(4U, sizeof(T));
#else
return sizeof(T);
#endif
}

One of the things config.recommend does is squash
CRYPTOPP_ALLOW_UNALIGNED_DATA_ACCESS.

GetAlignmentOf coupled with CRYPTOPP_ALLOW_UNALIGNED_DATA_ACCESS used
to cause me so much aggravation... its the cause of the failure, but
it never shows up in the back traces. The damage was done long before
the faulting function gets fingered in a bask trace.

Now I can spot the troubles it causes from a mile away. I've
experienced them on nearly every platform, from i686 and x86_64 to
MIPS and ARM. As soon as I see a SIGBUS or alignment issue, I jump
over to config.recommend to isolate it.

Jeff

Jeffrey Walton

unread,
Sep 12, 2016, 7:01:12 AM9/12/16
to Andrew Marlow, Crypto++ Users
>> 1) There could be more info in the Readme.txt to explain how to build it on
>> solaris.
>
> OK,for this, I think we need a wiki page. Wiki pages get indexed by
> search engines, so users will find the information easier.
>
> Wiki pages are on my TODO list. I need a 32-hour day....

I started working on this:
http://cryptopp.com/wiki/Solaris_(Command_Line). It should be ready in
the next day or to.

We really needed a wiki article because there's a lot more to Solaris
than a couple of blurbs in the README. When the wiki article is
mature, I will add information to the README leading readers to the
wiki.

Jeff

Andrew Marlow

unread,
Sep 12, 2016, 10:49:10 AM9/12/16
to Crypto++ Users, marlow...@gmail.com, nolo...@gmail.com


On Monday, 12 September 2016 10:32:59 UTC+1, Jeffrey Walton wrote:
On Mon, Sep 12, 2016 at 5:02 AM, Andrew Marlow <marlow...@gmail.com> wrote:
> Many thanks for releasing version 564. I see there are lots of improvements
> :-)
>

Yeah, the Solaris folks got a lot of improvements. You guys deserve it
for enduring SunCC.

Gee, thanks! It's not my choice of compiler. The compiler is forced on us. We are using another third party library where the vendor does not support GCC, only the Sun compiler.

[snip]
 
> 3) The test program core dumps when run with the v option.
> Testing MessageDigest algorithm SHA-384.
> ..signal BUS (invalid address alignment) in CryptoPP::SHA512::Transform at
> line 27 in file "sha.cpp"
 [stack trace snipped]
OK, this is good stuff here. I can't duplicate in my modest test
environment, but its obvious 'data = 0xffffffff7fffc1ec' is only
aligned to 2-bytes, while you likely need 8-byte or 16-byte alignment
due to SSE2.

Here's the first - and only - thing you should do at the moment:

    gmake distclean
    cp config.recommend config.h
 
    gmake -j 4 ...

    ./cryptest.exe v

I just tried that. It broke in the same place in the same way.
 
    ./cryptest.exe tv all

That should isolate it to the known undefined behavior we are
[currently] carrying around. If it fixes the issue, then problem
solved until we can make config.recommend the default (Crypto++ 5.7
when it arrives).

The hairier result is, it does not fix the problem. In this case, we
will need to investigate why the caller is not using
OptimalDataAlignment(). Also see
https://www.cryptopp.com/docs/ref/class_s_h_a3.html.

I am out of my depth here. I see that for the failing test SHA3 is being used and that OptimalDataAlignment returns GetAlignmentOf<word64>();. It looks like this returns sizeof(word64). I can't see what's wrong....
 
Jeff

Jeffrey Walton

unread,
Sep 12, 2016, 2:53:10 PM9/12/16
to Andrew Marlow, Crypto++ Users
>> > 3) The test program core dumps when run with the v option.
>> > Testing MessageDigest algorithm SHA-384.
>> > ..signal BUS (invalid address alignment) in CryptoPP::SHA512::Transform
>> > at
>> > line 27 in file "sha.cpp"
>
> [stack trace snipped]
>>
>> OK, this is good stuff here. I can't duplicate in my modest test
>> environment, but its obvious 'data = 0xffffffff7fffc1ec' is only
>> aligned to 2-bytes, while you likely need 8-byte or 16-byte alignment
>> due to SSE2.
>>
>> Here's the first - and only - thing you should do at the moment:
>>
>> gmake distclean
>> cp config.recommend config.h
>>
>>
>> gmake -j 4 ...
>>
>> ./cryptest.exe v
>
>
> I just tried that. It broke in the same place in the same way.

Back story... Crypto++ 5.6.3 and below only provided C/C++
implementation, and it ran kind of slow (relative to other Intel
platforms). Effectively the library ran with CRYPTOPP_DISABLE_ASM.

Sun/Oracle began supporting GCC-style inline assembly at Sun Studio
12 (cf., http://blogs.oracle.com/x86be/entry/gcc_style_asm_inlining_support).
Early testing on Intel platforms showed a lot of promise so I enabled
it.

Enabling the ASM brought in more maintenance for the library because
SunCC is fragile a times (q.v.), but I think its worth it.

(A) The next step in troubleshooting is to disable all ASM.

gmake distclean

CXXFLAGS="-DNDEBUG -g3 -xO2 -DCRYPTOPP_DISABLE_ASM"
CXXFLAGS="$CXXFLAGS" gmake -j 4

./cryptest.exe v

*if* that resolves the problem, then the next step is to disable ASM
in SHA only. That SHA has its own "disable ASM" macro should tell you
how much trouble its given me on Intel platforms. There are a handful
of other offenders like SHA.

(B) Disable ASM in SHA

gmake distclean

CXXFLAGS="-DNDEBUG -g3 -xO2 -DCRYPTOPP_DISABLE_SHA_ASM"
CXXFLAGS="$CXXFLAGS" gmake -j 4

./cryptest.exe v

I've got a feeling one of these two will squash the problem for you. I
don't know why you need them when I don't. I suspect its due to
different versions of the compiler. I suspect you likely use the
latest/patched version while I have the free version without updates.

I don't recall if you stated it, but does G++ have the same issue? I
suspect not. If you don't know and are curious, then you can:

gmake distclean

CXX=/bin/g++ gmake -j 4

./cryptest.exe v

Jeff

Andrew Marlow

unread,
Sep 13, 2016, 3:20:25 AM9/13/16
to Crypto++ Users, marlow...@gmail.com, nolo...@gmail.com
On Monday, 12 September 2016 19:53:10 UTC+1, Jeffrey Walton wrote:
>> > 3) The test program core dumps when run with the v option.
>> > Testing MessageDigest algorithm SHA-384.
>> > ..signal BUS (invalid address alignment) in CryptoPP::SHA512::Transform
>> > at
>> > line 27 in file "sha.cpp"
>
>  [stack trace snipped]
[snip]
Enabling the ASM brought in more maintenance for the library because
SunCC is fragile a times (q.v.), but I think its worth it.

(A) The next step in troubleshooting is to disable all ASM.

    gmake distclean
    CXXFLAGS="-DNDEBUG -g3 -xO2 -DCRYPTOPP_DISABLE_ASM"
    CXXFLAGS="$CXXFLAGS" gmake -j 4
    ./cryptest.exe v

This did not resolve the problem.
 

*if* that resolves the problem, then the next step is to disable ASM
in SHA only. That SHA has its own "disable ASM" macro should tell you
how much trouble its given me on Intel platforms. There are a handful
of other offenders like SHA.

(B) Disable ASM in SHA

    gmake distclean
    CXXFLAGS="-DNDEBUG -g3 -xO2 -DCRYPTOPP_DISABLE_SHA_ASM"
    CXXFLAGS="$CXXFLAGS" gmake -j 4
    ./cryptest.exe v

I tried this flag also. Same result, core dumps in the same place.
 

I've got a feeling one of these two will squash the problem for you. I
don't know why you need them when I don't. I suspect its due to
different versions of the compiler. I suspect you likely use the
latest/patched version while I have the free version without updates.

I think I am also using the free one on Solaris 11 but we get the same problems on Solaris 10 where I think it is the licensed version.
 

I don't recall if you stated it, but does G++ have the same issue? I
suspect not.

There are no problems with GCC on RHEL 5.11 or 6.8. We only get the problems on Solaris 10 and 11 with the solaris compiler. I tried building with GCC, even though we can't use it on solaris for our project, just to see what would happen. Unfortunately, I got compilation errors.

g++ -DNDEBUG -g2 -O2 -fPIC -pipe -c pkcspad.cpp
cryptlib.cpp:51:95: error: init_priority attribute is not supported on this platform
 const std::string DEFAULT_CHANNEL __attribute__ ((init_priority (CRYPTOPP_INIT_PRIORITY + 25))) = "";
                                                                                               ^
cryptlib.cpp:52:91: error: init_priority attribute is not supported on this platform
 const std::string AAD_CHANNEL __attribute__ ((init_priority (CRYPTOPP_INIT_PRIORITY + 26))) = "AAD";
                                                                                           ^
cryptlib.cpp:76:120: error: init_priority attribute is not supported on this platform
 const simple_ptr<NullNameValuePairs> s_pNullNameValuePairs __attribute__ ((init_priority (CRYPTOPP_INIT_PRIORITY + 30))) = new NullNameValuePairs;
                                                                                                                        ^
g++ -DNDEBUG -g2 -O2 -fPIC -pipe -c cmac.cpp
g++ -DNDEBUG -g2 -O2 -fPIC -pipe -c gf256.cpp
g++ -DNDEBUG -g2 -O2 -fPIC -pipe -c xtrcrypt.cpp
g++ -DNDEBUG -g2 -O2 -fPIC -pipe -c queue.cpp
g++ -DNDEBUG -g2 -O2 -fPIC -pipe -c mars.cpp
g++ -DNDEBUG -g2 -O2 -fPIC -pipe -c rc5.cpp
default.cpp: In constructor CryptoPP::DefaultEncryptorWithMAC::DefaultEncryptorWithMAC(const char*, CryptoPP::BufferedTransformation*):
default.cpp:220:39: warning: DefaultEncryptor is deprecated (declared at default.h:29): DefaultEncryptor will be changing in the near future because the algorithms are no longer secure [-Wdeprecated-declarations]
  SetFilter(new HashFilter(*m_mac, new DefaultEncryptor(passphrase), true));

This is with g++ version 4.8.2. I know it's an old version, but that's what it available without going off and building gcc from source.


Jeffrey Walton

unread,
Sep 13, 2016, 7:04:35 AM9/13/16
to Andrew Marlow, Crypto++ Users
>> >> > 3) The test program core dumps when run with the v option.
>> >> > Testing MessageDigest algorithm SHA-384.
>> >> > ..signal BUS (invalid address alignment) in
>> >> > CryptoPP::SHA512::Transform
>> >> > at
>> >> > line 27 in file "sha.cpp"
>> >
>> > [stack trace snipped]
>> [snip]
>>
>> Enabling the ASM brought in more maintenance for the library because
>> SunCC is fragile a times (q.v.), but I think its worth it.
>>
>> (A) The next step in troubleshooting is to disable all ASM.
>>
>> gmake distclean
>> CXXFLAGS="-DNDEBUG -g3 -xO2 -DCRYPTOPP_DISABLE_ASM"
>> CXXFLAGS="$CXXFLAGS" gmake -j 4
>> ./cryptest.exe v
>
> This did not resolve the problem.

That's a _really_ bad sign. We are quickly approaching the "compiler
bug" wall since you just tried to use a straight C/C++ implementation.
If its not a compiler bug, then it could be a hardware issue. But my
money is on a compiler bug. The compiler is aligning to 2-bytes and we
are not doing anything special that needs the alignment increased.

One last thing to try... Use the Crypto++ 5.6.2 makefile to build the
latest sources. This uses the older CXXFLAGS from 5.6.2; and not the
newer ones from 5.6.3.

I added instructions for using the 5.6.2 makefile to build latest
sources at https://cryptopp.com/wiki/Solaris_(Command_Line)#It_used_to_work.21.21.21
.

Jeff

Andrew Marlow

unread,
Sep 13, 2016, 10:17:24 AM9/13/16
to Crypto++ Users, marlow...@gmail.com, nolo...@gmail.com
On Tuesday, 13 September 2016 12:04:35 UTC+1, Jeffrey Walton wrote:
[snip]

> This did not resolve the problem.

That's a _really_ bad sign. We are quickly approaching the "compiler
bug" wall since you just tried to use a straight C/C++ implementation.

Looks that way. :-(
 
If its not a compiler bug, then it could be a hardware issue. But my
money is on a compiler bug.

So is mine.
 
The compiler is aligning to 2-bytes and we
are not doing anything special that needs the alignment increased.

One last thing to try... Use the Crypto++ 5.6.2 makefile to build the
latest sources. This uses the older CXXFLAGS from 5.6.2; and not the
newer ones from 5.6.3.


I tried just building 562 and that gives the same problem, same place.
Then I tried copying the 562 makefile to 564 but that gave a build error - No rule to make target 'bench.o'.
The compiler flags were: -DNDEBUG -O -g0 -native -template=no%extdef -m64 -c so I tried the 564 makefile with those options.
This gave me the pipe warning back but I ignored it. This gave the same error.

This is very unlikely to be a hardware problem because I see the same core dump on our solaris 10 machine as well an on the newer solaris 11 machine where I am working now.


 

Jeffrey Walton

unread,
Sep 13, 2016, 10:58:56 AM9/13/16
to Andrew Marlow, Crypto++ Users
On Tue, Sep 13, 2016 at 10:17 AM, Andrew Marlow <marlow...@gmail.com> wrote:
> On Tuesday, 13 September 2016 12:04:35 UTC+1, Jeffrey Walton wrote:
>>
>> [snip]
>> > This did not resolve the problem.
>>
>> That's a _really_ bad sign. We are quickly approaching the "compiler
>> bug" wall since you just tried to use a straight C/C++ implementation.
>
>
> Looks that way. :-(

Ok, I went through my lab book and I see some notes that may apply
from a few years ago... let's try one more thing...

Open sha.h and find the template definitions for SHA384 and SHA512.
The last expression is "(CRYPTOPP_BOOL_X86|CRYPTOPP_BOOL_X32)". That's
a template argument that controls 16-byte aligned/unaligned. Change it
to "true".

That should cause OptimalDataAlignment() to always return 16.

In the mean time, I'll look through sha.cpp looking for potential suspects.

Jeff

Andrew Marlow

unread,
Sep 13, 2016, 1:26:58 PM9/13/16
to Crypto++ Users, marlow...@gmail.com, nolo...@gmail.com
On Tuesday, 13 September 2016 15:58:56 UTC+1, Jeffrey Walton wrote:
[snip]
 
Ok, I went through my lab book and I see some notes that may apply
from a few years ago... let's try one more thing...

Open sha.h and find the template definitions for SHA384 and SHA512.
The last expression is "(CRYPTOPP_BOOL_X86|CRYPTOPP_BOOL_X32)". That's
a template argument that controls 16-byte aligned/unaligned. Change it
to "true".

That should cause OptimalDataAlignment() to always return 16.

Indeed. I tried it. Unfortunately it gave the same result. In desperation to see something working on Solaris I rebuilt 564 with gcc. I had to hack the source, ifdeffinf out HAVE_GCC_INIT_PRIORITY since my gcc doesnt have it. When it all built the tests all ran ok.
We cannot use the GNU compiler on solaris, even though it is there and cryptopp works with it. This is because we have to link with third party libraries that were compiled with the Sun compiler. The vendors will not provide gcc versions unless they are paid especially for it.
 

In the mean time, I'll look through sha.cpp looking for potential suspects.

Jeff

I hope you find something, it's beginning to look bleak for Solaris SPARC.

Reply all
Reply to author
Forward
0 new messages