Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

NSS 3.12.5: Error '-8023' ... how to track it down?

1,190 views
Skip to first unread message

Bernhard Thalmayr

unread,
Jan 11, 2011, 11:36:40 AM1/11/11
to dev-tec...@lists.mozilla.org
Hi experts,

I have apache httpd running with a shared lib using NSS/NSPR.

The lib talks to an SSL enabled server using PR_WRITE. Occasionally
PR_WRITE returns error '-8023'.

OS is CentOS 5.5 64bit.
NSS: @(#)NSS 3.12.5.0 Aug 3 2010 17:15:02
NSPR: @(#)NSPR 4.8.2 2010-08-03 17:13:30

I've tried 'ltrace' to trace the NSS/NSPR library calls but without
success ... 'ltrace' does not show anything.

AFAIH understood the source '-8023' could be
'SEC_ERROR_PKCS11_DEVICE_ERROR' or 'CKR_DEVICE_ERROR'.

There's no other PKCS11 module involved but the internal one.

How can I track down this issue?

TIA,
Bernhard

John Dennis

unread,
Jan 11, 2011, 12:08:20 PM1/11/11
to mozilla's crypto code discussion list, Bernhard Thalmayr

Install the debugging symbols and source code for nss and nspr

yum install nss-debuginfo nspr-debuginfo

Then run the process under gdb or attach gdb to a running process and
set a break point on PR_SetError, you will probably need to issue this
gdb command (or put it in a .gdbinit file):

set breakpoint pending on

You should break in the debugger with full source code when an error is
set, you can then look around to see why.

--
John Dennis <jde...@redhat.com>

Looking to carve out IT costs?
www.redhat.com/carveoutcosts/

Honza Bambas

unread,
Jan 11, 2011, 12:13:12 PM1/11/11
to dev-tec...@lists.mozilla.org
On 1/11/2011 5:36 PM, Bernhard Thalmayr wrote:
> Hi experts,
>
> I have apache httpd running with a shared lib using NSS/NSPR.
>
> The lib talks to an SSL enabled server using PR_WRITE. Occasionally
> PR_WRITE returns error '-8023'.
>
> OS is CentOS 5.5 64bit.
> NSS: @(#)NSS 3.12.5.0 Aug 3 2010 17:15:02
> NSPR: @(#)NSPR 4.8.2 2010-08-03 17:13:30
>
> I've tried 'ltrace' to trace the NSS/NSPR library calls but without
> success ... 'ltrace' does not show anything.
>
> AFAIH understood the source '-8023' could be
> 'SEC_ERROR_PKCS11_DEVICE_ERROR' or 'CKR_DEVICE_ERROR'.
>
> There's no other PKCS11 module involved but the internal one.
>
> How can I track down this issue?
>
> TIA,
> Bernhard
Some occurrences of this error might get logged by one of these functions:
http://mxr.mozilla.org/mozilla-central/source/security/nss/cmd/pk11mode/pk11mode.c#1603

Check the console and/or error output if you find something that could
lead us.

-hb-

Bernhard Thalmayr

unread,
Jan 11, 2011, 1:41:50 PM1/11/11
to mozilla's crypto code discussion list
Thanks for the hints Dennis,

unfortunately the server is running remotely ... inaccessible for debugging.

Do you know why 'ltrace' does not work when using '-p'?

Thanks,
Bernhard


On 01/11/2011 06:08 PM, John Dennis wrote:


> On 01/11/2011 11:36 AM, Bernhard Thalmayr wrote:
>> Hi experts,
>>
>> I have apache httpd running with a shared lib using NSS/NSPR.
>>
>> The lib talks to an SSL enabled server using PR_WRITE. Occasionally
>> PR_WRITE returns error '-8023'.
>>
>> OS is CentOS 5.5 64bit.
>> NSS: @(#)NSS 3.12.5.0 Aug 3 2010 17:15:02
>> NSPR: @(#)NSPR 4.8.2 2010-08-03 17:13:30
>>
>> I've tried 'ltrace' to trace the NSS/NSPR library calls but without
>> success ... 'ltrace' does not show anything.
>>
>> AFAIH understood the source '-8023' could be
>> 'SEC_ERROR_PKCS11_DEVICE_ERROR' or 'CKR_DEVICE_ERROR'.
>>
>> There's no other PKCS11 module involved but the internal one.
>>
>> How can I track down this issue?
>

Bernhard Thalmayr

unread,
Jan 11, 2011, 1:47:11 PM1/11/11
to mozilla's crypto code discussion list
On 01/11/2011 06:13 PM, Honza Bambas wrote:

> On 1/11/2011 5:36 PM, Bernhard Thalmayr wrote:
>> Hi experts,
>>
>> I have apache httpd running with a shared lib using NSS/NSPR.
>>
>> The lib talks to an SSL enabled server using PR_WRITE. Occasionally
>> PR_WRITE returns error '-8023'.
>>
>> OS is CentOS 5.5 64bit.
>> NSS: @(#)NSS 3.12.5.0 Aug 3 2010 17:15:02
>> NSPR: @(#)NSPR 4.8.2 2010-08-03 17:13:30
>>
>> I've tried 'ltrace' to trace the NSS/NSPR library calls but without
>> success ... 'ltrace' does not show anything.
>>
>> AFAIH understood the source '-8023' could be
>> 'SEC_ERROR_PKCS11_DEVICE_ERROR' or 'CKR_DEVICE_ERROR'.
>>
>> There's no other PKCS11 module involved but the internal one.
>>
>> How can I track down this issue?
>>
>> TIA,
>> Bernhard
> Some occurrences of this error might get logged by one of these functions:
> http://mxr.mozilla.org/mozilla-central/source/security/nss/cmd/pk11mode/pk11mode.c#1603

Thanks for the pointer Honza ....


>
>
> Check the console and/or error output if you find something that could

a private lib within Apache httpd is using NSS/NSRP ... nothing in any log.

The error is caught and printed in a custom log (using PR_GetError).

-Bernhard
> lead us.
>
> -hb-
>

Wan-Teh Chang

unread,
Jan 11, 2011, 2:28:48 PM1/11/11
to mozilla's crypto code discussion list
Hi Bernhard,

The best way to debug this is to find out where NSS's internal PKCS
#11 module sets the error code CKR_DEVICE_ERROR. Unfortunately there
are a lot of possible places (all the files in the
mozilla/security/nss/lib/softoken directory):
http://mxr.mozilla.org/security/ident?i=CKR_DEVICE_ERROR

The best way to track this down is to run your Apache httpd in a
debugger with debug symbols for NSS, or run your Apache httpd with NSS
shared libraries that you build from source code and insert printf
statements.

Wan-Teh

Bernhard Thalmayr

unread,
Jan 11, 2011, 3:51:08 PM1/11/11
to dev-tec...@lists.mozilla.org
Hi Wan-Teh,

thanks for your reply.

Will it be helpfull to use the 'PKCS #11 Module Logger' before starting
with 'printfs'?

I tried that and get at least some output in the specified log.

-Bernhard

Robert Relyea

unread,
Jan 11, 2011, 3:59:10 PM1/11/11
to mozilla's crypto code discussion list, Bernhard Thalmayr
On 01/11/2011 12:51 PM, Bernhard Thalmayr wrote:
> Hi Wan-Teh,
>
> thanks for your reply.
>
> Will it be helpfull to use the 'PKCS #11 Module Logger' before
> starting with 'printfs'?
>
> I tried that and get at least some output in the specified log.
>
> -Bernhard
>
yes, that will tell you which PKCS #11 call returned a CKR_DEVICE_ERROR.

bob

Bernhard Thalmayr

unread,
Jan 12, 2011, 4:26:58 PM1/12/11
to mozilla's crypto code discussion list
So here we go ... the PCKS#11 logger shows the following

331569088[1bd1610]: C_DigestUpdate
331569088[1bd1610]: hSession = 0x88
331569088[1bd1610]: pPart = 0x6e580a4
331569088[1bd1610]: ulPartLen = 70
331569088[1bd1610]: rv = CKR_OK
331569088[1bd1610]: C_GetMechanismInfo
331569088[1bd1610]: slotID = 0x1
331569088[1bd1610]: type = 0x132
331569088[1bd1610]: pInfo = 0x7fffcd592e70
331569088[1bd1610]: rv = CKR_OK
331569088[1bd1610]: C_UnwrapKey
331569088[1bd1610]: hSession = 0x6
331569088[1bd1610]: pMechanism = 0x7fffcd592ea0
331569088[1bd1610]: hUnwrappingKey = 0x8
331569088[1bd1610]: pWrappedKey = 0x36ac9bc
331569088[1bd1610]: ulWrappedKeyLen = 48
331569088[1bd1610]: pTemplate = 0x7fffcd592cf0
331569088[1bd1610]: ulAttributeCount = 6
331569088[1bd1610]: phKey = 0x36bfd58
331569088[1bd1610]: CKA_SIGN = CK_TRUE [1]
331569088[1bd1610]: CKA_VERIFY = CK_TRUE [1]
331569088[1bd1610]: CKA_CLASS = CKO_SECRET_KEY [8]
331569088[1bd1610]: CKA_KEY_TYPE = 0x10 [8]
331569088[1bd1610]: CKA_DERIVE = CK_TRUE [1]
331569088[1bd1610]: CKA_VALUE_LEN = 3000000000000000 [8]
331569088[1bd1610]: mechanism = CKM_DES3_ECB
331569088[1bd1610]: *phKey = 0x54
331569088[1bd1610]: rv = CKR_OK
331569088[1bd1610]: C_DeriveKey
331569088[1bd1610]: slotID = 0x1
331569088[1bd1610]: flags = 0x4
331569088[1bd1610]: pApplication = 0331569088331569088[1bd1610]:
Notify = 0x13231f333331569088[1bd1610]: phSession =
0x7fffc331569088[1bd1610]: phKey = 0x36c1618
331569088[1bd1610]: CKA_CLASS = CKO_SECRET_KEY [8]
331569088[1bd1610]: CKA_KEY_TYPE = 0x12 [8]
331569088[1bd1610]: *phSession = 0x132317f8
331569088[1bd1610]: rv = CKR_DEVICE_ERROR
331569088[1bd1610]: mechanism = CKM_TLS_KEY_AND_MAC_DERIVE
331569088[1bd1610]: C_DigestInit
331569088[1bd1610]: hSession = 0x1
331569088[1bd1610]: pMechanism = 0x7fffcd593290
331569088[1bd1610]: mechanism = CKM_MD5
331569088[1bd1610]: rv = CKR_DEVICE_ERROR
331569088[1bd1610]: C_OpenSession
331569088[1bd1610]: slotID = 0x1
331569088[1bd1610]: flags = 0x4
331569088[1bd1610]: pApplication = 0x4bdff70
331569088[1bd1610]: Notify = 0x13231f30
331569088[1bd1610]: phSession = 0x7fffcd593228
331569088[1bd1610]: *phSession = 0x132317f8
331569088[1bd1610]: rv = CKR_DEVICE_ERROR

Could someone please help me understand the trace?


I've also gathered an SSLTRACE log if needed.

TIA,
Bernhard

Nelson B Bolyard

unread,
Jan 12, 2011, 4:50:55 PM1/12/11
to mozilla's crypto code discussion list
Bernhard wrote:

> 331569088[1bd1610]: flags = 0x4
> 331569088[1bd1610]: pApplication = 0331569088331569088[1bd1610]:
> Notify = 0x13231f333331569088[1bd1610]: phSession =
> 0x7fffc331569088[1bd1610]: phKey = 0x36c1618
> 331569088[1bd1610]: CKA_CLASS = CKO_SECRET_KEY [8]

Was that a copy and paste error?
Or does the file really look like that?
Should look like this...

> 331569088[1bd1610]: pApplication = 0331569088
> 331569088[1bd1610]: Notify = 0x13231f333
> 331569088[1bd1610]: phSession = 0x7fffc
> 331569088[1bd1610]: phKey = 0x36c1618

Bernhard Thalmayr

unread,
Jan 12, 2011, 4:59:16 PM1/12/11
to dev-tec...@lists.mozilla.org
On 01/12/2011 10:50 PM, Nelson B Bolyard wrote:
> Bernhard wrote:
>
>> 331569088[1bd1610]: flags = 0x4
>> 331569088[1bd1610]: pApplication = 0331569088331569088[1bd1610]:
>> Notify = 0x13231f333331569088[1bd1610]: phSession =
>> 0x7fffc331569088[1bd1610]: phKey = 0x36c1618
>> 331569088[1bd1610]: CKA_CLASS = CKO_SECRET_KEY [8]
>
> Was that a copy and paste error?

No (at least I don't think so)


> Or does the file really look like that?

Yes

$grep 0331569088331569088 nspr.log2


331569088[1bd1610]: pApplication = 0331569088331569088[1bd1610]:
Notify = 0x13231f333331569088[1bd1610]: phSession =
0x7fffc331569088[1bd1610]: phKey = 0x36c1618

Bernhard Thalmayr

unread,
Jan 12, 2011, 5:02:27 PM1/12/11
to mozilla's crypto code discussion list
On 01/12/2011 10:50 PM, Nelson B Bolyard wrote:
> Bernhard wrote:
>
>> 331569088[1bd1610]: flags = 0x4
>> 331569088[1bd1610]: pApplication = 0331569088331569088[1bd1610]:
>> Notify = 0x13231f333331569088[1bd1610]: phSession =
>> 0x7fffc331569088[1bd1610]: phKey = 0x36c1618
>> 331569088[1bd1610]: CKA_CLASS = CKO_SECRET_KEY [8]
>
> Was that a copy and paste error?
> Or does the file really look like that?
> Should look like this...
>
>> 331569088[1bd1610]: pApplication = 0331569088
>> 331569088[1bd1610]: Notify = 0x13231f333
>> 331569088[1bd1610]: phSession = 0x7fffc
>> 331569088[1bd1610]: phKey = 0x36c1618


Am'I wright that 'C_DeriveKey' is actually 'NSC_DeriveKey' in
http://mxr.mozilla.org/security/source/security/nss/lib/softoken/pkcs11c.c ?

Wan-Teh Chang

unread,
Jan 12, 2011, 5:11:44 PM1/12/11
to mozilla's crypto code discussion list
On Wed, Jan 12, 2011 at 2:02 PM, Bernhard Thalmayr
<bernhard...@painstakingminds.com> wrote:
>
> Am'I wright that 'C_DeriveKey' is actually 'NSC_DeriveKey' in
> http://mxr.mozilla.org/security/source/security/nss/lib/softoken/pkcs11c.c ?

Yes. C_DeriveKey is a function pointer. It points to the
NSC_DeriveKey function.

Wan-Teh

Robert Relyea

unread,
Jan 12, 2011, 5:38:54 PM1/12/11
to dev-tec...@lists.mozilla.org
On 01/12/2011 01:26 PM, Bernhard Thalmayr wrote:
> So here we go ... the PCKS#11 logger shows the following
>
> 331569088[1bd1610]: C_DigestUpdate
> 331569088[1bd1610]: hSession = 0x88
> 331569088[1bd1610]: pPart = 0x6e580a4
> 331569088[1bd1610]: ulPartLen = 70
> 331569088[1bd1610]: rv = CKR_OK

This says you called C_DigestUpdate with a session handle of 0x88 and to
digest 70 bytes of data at memory location 0x63580a4. The function
returned CKR_OK.

The rest are pretty similar, I'll add additional notes as appropriate....


> 331569088[1bd1610]: C_GetMechanismInfo
> 331569088[1bd1610]: slotID = 0x1
> 331569088[1bd1610]: type = 0x132
> 331569088[1bd1610]: pInfo = 0x7fffcd592e70
> 331569088[1bd1610]: rv = CKR_OK

From this, it appears you are using the internal crypto slot rather than
the internal key/certificate store slot. This would be expected if you
are doing SSL, which is what appears to be happening here.


> 331569088[1bd1610]: C_UnwrapKey
> 331569088[1bd1610]: hSession = 0x6
> 331569088[1bd1610]: pMechanism = 0x7fffcd592ea0
> 331569088[1bd1610]: hUnwrappingKey = 0x8
> 331569088[1bd1610]: pWrappedKey = 0x36ac9bc
> 331569088[1bd1610]: ulWrappedKeyLen = 48
> 331569088[1bd1610]: pTemplate = 0x7fffcd592cf0
> 331569088[1bd1610]: ulAttributeCount = 6
> 331569088[1bd1610]: phKey = 0x36bfd58
> 331569088[1bd1610]: CKA_SIGN = CK_TRUE [1]
> 331569088[1bd1610]: CKA_VERIFY = CK_TRUE [1]

> 331569088[1bd1610]: CKA_CLASS = CKO_SECRET_KEY [8]

> 331569088[1bd1610]: CKA_KEY_TYPE = 0x10 [8]
> 331569088[1bd1610]: CKA_DERIVE = CK_TRUE [1]
> 331569088[1bd1610]: CKA_VALUE_LEN = 3000000000000000 [8]
> 331569088[1bd1610]: mechanism = CKM_DES3_ECB
> 331569088[1bd1610]: *phKey = 0x54

The extra data here is dumping out interesting values inside the
parameters passed to C_UnwrapKey. The list of CKA_XXXX = XXXXX are
dumping out the template sent in pTemplate. CKA_VALUE_LEN looks wrong to
me, I don't know if tehre is some byteswap issue here.

mechansim CKM_DES3_ECB means you are using DES to do unwrap, so this
appears to be a restart operation, rather than a full handshake.


> 331569088[1bd1610]: rv = CKR_OK
> 331569088[1bd1610]: C_DeriveKey
> 331569088[1bd1610]: slotID = 0x1

> 331569088[1bd1610]: flags = 0x4

> 331569088[1bd1610]: pApplication = 0331569088
> 331569088[1bd1610]: Notify = 0x13231f333
> 331569088[1bd1610]: phSession = 0x7fffc
> 331569088[1bd1610]: phKey = 0x36c1618

> 331569088[1bd1610]: CKA_CLASS = CKO_SECRET_KEY [8]

> 331569088[1bd1610]: CKA_KEY_TYPE = 0x12 [8]
> 331569088[1bd1610]: *phSession = 0x132317f8
> 331569088[1bd1610]: rv = CKR_DEVICE_ERROR
> 331569088[1bd1610]: mechanism = CKM_TLS_KEY_AND_MAC_DERIVE

Here you are deriving the SSL session keys. A session handle of
0x132317f8 is pretty large, but not unexpected in a server that has been
running a while.

CKR_DEVICE_ERROR can happen in a CKM_TLS_KEY_AND_MAC_DERIVE in a couple
of pathological cases:

1) if the underlying object structure is hosed.
2) if the key object looks bogus (a token Object that isn't known or
stored, for instance).

This object should be a token object. Unfortunately it looks like the
logging program did not out put hKey (or it did, but we ran into some
overwrite do to threading... the missing new lines after the
pApplication, Nofiy, and phSession) seems to indicate that. (In fact
Notify should be pNotify, and that looks like part of a C_OpenSession()
call, not a C_DeriveKey()... which makes some of the conclusions we are
trying to draw questionable.

> 331569088[1bd1610]: C_DigestInit
> 331569088[1bd1610]: hSession = 0x1
> 331569088[1bd1610]: pMechanism = 0x7fffcd593290
> 331569088[1bd1610]: mechanism = CKM_MD5
> 331569088[1bd1610]: rv = CKR_DEVICE_ERROR

I can see no place where NSC_Digest can even return CKR_DEVICE_ERROR


> 331569088[1bd1610]: C_OpenSession
> 331569088[1bd1610]: slotID = 0x1

> 331569088[1bd1610]: flags = 0x4

> 331569088[1bd1610]: pApplication = 0x4bdff70
> 331569088[1bd1610]: Notify = 0x13231f30
> 331569088[1bd1610]: phSession = 0x7fffcd593228
> 331569088[1bd1610]: *phSession = 0x132317f8
> 331569088[1bd1610]: rv = CKR_DEVICE_ERROR

This looks like a normal C_OpenSession call...).


> OS is CentOS 5.5 64bit.

Arg, I just remembered. CentOS 5.5 will likely be following RHEL 5.5 in
having an NSS 3.11 softoken. I've just checked and NSC_DigestInit still
cannot return CKR_DEVICE_ERROR in NSS 3.11 -- unless you are running in
FIPS mode. This is possible, but seem extremely unlikely since you are
using slotid=1 (FIPS slot ID is 3). In FIPS mode, any call can fail with
CKR_DEVICE_ERROR if the token is in 'fatalError' mode. That happens if
any of the following fail:
1) the Random number generator.
2) the Seeding the random number generator.
3) generating a key pair fails with CKR_GENERAL_ERROR
4) power on self check fails at init time.
It can be reset by reinitializing the token.
If this is what is happening, you should be getting NSS messages in your
syslog. I don't think FIPS is your issue here looking at the logs, however.


bob

Bernhard Thalmayr

unread,
Jan 13, 2011, 5:53:54 AM1/13/11
to dev-tec...@lists.mozilla.org
Thanks a lot for the detailed explanation Robert - much appreciated.

Please see my comments in line, some stuff deleted for brevity.

On 01/12/2011 11:38 PM, Robert Relyea wrote:
--- snip ---


>> 331569088[1bd1610]: C_UnwrapKey 331569088[1bd1610]: hSession =
>> 0x6 331569088[1bd1610]: pMechanism = 0x7fffcd592ea0
>> 331569088[1bd1610]: hUnwrappingKey = 0x8 331569088[1bd1610]:
>> pWrappedKey = 0x36ac9bc 331569088[1bd1610]: ulWrappedKeyLen = 48
>> 331569088[1bd1610]: pTemplate = 0x7fffcd592cf0
>> 331569088[1bd1610]: ulAttributeCount = 6 331569088[1bd1610]:
>> phKey = 0x36bfd58 331569088[1bd1610]: CKA_SIGN = CK_TRUE [1]
>> 331569088[1bd1610]: CKA_VERIFY = CK_TRUE [1]
>> 331569088[1bd1610]: CKA_CLASS = CKO_SECRET_KEY [8]
>> 331569088[1bd1610]: CKA_KEY_TYPE = 0x10 [8] 331569088[1bd1610]:
>> CKA_DERIVE = CK_TRUE [1] 331569088[1bd1610]: CKA_VALUE_LEN =
>> 3000000000000000 [8] 331569088[1bd1610]: mechanism =
>> CKM_DES3_ECB 331569088[1bd1610]: *phKey = 0x54
> The extra data here is dumping out interesting values inside the
> parameters passed to C_UnwrapKey. The list of CKA_XXXX = XXXXX are
> dumping out the template sent in pTemplate. CKA_VALUE_LEN looks wrong
> to me, I don't know if tehre is some byteswap issue here.

I've checked nearly every C_UnwrapKey function trace in the log ... each
of them is showing 'CKA_VALUE_LEN = 3000000000000000 [8]'.
Every C_DeriveKey function trace shows 'CKA_VALUE_LEN = 1000000000000000
[8]'

>
> mechansim CKM_DES3_ECB means you are using DES to do unwrap, so this
> appears to be a restart operation, rather than a full handshake.

Looking at the SSLTrace file this seems to be the case. I don't know if
I can correlate the SSLTRACE file with the PKCS#11 log file, but it
seems the issue only appears in restart operations. Full handshakes
don't show the issue.

from SSLTRACE log
=================

10891: SSL[107778448]: sending client-hello
SSL: Lookup1: sid=0x36a9080
10891: SSL[107778448]: client, found session-id: [Len: 16]
20 4d 2d a6 3a 3c 67 1a 60 c8 e5 18 0d ea d4 e2 M-.:<g.`.......
10891: SSL3[107778448]: send client_hello handshake
10891: SSL3[107778448]: start handshake hashes
10891: SSL[107778448]: SecureSend: returning -1 count, error -8023
10891: SSL[107778448]: closing, rv=0 errno=-8023


10891: SSL[107778448]: sending client-hello
SSL: Lookup1: sid=0x36a9080
10891: SSL[107778448]: client, found session-id: [Len: 16]
20 4d 2d a6 3a 3c 67 1a 60 c8 e5 18 0d ea d4 e2 M-.:<g.`.......
10891: SSL3[107778448]: send client_hello handshake
10891: SSL3[107778448]: start handshake hashes
10891: SSL[107778448]: SecureSend: returning -1 count, error -8023
10891: SSL[107778448]: closing, rv=0 errno=-5966

This looks also interesting: PR_NO_ACCESS_RIGHTS_ERROR ??

It might be helpfull if SSLTRACE and PKCS#11 could log a timestamp to
help in correlation. Or is there already a way to correlate it?

>> 331569088[1bd1610]: rv = CKR_OK 331569088[1bd1610]: C_DeriveKey
>> 331569088[1bd1610]: slotID = 0x1 331569088[1bd1610]: flags =
>> 0x4 331569088[1bd1610]: pApplication = 0331569088
>> 331569088[1bd1610]: Notify = 0x13231f333 331569088[1bd1610]:
>> phSession = 0x7fffc 331569088[1bd1610]: phKey = 0x36c1618
>> 331569088[1bd1610]: CKA_CLASS = CKO_SECRET_KEY [8]
>> 331569088[1bd1610]: CKA_KEY_TYPE = 0x12 [8] 331569088[1bd1610]:
>> *phSession = 0x132317f8 331569088[1bd1610]: rv =
>> CKR_DEVICE_ERROR 331569088[1bd1610]: mechanism =
>> CKM_TLS_KEY_AND_MAC_DERIVE
> Here you are deriving the SSL session keys. A session handle of
> 0x132317f8 is pretty large, but not unexpected in a server that has
> been running a while.

Well the server is only running 22 minutes until the issue arises ...
might this be 'a while'?

During this time only 28 'send client_hello handshake' messages appear
in the SSLTRACE log. 19 of them show 'closing, rv=0 errno=0', 1
'closing, rv=0 errno=-5966', the others 'closing, rv=0 errno=-8023'.

So no much connection at all.


>
> CKR_DEVICE_ERROR can happen in a CKM_TLS_KEY_AND_MAC_DERIVE in a
> couple of pathological cases:
>
> 1) if the underlying object structure is hosed. 2) if the key object
> looks bogus (a token Object that isn't known or stored, for
> instance).
>
> This object should be a token object. Unfortunately it looks like
> the logging program did not out put hKey (or it did, but we ran into
> some overwrite do to threading... the missing new lines after the
> pApplication, Nofiy, and phSession) seems to indicate that. (In fact
> Notify should be pNotify, and that looks like part of a
> C_OpenSession() call, not a C_DeriveKey()... which makes some of the
> conclusions we are trying to draw questionable.
>
>> 331569088[1bd1610]: C_DigestInit 331569088[1bd1610]: hSession =
>> 0x1 331569088[1bd1610]: pMechanism = 0x7fffcd593290
>> 331569088[1bd1610]: mechanism = CKM_MD5 331569088[1bd1610]:
>> rv = CKR_DEVICE_ERROR
> I can see no place where NSC_Digest can even return CKR_DEVICE_ERROR

So I can not trust the log?

>> 331569088[1bd1610]: C_OpenSession 331569088[1bd1610]: slotID =
>> 0x1 331569088[1bd1610]: flags = 0x4 331569088[1bd1610]:
>> pApplication = 0x4bdff70 331569088[1bd1610]: Notify = 0x13231f30
>> 331569088[1bd1610]: phSession = 0x7fffcd593228
>> 331569088[1bd1610]: *phSession = 0x132317f8 331569088[1bd1610]:
>> rv = CKR_DEVICE_ERROR
> This looks like a normal C_OpenSession call...).
>
>
>> OS is CentOS 5.5 64bit.
>
> Arg, I just remembered. CentOS 5.5 will likely be following RHEL 5.5
> in having an NSS 3.11 softoken.

This is only the OS ... the library is not using NSS/NSPR from the OS.
It is using a bundled one.

strings libsoftokn3.so | grep -i header
$Header: NSS 3.12.4.5 (debug) Jan 10 2011 17:14:10


I've just checked and NSC_DigestInit
> still cannot return CKR_DEVICE_ERROR in NSS 3.11 -- unless you are
> running in FIPS mode. This is possible, but seem extremely unlikely
> since you are using slotid=1 (FIPS slot ID is 3). In FIPS mode, any
> call can fail with CKR_DEVICE_ERROR if the token is in 'fatalError'
> mode. That happens if any of the following fail: 1) the Random number
> generator. 2) the Seeding the random number generator. 3) generating
> a key pair fails with CKR_GENERAL_ERROR 4) power on self check fails
> at init time. It can be reset by reinitializing the token. If this is
> what is happening, you should be getting NSS messages in your syslog.
> I don't think FIPS is your issue here looking at the logs, however.

I don't think it's running in FIPS mode .. this has to be configured
AFAIK and noone did it.

What could be a next step to narrow it down? Give 3.12.9 a try?

TIA,
Bernhard

Bernhard Thalmayr

unread,
Jan 13, 2011, 1:46:22 PM1/13/11
to dev-tec...@lists.mozilla.org
Hi again,

today I a built a debug version of NSS 3.12.8 (as I haven't found 3.12.9
yet)

The issue is still there, but occours much later then with 3.12.5.

Server (with lib using NSS) ran about 1.5 hours before the issue
occoured. During this time 911 SSL connections have been done. The last
6 failed (and then I stopped).

It seems SSLTRACE log has been truncated ... is there a limit? (It shows
much less connections as the lib's log does).

PKCS#11 logger now seems to be ok, but shows 'CKR_DEVICE_ERRORS' for
other functions as for 3.12.5.

822564800[1581690]: C_OpenSession
822564800[1581690]: slotID = 0x1
822564800[1581690]: flags = 0x4
822564800[1581690]: pApplication = 0x1884150
822564800[1581690]: Notify = 0x30468315
822564800[1581690]: phSession = 0x7fffb77fd208
822564800[1581690]: *phSession = 0x1884150
822564800[1581690]: rv = CKR_DEVICE_ERROR
822564800[1581690]: C_DigestInit
822564800[1581690]: hSession = 0x1
822564800[1581690]: pMechanism = 0x7fffb77fd260
822564800[1581690]: mechanism = CKM_MD5
822564800[1581690]: rv = CKR_DEVICE_ERROR
822564800[1581690]: C_OpenSession
822564800[1581690]: slotID = 0x1
822564800[1581690]: flags = 0x4
822564800[1581690]: pApplication = 0x1884150
822564800[1581690]: Notify = 0x30468315
822564800[1581690]: phSession = 0x7fffb77fd208
822564800[1581690]: *phSession = 0x1884150
822564800[1581690]: rv = CKR_DEVICE_ERROR
822564800[1581690]: C_DigestInit
822564800[1581690]: hSession = 0x1
822564800[1581690]: pMechanism = 0x7fffb77fd260
822564800[1581690]: mechanism = CKM_SHA_1
822564800[1581690]: rv = CKR_DEVICE_ERROR
...
..

Once the 'CKR_DEVICE_ERROR' occoured it seems to be unrecoverable.

Could this be a file-system problem?

TIA,
Bernhard

Robert Relyea

unread,
Jan 13, 2011, 4:40:20 PM1/13/11
to mozilla's crypto code discussion list, Bernhard Thalmayr
On 01/13/2011 10:46 AM, Bernhard Thalmayr wrote:
> Hi again,
>
> today I a built a debug version of NSS 3.12.8 (as I haven't found
> 3.12.9 yet)
I wouldn't expect 3.12.9 to fix the problem, as you seem to be running
into a unique issue.

>
> The issue is still there, but occours much later then with 3.12.5.
Hmm, I'm not sure what changes between 3.12.8 and 3.12.5 would generate
this.

What is the actual client software you are running? I sounds like
someone is overwriting some buffers and causing very bizarre behavior.

That sounds like exactly the behavior of the FIPS token if it hits the
unrecoverable error state, but clearly you are using Slot 1, not Slot 3
(the FIPS slot). None of this matches anything I would expect to see
with NSS talking to softoken.;(.
>
> TIA,
> Bernhard


Wan-Teh Chang

unread,
Jan 13, 2011, 5:58:30 PM1/13/11
to mozilla's crypto code discussion list
On Wed, Jan 12, 2011 at 2:38 PM, Robert Relyea <rre...@redhat.com> wrote:
> On 01/12/2011 01:26 PM, Bernhard Thalmayr wrote:
>> 331569088[1bd1610]: C_UnwrapKey
>> 331569088[1bd1610]:   hSession = 0x6
>> 331569088[1bd1610]:   pMechanism = 0x7fffcd592ea0
>> 331569088[1bd1610]:   hUnwrappingKey = 0x8
>> 331569088[1bd1610]:   pWrappedKey = 0x36ac9bc
>> 331569088[1bd1610]:   ulWrappedKeyLen = 48
>> 331569088[1bd1610]:   pTemplate = 0x7fffcd592cf0
>> 331569088[1bd1610]:   ulAttributeCount = 6
>> 331569088[1bd1610]:   phKey = 0x36bfd58
>> 331569088[1bd1610]:     CKA_SIGN = CK_TRUE [1]
>> 331569088[1bd1610]:     CKA_VERIFY = CK_TRUE [1]
>> 331569088[1bd1610]:     CKA_CLASS = CKO_SECRET_KEY [8]
>> 331569088[1bd1610]:     CKA_KEY_TYPE = 0x10 [8]
>> 331569088[1bd1610]:     CKA_DERIVE = CK_TRUE [1]
>> 331569088[1bd1610]:     CKA_VALUE_LEN = 3000000000000000 [8]
>> 331569088[1bd1610]:       mechanism = CKM_DES3_ECB
>> 331569088[1bd1610]:   *phKey = 0x54
> The extra data here is dumping out interesting values inside the
> parameters passed to C_UnwrapKey. The list of CKA_XXXX = XXXXX are
> dumping out the template sent in pTemplate. CKA_VALUE_LEN looks wrong to
> me, I don't know if tehre is some byteswap issue here.

I looked into the CKA_VALUE_LEN issue. I found that our PKCS #11
logger prints CKA_VALUE_LEN as a hex dump. So
3000000000000000 [8]
means a sequence of 8 bytes:
0x30 0x00 0x00 0x00 0x00 0x00 0x00 0x00

On a little-endian system (such as x86/x64), this is 48 decimal, which
is the length of the SSL/TLS master secret.

I filed an NSS bug https://bugzilla.mozilla.org/show_bug.cgi?id=625491
to improve the logging of CKA_VALUE_LEN, with a patch.

Wan-Teh

Wan-Teh Chang

unread,
Jan 13, 2011, 6:03:25 PM1/13/11
to mozilla's crypto code discussion list
On Thu, Jan 13, 2011 at 2:53 AM, Bernhard Thalmayr
<bernhard...@painstakingminds.com> wrote:
>
> It might be helpfull if SSLTRACE and PKCS#11 could log a timestamp to help
> in correlation.

You can add 'timestamp' to the NSPR_LOG_MODULES environment variable. See
http://www.mozilla.org/projects/nspr/reference/html/prlog.html#25306

This should add timestamps to PKCS #11 logging. I don't know if
SSLTRACE is based on NSPR logging...

Wan-Teh

Bernhard Thalmayr

unread,
Jan 14, 2011, 4:24:14 AM1/14/11
to mozilla's crypto code discussion list
Thanks again Robert, please see my comments inline ...


On 01/13/2011 10:40 PM, Robert Relyea wrote:
--snip--

> What is the actual client software you are running?


the 'client' is the OpenSSO web-agent (a lib) used by Apache httpd.

It merley does the following ...

PR_SetConcurrency(4);

while (len > 0) {
PRInt32 bytesToWrite;
PRInt32 bytesWritten;

if (len > static_cast<std::size_t>(MAX_READ_WRITE_LEN)) {
bytesToWrite = MAX_READ_WRITE_LEN;
} else {
bytesToWrite = static_cast<PRInt32>(len);
}

bytesWritten = PR_Write(socket, data, bytesToWrite);
if (bytesWritten < 0) {
status = AM_NSPR_ERROR;
break;
}
data += bytesWritten;
len -= static_cast<std::size_t>(bytesWritten);
}

the socket referenced is created using the following function

PRFileDesc *createSocket(const PRNetAddr& address, bool useSSL,
const std::string &certDBPasswd,
const std::string &certNickName,
bool alwaysTrustServerCert);

char *growBuffer(char *oldBuffer, std::size_t oldBufferLen,
std::size_t newBufferLen);
am_status_t read(char *buffer, std::size_t& bufferLen);

static bool initialized;
PRFileDesc *socket;
char *certdbpasswd;
char *certnickname;
};


I sounds like
> someone is overwriting some buffers and causing very bizarre behavior.

Hmm, why could this 'buffer overwriting' only occour on this
configuration .. it does not happen on any other?!?

-- snip --

Is the output of 'PKCS#11' reliable?

AFAI have understood the code 'NSC_OpenSession' does not return
'CKR_DEVICE_ERROR' ... but I'm not a C-programmer.

Unfortunately 'printfs' are not passed through by Apache httpd ... I'll
try to use PR_LOG, but the question is will this be helpfull...


>>
>> TIA,
>> Bernhard
>
>

Kaspar Brand

unread,
Jan 16, 2011, 6:16:30 AM1/16/11
to dev-tec...@lists.mozilla.org
On 14.01.2011 10:24, Bernhard Thalmayr wrote:
> the 'client' is the OpenSSO web-agent (a lib) used by Apache httpd.

Just to be sure: we're talking of this code here, right?

http://sources.forgerock.org/browse/openam/trunk/opensso/products/webagents/am/source/connection.cpp?r=HEAD&content=true

Maybe Bob or Nelson can spot a problem when skimming over that code...?
I have another question/thought, however: what version of Apache httpd
and what MPM are you using? Is it possible that the
Connection::initialized boolean might not be shared among the httpd
processes, resulting in multiple (concurrent) NSS initializations?

Also, are "CertDir" and "dbPrefix" set in your configuration? If not,
Connection::initialize() would call NSS_NoDB_Init, which isn't intended
for SSL operations, from what I understand
(http://www.mozilla.org/projects/security/pki/nss/ref/ssl/sslfnc.html#1234224).

Kaspar

Bernhard Thalmayr

unread,
Jan 17, 2011, 7:38:24 AM1/17/11
to dev-tec...@lists.mozilla.org
Thanks for your reply Kaspar, please see my comments inline.

On 01/16/2011 12:16 PM, Kaspar Brand wrote:
> On 14.01.2011 10:24, Bernhard Thalmayr wrote:
>> the 'client' is the OpenSSO web-agent (a lib) used by Apache httpd.
>
> Just to be sure: we're talking of this code here, right?

yes

> I have another question/thought, however: what version of Apache httpd

Apache httpd 2.2.17

> and what MPM are you using?

Worker MPM is used , but is configured so start multiple processes (default)

Is it possible that the
> Connection::initialized boolean might not be shared among the httpd
> processes, resulting in multiple (concurrent) NSS initializations?

Would this be a problem? The Agent codes has always worked this way for
multi-process servers.

>
> Also, are "CertDir" and "dbPrefix" set in your configuration?

They are not set.

If not,
> Connection::initialize() would call NSS_NoDB_Init, which isn't intended
> for SSL operations, from what I understand
> (http://www.mozilla.org/projects/security/pki/nss/ref/ssl/sslfnc.html#1234224).


Hmm if this is true then the Agent code should exit and should not call
'NSS_NoDB_Init'. However how could 911 handshakes be made if this holds
true?

Thanks,
Bernhard


>
> Kaspar

Kaspar Brand

unread,
Jan 17, 2011, 12:28:04 PM1/17/11
to dev-tec...@lists.mozilla.org
On 17.01.2011 13:38, Bernhard Thalmayr wrote:
> Apache httpd 2.2.17
>
>> and what MPM are you using?
>
> Worker MPM is used , but is configured so start multiple processes (default)
>
> Is it possible that the
>> Connection::initialized boolean might not be shared among the httpd
>> processes, resulting in multiple (concurrent) NSS initializations?
>
> Would this be a problem? The Agent codes has always worked this way for
> multi-process servers.

Not sure. But when you say "has always worked this way for multi-process
servers", does this mean it worked with previous Apache httpd versions?
Did you also try with the prefork MPM (which is the default for the
CentOS-bundled version of httpd)?

>> Also, are "CertDir" and "dbPrefix" set in your configuration?
>
> They are not set.
>
> If not,
>> Connection::initialize() would call NSS_NoDB_Init, which isn't intended
>> for SSL operations, from what I understand
>> (http://www.mozilla.org/projects/security/pki/nss/ref/ssl/sslfnc.html#1234224).
>
>
> Hmm if this is true then the Agent code should exit and should not call
> 'NSS_NoDB_Init'. However how could 911 handshakes be made if this holds
> true?

Well, unless I'm completely misreading the following code,
Connection::initialize will proceed with NSS_NoDB_Init if certDir is not
set:

> if (certDir.length() != 0) {
> Log::log(Log::ALL_MODULES, Log::LOG_DEBUG, "Connection::initialize() "
> "calling NSS_Initialize() with directory = \"%s\" and "
> "prefix = \"%s\"", certDir.c_str(), dbPrefix.c_str());
> nssMethodName = "NSS_Initialize";
> secStatus = NSS_Initialize(certDir.c_str(), dbPrefix.c_str(),
> dbPrefix.c_str(), "secmod.db",
> NSS_INIT_READONLY|NSS_INIT_FORCEOPEN);
> } else {
> Log::log(Log::ALL_MODULES, Log::LOG_DEBUG, "Connection::initialize() "
> "CertDir and dbPrefix EMPTY -- Calling NSS_NoDB_Init");
> nssMethodName = "NSS_NoDB_Init";
> secStatus = NSS_NoDB_Init(NULL);
> }

Kaspar

Bernhard Thalmayr

unread,
Jan 18, 2011, 6:29:00 AM1/18/11
to dev-tec...@lists.mozilla.org
On 01/17/2011 06:28 PM, Kaspar Brand wrote:
> On 17.01.2011 13:38, Bernhard Thalmayr wrote:
>> Apache httpd 2.2.17
>>
>>> and what MPM are you using?
>>
>> Worker MPM is used , but is configured so start multiple processes (default)
>>
>> Is it possible that the
>>> Connection::initialized boolean might not be shared among the httpd
>>> processes, resulting in multiple (concurrent) NSS initializations?
>>
>> Would this be a problem? The Agent codes has always worked this way for
>> multi-process servers.
>
> Not sure. But when you say "has always worked this way for multi-process
> servers", does this mean it worked with previous Apache httpd versions?

I had been employed at Sun/iPlanet for last 10 years and I've never run
into this issue ....


> Did you also try with the prefork MPM (which is the default for the
> CentOS-bundled version of httpd)?

Not yet...


I meant it might be a bug in Agent code to call 'NSS_NoDBInit' ...
however this code has been there for some years already.

If this is a bug ... how can SSL handshake work at all? It does not fail
over time, but it fails somehwere in time.

If the error once occured NSS can not recover from it.

I'm not 100% sure but it seems the error does only occur on SSL restarts
.. not on full handshakes.

-Bernhard
>
> Kaspar

Kaspar Brand

unread,
Jan 18, 2011, 11:16:14 AM1/18/11
to dev-tec...@lists.mozilla.org
On 18.01.2011 12:29, Bernhard Thalmayr wrote:
> I meant it might be a bug in Agent code to call 'NSS_NoDBInit' ...
> however this code has been there for some years already.

One explanation I can think of is that it would only break with more
recent versions of NSS, due to stricter application behavior being
enforced? (https://bugzilla.mozilla.org/show_bug.cgi?id=331096 comes to
mind, e.g. - but that's just a [very] wild guess).

I'll defer to the real experts (Bob, Nelson, Wan-Teh) for authoritative
answers on this, however.

> I'm not 100% sure but it seems the error does only occur on SSL restarts
> .. not on full handshakes.

Perhaps it can do full handshakes even if it has been init'ed with
NSS_NoDBInit, but no session resumptions? (Again, just guessing.)

What you could also try is creating an empty DB (certutil -N -d somedir)
and configure that for the agent. Would be interesting to see if it
makes a difference.

Kaspar

Bernhard Thalmayr

unread,
Jan 18, 2011, 11:23:17 AM1/18/11
to dev-tec...@lists.mozilla.org
On 01/18/2011 05:16 PM, Kaspar Brand wrote:
> On 18.01.2011 12:29, Bernhard Thalmayr wrote:
>> I meant it might be a bug in Agent code to call 'NSS_NoDBInit' ...
>> however this code has been there for some years already.
>
> One explanation I can think of is that it would only break with more
> recent versions of NSS, due to stricter application behavior being
> enforced?

Not really ... I've already created a debug build for NSS 3.12.8 (as
this seems to fix some PKCS#11 logger issues).

The issue does occour later, but it occurs.


(https://bugzilla.mozilla.org/show_bug.cgi?id=331096 comes to
> mind, e.g. - but that's just a [very] wild guess).
>
> I'll defer to the real experts (Bob, Nelson, Wan-Teh) for authoritative
> answers on this, however.
>
>> I'm not 100% sure but it seems the error does only occur on SSL restarts
>> .. not on full handshakes.
>
> Perhaps it can do full handshakes even if it has been init'ed with
> NSS_NoDBInit, but no session resumptions? (Again, just guessing.)

Well, the error does not occour on every restart ... that's the problem.

It's not reproducable ... the agent just sends requests to a secure
server and on some point in time NSS can not recover.

>
> What you could also try is creating an empty DB (certutil -N -d somedir)
> and configure that for the agent. Would be interesting to see if it
> makes a difference.


I'll give this a try.

Thanks,
Bernhard
>
> Kaspar

0 new messages