Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Potential Nightmare: CryptoAPI issues running SMS 2003 on Windows 2000 Server

25 views
Skip to first unread message

to...@ovod-everett.org

unread,
Jan 13, 2005, 1:35:41 AM1/13/05
to
If you are running SMS 2003 on top of Windows 2000 Server, I would
strongly recommend that you consider contacting Microsoft and
requesting access to the hotfix described
http://support.microsoft.com/?id=884872 . This hotfix will defend
against a potentially nightmarish scenario. This is a Pre-SP5 hotfix
for Windows 2000, but since Microsoft has publicly announced that there
will be no SP5 for Windows 2000, the only way to get your hands on it
will be to call Microsoft and request it. It will not, however, help
you pick up the pieces afterwards if the problem described happens to
you before you apply the hotfix. This documentation may help you
recover, and you may decide to simply run your server without the
hotfix and keep these recovery instructions in mind.

I have a single site with one server. The server runs both SQL Server
2000 and SMS 2003 SP1, along with SUS (yes, I know this is not the
preferred configuration, but with careful use of IISLockD, it is
workable) and RIS. I have enabled signing and encryption for all
communication (bring up Properties on the Site, go to the Advanced tab,
check every checkbox in sight!).

On Jan 7 at 5:59 PM, my server booted up after a boot-time
defragmentation of the RIS partition. Everything seemed OK, but I
noticed when remoted into work on Jan 8 that inventory was not
processing. I figured I'd look at it on Jan 10 (Monday), so I
continued doing some defragmentation on the RIS partition. Around 9:00
PM on Jan 8th I rebooted for a second time. The next morning, I
noticed that IIS was no longer running. The displayed error message
was "Could not start the WWW Publishing Service. Error 1008. An attempt
was made to reference a token that does not exist." The only response
to this scenario that I could find was
http://support.microsoft.com/?id=328617 and numerous other pages with
similar instructions. I drove into the office on the afternoon of Jan
9 to start troubleshooting. First, I attempted restoring the
metabase.bin file from System State backups (useful note: you can
restore System State backups to an alternate location - although you
cannot pick which files you restore, it is fairly easy to pick through
the restored tree and grab the file you want). None of the
metabase.bin files worked. My presumption was either some form of
undetected corruption or an error in the backup. I would later learn
that there was nothing wrong with the metabase.bin file, but I had no
way of knowing this at the time. I followed the second instruction in
the 328617 article and preceded to uninstall and reinstall IIS. This
got IIS functioning again, at which point I turned my attention towards
getting SUS and SMS happy. There were other problems I stumbled around
including password synchronization (which may have been a red herring)
on the IUSR and IWAM accounts, but eventually I managed to get SUS
reinstalled and the SMS Management Points reinstalled and everything
happy.

Or so I thought. My clients, on the other hand, were reporting "Both
the trusted key and the mp certificate have changed on server The
client cannot validate the authentication information" in the
LocationServices.log file. At this point I still didn't know what was
going on, but I presumed it was somehow related to my reinstall of the
Management Point, as indicated in
http://groups-beta.google.com/group/microsoft.public.sms.setup/browse_thread/thread/210df960c4ac34f2/8b82381dff7721a7
. I forged on and picked a test box and triggered CCMSETUP.EXE
RESETKEYINFORMATION=TRUE on that machine. From that point on, the
"Both the trusted key and the mp certificate . . ." error disappeared
from its logs, but the inventory still wasn't flowing. At this point I
gave up on it and decided I would call Microsoft Monday morning.

During contact with Microsoft, we began to zero in on some interesting
log events. The first were centered around 5:59 PM on Jan 7:

Entering Certificate Maintenance CcmExec 1/7/2005 5:59:38 PM
CCMSignData failed (0x8009200b). CcmExec 1/7/2005 5:59:40 PM
Creating Signing Certificate... CcmExec 1/7/2005 5:59:40 PM
Successfully created certifcate CcmExec 1/7/2005 5:59:43 PM

These indicated that upon starting up, SMS suddenly decided to use a
new TrustedRootKey and MPCertificate. This explained the "Both the
trusted key and the mp certificate . . ." errors and the sudden failure
to continue processing inventory. We were still at a loss for an
explanation on why the metabase.bin file got "corrupted" a day later,
but at least we had some idea what had happened to SMS. However, after
successfully giving the new TrustedRootKey and MPCertificate to a
machine, the inventory wouldn't process. We tracked that down to the
errors in C:\SMS_CCM\Logs\ccmexec.log looking like so:

EndpointMessage(Queue='MP_HinvEndpoint',
ID={CC96DB74-AD08-48DD-B422-0E06EC5A243C}): Will be discarded
(0x8009000d). CcmExec 1/11/2005 10:12:16 AM 3944 (0x0F68)

I was able to deduce that the 0x8009000d was a crypto error of some
kind - it was NTE_NO_KEY, and I could even find some API calls using
MSDN Online that returned it, but we still couldn't figure out what was
wrong with the TrustedRootKey and/or MPCertificate. Eventually I
realized on the afternoon of Jan 11 (Tuesday) that I could turn off
Encryption for inventory data (Advanced tab of the Site Properties
dialog box) and then updated machines could successfully report
inventory. Shortly after I started running scripts to update the
TrustedRootKey and MPCertificate on the clients (I'll tell you the
trick for that shortly), my Microsoft contact sent me
http://support.microsoft.com/?id=884872 . That was the key to
understanding everything. Suddenly I knew _WHERE_ to look -
"C:\Documents and Settings\All Users\Application
Data\Microsoft\Crypto\RSA\Machine Keys". Within the next few hours,
everything fell into place. Restoring the c23 key was pointless - I
had already mucked with IIS so much that I was better off sticking with
my existing configuration than trying to restore an old metabase.bin
file that no longer matched the files in C:\Inetpub\wwwroot. However,
with a little more detective work I noticed the following. One of the
pre-nightmare keys, which started with "ca6f", had a timestamp that
matched when I installed SMS 2003. Another, which started with "19c5",
matched when I upgraded to SMS 2003 SP1. Furthermore, the file sizes
on the post-nightmare ca6f key matched the pre-nightmare key, but the
post-nightmare 19c5 key was 1,166 bytes smaller than the pre-nightmare
key!!! Evidence that when it had created a new TrustedRootKey and
MPCertificate, it had failed to create them properly and had hosed up
some critical part - probably the MP encryption key. My guess is
that there is some sort of update done to the 19c5 key when SP1 is
applied to a server, but for some reason the SP1 server code does not
trigger that update when the keys get damaged and it decides to create
new ones.

After a couple hours of more trial and error, I successfully followed
the instructions in http://support.microsoft.com/?id=884872, but
instead of renaming the c23 key (which was instrumental for IIS, but I
already had IIS working), I renamed the ca6f and 19c5 keys. I then had
to uninstall and reinstall the Management Point on the server
(following proper procedure by unchecking the box in the Administrator
and watching for C:\SMS\Logs\MPSetup.log for it to finish uninstalling
before rechecking the box). After I had taken both of those steps (the
first reverted SMS to the old keys, the second forced an update on all
of the records that list the MPCertificate), clients that were still
using the old TrustedRootKey and MPCertificate values started working,
even with encryption for inventory turned back on! The clients that I
had updated to start using the post-nightmare keys had to be reverted,
but that wasn't too painful.

So, if this happens to you, DON'T PANIC!!! :)

Go to http://support.microsoft.com/?id=884872 and use your old c23,
ca6f, and 19c5 keys to replace your new ones. I would strongly
recommend making a complete backup of the Machine Keys directory before
doing anything interesting. I would also recommend looking at
timestamps and building a mental model of which key is used for what as
a check that the key names don't vary from system to system. After you
restart all your services, uninstall and reinstall your management
point and keep your fingers crossed!

--Toby Ovod-Everett

P.S. I promised you a trick for updating the TrustedRootKey and
MPCertificate on machines remotely. You shouldn't need this in this
case, but there may be other scenarios where this comes in handy.
Using your scripting language of choice, connect to WMI remotely and
look at root\ccm\locationservices and then enumerate the TrustedRootKey
instances. Update the MPCertificate and TrustedRootKey values
(surprise, surprise) to values you retrieve by either looking at a
functioning machine or by using
http://servername/SMS_MP/.sms_aut?mpkeyinformation and viewing source
and then reformating the XML into something coherent. I restarted
ccmexec after doing this. Note that if you have multiple MPs, you may
have multiple MPCertificate values depending upon which management
point the machine in question has most recently contacted. Actually,
you may not need to update the MPCertificate - I think it will
download a new one on the next policy refresh if the signatures on the
certificate can be validated using the newly correct TrustedRootKey.

Joseph Calabig [MSFT]

unread,
Jan 13, 2005, 8:35:00 PM1/13/05
to
Glad your encryption problems got fixed.
The next time you get into client authentication problems, you can delete
the clientkeydata table entry for a client in SQL.
This will allow all inventory to be accepted, until the next DDR with the
new client key gets processed.
--
Thanks,
Joseph Calabig

This posting is provided "AS IS" with no warranties, and confers no rights.

<to...@ovod-everett.org> wrote in message
news:1105598141....@z14g2000cwz.googlegroups.com...

to...@ovod-everett.org

unread,
Jan 14, 2005, 1:45:15 PM1/14/05
to

Joseph Calabig [MSFT] wrote:
> Glad your encryption problems got fixed.
> The next time you get into client authentication problems, you can
delete
> the clientkeydata table entry for a client in SQL.
> This will allow all inventory to be accepted, until the next DDR with
the
> new client key gets processed.

Actually, that wouldn't solve the problem here. I did look at the
ClientKeyData along the way, but the problem here wasn't client
signing, it was _server_ signing and client _encryption_. My Microsoft
support engineer thought that client signing might be the issue at the
beginning, but I noticed that DDRs (which are signed, but not
encrypted) _were_ being processed (I could tell because the client
heartbeats were updating), but inventory (which is both signed and
encrypted) was _not_ being processed. There was no problem with the
ClientKeyData (although I probably could have used your trick later
when I needed to get the server to start accepting its own inventory
data, since the server's ClientKeyData was out-of-whack after all of
this fun and games). The problem was that the encryption key being
used by the clients (i.e. the management point's encryption public key)
didn't match a known private key on the server (thus the NTE_NO_KEY
error, which comes out of the CryptoAPI calls to look for a key in the
CryptoAPI key storage system - the ClientKeyData stuff is maintained in
the SQL database and is thus outside of the CryptoAPI, and so you
wouldn't be searching for that stuff and thus wouldn't see a NTE_NO_KEY
error). If there is an error in the ClientKeyData, you get the errors
described in http://support.microsoft.com/?id=886013. We weren't
seeing any of those errors (and I never ended up making any
modifications to the ClientKeyData).

--Toby Ovod-Everett

0 new messages