Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Cluster service crashing on failover after antivirus installation

42 views
Skip to first unread message

Neil

unread,
Nov 28, 2003, 5:14:50 AM11/28/03
to
Hi,

I have Windows a 2000 SP3 Cluster. We have clustered a number of
services including MSDTC and MSMQ.

Everything was working fine until Ca eTrust Antivirus software v7
(supposedly compatible with Clusters) was installed on the Cluster.
The Quorum drives have been excluded from the virus scan, although the
software is still scanning read/write to the disks.

What happens now, is that occasionally, on failover the Cluster
service will crash. Sometimes this crashes the whole node.

I've been through the Cluster log. When the Cluster fails over to a
node everything is ok until it tries to write a MSMQ checkpointing
file to drive F: (the Quorum). This seems to fail and then the Cluster
runs a Chkdsk on F: and the Cluster service restarts. Here's what I
make of the log:

// Cluster node starting
00000fb4.00000b84::2003/11/27-14:39:01.359 Physical Disk <Disk F:>:
MountieVerify: DriveLetters mask is now 00000020.
000007d8.00000d98::2003/11/27-14:39:01.859 [FM] FmpRmOnlineResource:
release quolock/group lock and wait on ghQuoOnlineEvent
000007d8.00000d98::2003/11/27-14:39:02.359 [FM] FmpRmOnlineResource:
release quolock/group lock and wait on ghQuoOnlineEvent
000007d8.000010cc::2003/11/27-14:39:02.359 [CP] CppRegNotifyThread
checkpointing key SOFTWARE\Microsoft\MSMQ\Clustered
QMs\MSMQ$MSMQ\Parameters to id 1 due to timer
000007d8.000010cc::2003/11/27-14:39:02.406 [CP] CpSaveData:
checkpointing data id 1 to quorum node 1

// MSMQ Writing checkpoint file
000007d8.000010cc::2003/11/27-14:39:02.421 [CP] CppWriteCheckpoint
checkpointing file C:\DOCUME~1\SERVER~1\LOCALS~1\Temp\CLS99.tmp to
file F:\MSCS\\23fceb64-56eb-4ff7-87fe-7060f5c7daf2\00000001.CPT

// Here the cluster is waiting for something - this line is repeated a
number of times.
000007d8.00000d98::2003/11/27-14:39:02.859 [FM] FmpRmOnlineResource:
release quolock/group lock and wait on ghQuoOnlineEvent

// Disk F is doing somthing here (not sure what)
000007d8.00000d98::2003/11/27-14:39:03.359 [FM] FmpRmOnlineResource:
release

00000fb4.00000b90::2003/11/27-14:39:17.531 Physical Disk <Disk F:>:
[DiskArb] CompletionRoutine, status 0.
00000fb4.00000b90::2003/11/27-14:39:17.531 Physical Disk <Disk F:>:
[DiskArb] posting AsyncCheckReserve request.
00000fb4.00000b90::2003/11/27-14:39:17.531 Physical Disk <Disk F:>:
[DiskArb] error checking disk reservation thread, error 995.

// Here things seem to start to go wrong with the Quorum F:
00000fb4.00000b90::2003/11/27-14:39:17.531 Physical Disk <Disk F:>:
[DiskArb] CompletionRoutine: reservation lost!
000007d8.000007e8::2003/11/27-14:39:17.562 [EVT] s_ApiEvPropEvents:
Calling into EvpPropPendingEvents, size=648...
000007d8.000007e8::2003/11/27-14:39:17.562 [EVT] s_ApiEvPropEvents:
Called EvpPropPendingEvents...

// Here things go badly wrong
00000fb4.00000b90::2003/11/27-14:39:17.562 [RM] RmpLostQuorumResource,
cluster service terminated...

These problems have happened with two different versions of Virus
software now. Can anyone confirm why the Virus software is doing this,
and also what the recommended methods of installing Virus software on
a Win 2K Cluster would be ?

I am also aware of a Knowledgebase article which recommends removing
the Virus software Filter drivers. Unfortunately this is not an option
as this company is adamant that Antivirus software is installed on the
cluster.

thanks,

Neil

anon...@discussions.microsoft.com

unread,
Nov 30, 2003, 6:27:59 PM11/30/03
to
Antivirus software is not recommended to put on a cluster
if you can help it, and if you do, filter so that it does
not scan the quorum disk or say, SQL data/log disks.
You're doing that, but my recommendation is to remove or
disable the AV software. It is outlined clearly in a KB
article that AV can cause problems with a cluster.
>.
>
0 new messages