We ran into a weird problem where mongodump-ing a 2.0.1 database with
a 1.6.5 version of the mongodump utility caused the server to SEGFAULT
and hang in a weird state:
https://jira.mongodb.org/browse/SERVER-4190
At the time the server being dumped was the master in a replicaSet.
We brought it back up and it rejoined as a slave. We're still able to
reliably cause *that* server to die, but only that server. If we dump
(with 1.6.5) on either the new master or the other slave we don't have
any problems.
Ignoring for a second that we can crash the server at all I wondered
why we can only do it for one of the members of the replicaSet and not
the others. So I started thinking that maybe it's data is corrupted
in some unique way. I did a mongodump using the 2.0.1 version of the
utility for all of the members of the replica set. I then compared
all of the collections dumped, counts of objects reported as dumped,
and the filesizes of the resulting .bson files and they appear to
match.
Two questions:
First, Is there a better way to compare dumps between the master and
slave? Such a strategy is suggested in
http://www.mongodb.org/display/DOCS/Durability+and+Repair
but it doesn't suggest how to do the comparison.
Second, assuming the mongodumps check out and they are the same
between master and slave. Is it possible that state internal to the
server (maybe the data files *.0, *.1, ... or *.ns files?) is
corrupted on my one server in a way that it's not corrupted on the
other server? Is there any way to tell that? It would sure suck to
think that my replica set has integrity when it doesn't.
Thanks,
Andy