Failure of Replica Sets

30 Aufrufe
Direkt zur ersten ungelesenen Nachricht

Chase

ungelesen,
12.12.2010, 15:54:1012.12.10
an mongodb-user
Hello - Due to the disk growth issues we attempted to repair the DB of
one our servers in a replica set. When the repair failed (out of
space). We removed the dbpath and restarted. Now the replica will
never resync:

Mon Dec 13 04:49:19 [rs_sync] replSet initial sync failing, error
applying oplog 11000 E11000 duplicate key error index: cache.session.
$pk_idx dup key: { : { key:
"abfe227f-5c59-4855-9c46-1d1803cb7508" } }

This is very odd - the fact is that even if a dup key was received on
the master it should never have been written to the oplog of the
master but it seems to be there. Now we are stuck with an outage to
try and sync up the servers - anyone else have any other ideas?

Eliot Horowitz

ungelesen,
12.12.2010, 20:16:1412.12.10
an mongod...@googlegroups.com
Can you check the master to see if there are indeed duplicates?

> --
> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com.
> To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
>

Chase

ungelesen,
12.12.2010, 22:20:4512.12.10
an mongodb-user
Hi the client gets dup key errors and they seem to correlate. It is
like the dup keys insert statement is put into the oplog on the
master. No dup keys in the actual DB.

On Dec 12, 6:16 pm, Eliot Horowitz <eliothorow...@gmail.com> wrote:
> Can you check the master to see if there are indeed duplicates?
>

Eliot Horowitz

ungelesen,
12.12.2010, 22:35:4212.12.10
an mongod...@googlegroups.com
Can you send the whole slave log?

Chase

ungelesen,
13.12.2010, 13:12:2613.12.10
an mongodb-user
Hello - i have pasted the log in. I have changed some of the
collection names but it is the the log from the slave:


Mon Dec 13 06:17:22 [rs_sync] replSet initial sync failed during
applyoplog
Mon Dec 13 06:17:22 [rs_sync] replSet cleaning up [1]Mon Dec 13
06:17:25 [rs_sync] replSet cleaning up [2]
Mon Dec 13 06:17:29 [rs_sync] replSet initial sync pendingMon Dec 13
06:17:34 [rs_sync] replSet initial sync drop all databases
Mon Dec 13 06:17:34 [rs_sync] dropAllDatabasesExceptLocal 3Mon Dec 13
06:17:34 [rs_sync] replSet initial sync cloning db: collectiona
Mon Dec 13 06:17:34 allocating new datafile /data/sessions.1/
collectiona.ns, filling with zeroes...Mon Dec 13 06:17:34 done
allocating datafile /data/sessions.1/collectiona.ns, size: 16MB, took
0.02 secs
Mon Dec 13 06:17:34 allocating new datafile /data/sessions.1/
collectiona.0, filling with zeroes...Mon Dec 13 06:17:35 done
allocating datafile /data/sessions.1/collectiona.0, size: 64MB, took
0.091 secs
Mon Dec 13 06:17:35 allocating new datafile /data/sessions.1/
collectiona.1, filling with zeroes...Mon Dec 13 06:17:35 done
allocating datafile /data/sessions.1/collectiona.1, size: 128MB, took
0.206 secs
Mon Dec 13 06:17:35 allocating new datafile /data/sessions.1/
collectiona.2, filling with zeroes...Mon Dec 13 06:17:36 done
allocating datafile /data/sessions.1/collectiona.2, size: 256MB, took
0.363 secs
Mon Dec 13 06:17:36 allocating new datafile /data/sessions.1/
collectiona.3, filling with zeroes...Mon Dec 13 06:17:37 done
allocating datafile /data/sessions.1/collectiona.3, size: 512MB, took
0.696 secs
Mon Dec 13 06:17:38 [rs_sync] building new index on { _id: 1 } for
collectiona.accountMon Dec 13 06:17:40 [rs_sync] done for 580857
records 2.344secs
Mon Dec 13 06:17:40 [rs_sync] building new index on { subscriberId:
1 } for collectiona.account
Mon Dec 13 06:17:42 allocating new datafile /data/sessions.1/
collectiona.4, filling with zeroes...
Mon Dec 13 06:17:43 [rs_sync] done for 580857 records 2.424secs
Mon Dec 13 06:17:43 [rs_sync] building new index on { nextEvalTime:
1 } for collectiona.account
Mon Dec 13 06:17:44 done allocating datafile /data/sessions.1/
collectiona.4, size: 1024MB, took 1.334 secs
Mon Dec 13 06:17:44 [rs_sync] done for 580857 records 1.525secs
Mon Dec 13 06:17:44 [rs_sync] building new index on { reservationIds:
1 } for collectiona.account
Mon Dec 13 06:17:46 [rs_sync] done for 580857 records 1.514secs
Mon Dec 13 06:17:46 [rs_sync] replSet initial sync cloning db:
collectionb
Mon Dec 13 06:17:46 allocating new datafile /data/sessions.1/
collectionb.ns, filling with zeroes...
Mon Dec 13 06:17:46 done allocating datafile /data/sessions.1/
collectionb.ns, size: 16MB, took 0.02 secs
Mon Dec 13 06:17:46 allocating new datafile /data/sessions.1/
collectionb.0, filling with zeroes...
Mon Dec 13 06:17:46 done allocating datafile /data/sessions.1/
collectionb.0, size: 64MB, took 0.103 secs
Mon Dec 13 06:17:46 allocating new datafile /data/sessions.1/
collectionb.1, filling with zeroes...
Mon Dec 13 06:17:46 done allocating datafile /data/sessions.1/
collectionb.1, size: 128MB, took 0.205 secs
Mon Dec 13 06:17:46 [rs_sync] building new index on { _id: 1 } for
collectionb.sess
Mon Dec 13 06:17:47 [rs_sync] done for 23177 records 0.08secs
Mon Dec 13 06:17:47 [rs_sync] building new index on { _id: 1 } for
collectionb.inactive_d
Mon Dec 13 06:17:47 [rs_sync] done for 0 records 0secs
Mon Dec 13 06:17:47 [rs_sync] building new index on { key: 1 } for
collectionb.sess
Mon Dec 13 06:17:47 [rs_sync] done for 23177 records 0.149secs
Mon Dec 13 06:17:47 [rs_sync] building new index on { tags: 1 } for
collectionb.sess
Mon Dec 13 06:17:47 allocating new datafile /data/sessions.1/
collectionb.2, filling with zeroes...
Mon Dec 13 06:17:48 [rs_sync] done for 23177 records 0.972secs
Mon Dec 13 06:17:48 [rs_sync] building new index on { nextEvalTime:
1 } for collectionb.session
Mon Dec 13 06:17:48 [rs_sync] done for 23177 records 0.073secs
Mon Dec 13 06:17:48 [rs_sync] building new index on { key: 1 } for
collectionb.inactive_d
Mon Dec 13 06:17:48 [rs_sync] done for 0 records 0secs
Mon Dec 13 06:17:48 [rs_sync] building new index on { tags: 1 } for
collectionb.inactive_d
Mon Dec 13 06:17:48 [rs_sync] done for 0 records 0secs
Mon Dec 13 06:17:48 [rs_sync] building new index on { nextEvalTime:
1 } for collectionb.inactive_d
Mon Dec 13 06:17:48 [rs_sync] done for 0 records 0secs
Mon Dec 13 06:17:48 [rs_sync] replSet initial sync cloning db: admin
Mon Dec 13 06:17:48 [rs_sync] replSet initial sync cloning db:
collectionc
Mon Dec 13 06:17:48 [rs_sync] replSet initial sync cloning db: test
Mon Dec 13 06:17:48 [rs_sync] replSet initial sync query minValid
Mon Dec 13 06:17:48 [rs_sync] replSet initial sync copy+apply oplog
Mon Dec 13 06:17:48 done allocating datafile /data/sessions.1/
collectionb.2, size: 256MB, took 0.33 secs
Mon Dec 13 06:17:50 [rs_sync] replSet initialSyncOplogApplication
100000
Mon Dec 13 06:17:52 [rs_sync] replSet initialSyncOplogApplication
200000
Mon Dec 13 06:17:54 [rs_sync] replSet initialSyncOplogApplication
300000
Mon Dec 13 06:17:56 [rs_sync] replSet initialSyncOplogApplication
400000
Mon Dec 13 06:17:58 [rs_sync] replSet initialSyncOplogApplication
500000
Mon Dec 13 06:18:00 [rs_sync] replSet initialSyncOplogApplication
600000
Mon Dec 13 06:18:03 [rs_sync] replSet initialSyncOplogApplication
700000
Mon Dec 13 06:18:05 [rs_sync] replSet initialSyncOplogApplication
800000
Mon Dec 13 06:18:07 [rs_sync] replSet initialSyncOplogApplication
900000
Mon Dec 13 06:18:09 [rs_sync] replSet initialSyncOplogApplication
1000000
Mon Dec 13 06:18:11 [rs_sync] replSet initialSyncOplogApplication
1100000
Mon Dec 13 06:18:13 [rs_sync] replSet initialSyncOplogApplication
1200000
Mon Dec 13 06:18:15 [rs_sync] replSet initialSyncOplogApplication
1300000
Mon Dec 13 06:18:17 [rs_sync] replSet initialSyncOplogApplication
1400000
Mon Dec 13 06:18:19 [rs_sync] replSet initialSyncOplogApplication
1500000
Mon Dec 13 06:18:21 [rs_sync] replSet initialSyncOplogApplication
1600000
Mon Dec 13 06:18:23 [rs_sync] replSet initialSyncOplogApplication
1700000
n Dec 13 06:18:28 [rs_sync] replSet initialSyncOplogApplication
1900000
Mon Dec 13 06:18:28 [rs_sync] replSet initial sync failing, error
applying oplog 11000 E11000 duplicate key error index:
collectionb.sess.$pk_idx dup key: { : { key: "4ca21362-3ff5-4faf-a528-
c03b1b20c806" } }
Mon Dec 13 06:18:28 [rs_sync] replSet initial sync failed during
applyoplog
Mon Dec 13 06:18:28 [rs_sync] replSet cleaning up [1]
Mon Dec 13 06:18:52 [rs_sync] replSet cleaning up [2]
Mon Dec 13 06:18:55 [rs_sync] replSet initial sync pending
Mon Dec 13 06:19:00 [rs_sync] replSet initial sync drop all databases

Chase

ungelesen,
13.12.2010, 15:37:0113.12.10
an mongodb-user
Eliot as a separate note. I ran a similar test in another environment
and I got a duplicate key index

Mon Dec 13 13:26:50 [rs_sync] warning: exception cloning object in
collectiona.system.indexes E11000 duplicate key error index:
collectiona.sess.$pk_idx dup key: { : { key: "121.127.99.139" } } obj:
{ name: "pk_idx", ns: "collectiona.sess", unique: true, key: { key:
1 } }

This was during the cloning operation of the oplog. I then ran
validate on the master and it showed everything was fine for the
collection. In this scenario it seemed to work fine and replica set
resynced. In our other environment that was not the case.

Chase


On Dec 12, 8:35 pm, Eliot Horowitz <eliothorow...@gmail.com> wrote:
> Can you send the whole slave log?
>

Kristina Chodorow

ungelesen,
14.12.2010, 12:17:0014.12.10
an mongod...@googlegroups.com
I think this is a bug with initial sync (it shouldn't halt if there are duplicate key exceptions).  I've created a bug you can track: https://jira.mongodb.org/browse/SERVER-2219

Is the error message always 'collectionb.sess.$pk_idx  dup key: { : { key: "4ca21362-3ff5-4faf-a528-c03b1b20c806" } }' or does the duplicate key change?

Does it work if you try your initial sync when your app is less busy?

Chase

ungelesen,
14.12.2010, 14:49:3714.12.10
an mongodb-user
The key always changes and if I stop processing transactions to mongo
it does sync.
> > mongodb-user...@googlegroups.com<mongodb-user%2Bunsubscribe@google groups.com>
> > .
> > > >> > For more options, visit this group athttp://
> > groups.google.com/group/mongodb-user?hl=en.
>
> > > > --
> > > > You received this message because you are subscribed to the Google
> > Groups "mongodb-user" group.
> > > > To post to this group, send email to mongod...@googlegroups.com.
> > > > To unsubscribe from this group, send email to
> > mongodb-user...@googlegroups.com<mongodb-user%2Bunsubscribe@google groups.com>
> > .
> > > > For more options, visit this group athttp://
> > groups.google.com/group/mongodb-user?hl=en.
>
> > --
> > You received this message because you are subscribed to the Google Groups
> > "mongodb-user" group.
> > To post to this group, send email to mongod...@googlegroups.com.
> > To unsubscribe from this group, send email to
> > mongodb-user...@googlegroups.com<mongodb-user%2Bunsubscribe@google groups.com>
> > .

Kristina Chodorow

ungelesen,
14.12.2010, 16:52:3214.12.10
an mongod...@googlegroups.com
Okay, that makes sense, then.  I'll fix it ASAP.


To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.

Chase

ungelesen,
15.12.2010, 15:19:1915.12.10
an mongodb-user
Will this be backported to 1.6 ? Seems like a large problem for the
replication.

Kristina Chodorow

ungelesen,
15.12.2010, 15:51:2415.12.10
an mongod...@googlegroups.com
Doubtful.  You could use 1.7 to sync and then switch back to 1.6 for normal operations. 


To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.

Chase

ungelesen,
16.12.2010, 20:44:4516.12.10
an mongodb-user
Will the same Problem prevent syncing if a repair is done on the slave
and it then restarts normally?

Kristina Chodorow

ungelesen,
17.12.2010, 15:51:0517.12.10
an mongod...@googlegroups.com
If it's not an initial sync, a duplicate key exception shouldn't halt replication.


To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.

Chase

ungelesen,
01.01.2011, 10:13:0001.01.11
an mongodb-user
Hi Kristina - We also noticed the exact same issue in replicated
capped collections. The difference was the master did an insert and
then updated the record to null out a field. This reported an error
on initial resync on the slave. The error said you cannot change the
size of a record in a capped collection.
Chase

Nat

ungelesen,
01.01.2011, 11:02:5301.01.11
an mongodb-user
If you can replicate it, can you list down what's in the oplog for
that particular key?
Allen antworten
Antwort an Autor
Weiterleiten
0 neue Nachrichten