syncThread: 11000 E11000 duplicate key error index: on Slave Nodes

113 views
Skip to first unread message

Wilton

unread,
Apr 29, 2011, 2:02:10 AM4/29/11
to mongodb-user
Hi All,

I am writing because I am getting a number of strange exceptions that
I don't feel like I should be getting. I am using Mongo 1.8.x and
have a replica set with three-four nodes (depending on what I'm doing
at the time).

A few days ago I started getting the following messages in the logs of
the slaves.

syncThread: 11000 E11000 duplicate key error index: robots.market.
$market_1_tic_1_close_date_-1 dup key: { : "NASD", : "RADS", : new
Date(1302825600000) }

This happens repeatedly and it's always the same record. All my
replicas report the error. In addition, the index (market, tic,
close_date) is a unique index.

To fix this, I brought down each of the slaves individually, deleted
all the data files, and brought them back up to do a full resync --
unfortunately the problem is back.

Any advice?
Wilton

Scott Hernandez

unread,
Apr 29, 2011, 2:06:48 AM4/29/11
to mongod...@googlegroups.com
If you query the slave does this document exist?

Has the master had any unclean shutdowns? Are you using 1.8.0 and journalling, or were you ever?


--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.


Wilton

unread,
Apr 29, 2011, 2:12:26 AM4/29/11
to mongodb-user

Regarding unclean shutdowns -- I don't *think* it has -- I always use
the redhat "service mongod start/stop" commands to bring these up and
down.

I am using 1.8.0, although I'm probably not using journalling, unless
it is turned on by default.

The document does not appear to exist in the slave.

I'm also getting this message in the slaves now:

syncThread: 1000 replSet source for syncing doesn't seem to be await
capable -- is it an older version of mongodb?

Thanks for the quick response!

Scott Hernandez

unread,
Apr 29, 2011, 2:30:07 AM4/29/11
to mongod...@googlegroups.com
Can you connect to the primary and replicas (just one or two) with the javascript shell and run the following on each:

db.serverStatus()
db.printReplicationInfo()
rs.status()

Please post that to gist/pastie/pastebin or some other source that doesn't line-wrap.

Wilton

unread,
Apr 29, 2011, 2:36:49 AM4/29/11
to mongodb-user

In response to your last note, I took down one of the slaves, added '--
journal' to the startup options, and restarted it. Now *all* of my
slaves are suddenly reporting the same message:

[replica set sync] replSet error RS102 too stale to catch up, at least
from hal1.xxx.com
Fri Apr 29 06:34:02 [replica set sync] replSet our last optime : Apr
26 23:03:36 4db74f48:50e
Fri Apr 29 06:34:02 [replica set sync] replSet oldest at
hal1.xxx.com : Apr 26 23:03:41 4db74f4d:1301
Fri Apr 29 06:34:02 [replica set sync] replSet See
http://www.mongodb.org/display/DOCS/Resyncing+a+Very+Stale+Replica+Set+Member
Fri Apr 29 06:34:02 [replica set sync] replSet error RS102 too stale
to catch up

It looks like I'm going to have to resync these all tonight -- it will
take a few hours. I'll send an update when I'm done.

W.

Scott Hernandez

unread,
Apr 29, 2011, 2:41:06 AM4/29/11
to mongod...@googlegroups.com
The error is because you ran out of oplog on primary; it rolled over and past the last sync'd point on all the replicas. They could no longer use the oplog to catch up.

That is unrelated to journalling.

I would suggest upgrading to 1.8.1 while you are rebuilding things.

Wilton

unread,
Apr 29, 2011, 2:48:51 AM4/29/11
to mongodb-user

Ok. Good to know. Seems strange that it's never happened before (in
the three months I've been using mongo) and all of a sudden it just
happened.

You think I should increase the oplog size?


On Apr 28, 11:41 pm, Scott Hernandez <scotthernan...@gmail.com> wrote:
> The error is because you ran out of oplog on primary; it rolled over and
> past the last sync'd point on all the replicas. They could no longer use the
> oplog to catch up.
>
> That is unrelated to journalling.
>
> I would suggest upgrading to 1.8.1 while you are rebuilding things.
>
>
>
>
>
>
>
> On Thu, Apr 28, 2011 at 11:36 PM, Wilton <risenhoo...@gmail.com> wrote:
>
> > In response to your last note, I took down one of the slaves, added '--
> > journal' to the startup options, and restarted it.  Now *all* of my
> > slaves are suddenly reporting the same message:
>
> > [replica set sync] replSet error RS102 too stale to catch up, at least
> > from hal1.xxx.com
> > Fri Apr 29 06:34:02 [replica set sync] replSet our last optime : Apr
> > 26 23:03:36 4db74f48:50e
> > Fri Apr 29 06:34:02 [replica set sync] replSet oldest at
> > hal1.xxx.com : Apr 26 23:03:41 4db74f4d:1301
> > Fri Apr 29 06:34:02 [replica set sync] replSet See
>
> >http://www.mongodb.org/display/DOCS/Resyncing+a+Very+Stale+Replica+Se...

Scott Hernandez

unread,
Apr 29, 2011, 11:33:00 PM4/29/11
to mongod...@googlegroups.com
On Thu, Apr 28, 2011 at 11:48 PM, Wilton <risen...@gmail.com> wrote:

Ok.  Good to know.  Seems strange that it's never happened before (in
the three months I've been using mongo) and all of a sudden it just
happened.

It would only be an issue when replication stops or backups up.
 
You think I should increase the oplog size?

Generally you want a oplog size of 2-3x the maximum period you think the replicas will need for maintenance or downtime. I'd suggest a window of 2-3 days if you can spare  the space.

Andrew Armstrong

unread,
Apr 29, 2011, 11:50:50 PM4/29/11
to mongodb-user
It would be neat I think to know whether the oplog appears to be too
small to cope with a high write demand; I wonder if mongo could write
log alerts or something to indicate that its coming close to falling
behind the master due to write activity etc?

- Andrew

On Apr 30, 1:33 pm, Scott Hernandez <scotthernan...@gmail.com> wrote:

Wilton

unread,
Apr 30, 2011, 12:54:06 AM4/30/11
to mongodb-user
I was able to get the slave synced last now -- they're all running
with --journal although the master is still running without. So far
there is no problem. I'l check back in if things change for the
worse.

Chase

unread,
May 1, 2011, 5:28:45 PM5/1/11
to mongodb-user
Did you run a repair on the master (via db.repairDatabase()). If you
did then this can cause the error you are seeing. We got this error
by running repairDatabase(). It happens because repairDatabase()
seems to lose some transactions (in our case some delete calls).
Reply all
Reply to author
Forward
0 new messages