oplog problems with changed time

Erez Zarum

unread,

Sep 1, 2011, 6:10:55 AM9/1/11

to mongodb-user

I have shutdown a server for maintenance and replaced the motherboard, this came up with a different time offset which the ntp did not handle well.
at first it started as a secondary to catch up with the master then a few hours later i received this: Thu Sep 1 06:15:18 [conn22215] Assertion: 13312:replSet error : logOp() but not primary?

and then an exceptions:
Thu Sep 1 06:15:19 [conn22215] update storage.object query: { _id: 2397448961 } exception 13312 replSet error : logOp() but not primary? 127ms
Thu Sep 1 06:15:19 [conn22531] Assertion: 13312:replSet error : logOp() but not primary?
0x55f5aa 0x713191 0x711052 0x66a467 0x66cd30 0x7588bf 0x75b251 0x8a8fce 0x8bb630 0x3022c0673d 0x30220d44bd
Thu Sep 1 06:15:19 [initandlisten] connection accepted from xx.xx.xx.237:53888 #23793
Thu Sep 1 06:15:19 [conn23790] end connection xx.xx.xx.232:60094
/usr/bin/mongod(_ZN5mongo11msgassertedEiPKc+0x12a) [0x55f5aa]
/usr/bin/mongod [0x713191]
/usr/bin/mongod(_ZN5mongo5logOpEPKcS1_RKNS_7BSONObjEPS2_Pb+0x42) [0x711052]
/usr/bin/mongod(_ZN5mongo14_updateObjectsEbPKcRKNS_7BSONObjES2_bbbRNS_7OpDebugEPNS_11RemoveSaverE+0x1327) [0x66a467]
/usr/bin/mongod(_ZN5mongo13updateObjectsEPKcRKNS_7BSONObjES2_bbbRNS_7OpDebugE+0x130) [0x66cd30]
/usr/bin/mongod(_ZN5mongo14receivedUpdateERNS_7MessageERNS_5CurOpE+0x47f) [0x7588bf]
/usr/bin/mongod(_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_8SockAddrE+0x1941) [0x75b251]
/usr/bin/mongod(_ZN5mongo10connThreadEPNS_13MessagingPortE+0x21e) [0x8a8fce]
/usr/bin/mongod(thread_proxy+0x80) [0x8bb630]
/lib64/libpthread.so.0 [0x3022c0673d]
/lib64/libc.so.6(clone+0x6d) [0x30220d44bd]

then the primary went into recovery mode, and could not rollback because of time difference.
I have changed the server time but this does not help.

shard0001:PRIMARY> db.printReplicationInfo()
configured oplog size:   48958.344531250004MB
log length start to end: 463340secs (128.71hrs)
oplog first event time: Sat Aug 27 2011 08:23:01 GMT+0000 (GMT)
oplog last event time:   Thu Sep 01 2011 17:05:21 GMT+0000 (GMT)
now:                     Thu Sep 01 2011 10:04:51 GMT+0000 (GMT)

shard0001:ROLLBACK> db.printReplicationInfo()
configured oplog size:   44682.4892578125MB
log length start to end: 384524secs (106.81hrs)
oplog first event time: Sat Aug 27 2011 19:22:49 GMT+0000 (GMT)
oplog last event time:   Thu Sep 01 2011 06:11:33 GMT+0000 (GMT)
now:                     Thu Sep 01 2011 10:07:26 GMT+0000 (GMT)

currently what i can only do is to delete the data on the problematic node, but i understand that there's a need to delete the oplog (local db) on the primary as well as it's causing problems

What really bothers me is the way mongodb handle with those time differences, so in case a server time go out of sync it may cause a lot of problems.

is there any other way to fix it?

Kristina Chodorow

unread,

Sep 1, 2011, 11:07:57 AM9/1/11

to mongodb-user

The steps you need to take are:

1. Fix the oplog on the primary
2. Resync the secondary

To fix the oplog on the primary (call the server P), make sure the
clock is set correctly. Shut down P and start it up on a different
port without the --replSet option (so it's a stand-alone server).
Drop and recreate the oplog with:

> use local
> db.oplog.rs.drop()
> db.createCollection("oplog.rs", {size : newOplogSizeInMB, capped : true})

Replace "newOplogSizeInMB" with the oplog size.

Now shut down P and the secondary. Delete all of the data files on
the secondary.

Start both servers back up (make sure P is started on the correct port
with --replSet). The secondary will resync from P.

This bug (oplog not handling time differences) is fixed for 2.0.

Erez Zarum

unread,

Sep 1, 2011, 11:16:58 AM9/1/11

to mongod...@googlegroups.com

I did something else
i stopped both of them and the arbiter, so to create a new replicaset.
i deleted local.* on the primary and deleted all the data on the secondary
I allocated local files on both of them
then started the primary, used rs.initiate()
then added the second replicaset using rs.add
then i accidentaly added the arbiter using rs.add so i removed it and used rs.addArb again (see my second question in the mailing list).

Is it ok? if not, can i do something else?

--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.

Kristina Chodorow

unread,

Sep 1, 2011, 11:23:14 AM9/1/11

to mongodb-user

That's fine.

On Sep 1, 11:16 am, Erez Zarum <e...@icinga.org.il> wrote:
> I did something else
> i stopped both of them and the arbiter, so to create a new replicaset.
> i deleted local.* on the primary and deleted all the data on the secondary
> I allocated local files on both of them
> then started the primary, used rs.initiate()
> then added the second replicaset using rs.add
> then i accidentaly added the arbiter using rs.add so i removed it and used
> rs.addArb again (see my second question in the mailing list).
>
> Is it ok? if not, can i do something else?
>

Azat Khuzhin

unread,

May 20, 2012, 6:39:57 PM5/20/12

to mongod...@googlegroups.com

Hi all, I have the same issue, but not changed time - primary goes down ( Got signal: 6 (Aborted), I'v post bug report about this ) and after I restart it it have status "ROLLBACK"

Is there way without delete all data and re initiate replication?

And also how this is possible:

fb-backend:ROLLBACK> db.printReplicationInfo()
configured oplog size: 33566.70703125MB

....

fb-backend:ROLLBACK> db.getCollection("oplog.rs").stats().size

33586278860

fb-backend:ROLLBACK> db.getCollection("oplog.rs").stats().max

2147483647

size > max (on the primary node analogically size > max)?

> > mongodb-user+unsubscribe@googlegroups.com.

Scott Hernandez

unread,

May 20, 2012, 6:49:17 PM5/20/12

to mongod...@googlegroups.com

On Sun, May 20, 2012 at 6:39 PM, Azat Khuzhin <a3at...@gmail.com> wrote:
> Hi all, I have the same issue, but not changed time - primary goes down (
> Got signal: 6 (Aborted), I'v post bug report about this ) and after I
> restart it it have status "ROLLBACK"
>
> Is there way without delete all data and re initiate replication?

Not really, and not easily.

The docs might help you understand better:
http://docs.mongodb.org/manual/core/replication/#replica-set-rollbacks
http://www.mongodb.org/display/DOCS/Replica+Sets+-+Rollbacks

> And also how this is possible:
>
> fb-backend:ROLLBACK> db.printReplicationInfo()
> configured oplog size: 33566.70703125MB
> ....
> fb-backend:ROLLBACK> db.getCollection("oplog.rs").stats().size
> 33586278860
>
> fb-backend:ROLLBACK> db.getCollection("oplog.rs").stats().max
> 2147483647
>
> size > max (on the primary node analogically size > max)?

They are different things, size is in bytes (allocated) and max is a
(maximum) count for the collection.

>> > > mongodb-user...@googlegroups.com.

>> > > For more options, visit this group at
>> > >http://groups.google.com/group/mongodb-user?hl=en.
>

> --
> You received this message because you are subscribed to the Google
> Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com

> To unsubscribe from this group, send email to

> mongodb-user...@googlegroups.com
> See also the IRC channel -- freenode.net#mongodb

Azat Khuzhin

unread,

May 21, 2012, 2:46:41 AM5/21/12

to mongod...@googlegroups.com

I'v read that links thanks, but can't understand how can I rollback without deleting all data on ROLLBACK node

--
Azat Khuzhin

Azat Khuzhin

unread,

May 21, 2012, 3:01:46 AM5/21/12

to mongod...@googlegroups.com

And link http://docs.mongodb.org/manual/core/replication/#replica-set-rollbacks not for mongo >= 2.1, because it not write rollback data to "/mongo/db/path/rollback"

Instead it write rollback data to "local.someCollection"

--
Azat Khuzhin

Azat Khuzhin

unread,

May 21, 2012, 3:13:11 AM5/21/12

to mongod...@googlegroups.com

In log I have such message

Mon May 21 03:03:46 [rsSync] replSet syncThread: 13410 replSet too much data to roll back

If I just recompile mongo with drop limit of 300MiB, is this can fix?

But, I don't want do this, if there is way to fix such situation manually without changing mongo sources, could you suggest me such?

Thanks.

--
Azat Khuzhin

Scott Hernandez

unread,

May 21, 2012, 8:12:07 AM5/21/12

to mongod...@googlegroups.com

The best way to keep this from happening is to make sure when you
write that you wait for another (or really a majority) of replicas to
acknowledge the write so there is no chance that rollback will ever
happen.

Changing the limit is generally not the correct solution ever.

Azat Khuzhin

unread,

May 21, 2012, 8:27:33 AM5/21/12

to mongod...@googlegroups.com

Thanks, but I couldn't use safe write.

Some queries take long time to run (UPDATES) but there is not so many UPDATES - so it not regress mongodb performance

Scott Hernandez

unread,

May 21, 2012, 8:45:17 AM5/21/12

to mongod...@googlegroups.com

It sounds like you have a general issue with keeping up with writes
across your replica set. Are you monitoring this replica set in MMS,
if so what is the group name?

If you aren't, can you run mongostat --discover on the primary and
collect stats for a few minutes at the same time as "iostat -xm 2"?

While changing the rollback limit is easy to do, aside from compiling
your own binaries, this is really not the cause of the issue but a
symptom; I'd like to better understand the current behavior before
suggesting a solution.

Azat Khuzhin

unread,

May 21, 2012, 8:57:37 AM5/21/12

to mongod...@googlegroups.com

I can't monitor it via MMS

In fact there is about 20 000 UPDATE per day (can be slow - 5-10 secs), and INSERT about 3 000 000 (fast), and many FIND (fast)

But if I run every update safe - than it will stop current client, which is bad choice for me

mongostat

http://pastebin.com/raw.php?i=Rr11DCw4 (raw)

http://pastebin.com/Rr11DCw4

iostat

http://pastebin.com/raw.php?i=TffxqnXc (raw)

http://pastebin.com/TffxqnXc

Azat Khuzhin

unread,

May 21, 2012, 9:01:55 AM5/21/12

to mongod...@googlegroups.com

Is that enough or you really need stats for a few minutes?

--
Azat Khuzhin

Kristina Chodorow

unread,

May 23, 2012, 5:08:23 PM5/23/12

to mongodb-user

A few minutes would be more helpful, but the servers look like they're
keeping up okay, but only during that few seconds of time.

If you get a rollback, look at the logs and make sure it is proceeding
(if rollback fails, it will retry, fail, retry, fail, and so on
forever). Assuming it isn't trying to rollback over and over, let it
finish rolling back and it'll catch up with the other members. If it
is stuck, you'll have to wipe the data and reinitialize.

On May 21, 9:01 am, Azat Khuzhin <a3at.m...@gmail.com> wrote:
> Is that enough or you really need stats for a few minutes?
>
>
>
>
>
>
>
> On Mon, May 21, 2012 at 4:57 PM, Azat Khuzhin <a3at.m...@gmail.com> wrote:
> > I can't monitor it via MMS
>
> > In fact there is about 20 000 UPDATE per day (can be slow - 5-10 secs),
> > and INSERT about 3 000 000 (fast), and many FIND (fast)
> > But if I run every update safe - than it will stop current client, which
> > is bad choice for me
>
> > mongostat
> >http://pastebin.com/raw.php?i=Rr11DCw4(raw)
> >http://pastebin.com/Rr11DCw4
>
> > iostat
> >http://pastebin.com/raw.php?i=TffxqnXc(raw)
> >http://pastebin.com/TffxqnXc
>

> > On Mon, May 21, 2012 at 4:45 PM, Scott Hernandez <scotthernan...@gmail.com

> > > wrote:
>
> >> It sounds like you have a general issue with keeping up with writes
> >> across your replica set. Are you monitoring this replica set in MMS,
> >> if so what is the group name?
>
> >> If you aren't, can you run mongostat --discover on the primary and
> >> collect stats for a few minutes at the same time as "iostat -xm 2"?
>
> >> While changing the rollback limit is easy to do, aside from compiling
> >> your own binaries, this is really not the cause of the issue but a
> >> symptom; I'd like to better understand the current behavior before
> >> suggesting a solution.
>

> >> On Mon, May 21, 2012 at 8:27 AM, Azat Khuzhin <a3at.m...@gmail.com>

> >> wrote:
> >> > Thanks, but I couldn't use safe write.
> >> > Some queries take long time to run (UPDATES) but there is not so many
> >> > UPDATES - so it not regress mongodb performance
>
> >> > On Mon, May 21, 2012 at 4:12 PM, Scott Hernandez <

> >> scotthernan...@gmail.com>

> >> > wrote:
>
> >> >> The best way to keep this from happening is to make sure when you
> >> >> write that you wait for another (or really a majority) of replicas to
> >> >> acknowledge the write so there is no chance that rollback will ever
> >> >> happen.
>
> >> >> Changing the limit is generally not the correct solution ever.
>

> >> >> On Mon, May 21, 2012 at 3:13 AM, Azat Khuzhin <a3at.m...@gmail.com>

> >> wrote:
> >> >> > In log I have such message
> >> >> > Mon May 21 03:03:46 [rsSync] replSet syncThread: 13410 replSet too
> >> much
> >> >> > data
> >> >> > to roll back
>
> >> >> > If I just recompile mongo with drop limit of 300MiB, is this can fix?
> >> >> > But, I don't want do this, if there is way to fix such situation
> >> >> > manually
> >> >> > without changing mongo sources, could you suggest me such?
>
> >> >> > Thanks.
>

> >> >> > On Mon, May 21, 2012 at 11:01 AM, Azat Khuzhin <a3at.m...@gmail.com>

> >> >> > wrote:
>
> >> >> >> And
>
> >> >> >> link
> >>http://docs.mongodb.org/manual/core/replication/#replica-set-rollbacks
> >> not
> >> >> >> for mongo >= 2.1, because it not write rollback data to
> >> >> >> "/mongo/db/path/rollback"
> >> >> >> Instead it write rollback data to "local.someCollection"
>

> >> >> >> On Mon, May 21, 2012 at 10:46 AM, Azat Khuzhin <a3at.m...@gmail.com

>
> >> >> >> wrote:
>
> >> >> >>> I'v read that links thanks, but can't understand how can I rollback
> >> >> >>> without deleting all data on ROLLBACK node
>
> >> >> >>> On Mon, May 21, 2012 at 2:49 AM, Scott Hernandez
> >> >> >>> <scotthernan...@gmail.com> wrote:
>
> >> >> >>>> On Sun, May 20, 2012 at 6:39 PM, Azat Khuzhin <

> >> a3at.m...@gmail.com>

> ...
>
> read more »

Azat Khuzhin

unread,

May 23, 2012, 5:35:20 PM5/23/12

to mongod...@googlegroups.com

It can't rollback data because, data set more than 300mb (as I already write), I saw that it try rollback again & again ( but can't imagine situation in which this is useful )

I don't want to reinitialize data again ( all database ~300gb ), so I just recompile mongodb without this restriction (300mb), and successfully rollback data. (I understand that this is a hack)

If you could explain for what this limit (300mb) it will be great (could be configurated in config maybe?), and also why it try to rollback again and again after fail with limit.

As Scott written, yes it is just a symptom, maybe because mongodb not shutdown properly, because of "nofile" limit

At shutdown it wait write log, closing connections

Here piece of mongodb log

http://pastebin.com/raw.php?i=xNM6W8fx

http://pastebin.com/xNM6W8fx

Thanks

--
You received this message because you are subscribed to the Google
Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com
To unsubscribe from this group, send email to
mongodb-user...@googlegroups.com
See also the IRC channel -- freenode.net#mongodb

--
Azat Khuzhin

Kristina Chodorow

unread,

May 24, 2012, 6:47:20 AM5/24/12

to mongodb-user

Suppose you had a 3-node set, X, Y, Z, where X is primary, Y is
secondary, and Z is an arbiter. Y is a week behind, about to go
stale. Then X goes down, so Y becomes primary for a few minutes and
does some writes. Then X comes back up. Should X rollback a week's
worth of data automatically? The 300mb should probably be a bit more
of an advanced measure, but it's there to prevent that type of
thing.

I agree that having it endlessly loop on failure is a bad idea. It
should just exit or do something smarter.

I don't see how it could be a symptom of how you shut it down, it's
caused by a lot of data not being written to the secondary. Which
means the secondary was behind.

On May 23, 5:35 pm, Azat Khuzhin <a3at.m...@gmail.com> wrote:
> It can't rollback data because, data set more than 300mb (as I already
> write), I saw that it try rollback again & again ( but can't imagine
> situation in which this is useful )
> I don't want to reinitialize data again ( all database ~300gb ), so I just
> recompile mongodb without this restriction (300mb), and successfully
> rollback data. (I understand that this is a hack)
>
> If you could explain for what this limit (300mb) it will be great (could be
> configurated in config maybe?), and also why it try to rollback again and
> again after fail with limit.
>
> As Scott written, yes it is just a symptom, maybe because mongodb not
> shutdown properly, because of "nofile" limit
> At shutdown it wait write log, closing connections
>

> Here piece of mongodb loghttp://pastebin.com/raw.php?i=xNM6W8fxhttp://pastebin.com/xNM6W8fx
>
> Thanks

> ...
>
> read more »

Azat Khuzhin

unread,

May 24, 2012, 12:29:46 PM5/24/12

to mongod...@googlegroups.com

Thanks for the explanation.

But I think that this 300mb limit must be calculating using such things as "syncdelay" mongod option, and sysctl "vm.dirty_expire_centisecs" - flushing parameters.

And also can be set using config directive

In my case his 300mb limit probably not enough, because I use "syncdelay=180" and "vm.dirty_expire_centisecs=12000"

> ...
>
> read more »

--
You received this message because you are subscribed to the Google
Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com
To unsubscribe from this group, send email to
mongodb-user...@googlegroups.com
See also the IRC channel -- freenode.net#mongodb

--
Azat Khuzhin

Kristina Chodorow

unread,

May 24, 2012, 4:04:01 PM5/24/12

to mongodb-user

Yeah, you might want to file a ticket at https://jira.mongodb.org/browse/SERVER
for that, I don't think anyone's asked for that before (but it would
be a good idea).

On May 24, 12:29 pm, Azat Khuzhin <a3at.m...@gmail.com> wrote:
> Thanks for the explanation.
>
> But I think that this 300mb limit must be calculating using such things as
> "syncdelay" mongod option, and sysctl "vm.dirty_expire_centisecs" -
> flushing parameters.
> And also can be set using config directive
>
> In my case his 300mb limit probably not enough, because I use
> "syncdelay=180" and "vm.dirty_expire_centisecs=12000"
>

> ...
>
> read more »

Azat Khuzhin

unread,

May 25, 2012, 4:06:16 AM5/25/12

to mongod...@googlegroups.com

Yeah, I create two issues

https://jira.mongodb.org/browse/SERVER-5930

https://jira.mongodb.org/browse/SERVER-5929

> ...
>
> read more »

--
You received this message because you are subscribed to the Google
Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com
To unsubscribe from this group, send email to
mongodb-user...@googlegroups.com
See also the IRC channel -- freenode.net#mongodb

--
Azat Khuzhin

Kristina Chodorow

unread,

May 25, 2012, 11:08:35 AM5/25/12

to mongodb-user

Cool. If you "Watch" them you'll get emails when we update the
issues.

On May 25, 4:06 am, Azat Khuzhin <a3at.m...@gmail.com> wrote:
> Yeah, I create two issueshttps://jira.mongodb.org/browse/SERVER-5930https://jira.mongodb.org/browse/SERVER-5929

> ...
>
> read more »

Reply all

Reply to author

Forward