Bug in interrupted initial sync

16 views
Skip to first unread message

Damon P. Cortesi

unread,
Sep 5, 2010, 1:59:16 PM9/5/10
to mongodb-user
I think I've come across a bug in the initial sync code that I was
hoping to verify and find a way around so I don't have to do a full
resync.

I started an initial sync on a new slave on Sept 2. Unfortunately,
performance on the master was affected so much I had to stop the slave
temporarily. I started it back up on Sept 4, but the slave replication
info still had the syncedTo date as Sept 2, even though it restarted
from scratch.

So the slave finished syncing, but of course the oplog had passed the
Sept 2 date, but not the Sept 4 date. So the slave /is/ actually
synced to Sept 4, but it halted replication because the stored
syncedTo date is incorrect. My assumption is that the syncedTo date
should have been updated when the slave restarted the resync by
itself.

But now that I'm in this state, can I manually set the syncedTo date
and have the slave start replication at the proper place? I imagine I
could try to find the last object replicated, but I'm not sure how I
would go about finding that and the appropriate timestamp in the
oplog.

Any suggestions would be welcome - I'd prefer not to have to do a full
resync as it takes about 36 hours.

Thanks,

Damon

Eliot Horowitz

unread,
Sep 5, 2010, 10:54:35 PM9/5/10
to mongod...@googlegroups.com
It probably did re-start from where it left of, so there might be data from as far back as sep. 2.
Are you sure it really restarted from scratch?


--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.


Damon P. Cortesi

unread,
Sep 6, 2010, 12:42:44 AM9/6/10
to mongodb-user
All collections appeared to be dropped and re-synced. Also, when
trying to find the oldest piece of data that had replicated, it
appeared to be from the point at which I restarted mongod (Sept 4).

I updated the syncedTo date to be approximately the time when I
restarted the sync - here's hoping I don't shoot myself in the foot...

Damon

On Sep 5, 7:54 pm, Eliot Horowitz <eliothorow...@gmail.com> wrote:
> It probably did re-start from where it left of, so there might be data from
> as far back as sep. 2.
> Are you sure it really restarted from scratch?
>
> > mongodb-user...@googlegroups.com<mongodb-user%2Bunsubscribe@google groups.com>
> > .

Eliot Horowitz

unread,
Sep 6, 2010, 8:50:18 AM9/6/10
to mongod...@googlegroups.com
I would recommend running count() on all the collections to make sure they're consistent.

> To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.

Damon P. Cortesi

unread,
Sep 6, 2010, 8:15:35 PM9/6/10
to mongodb-user
Looks like everything came over just fine, thankfully.

The one tricky part was updating the syncedTo timestamp - not sure how
that's represented, but I ended up having to use the perl driver to
actually update it properly.

On Sep 6, 5:50 am, Eliot Horowitz <eliothorow...@gmail.com> wrote:
> I would recommend running count() on all the collections to make sure they're consistent.
>

Damon P. Cortesi

unread,
Sep 6, 2010, 8:17:53 PM9/6/10
to mongodb-user
I forgot to mention, should I open a JIRA on this? It seems like when
the initial sync was interrupted, then started again two days later,
it should have updated te syncedTo value. As far as I can tell, it did
drop the collections and start resyncing from scratch. I know it's an
edge case, though...

On Sep 6, 5:50 am, Eliot Horowitz <eliothorow...@gmail.com> wrote:
> I would recommend running count() on all the collections to make sure they're consistent.
>

Eliot Horowitz

unread,
Sep 6, 2010, 8:31:18 PM9/6/10
to mongod...@googlegroups.com
Yeah - go ahead and open a jira.
Ideally with the entire log file.
Reply all
Reply to author
Forward
0 new messages