Slave increasing lag ?

8 views
Skip to first unread message

Erwan A.

unread,
Nov 12, 2009, 5:30:53 AM11/12/09
to mongodb-user
Hello again,

Still seeking for a proper data migration plan, I tried a "snapshot
slave data + promotion" scenario like the following.

- shutdown the backup slave
- take an EBS snapshot of the volume
- restart slave
- create an EBS from the snapshot, attach it to the new host.
- start new server as a slave, synchronize with master
- once synchronized, make sure no write/update happens anymore
- stop all slaves
- restart synchronize server with --master (being able to do that
online, would be great)
- change slave's configuration to hit the new server
- enjoy the new, beefier master

That's when i realized that the slaves were not properly synchronizing
with the master anymore ... The frontend application kept on showing
old data, and no new operations seemed to be applied, while the slave
log file displayed an increasing lag between the actual date and the
syncedTo timestamp:

It started around 10:25 UTC, and was slowly but surely increasing ...

Mon Nov 9 10:25:09 repl: end sync_pullOpLog syncedTo: Mon Nov 9
10:25:08 2009 4af7ee04:a
Mon Nov 9 10:25:12 replMain: sleep 3 before next pass
Mon Nov 9 10:25:12 pull: ma...@mongodb.silentale.net
...
Mon Nov 9 14:04:08 repl: end sync_pullOpLog syncedTo: Mon Nov 9
13:00:15 2009 4af8125f:c
Mon Nov 9 14:04:44 pull: applied 428 operations
Mon Nov 9 14:04:44 repl: end sync_pullOpLog syncedTo: Mon Nov 9
13:00:23 2009 4af81267:18

Nov 10 00:00: slave is lagging by 7 hours, i find strange not to find
ANY replication related message for 1 hour after getting an "old
cursor isDead"

Mon Nov 9 23:48:51 pull: ma...@mongodb.silentale.net
Mon Nov 9 23:48:51 pull: old cursor isDead, initiating a new one
...
Tue Nov 10 00:37:25 pull: applied 97 operations
Tue Nov 10 00:37:25 repl: end sync_pullOpLog syncedTo: Mon Nov 9
17:23:32 2009 4af85014:4
Tue Nov 10 00:38:40 pull: applied 165 operations
...

Today this slave is lagging by more than 24 hours, and i made sure
that nothing else (concurrent read accesses, etc ...) gets in the way.
I've tried restarting the slave, restart the replication process, to
no avail ... The only active process on this host is mongod and it's
doing I/O (reads) at the maximum rate possible.

Another slave with the same dataset has been lagging too, but caught
up with the master around midnight today ... I can't figure out any
reason why this would happen in such great amounts.

Any suggestion would be appreciated, i'm post a JIRA ticked with more
details.

Thanks,

Erwan


Eliot Horowitz

unread,
Nov 12, 2009, 9:13:15 AM11/12/09
to mongod...@googlegroups.com
I'm going to move this conversation to the jira ticket as there is
more info there.
http://jira.mongodb.org/browse/SERVER-415
Reply all
Reply to author
Forward
0 new messages