I set up master-slave replication about 1 month ago and I was thinking
that it works good, but recently I've noticed that probably something is
broken.
I have "mdstore" db on master, but it has been missing on slave. So I
was trying to resync slave, but mongoDB shell tells me that everything
is alright:
> db.runCommand( {resync:1} )
{ "errmsg" : "not dead, no need to resync", "ok" : 0 }
Below you can see a result of db.printReplicationInfo() for my master
and db.printSlaveReplicationInfo() for my slave and also content of
their dbpath directory:
> db.printReplicationInfo()
configured oplog size: 1048.576MB
log length start to end: 47962secs (13.32hrs)
oplog first event time: Thu Nov 25 2010 23:19:51 GMT+0000 (UTC)
oplog last event time: Fri Nov 26 2010 12:39:13 GMT+0000 (UTC)
now: Mon Dec 27 2010 10:54:22 GMT+0000 (UTC)
-rw------- 1 mongodb nogroup 64M 2010-11-25 23:20 local.0
-rw------- 1 mongodb nogroup 128M 2010-11-25 23:19 local.1
-rw------- 1 mongodb nogroup 1.0G 2010-12-27 10:25 local.2
-rw------- 1 mongodb nogroup 16M 2010-11-26 12:39 local.ns
-rw------- 1 mongodb nogroup 64M 2010-12-20 13:02 mdstore.0
-rw------- 1 mongodb nogroup 128M 2010-11-26 16:17 mdstore.1
-rw------- 1 mongodb nogroup 16M 2010-12-20 13:02 mdstore.ns
-rwxr-xr-x 1 mongodb nogroup 5 2010-11-26 15:02 mongod.lock
-rw------- 1 mongodb nogroup 64M 2010-11-25 23:33 mydb.0
-rw------- 1 mongodb nogroup 128M 2010-11-30 14:58 mydb.1
-rw------- 1 mongodb nogroup 16M 2010-11-25 23:33 mydb.ns
drwxr-xr-x 2 mongodb nogroup 6 2010-12-20 13:02 _tmp
> db.printSlaveReplicationInfo()
source: masterhost
syncedTo: Fri Nov 26 2010 12:39:13 GMT+0000 (UTC)
= 2672021secs ago (742.23hrs)
-rw------- 1 mongodb nogroup 64M 2010-12-27 10:50 local.0
-rw------- 1 mongodb nogroup 128M 2010-11-26 23:24 local.1
-rw------- 1 mongodb nogroup 16M 2010-11-25 23:24 local.ns
-rwxr-xr-x 1 mongodb nogroup 0 2010-12-27 10:50 mongod.lock
-rw------- 1 mongodb nogroup 64M 2010-11-25 23:33 mydb.0
-rw------- 1 mongodb nogroup 128M 2010-12-20 10:26 mydb.1
-rw------- 1 mongodb nogroup 16M 2010-11-25 23:33 mydb.ns
I thought that slave can't resync, because its data is too stale, so I
stopped slave, removed all files from its dbpath directory and started
it again.
It seems that it helped a bit, because now I have "mdstore" db on slave.
Unfortunately db.printSlaveReplicationInfo() still tells me that my
slave is synced to Fri Nov 26 2010 12:39:13 GMT+0000 and
db.printReplicationInfo() still shows me info about master's oplog event
times in November.
I inserted some data into testing "mydb" db on master in order to check
if replication works, but I can't see them on slave:
> db.foo.save( { 'b' : '2' } )
magic == 0x41424344
>
> db.foo.find()
{ "_id" : ObjectId("4ceef235be088fcabdc27864"), "a" : 1 }
{ "_id" : ObjectId("4d187752bd265e1fe7d22d8b"), "b" : "2" }
> db.foo.find()
{ "_id" : ObjectId("4ceef235be088fcabdc27864"), "a" : 1 }
It's rather strange, because I can see a lot of following lines in the
slave's log file:
Mon Dec 27 12:52:27 [replslave] repl: from host:masterhost
Mon Dec 27 12:52:32 [replslave] repl: from host:masterhost
Mon Dec 27 12:52:36 [replslave] repl: from host:masterhost
Mon Dec 27 12:52:40 [replslave] repl: from host:masterhost
Mon Dec 27 12:52:44 [replslave] repl: from host:masterhost
What's wrong with my mongoDB? How can I fix it?
Here are the details about my system:
* kernel: Linux 2.6.32-5-xen-amd64
* distro: Ubuntu Lucid 10.04 LTS
* mongoDB: stable 20101123
My best regards,
Pawel
2010/12/27 Paweł Tęcza <P.T...@icm.edu.pl>:
> --
> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com.
> To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
>
>
Thanks a lot for your reply, Eliot!
This is simple master-slave replication with one master and only one
slave. Below is my configuration:
dbpath=/srv/mongodb
logpath=/var/log/mongodb/mongodb.log
logappend=true
master = true
oplogSize = 1000
dbpath=/srv/mongodb
logpath=/var/log/mongodb/mongodb.log
logappend=true
slave = true
source = masterhost
Unfortunately the slave's log in not rotated and it has about 32MB, so
it's not good idea to send it to the group... But I can send it to you,
if you agree, of course :)
Cheers,
Pawel
2010/12/27 Paweł Tęcza <P.T...@icm.edu.pl>:
OK, but I'm affraid the log is too big for uploading it there. Of course
I can try with Rapidshare or Mediafire :)
I wrote you that it has ~32MB. Now I can add that it contains ~559k
lines at all and ~548k lines of "[replslave] repl: from
host:masterhost"...
Do you really need entire log?
P.
The key is probably the other lines, but hard to tell without looking...
2010/12/27 Paweł Tęcza <P.T...@icm.edu.pl>:
> Do you really need entire log?
I took a deeper look at my log file and I've just discovered several
following entries there:
Sun Dec 5 12:32:10 MongoDB starting : pid=526 port=27017
dbpath=/srv/mongodb slave=1 64-bit
Sun Dec 5 12:32:11 db version v1.6.4, pdfile version 4.5
Sun Dec 5 12:32:11 git version:
4f5c02f8d92ff213b71b88f5eb643b7f62b50abc
Sun Dec 5 12:32:11 sys info: Linux domU-12-31-39-06-79-A1
2.6.21.7-2.ec2.v1.2.fc8xen #1 SMP Fri Nov 20 17:48:28 EST 2009
x86_64 BOOST_LIB_VERSION=1_41
**************
old lock file: /srv/mongodb/mongod.lock. probably means unclean
shutdown
recommend removing file and running --repair
see: http://dochub.mongodb.org/core/repair for more information
*************
Sun Dec 5 12:32:11 exception in initAndListen std::exception: old lock
file, terminating
Sun Dec 5 12:32:11 dbexit:
Sun Dec 5 12:32:11 shutdown: going to close listening sockets...
Sun Dec 5 12:32:11 shutdown: going to flush oplog...
Sun Dec 5 12:32:11 shutdown: going to close sockets...
Sun Dec 5 12:32:11 shutdown: waiting for fs preallocator...
Sun Dec 5 12:32:11 shutdown: closing all files...
Sun Dec 5 12:32:11 closeAllFiles() finished
Sun Dec 5 12:32:11 dbexit: really exiting now
Do you think it can be related to replication issues?
P.
2010/12/27 Paweł Tęcza <P.T...@icm.edu.pl>:
I've created a new issue on your Jira:
http://jira.mongodb.org/browse/SERVER-2290
The slave log is attached, of course.
P.
It seems that slave took the data from master when today I have stopped
the slave, removed its files and started it again. But the log entries
about repairing mongoDB are older...
P.
2010/12/27 Paweł Tęcza <P.T...@icm.edu.pl>:
No, I've never used replica pairs. I configured regular master/slave
replication using details from your web site:
http://www.mongodb.org/display/DOCS/Master+Slave
This is my configuration:
dbpath=/srv/mongodb
logpath=/var/log/mongodb/mongodb.log
logappend=true
master = true
oplogSize = 1000
dbpath=/srv/mongodb
logpath=/var/log/mongodb/mongodb.log
logappend=true
slave = true
source = masterhost
I haven't been using another settings for my mongoDB, probably except of
dbpath. But I believe it doesn't matter, because I'm sure I stopped
mongoDB before, next moved old data directory to new place and finally
started mongoDB one more time with fixed dbpath.
BTW, do you recommend me to upgrade to replica sets? Now I have very
simple scenario: only one master and only one slave. Of course,
automatic failover and recovery are great features for me, but I'm not
sure whether I need more nodes, for example for an arbiter.
P.
Sure! I copied and pasted the output at the Jira:
http://jira.mongodb.org/browse/SERVER-2290
P.
Hi Nat,
Thank you for your response! Probably you're right, because I could see
the following messages in the master's log file:
Tue Dec 28 11:43:35 [replmaster] caught exception in replMasterThread()
Tue Dec 28 11:43:45 [replmaster] Assertion failure magic == 0x41424344
db/pdfile.h 241
0x534a61 0x54163f 0x6e56db 0x6e59e1 0x6e9c2a 0x6dfbc5 0x6dfc0a 0x72247f
0x6bb591 0x6babcf 0x650932 0x837550 0x7f1b50e7b9ca 0x7f1b5042a70d
/usr/bin/mongod(_ZN5mongo12sayDbContextEPKc+0xb1) [0x534a61]
/usr/bin/mongod(_ZN5mongo8assertedEPKcS1_j+0x10f) [0x54163f]
/usr/bin/mongod(_ZNK5mongo16NamespaceDetails11inCapExtentERKNS_7DiskLocE
+0xfb) [0x6e56db]
/usr/bin/mongod(_ZN5mongo16NamespaceDetails10__capAllocEi+0x101)
[0x6e59e1]
/usr/bin/mongod(_ZN5mongo16NamespaceDetails11cappedAllocEPKci+0x21a)
[0x6e9c2a]
/usr/bin/mongod(_ZN5mongo16NamespaceDetails6_allocEPKci+0x25) [0x6dfbc5]
/usr/bin/mongod(_ZN5mongo16NamespaceDetails5allocEPKciRNS_7DiskLocE
+0x3a) [0x6dfc0a]
/usr/bin/mongod(_ZN5mongo11DataFileMgr17fast_oplog_insertEPNS_16NamespaceDetailsEPKci+0x17f) [0x72247f]
/usr/bin/mongod() [0x6bb591]
/usr/bin/mongod(_ZN5mongo12logKeepaliveEv+0x4f) [0x6babcf]
/usr/bin/mongod() [0x650932]
/usr/bin/mongod(thread_proxy+0x80) [0x837550]
/lib/libpthread.so.0(+0x69ca) [0x7f1b50e7b9ca]
/lib/libc.so.6(clone+0x6d) [0x7f1b5042a70d]
Tue Dec 28 11:43:45 [replmaster] caught exception in replMasterThread()
I stopped the master (`/etc/init.d/mongodb stop`), purged its local.*
files and started again (`/usr/bin/mongod --config /etc/mongodb.conf`).
I had some troubles with it, because here the init script kills mongoDB
daemon, but its lock file still exists... Probably I should create a new
thread for reporting that issue.
Now it seems that master-slave replication works fine for me :D
> db.printReplicationInfo()
configured oplog size: 1048.576MB
log length start to end: 1314secs (0.37hrs)
oplog first event time: Tue Dec 28 2010 11:49:19 GMT+0000 (UTC)
oplog last event time: Tue Dec 28 2010 12:11:13 GMT+0000 (UTC)
now: Tue Dec 28 2010 12:11:15 GMT+0000 (UTC)
> db.printSlaveReplicationInfo()
source: masterhost
syncedTo: Tue Dec 28 2010 12:11:03 GMT+0000 (UTC)
= 13secs ago (0hrs)
Thanks a lot for your help!
My best regards,
Pawel
> I stopped the master (`/etc/init.d/mongodb stop`), purged its local.*
> files and started again (`/usr/bin/mongod --config /etc/mongodb.conf`).
> I had some troubles with it, because here the init script kills mongoDB
> daemon, but its lock file still exists... Probably I should create a new
> thread for reporting that issue.
>
> Now it seems that master-slave replication works fine for me :D
I forgot to add I had to resync my slave:
> use admin
switched to db admin
> db.runCommand({resync: 1})
{ "info" : "triggered resync for all sources", "ok" : 1 }
I think it can be useful tip for another MongoDB users :)
P.
2010/12/28 Paweł Tęcza <P.T...@icm.edu.pl>: