Halted replication on 1.8.0-rc2

Benoît Larroque

unread,

Mar 15, 2011, 8:15:02 PM3/15/11

to mongodb-user

Hello everyone,

I got a 1.8.0rc2 secondary in a replica set with a 1.7.5-pre primary.

Everything seemed to go well but this morning I discovered that the
secondary doesn't seem to be syncing anymore...

On the replset status page, I had :

primary : optime : 4d7f3613:1
secondary : optime : 4d7ded8e:6
messages : "syncThread: 13629 can't have undefined in
a query expression"

There are no delayed replication configured

operation 4d7ded8e:6 is : { create:
"tmp.mr.downloads_tmp.96b7706e-4e25-11e0-a302-001cc0a62c0b_2" }

this messages seem to be throwed by db/matcher.cpp on line ~ 310. it
was added in commit
https://github.com/mongodb/mongo/commit/c31c7f322943034bd294993e832f51383799ee5f

It may not be a bug since my primary is an unstable version (certainly
from before this commit ). Still, in my opinion, the replica set
status is not explicit at all and should be improved. It should at the
very least clearly signal that one of it's member is in a degraded
state...

Restarting the secondary seems to have solved the block but it seem to
keep on getting stuck again (at another optime but with the same
message...) .

Best regards,

Benoit Larroque

Kristina Chodorow

unread,

Mar 16, 2011, 10:40:44 AM3/16/11

to mongod...@googlegroups.com, Benoît Larroque

Sounds like a bug. What are the options you're running MapReduce with?

Can you send a dump of the "create" entries? e.g., on the stuck secondary run:

$ mongodump -d local -c oplog.rs -q '{"o" : "c"}'

...and attach the output? (Or the whole oplog, if you're comfortable with that... you can also upload it to https://jira.mongodb.org/browse/SUPPORT and only the 10gen devs will be able to see it.)

Do you know if it's getting stuck on every MapReduce you're running?

2011/3/15 Benoît Larroque <benoit....@feedbooks.com>

--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.

Benoît Larroque

unread,

Mar 16, 2011, 11:43:22 AM3/16/11

to Kristina Chodorow, mongod...@googlegroups.com

Hi Kristina,

Your query didn't fetch any records

I altered it to : $ mongoexport -d local -c oplog.rs -q '{"o.create" :
{$exists:true}}'
connected to: 127.0.0.1
{ "ts" : { "t" : 1300230192000 , "i" : 4 }, "h" : 7774912460141041017,
"op" : "c", "ns" : "FEEDBOOKS_production_downloads.$cmd", "o" : {
"create" : "tmp.mr.downloads_tmp.1742de52-4f58-11e0-898a-001cc0a62c0b_101"
} }
{ "ts" : { "t" : 1300230215000 , "i" : 1140 }, "h" :
8817460142386671296, "op" : "c", "ns" :
"FEEDBOOKS_production_downloads.$cmd", "o" : { "create" :
"tmp.mr.downloads_tmp.749bb88a-4f58-11e0-a923-001cc0a62c0b_102" } }
exported 2 records

Oplog is ending 2 operations after the last create :
{ "ts" : { "t" : 1300230215000 , "i" : 1140 }, "h" :
8817460142386671296, "op" : "c", "ns" :
"FEEDBOOKS_production_downloads.$cmd", "o" : { "create" :
"tmp.mr.downloads_tmp.749bb88a-4f58-11e0-a923-001cc0a62c0b_102" } }
{ "ts" : { "t" : 1300230215000 , "i" : 1141 }, "h" :
-6596835512270406651, "op" : "i", "ns" :
"FEEDBOOKS_production_downloads.tmp.mr.downloads_tmp.1742de52-4f58-11e0-898a-001cc0a62c0b_101",
"o" : { "_id" : "15087-userbook-format-epub", "value" : { "count" : 3,
"type" : "format", "name" : "epub", "itype" : "userbook", "iid" :
15087 } } }
{ "ts" : { "t" : 1300230215000 , "i" : 1142 }, "h" :
-4842537260107437227, "op" : "i", "ns" :
"FEEDBOOKS_production_downloads.tmp.mr.downloads_tmp.1742de52-4f58-11e0-898a-001cc0a62c0b_101",
"o" : { "_id" : "15088-userbook-client-aldiko", "value" : { "count" :
1, "type" : "client", "name" : "aldiko", "itype" : "userbook", "iid" :
15088 } } }

I'm uploaded the whole oplog on jira (I'm not very comfortable with
having it going public...) : ref # : SUPPORT-102

About the MapReduce. I don't know if they are all getting stuck... We
had an incident this weekend and had to pass the unstable secondary
(1.7) as master while updating the old primary from 1.6.X to 1.8-rc2.
Since "out" parameter changed between the two versions those were not
working anymore... I updated our code base afterwards to get the
MapReduce to work again and here we are.

I call the MapReduce from ruby with this kind of calls:
map_reduce(map,reduce,{'query' => {'date' => {'$gt' => bd.to_i,'$lt'
=> ed.to_i }},'out'=> collname})
where :
- collname is always is an created from an uuid (like
tmp.1742de52-4f58-11e0-898a-001cc0a62c0b)
- bd.to_i and ed.to_i can't be nil

Regards

2011/3/16 Kristina Chodorow <kris...@10gen.com>:

Kristina Chodorow

unread,

Mar 16, 2011, 1:24:15 PM3/16/11

to mongod...@googlegroups.com, Benoît Larroque

Whoops, thanks for fixing it. I'll take a look.

2011/3/16 Benoît Larroque <benoit....@feedbooks.com>

Reply all

Reply to author

Forward