Replica secondaries keep "stopping" after re-indexing system.js _id

41 views
Skip to first unread message

Paul Harvey

unread,
May 8, 2011, 5:51:47 AM5/8/11
to mongodb-user
We keep getting the dreaded "error RS102 too stale to catch up" - and
it's driving me nuts. Installation is backing a wiki site so it's
mostly queries, not updates. Well, it's happened four times now, one
of those times was during a bulk load of our data (20-60 docs/sec) -
that's the only time it's even remotely write-heavy.

mongod --version
db version v1.8.1, pdfile version 4.5
Sun May 8 19:30:59 git version:
a429cd4f535b2499cc4130b06ff7c26f41c00f04

4GB physical RAM on each of the three VMs in our replicaSet, two
running Ubuntu 9.04 and one Ubuntu 10.04 LTS, each are dedicated to
running MongoDB.

We don't have a challenging quantity of data by any stretch - for now,
barely a little over 100k documents across ~80 databases totalling
~7GB according to the show dbs GB numbers (we use --noprealloc and --
smallfiles).

The "working set" seems to fit inside ~3GB RAM (well, that's what the
RES figure climbs to according to top). We have service monitoring
tracking iowait figures, and I don't see any spikes that would
indicate disk thrashing.

This time before deleting the data off the secondaries prior to
syncing from scratch, I took note of the optime according to
rs.status(). At precisely that moment (to the second, 15:37:08), I
found these log entries:

Thu May 5 15:37:08 [replica set sync] building new index on { _id:
1 } for foswiki__TaxonProfile.Relationships.system.js
Thu May 5 15:37:08 [replica set sync] done for 0 records 0.007secs
Thu May 5 15:37:08 [replica set sync] replSet syncThread: 10156
cannot update system collection:
foswiki__TaxonProfile.Relationships.system.indexes q: { ns:
"foswiki__TaxonProfile.Relationships.current", key: { _topic: 1 },
name: "_topic", background: true, v: 0 } u: { ns:
"foswiki__TaxonProfile.Relationships.current", key: { _topic: 1 },
name: "_topic", background: true, v: 0 }
Thu May 5 15:37:15 [conn4712] end connection wiki.example.org:47840
Thu May 5 15:37:19 [replica set sync] replSet syncThread: 10156
cannot update system collection:
foswiki__TaxonProfile.Relationships.system.indexes q: { ns:
"foswiki__TaxonProfile.Relationships.current", key: { _topic: 1 },
name: "_topic", background: true, v: 0 } u: { ns:
"foswiki__TaxonProfile.Relationships.current", key: { _topic: 1 },
name: "_topic", background: true, v: 0 }

The "cannot update system collection" was in the errmsg of both
SECONDARYs according to rs.status(), but the state on both were still
SECONDARY. When I restarted the PRIMARY, the SECONDARYs changed to
RECOVERING with errmsg "error RS102 too stale to catch up".

Questions:
* Could this be something to do with our use of --noprealloc and --
smallfiles?
* Why did the SECONDARYs keep their status as SECONDARY - isn't
that misleading? According to their optime, they hadn't taken any
updates from the PRIMARY for almost 3 days! (yes, I need to fix our
monitoring)
* The SECONDARY's status changed to RECOVERING with errmsg "error
RS102 too stale to catch up" only after restarting the PRIMARY - why
is that? Shouldn't the secondaries have detected their stale condition
without me having to kick it?
* Any ideas on how I can further debug the stoppage in the first
place? Do I need to do a custom build of mongod with some debugging
options enabled?

Cheers

- Paul Harvey

Scott Hernandez

unread,
May 8, 2011, 9:43:20 AM5/8/11
to mongod...@googlegroups.com
On Sun, May 8, 2011 at 2:51 AM, Paul Harvey <Paul.W...@csiro.au> wrote:
We keep getting the dreaded "error RS102 too stale to catch up" - and
it's driving me nuts. Installation is backing a wiki site so it's

As you know generally this is because the replicas (secondaries) fall too far behind the primary. It is best to address the causes of those lags but you can also increase the oplog size to give you more time to bring the secondaries up to date.

How big is your oplog? You can run db.printReplicationInfo() to get the details from each node.
Possible but unlikely.
 
  * Why did the SECONDARYs keep their status as SECONDARY - isn't
that misleading? According to their optime, they hadn't taken any
updates from the PRIMARY for almost 3 days! (yes, I need to fix our
monitoring)

Yes, a secondary will stay that way even if it get very stale, but not too stale.
 
  * The SECONDARY's status changed to RECOVERING with errmsg "error
RS102 too stale to catch up" only after restarting the PRIMARY - why
is that? Shouldn't the secondaries have detected their stale condition
without me having to kick it?

No, it needs manual intervention instead of putting lots of load on the other replicas. When it gets so stale that is require a resync or such then it is a admin tasks since different solution affect the other replicas in other ways.
 
  * Any ideas on how I can further debug the stoppage in the first
place? Do I need to do a custom build of mongod with some debugging
options enabled?

You can run with -vvvvv to get more logging.

Does you database name have "." in it? What operating system/fs are you using?
 

Cheers

- Paul Harvey

--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.


Paul Harvey

unread,
May 8, 2011, 8:37:52 PM5/8/11
to mongodb-user
Thanks Scott for your thoughts,

On May 8, 11:43 pm, Scott Hernandez <scotthernan...@gmail.com> wrote:
> On Sun, May 8, 2011 at 2:51 AM, Paul Harvey <Paul.W.Har...@csiro.au> wrote:
> > We keep getting the dreaded "error RS102 too stale to catch up" - and
> > it's driving me nuts. Installation is backing a wiki site so it's
>
> As you know generally this is because the replicas (secondaries) fall too
> far behind the primary. It is best to address the causes of those lags but
> you can also increase the oplog size to give you more time to bring the
> secondaries up to date.

I am at a bit of a loss as to why both my secondaries - one of which
is in a different DC - stopped syncing at exactly the same time. In
the 5 hours leading up to the stoppage, there were only 82 updates. I
have trawled through our application logs on this occasion and the
last, and couldn't associate the stoppage with any significant event
in the application, other than these log messages in mongo itself
regarding the system.js collection.

This might be irrelevant to the current problem, but we're dealing
with a DB-agnostic query language our application uses (Foswiki's
QuerySearch) and it isn't always possible to transform these queries
into native mongo queries, so in some circumstances they are mapped
into db.eval expressions. Some pages of the wiki result in queries
that rely heavily on db.eval, and these seem to invoke the most
problems from mongod - some of them I've protected from guest users
(crawlers). Given this and the difficulty of designing indexes for
sorted, ad-hoc queries on truly schemaless data, probably MongoDB
probably isn't a very good fit for our problem - one day I'd like to
consider neo4j - but we're happy to stick with mongo for now...

.. however I wonder if some of the db.eval's are causing the PRIMARY
to stop sending ops to the secondaries.

> How big is your oplog? You can run db.printReplicationInfo() to get the
> details from each node.

1.2GB, 1.7GB, 2.2GB. We get very few updates, except during the bulk-
load process - in the 5 hours leading up to the "stoppage" there were
only 82 save operations. Surely an > 1GB oplog is big enough to cope
with 16 saves per hour? :)

> No, it needs manual intervention instead of putting lots of load on the
> other replicas. When it gets so stale that is require a resync or such then
> it is a admin tasks since different solution affect the other replicas in
> other ways.

Fair enough.

> You can run with -vvvvv to get more logging.

Thanks, I'll set that up now.

> Does you database name have "." in it? What operating system/fs are you
> using?

No, the db names are filtered to replace such characters with
underscores.

Ubuntu 9.04 (ext3) and 10.04 LTS (ext4). Filesystems have ~50% space
free.

- Paul

Scott Hernandez

unread,
May 8, 2011, 10:47:28 PM5/8/11
to mongod...@googlegroups.com
The replicas actually pull data from the primary, not the other way around. The oplog on the primary stores all changes and the replicas tail that and apply those operations.
 

> How big is your oplog? You can run db.printReplicationInfo() to get the
> details from each node.

1.2GB, 1.7GB, 2.2GB. We get very few updates, except during the bulk-
load process - in the 5 hours leading up to the "stoppage" there were
only 82 save operations. Surely an > 1GB oplog is big enough to cope
with 16 saves per hour? :)

What does printReplicationInfo() say? 

> No, it needs manual intervention instead of putting lots of load on the
> other replicas. When it gets so stale that is require a resync or such then
> it is a admin tasks since different solution affect the other replicas in
> other ways.

Fair enough.

> You can run with -vvvvv to get more logging.

Thanks, I'll set that up now.

Great, that might help.
 

> Does you database name have "." in it? What operating system/fs are you
> using?

No, the db names are filtered to replace such characters with
underscores.

It seems like you do from the logs (I am running on 3-6 hours of sleep in almost 72 hours so the blurry screen could be lying to me):
"foswiki__TaxonProfile.Relationships.system.indexes"
db = foswiki__TaxonProfile.Relationships
coll = system.indexes

What does "show dbs" say? What about "ls" in the dbpath dir?

 

Ubuntu 9.04 (ext3) and 10.04 LTS (ext4). Filesystems have ~50% space
free.

- Paul

--

Paul Harvey

unread,
May 9, 2011, 12:43:47 AM5/9/11
to mongodb-user, sven.dow...@gmail.com
> What does printReplicationInfo() say?

On one of the secondaries:
seta:SECONDARY> db.printReplicationInfo()
configured oplog size: 1718.1447265625002MB
log length start to end: 66670secs (18.52hrs)
oplog first event time: Sun May 08 2011 20:01:22 GMT+1000 (EST)
oplog last event time: Mon May 09 2011 14:32:32 GMT+1000 (EST)
now: Mon May 09 2011 14:34:54 GMT+1000 (EST)
seta:SECONDARY>

On the current primary:
seta:PRIMARY> db.printReplicationInfo()
configured oplog size: 1609.58359375MB
log length start to end: 94075secs (26.13hrs)
oplog first event time: Sun May 08 2011 12:24:37 GMT+1000 (EST)
oplog last event time: Mon May 09 2011 14:32:32 GMT+1000 (EST)
now: Mon May 09 2011 14:33:10 GMT+1000 (EST)
seta:PRIMARY>

> > > Does you database name have "." in it? What operating system/fs are you
> > > using?
>
> > No, the db names are filtered to replace such characters with
> > underscores.
>
> It seems like you do from the logs (I am running on 3-6 hours of sleep in
> almost 72 hours so the blurry screen could be lying to me):
> "foswiki__TaxonProfile.Relationships.system.indexes"
> db = foswiki__TaxonProfile.Relationships
> coll = system.indexes
>
> What does "show dbs" say? What about "ls" in the dbpath dir?

seta:SECONDARY> show dbs
admin (empty)
...
foswiki__TaxonProfile 0.125GB
foswiki__TaxonProfile__Builder 0.0625GB
foswiki__TaxonProfile__Definitions 0.25GB
foswiki__TaxonProfile__Relationships 0.125GB
foswiki__TaxonProfile__Relationships__Superannuated 0.03125GB
foswiki__TaxonProfile__Samples 0.03125GB
foswiki__TaxonProfile__Samples__Eremophila 0.03125GB
foswiki__TaxonProfile__Superannuated 0.0625GB

An 'ls' shows directory names (we're using directoryperdb = true) like
foswiki__TaxonProfile__Relationships

Thanks again for your help... maybe the strange database/collection
name you've seen in the logs is something we should dig around for in
our own code? I've CC'd our developer, cheers :-)

- Paul

Nat

unread,
May 9, 2011, 1:09:00 AM5/9/11
to mongodb-user
By default, db.eval will take write lock and thus can block oplog read
from secondary. Instead of db.eval, why don't you use $where where
possible instead?

Paul Harvey

unread,
May 9, 2011, 3:17:40 AM5/9/11
to mongodb-user
Hi Nat,

On May 9, 3:09 pm, Nat <nat.lu...@gmail.com> wrote:
> By default, db.eval will take write lock and thus can block oplog read
> from secondary. Instead of db.eval, why don't you use $where where
> possible instead?

Now, I'm not the developer doing the mongodb integration, but I do
know that we use $where where possible. But Foswiki has a few
expressions which are hard to reproduce in mongo without designing for
temporary collections to be generated (and managed, Eg. expiry,
dependency tracking, while respecting ACLs) to hold interim results
(via map/reduce or whatever). One such QuerySearch pattern is a sort
of "ref" query - an indirect query on one set of documents by a
property of some other document that is merely linked (I.E. de-
reference a dbref).

Example:
%SEARCH{
"form.name='TaskForm' AND State='Confirmed' AND ReportedBy/
Email~'*example.company.org'"
type="query"
}%

The ReportedBy field holds the address/name of another document (a
user profile document) which contains an Email filed.
And you can "walk" these refs as much as you want, Eg. "ReportedBy/
Country/Population < 1000000"

Clearly, this is a horrible abuse of nosql, but we came to MongoDB for
its claim to be an "schemaless" database, and our data seems to
naturally fit into JSON quite nicely. It wasn't because we had big
data or needed 1000/requests per second scalability. So when we found
db.eval actually worked, we were happy... and so far the logs are
telling me that mongo rarely spends more than 1s to resolve such
queries.

But obviously, if this abuse of db.eval causes mongo to break, we need
to find a different solution (maybe we avoid ever sending db.eval's to
the primary).

Nat

unread,
May 9, 2011, 3:51:09 AM5/9/11
to mongod...@googlegroups.com
One way to work around for now is to use nolock option with db.eval especially if db.eval statement doesn't need any consistent read.

Paul Harvey

unread,
May 9, 2011, 5:56:54 AM5/9/11
to mongodb-user
On May 9, 5:51 pm, "Nat" <nat.lu...@gmail.com> wrote:
> One way to work around for now is to use nolock option with db.eval especially if db.eval statement doesn't need any consistent read.

That sounds very useful; I'll pass it on, thanks!

I'm still a little bothered that our replicas, aren't... but we'll see
if it happens again.

Paul Harvey

unread,
May 23, 2011, 3:38:41 AM5/23/11
to mongodb-user
We're still getting crashes/segfaults, but the replication hasn't
"paused" again like I described earlier in this thread.

Filed https://jira.mongodb.org/browse/SERVER-3131

- Paul

Nat

unread,
May 23, 2011, 3:43:16 AM5/23/11
to mongod...@googlegroups.com
Why do you have a lot of $where: " (1) "? If you don't use $where unnecessary, you should reduce the chance of encountering JS Error.

Paul Harvey

unread,
May 23, 2011, 10:06:01 AM5/23/11
to mongodb-user
Hi Nat,

On May 23, 5:43 pm, Nat <nat.lu...@gmail.com> wrote:
> Why do you have a lot of $where: " (1) "? If you don't use $where
> unnecessary, you should reduce the chance of encountering JS Error.

Thanks, I'll pass your suggestion on to the developer.

To answer your question, it's the result of transforming a Foswiki
QuerySearch expression into something that mongo can execute. Foswiki
can use other query engines with their own native query language, but
we've stuck with MongoDB because it claims to be schemaless, and the
data fits much more naturally than SQL tables.

So far we have been aiming for correctness, and haven't yet tried to
optimise cases such as these.

As I'm starting to get pretty desperate, I've built mongodb from the
current git master ff53662 against v8 2.2.18. If the site is still up
in the morning, I'll see if it survives the business day.

But perhaps someone can tell me if I should be using a more recent v8
release than 2.2.18.

Nat

unread,
May 23, 2011, 10:37:27 AM5/23/11
to mongod...@googlegroups.com
I looked at FOSWIKI code. It looks like it builds a dynamic $where clause based on the query. There is a known memory leak problem because of that. Native query should be used if possible.

Paul Harvey

unread,
May 23, 2011, 11:20:15 AM5/23/11
to mongodb-user, sven.dow...@gmail.com
Is it possible this is specific to spidermonkey? Will my v8 build work
around this particular problem?

I'm not sure what the current Foswiki code is up to these days, but at
some point in the past we *were* doing native queries where possible.
But Foswiki's query language has a lot of stuff that just can't be
efficiently mapped to native mongo queries... and the JS was a neat
solution (apart from the crashes!)

Thanks again. I'll check with Sven (Foswiki developer).

Sven Dowideit

unread,
May 23, 2011, 8:24:29 PM5/23/11
to mongod...@googlegroups.com, Paul.W Harvey
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

let me begin by saying, I'm not using $where clauses most of the time -
though I'm sure there are more that can be removed.

I didn't see any docco suggesting that there might be a limit to the
number of $where clauses that could be run before mongodb crashes, so I
have focused more on completeness (we're still not able to generate the
full set of foswiki queries into MongoDB).

I have specifically coded $where clauses when there are multiple and
clauses on the same key (yup, users do that), when I need to query
multiple documents (getSisterDB type stuff) or when there's a mismatch
between how foswiki (and its perl oriented evaluation) and
javascript/mongo think about things, and various other little corners of
weirdness on both sides.

the remainder is when the generator gets confused (or when someone
changes the parse tree its generated from and I've not yet updated
enough code to make a 'good' mongodb query.

Its worth noting that using _only_ $where queries is already so much of
a speed improvement for foswiki, that converting those to native queries
is almost insignificant - if it wheren't for these crashes.

(I do love the $where: '(1)' querys that I'm generating :( )

I guess the question from my perspective (and wrt to getting the foswiki
on mongodb project functioning reliably) is should I focus on completing
the foswiki query to mongodb converter, or should I drop that, and work
on this memory leak in monogdb? Both are show stoppers for us :/ as
MongoDB's query language is simply not expressive enough to avoid $where
in the complex queries.

Sven


- --
________________________________________
Professional Wiki Innovation and Support
Enterprise Support Contracts for Foswiki
Sven Dowideit http://fosiki.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk3a+roACgkQPAwzu0QrW+mR5wCfbn/G4lKFnsw81Xz8G8WcMuD4
cDUAnjY8CprPuhdd0MQrww2b63aQ9Tda
=6lvy
-----END PGP SIGNATURE-----

Paul Harvey

unread,
May 23, 2011, 11:56:15 PM5/23/11
to mongodb-user
It turns out that the v8 build of
ff53662a8f44862ca9430a7a6e7e83904e9d7b1d is no better (against
libv8-2.5.9.9 anyway).

I've reverted back to 1.8.1, scheduled a cron job to restart our
mongod's & apache every morning... hoping we can ride this out until
SERVER-3131 or at least SERVER-3012 are fixed :-(

- Paul
> Comment: Using GnuPG with Mozilla -http://enigmail.mozdev.org/

Nat

unread,
May 24, 2011, 12:04:17 AM5/24/11
to mongod...@googlegroups.com
Paul,

If you restart only apache but not mongod, did it help at all?

Sven Dowideit

unread,
May 24, 2011, 12:56:39 AM5/24/11
to mongod...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

mmm, wow -

Nat, what are suggesting? that the MonogDB JS engine out of memory gets
reset if we close and reopen the connection?

if that really is an option, can't we just make our fastCGI backends
have a shorter life?

Sven

On 24/05/11 14:04, Nat wrote:
> Paul,
>
> If you restart only apache but not mongod, did it help at all?
>

> --
> You received this message because you are subscribed to the Google
> Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com.
> To unsubscribe from this group, send email to
> mongodb-user...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/mongodb-user?hl=en.

- --
________________________________________
Professional Wiki Innovation and Support
Enterprise Support Contracts for Foswiki
Sven Dowideit http://fosiki.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk3bOocACgkQPAwzu0QrW+lQHgCgnP+rtue4fsybcKCVdr0Li0b7
M14AoKdYm2RktQUBThkzKgiyom8XOjgY
=TaRq
-----END PGP SIGNATURE-----

Paul Harvey

unread,
May 24, 2011, 1:17:06 AM5/24/11
to mongodb-user
Hi Nat,

Thanks for the suggestion. When impatient users are telling me the
site is broken, I just restart everything :-) I'll try apache first
next time.

- PH

Nat

unread,
May 24, 2011, 1:35:06 AM5/24/11
to mongod...@googlegroups.com
If you see the same thing as I'm experiencing, restarting either should help. If not, restarting just mongod should help. Let me know then.
-----Original Message-----
From: Paul Harvey <Paul.W...@csiro.au>
Sender: mongod...@googlegroups.com
Date: Mon, 23 May 2011 22:17:06
To: mongodb-user<mongod...@googlegroups.com>
Reply-To: mongod...@googlegroups.com
Subject: [mongodb-user] Re: Replica secondaries keep "stopping"
afterre-indexing system.js _id

Reply all
Reply to author
Forward
0 new messages