Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Assertion related to TTLMonitor when running RS on Windows 64bit
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  10 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Assaf Lavie  
View profile  
 More options Oct 17 2012, 11:14 am
From: Assaf Lavie <assafla...@gmail.com>
Date: Wed, 17 Oct 2012 08:14:19 -0700 (PDT)
Local: Wed, Oct 17 2012 11:14 am
Subject: Assertion related to TTLMonitor when running RS on Windows 64bit

After upgrading Mongo to 2.3 on my local Windows dev machine, suddenly the
replica set console windows report an assertion whenever any query is made
against the DB:

[slaveTracking] update local.slaves query: { _id:
ObjectId('4fba5ef4bc2c86368476beed'), host: "127.0.0.1", ns:
 "local.oplog.rs" } update: { $set: { syncedTo: Timestamp 1350486453000|1 }

} keyUpdates:0 exception: Invalid BSONObj size: 0 (0x0

0000000) first element: EOO code:10334 locks(micros) w:44430 44ms

[TTLMonitor] assertion 0 assertion
d:\slave\windows_64bit_v2.2\mongo\src\mongo\db\pdfile.h:360 ns:local.system
.indexes query:{ expireAfterSeconds: { $exists: true } }

TTLMonitor] problem detected during query over local.system.indexes : {
$err: "assertion d:\slave\windows_64b
it_v2.2\mongo\src\mongo\db\pdfile.h:360" }

[TTLMonitor] ERROR: error processing ttl for db: local 10065 invalid
parameter: expected an object ()

Any idea what could cause this? All 3 nodes in teh RS are on the same
machine and aside from these errors the queries themselves seem to succeed.

Assaf


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Tad Marshall  
View profile  
 More options Oct 19 2012, 3:53 pm
From: Tad Marshall <t...@10gen.com>
Date: Fri, 19 Oct 2012 12:53:09 -0700 (PDT)
Local: Fri, Oct 19 2012 3:53 pm
Subject: Re: Assertion related to TTLMonitor when running RS on Windows 64bit

Hi Assaf,

The error messages from TTLMonitor may be  
https://jira.mongodb.org/browse/SERVER-6911 , which is fixed in version
2.2.1-rc1. Version 2.2.1-rc1 is close to release but not released as of
this writing.  You could try a recent nightly build in the 2.2 branch (
http://downloads.mongodb.org/win32/mongodb-win32-x86_64-v2.2-latest.zip )
to see if it makes the TTLMonitor messages go away.

The [slaveTracking] message may be something different.  Was there a stack
trace following this line?  This would help pin down where the error is
coming from.

If there was no stack trace, could you try either restarting mongod.exe
with an added "--traceExceptions" command line switch or issue a command in
the shell to enable stack traces for all exceptions:

db.adminCommand( { setParameter: 1, traceExceptions: true} )

On Windows, you need to have the mongod.pdb file in the same directory as
mongod.exe to get usable stack traces.

Let us know is this works for you, thanks!

Tad


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Assaf Lavie  
View profile  
 More options Oct 21 2012, 4:07 am
From: Assaf Lavie <assafla...@gmail.com>
Date: Sun, 21 Oct 2012 10:07:00 +0200
Subject: Re: [mongodb-user] Re: Assertion related to TTLMonitor when running RS on Windows 64bit

Thanks Tad.

Running the recent nightly yield a different exception every second
(regardless of actual queries):

Sun Oct 21 09:59:38 [rsSyncNotifier] kernel32.dll    BaseThreadInitThunk+0xd
Sun Oct 21 09:59:38 [rsSyncNotifier] replset tracking exception: exception:
0 assertion d:\slave\windows_64bit_v2.2\mongo\src\mongo\db\pdfile.h:360
Sun Oct 21 09:59:39 [rsSyncNotifier] replset setting oplog notifier to
localhost:27017
Sun Oct 21 09:59:39 [rsSyncNotifier]  local.me Assertion failure isOk()
d:\slave\windows_64bit_v2.2\mongo\src\mongo\db\pdfile.h 360

However, this only happens on one of the secondary instances.

If I run mongod with --traceExceptions I don't see an exception every
second - I only see the following exceptions whenever a query is performed:

Sun Oct 21 10:02:56 [slaveTracking] update local.slaves query: { _id:
ObjectId('4fba5e0d0c61cd1166308cac'), host: "127.0.0.1", ns: "local.oplog.rs"

} update: { $set: { syncedTo: Timestamp 1350806576000|1 } } keyUpdates:0

exception: Invalid BSONObj size: 0 (0x00000000) first element: EOO
code:10334 locks(micros) w:88764 88ms
Sun Oct 21 10:02:56 [slaveTracking] Assertion: 10334:Invalid BSONObj size:
0 (0x00000000) first element: EOO
Sun Oct 21 10:02:56 [slaveTracking] mongod.exe      ???
...
Sun Oct 21 10:02:56 [slaveTracking] mongod.exe      ???
Sun Oct 21 10:02:56 [slaveTracking] kernel32.dll    BaseThreadInitThunk+0xd
Sun Oct 21 10:02:56 [slaveTracking] warning: DBException thrown :: caused
by :: 10334 Invalid BSONObj size: 0 (0x00000000) first element: EOO

Assaf


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Tad Marshall  
View profile  
 More options Oct 21 2012, 6:33 am
From: Tad Marshall <t...@10gen.com>
Date: Sun, 21 Oct 2012 03:33:06 -0700 (PDT)
Local: Sun, Oct 21 2012 6:33 am
Subject: Re: [mongodb-user] Re: Assertion related to TTLMonitor when running RS on Windows 64bit

Hi Assaf,

The assertion pointing to pdfile.h line 360 is indicating corruption.  This
line verifies that an extent contains a valid signature, and the test is
failing.

This was also happening in the earlier log snippet you posted, which I
should have noticed, but I focused on the TTLMonitor aspect and missed the
corruption part.  The second log post adds another indication of
corruption; Assertion: 10334:Invalid BSONObj size.

The collections that are named are in your 'local' database; 'local.me' and
'local.slaves'.  It seems that both are corrupt.

The messages every second are triggered by a loop that tries to reconnect
to the server that the secondary is trying to sync from.  It is failing
each time, so you get repeating messages.  At log level 1 you would see an
additional message; "replset could not connect to localhost:27017".

You are not getting a good stack trace (indicated by the "???" lines).  Is
mongod.pdb in the same directory as mongod.exe?  It needs to
be find-able by mongod.exe to get information about mongod.exe in the stack
trace.

Has this server had hard shutdowns?  Can you check the Windows Event Log
for NTFS errors?

Tad


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Assaf Lavie  
View profile  
 More options Oct 21 2012, 7:10 am
From: Assaf Lavie <assafla...@gmail.com>
Date: Sun, 21 Oct 2012 13:09:36 +0200
Local: Sun, Oct 21 2012 7:09 am
Subject: Re: [mongodb-user] Re: Assertion related to TTLMonitor when running RS on Windows 64bit

Hi Tad,

I rebuilt the RS from a backup and the assertions are now gone, so it
definitely looks like corruption.

The mongod instances are running on my development Windows machine and are
usually killed off by closing their console windows. There haven't been
many hard shutdowns (e.g. power loss) at all - can't remember when one
happened (there's a UPS here).

In addition, the drive that contains the data is actually a mirrored RAID
drive, so the chances of HW problems are very slim indeed.

Nothing in Window's event log related to NTFS.

I should mention, though, that almost every time I re-launch the mongod's
of the RS I get the following:

 preallocating a journal file d:/data/mongosets/set3/journal/prealloc.2
52428800/1073741824     4%
...

Which usually takes about a minute. This happens even now, after completely
rebuilding the RS.

Assaf


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Assaf Lavie  
View profile  
 More options Oct 21 2012, 7:11 am
From: Assaf Lavie <assafla...@gmail.com>
Date: Sun, 21 Oct 2012 13:10:24 +0200
Local: Sun, Oct 21 2012 7:10 am
Subject: Re: [mongodb-user] Re: Assertion related to TTLMonitor when running RS on Windows 64bit

And, yes, mongod.pdb is there (part of the download package you linked to).

Assaf


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Tad Marshall  
View profile  
 More options Oct 21 2012, 12:34 pm
From: Tad Marshall <t...@10gen.com>
Date: Sun, 21 Oct 2012 09:34:43 -0700 (PDT)
Local: Sun, Oct 21 2012 12:34 pm
Subject: Re: [mongodb-user] Re: Assertion related to TTLMonitor when running RS on Windows 64bit

Hi Assaf,

Closing the console window by clicking the red X or using Close from the
system menu will give you a clean shutdown, so that's fine.

NTFS problems are not necessarily the same as disk hardware problems,
though of course hardware problems can cause NTFS problems.  We have seen
one case of corruption that was accompanied by an NTFS error stating that
NTFS was unable to flush buffers to disk, so it is worth looking at; thanks
for checking.

mongod tests the speed of writing to a new file with the speed of
overwriting the same file a second time and if the overwrite is faster it
will preallocate journal files on the theory that this will make journaling
faster.  If it is taking one minute to allocate three 1 GB files, it looks
like this heuristic isn't working right on your system.  You can use the
--nopreallocj option to disable preallocation of the journal files.

What is your RAID setup?  Is it using Intel's RAID chipset and Rapid
Storage Technology software, Windows' software RAID, or some other solution?

What version of Windows are you running?

You may need to have your current working directory be the directory that
mongod.exe is in for the pdb file to be found.  I just did some
experimenting and that may be the missing ingredient.  I'll post a Jira
ticket to get that fixed.

Tad


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Assaf Lavie  
View profile  
 More options Oct 22 2012, 1:40 am
From: Assaf Lavie <assafla...@gmail.com>
Date: Mon, 22 Oct 2012 07:39:26 +0200
Local: Mon, Oct 22 2012 1:39 am
Subject: Re: [mongodb-user] Re: Assertion related to TTLMonitor when running RS on Windows 64bit

Thanks for all the explanations.

I'm running Windows 7 64 bit, and using Windows' software RAID to mirror
two HDs.

Alas, I can't recreate the problem any more since I've rebuilt my RS, so
running from the bin directory is something I could try only next time
something similar happens.

Thanks,
Assaf

...

read more »


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Tad Marshall  
View profile  
 More options Oct 22 2012, 8:19 am
From: Tad Marshall <t...@10gen.com>
Date: Mon, 22 Oct 2012 05:19:32 -0700 (PDT)
Local: Mon, Oct 22 2012 8:19 am
Subject: Re: [mongodb-user] Re: Assertion related to TTLMonitor when running RS on Windows 64bit

Hi Assaf,

I filed  https://jira.mongodb.org/browse/SERVER-7435  to get the Windows
stack traces to work right when starting mongod.exe in a directory other
than the current working directory.

I'll do some testing with Windows' software RAID and see if I can duplicate
the corruption you saw.  Was the corruption just on a secondary, and if so,
had it been a primary at an earlier time?  Are you using capped collections
or only normal collections?  How large are your disk drives?  They are in
RAID 1 mode (mirroring), correct?  How large is your database, and does it
consist of a single collection or some other number?  When writing to the
database, are you mostly inserting or do you do a lot of updates?  If you
do updates, are they mostly in-place updates that would not change the size
of a record, or are they additions of new information that would force the
records to move to a new location?  How many indexes do you have on your
collections?  Do you index just simple fields, or do you have indexes on
arrays?

I'm just probing for the parameters I should use in trying to reproduce
your problem.  Thanks in advance!

Tad

...

read more »


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Assaf Lavie  
View profile  
 More options Oct 22 2012, 8:55 am
From: Assaf Lavie <assafla...@gmail.com>
Date: Mon, 22 Oct 2012 14:54:55 +0200
Local: Mon, Oct 22 2012 8:54 am
Subject: Re: [mongodb-user] Re: Assertion related to TTLMonitor when running RS on Windows 64bit

Thanks, Tad.
I'll answer inline.

On Mon, Oct 22, 2012 at 2:19 PM, Tad Marshall <t...@10gen.com> wrote:
> Hi Assaf,

> I filed  https://jira.mongodb.org/browse/SERVER-7435  to get the Windows
> stack traces to work right when starting mongod.exe in a directory other
> than the current working directory.

Appreciated.

> I'll do some testing with Windows' software RAID and see if I can
> duplicate the corruption you saw.  Was the corruption just on a secondary,
> and if so, had it been a primary at an earlier time?

The errors I was getting were from secondary instances. However, I have no
idea where exactly the corruption was. All I know is that I deleted all
data directories (of all 3 nodes) and restored the data from a dump.

> Are you using capped collections or only normal collections?

Yes, some capped collections for logging as well.

> How large are your disk drives?  They are in RAID 1 mode (mirroring),
> correct?

Yes. 2x2TB in a mirror raid (1).

> How large is your database, and does it consist of a single collection or
> some other number?

The data dump is about 4GB in size and is comprised of about 10
collections.

> When writing to the database, are you mostly inserting or do you do a lot
> of updates?

Very few updates.

> If you do updates, are they mostly in-place updates that would not change
> the size of a record, or are they additions of new information that would
> force the records to move to a new location?

We do update some documents (seldom), and when we do we might very well
change the size of the content in some field, but not by much. Only very
tiny documents are updated (few 1..2KB...)

> How many indexes do you have on your collections?

Typically no more than a few per collection. Most of the times just one
field indexed.

> Do you index just simple fields, or do you have indexes on arrays?

No indexes on arrays.

> I'm just probing for the parameters I should use in trying to reproduce
> your problem.  Thanks in advance!

No problem. I appreciate the thorough investigation.

Assaf

...

read more »


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »