After upgrading Mongo to 2.3 on my local Windows dev machine, suddenly the replica set console windows report an assertion whenever any query is made against the DB:
TTLMonitor] problem detected during query over local.system.indexes : { $err: "assertion d:\slave\windows_64b it_v2.2\mongo\src\mongo\db\pdfile.h:360" }
[TTLMonitor] ERROR: error processing ttl for db: local 10065 invalid parameter: expected an object ()
Any idea what could cause this? All 3 nodes in teh RS are on the same machine and aside from these errors the queries themselves seem to succeed.
The [slaveTracking] message may be something different. Was there a stack trace following this line? This would help pin down where the error is coming from.
If there was no stack trace, could you try either restarting mongod.exe with an added "--traceExceptions" command line switch or issue a command in the shell to enable stack traces for all exceptions:
On Wednesday, October 17, 2012 11:14:20 AM UTC-4, Assaf Lavie wrote:
> After upgrading Mongo to 2.3 on my local Windows dev machine, suddenly the > replica set console windows report an assertion whenever any query is made > against the DB:
Running the recent nightly yield a different exception every second
(regardless of actual queries):
Sun Oct 21 09:59:38 [rsSyncNotifier] kernel32.dll BaseThreadInitThunk+0xd
Sun Oct 21 09:59:38 [rsSyncNotifier] replset tracking exception: exception:
0 assertion d:\slave\windows_64bit_v2.2\mongo\src\mongo\db\pdfile.h:360
Sun Oct 21 09:59:39 [rsSyncNotifier] replset setting oplog notifier to
localhost:27017
Sun Oct 21 09:59:39 [rsSyncNotifier] local.me Assertion failure isOk()
d:\slave\windows_64bit_v2.2\mongo\src\mongo\db\pdfile.h 360
However, this only happens on one of the secondary instances.
If I run mongod with --traceExceptions I don't see an exception every
second - I only see the following exceptions whenever a query is performed:
Sun Oct 21 10:02:56 [slaveTracking] update local.slaves query: { _id:
ObjectId('4fba5e0d0c61cd1166308cac'), host: "127.0.0.1", ns: "local.oplog.rs"
exception: Invalid BSONObj size: 0 (0x00000000) first element: EOO
code:10334 locks(micros) w:88764 88ms
Sun Oct 21 10:02:56 [slaveTracking] Assertion: 10334:Invalid BSONObj size:
0 (0x00000000) first element: EOO
Sun Oct 21 10:02:56 [slaveTracking] mongod.exe ???
...
Sun Oct 21 10:02:56 [slaveTracking] mongod.exe ???
Sun Oct 21 10:02:56 [slaveTracking] kernel32.dll BaseThreadInitThunk+0xd
Sun Oct 21 10:02:56 [slaveTracking] warning: DBException thrown :: caused
by :: 10334 Invalid BSONObj size: 0 (0x00000000) first element: EOO
> The [slaveTracking] message may be something different. Was there a stack
> trace following this line? This would help pin down where the error is
> coming from.
> If there was no stack trace, could you try either restarting mongod.exe
> with an added "--traceExceptions" command line switch or issue a command in
> the shell to enable stack traces for all exceptions:
> On Windows, you need to have the mongod.pdb file in the same directory as
> mongod.exe to get usable stack traces.
> Let us know is this works for you, thanks!
> Tad
> On Wednesday, October 17, 2012 11:14:20 AM UTC-4, Assaf Lavie wrote:
>> After upgrading Mongo to 2.3 on my local Windows dev machine, suddenly
>> the replica set console windows report an assertion whenever any query is
>> made against the DB:
>> TTLMonitor] problem detected during query over local.system.indexes : {
>> $err: "assertion d:\slave\windows_64b
>> it_v2.2\mongo\src\mongo\db\**pdfile.h:360" }
>> [TTLMonitor] ERROR: error processing ttl for db: local 10065 invalid
>> parameter: expected an object ()
>> Any idea what could cause this? All 3 nodes in teh RS are on the same
>> machine and aside from these errors the queries themselves seem to succeed.
>> Assaf
> --
> You received this message because you are subscribed to the Google
> Groups "mongodb-user" group.
> To post to this group, send email to mongodb-user@googlegroups.com
> To unsubscribe from this group, send email to
> mongodb-user+unsubscribe@googlegroups.com
> See also the IRC channel -- freenode.net#mongodb
The assertion pointing to pdfile.h line 360 is indicating corruption. This line verifies that an extent contains a valid signature, and the test is failing.
This was also happening in the earlier log snippet you posted, which I should have noticed, but I focused on the TTLMonitor aspect and missed the corruption part. The second log post adds another indication of corruption; Assertion: 10334:Invalid BSONObj size.
The collections that are named are in your 'local' database; 'local.me' and 'local.slaves'. It seems that both are corrupt.
The messages every second are triggered by a loop that tries to reconnect to the server that the secondary is trying to sync from. It is failing each time, so you get repeating messages. At log level 1 you would see an additional message; "replset could not connect to localhost:27017".
You are not getting a good stack trace (indicated by the "???" lines). Is mongod.pdb in the same directory as mongod.exe? It needs to be find-able by mongod.exe to get information about mongod.exe in the stack trace.
Has this server had hard shutdowns? Can you check the Windows Event Log for NTFS errors?
>> The [slaveTracking] message may be something different. Was there a >> stack trace following this line? This would help pin down where the error >> is coming from.
>> If there was no stack trace, could you try either restarting mongod.exe >> with an added "--traceExceptions" command line switch or issue a command in >> the shell to enable stack traces for all exceptions:
>> On Windows, you need to have the mongod.pdb file in the same directory as >> mongod.exe to get usable stack traces.
>> Let us know is this works for you, thanks!
>> Tad
>> On Wednesday, October 17, 2012 11:14:20 AM UTC-4, Assaf Lavie wrote:
>>> After upgrading Mongo to 2.3 on my local Windows dev machine, suddenly >>> the replica set console windows report an assertion whenever any query is >>> made against the DB:
>>> TTLMonitor] problem detected during query over local.system.indexes : { >>> $err: "assertion d:\slave\windows_64b >>> it_v2.2\mongo\src\mongo\db\**pdfile.h:360" }
>>> [TTLMonitor] ERROR: error processing ttl for db: local 10065 invalid >>> parameter: expected an object ()
>>> Any idea what could cause this? All 3 nodes in teh RS are on the same >>> machine and aside from these errors the queries themselves seem to succeed.
>>> Assaf
>> -- >> You received this message because you are subscribed to the Google >> Groups "mongodb-user" group. >> To post to this group, send email to mongod...@googlegroups.com<javascript:> >> To unsubscribe from this group, send email to >> mongodb-user...@googlegroups.com <javascript:> >> See also the IRC channel -- freenode.net#mongodb
I rebuilt the RS from a backup and the assertions are now gone, so it
definitely looks like corruption.
The mongod instances are running on my development Windows machine and are
usually killed off by closing their console windows. There haven't been
many hard shutdowns (e.g. power loss) at all - can't remember when one
happened (there's a UPS here).
In addition, the drive that contains the data is actually a mirrored RAID
drive, so the chances of HW problems are very slim indeed.
Nothing in Window's event log related to NTFS.
I should mention, though, that almost every time I re-launch the mongod's
of the RS I get the following:
preallocating a journal file d:/data/mongosets/set3/journal/prealloc.2
52428800/1073741824 4%
...
Which usually takes about a minute. This happens even now, after completely
rebuilding the RS.
On Sun, Oct 21, 2012 at 12:33 PM, Tad Marshall <t...@10gen.com> wrote:
> Hi Assaf,
> The assertion pointing to pdfile.h line 360 is indicating corruption.
> This line verifies that an extent contains a valid signature, and the test
> is failing.
> This was also happening in the earlier log snippet you posted, which I
> should have noticed, but I focused on the TTLMonitor aspect and missed the
> corruption part. The second log post adds another indication of
> corruption; Assertion: 10334:Invalid BSONObj size.
> The collections that are named are in your 'local' database; 'local.me'
> and 'local.slaves'. It seems that both are corrupt.
> The messages every second are triggered by a loop that tries to reconnect
> to the server that the secondary is trying to sync from. It is failing
> each time, so you get repeating messages. At log level 1 you would see an
> additional message; "replset could not connect to localhost:27017".
> You are not getting a good stack trace (indicated by the "???" lines). Is
> mongod.pdb in the same directory as mongod.exe? It needs to
> be find-able by mongod.exe to get information about mongod.exe in the stack
> trace.
> Has this server had hard shutdowns? Can you check the Windows Event Log
> for NTFS errors?
> Tad
> On Sunday, October 21, 2012 4:07:58 AM UTC-4, Assaf Lavie wrote:
>> Thanks Tad.
>> Running the recent nightly yield a different exception every second
>> (regardless of actual queries):
>> Sun Oct 21 09:59:38 [rsSyncNotifier] kernel32.dll
>> BaseThreadInitThunk+0xd
>> Sun Oct 21 09:59:38 [rsSyncNotifier] replset tracking exception:
>> exception: 0 assertion d:\slave\windows_64bit_v2.2\**
>> mongo\src\mongo\db\pdfile.h:**360
>> Sun Oct 21 09:59:39 [rsSyncNotifier] replset setting oplog notifier to
>> localhost:27017
>> Sun Oct 21 09:59:39 [rsSyncNotifier] local.me Assertion failure isOk()
>> d:\slave\windows_64bit_v2.2\**mongo\src\mongo\db\pdfile.h 360
>> However, this only happens on one of the secondary instances.
>> If I run mongod with --traceExceptions I don't see an exception every
>> second - I only see the following exceptions whenever a query is performed:
>> Sun Oct 21 10:02:56 [slaveTracking] update local.slaves query: { _id:
>> ObjectId('**4fba5e0d0c61cd1166308cac'), host: "127.0.0.1", ns: "
>> local.oplog.rs" } update: { $set: { syncedTo: Timestamp 1350806576000|1
>> } } keyUpdates:0 exception: Invalid BSONObj size: 0 (0x00000000) first
>> element: EOO code:10334 locks(micros) w:88764 88ms
>> Sun Oct 21 10:02:56 [slaveTracking] Assertion: 10334:Invalid BSONObj
>> size: 0 (0x00000000) first element: EOO
>> Sun Oct 21 10:02:56 [slaveTracking] mongod.exe ???
>> ...
>> Sun Oct 21 10:02:56 [slaveTracking] mongod.exe ???
>> Sun Oct 21 10:02:56 [slaveTracking] kernel32.dll
>> BaseThreadInitThunk+0xd
>> Sun Oct 21 10:02:56 [slaveTracking] warning: DBException thrown :: caused
>> by :: 10334 Invalid BSONObj size: 0 (0x00000000) first element: EOO
>> Assaf
>> On Fri, Oct 19, 2012 at 9:53 PM, Tad Marshall <t...@10gen.com> wrote:
>>> The [slaveTracking] message may be something different. Was there a
>>> stack trace following this line? This would help pin down where the error
>>> is coming from.
>>> If there was no stack trace, could you try either restarting mongod.exe
>>> with an added "--traceExceptions" command line switch or issue a command in
>>> the shell to enable stack traces for all exceptions:
>>> On Windows, you need to have the mongod.pdb file in the same directory
>>> as mongod.exe to get usable stack traces.
>>> Let us know is this works for you, thanks!
>>> Tad
>>> On Wednesday, October 17, 2012 11:14:20 AM UTC-4, Assaf Lavie wrote:
>>>> After upgrading Mongo to 2.3 on my local Windows dev machine, suddenly
>>>> the replica set console windows report an assertion whenever any query is
>>>> made against the DB:
>>>> TTLMonitor] problem detected during query over local.system.indexes : {
>>>> $err: "assertion d:\slave\windows_64b
>>>> it_v2.2\mongo\src\mongo\db\**pdf**ile.h:360" }
>>>> [TTLMonitor] ERROR: error processing ttl for db: local 10065 invalid
>>>> parameter: expected an object ()
>>>> Any idea what could cause this? All 3 nodes in teh RS are on the same
>>>> machine and aside from these errors the queries themselves seem to succeed.
>>>> Assaf
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "mongodb-user" group.
>>> To post to this group, send email to mongod...@googlegroups.com
>>> To unsubscribe from this group, send email to
>>> mongodb-user...@**googlegroups.com
>>> See also the IRC channel -- freenode.net#mongodb
>> --
> You received this message because you are subscribed to the Google
> Groups "mongodb-user" group.
> To post to this group, send email to mongodb-user@googlegroups.com
> To unsubscribe from this group, send email to
> mongodb-user+unsubscribe@googlegroups.com
> See also the IRC channel -- freenode.net#mongodb
On Sun, Oct 21, 2012 at 1:09 PM, Assaf Lavie <assafla...@gmail.com> wrote:
> Hi Tad,
> I rebuilt the RS from a backup and the assertions are now gone, so it
> definitely looks like corruption.
> The mongod instances are running on my development Windows machine and are
> usually killed off by closing their console windows. There haven't been
> many hard shutdowns (e.g. power loss) at all - can't remember when one
> happened (there's a UPS here).
> In addition, the drive that contains the data is actually a mirrored RAID
> drive, so the chances of HW problems are very slim indeed.
> Nothing in Window's event log related to NTFS.
> I should mention, though, that almost every time I re-launch the mongod's
> of the RS I get the following:
> Which usually takes about a minute. This happens even now, after
> completely rebuilding the RS.
> Assaf
> On Sun, Oct 21, 2012 at 12:33 PM, Tad Marshall <t...@10gen.com> wrote:
>> Hi Assaf,
>> The assertion pointing to pdfile.h line 360 is indicating corruption.
>> This line verifies that an extent contains a valid signature, and the test
>> is failing.
>> This was also happening in the earlier log snippet you posted, which I
>> should have noticed, but I focused on the TTLMonitor aspect and missed the
>> corruption part. The second log post adds another indication of
>> corruption; Assertion: 10334:Invalid BSONObj size.
>> The collections that are named are in your 'local' database; 'local.me'
>> and 'local.slaves'. It seems that both are corrupt.
>> The messages every second are triggered by a loop that tries to reconnect
>> to the server that the secondary is trying to sync from. It is failing
>> each time, so you get repeating messages. At log level 1 you would see an
>> additional message; "replset could not connect to localhost:27017".
>> You are not getting a good stack trace (indicated by the "???" lines).
>> Is mongod.pdb in the same directory as mongod.exe? It needs to
>> be find-able by mongod.exe to get information about mongod.exe in the stack
>> trace.
>> Has this server had hard shutdowns? Can you check the Windows Event Log
>> for NTFS errors?
>> Tad
>> On Sunday, October 21, 2012 4:07:58 AM UTC-4, Assaf Lavie wrote:
>>> Thanks Tad.
>>> Running the recent nightly yield a different exception every second
>>> (regardless of actual queries):
>>> Sun Oct 21 09:59:38 [rsSyncNotifier] kernel32.dll
>>> BaseThreadInitThunk+0xd
>>> Sun Oct 21 09:59:38 [rsSyncNotifier] replset tracking exception:
>>> exception: 0 assertion d:\slave\windows_64bit_v2.2\**
>>> mongo\src\mongo\db\pdfile.h:**360
>>> Sun Oct 21 09:59:39 [rsSyncNotifier] replset setting oplog notifier to
>>> localhost:27017
>>> Sun Oct 21 09:59:39 [rsSyncNotifier] local.me Assertion failure isOk()
>>> d:\slave\windows_64bit_v2.2\**mongo\src\mongo\db\pdfile.h 360
>>> However, this only happens on one of the secondary instances.
>>> If I run mongod with --traceExceptions I don't see an exception every
>>> second - I only see the following exceptions whenever a query is performed:
>>> Sun Oct 21 10:02:56 [slaveTracking] update local.slaves query: { _id:
>>> ObjectId('**4fba5e0d0c61cd1166308cac'), host: "127.0.0.1", ns: "
>>> local.oplog.rs" } update: { $set: { syncedTo: Timestamp 1350806576000|1
>>> } } keyUpdates:0 exception: Invalid BSONObj size: 0 (0x00000000) first
>>> element: EOO code:10334 locks(micros) w:88764 88ms
>>> Sun Oct 21 10:02:56 [slaveTracking] Assertion: 10334:Invalid BSONObj
>>> size: 0 (0x00000000) first element: EOO
>>> Sun Oct 21 10:02:56 [slaveTracking] mongod.exe ???
>>> ...
>>> Sun Oct 21 10:02:56 [slaveTracking] mongod.exe ???
>>> Sun Oct 21 10:02:56 [slaveTracking] kernel32.dll
>>> BaseThreadInitThunk+0xd
>>> Sun Oct 21 10:02:56 [slaveTracking] warning: DBException thrown ::
>>> caused by :: 10334 Invalid BSONObj size: 0 (0x00000000) first element: EOO
>>> Assaf
>>> On Fri, Oct 19, 2012 at 9:53 PM, Tad Marshall <t...@10gen.com> wrote:
>>>> The [slaveTracking] message may be something different. Was there a
>>>> stack trace following this line? This would help pin down where the error
>>>> is coming from.
>>>> If there was no stack trace, could you try either restarting mongod.exe
>>>> with an added "--traceExceptions" command line switch or issue a command in
>>>> the shell to enable stack traces for all exceptions:
>>>> On Windows, you need to have the mongod.pdb file in the same directory
>>>> as mongod.exe to get usable stack traces.
>>>> Let us know is this works for you, thanks!
>>>> Tad
>>>> On Wednesday, October 17, 2012 11:14:20 AM UTC-4, Assaf Lavie wrote:
>>>>> After upgrading Mongo to 2.3 on my local Windows dev machine, suddenly
>>>>> the replica set console windows report an assertion whenever any query is
>>>>> made against the DB:
>>>>> TTLMonitor] problem detected during query over local.system.indexes :
>>>>> { $err: "assertion d:\slave\windows_64b
>>>>> it_v2.2\mongo\src\mongo\db\**pdf**ile.h:360" }
>>>>> [TTLMonitor] ERROR: error processing ttl for db: local 10065 invalid
>>>>> parameter: expected an object ()
>>>>> Any idea what could cause this? All 3 nodes in teh RS are on the same
>>>>> machine and aside from these errors the queries themselves seem to succeed.
>>>>> Assaf
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "mongodb-user" group.
>>>> To post to this group, send email to mongod...@googlegroups.com
>>>> To unsubscribe from this group, send email to
>>>> mongodb-user...@**googlegroups.com
>>>> See also the IRC channel -- freenode.net#mongodb
>>> --
>> You received this message because you are subscribed to the Google
>> Groups "mongodb-user" group.
>> To post to this group, send email to mongodb-user@googlegroups.com
>> To unsubscribe from this group, send email to
>> mongodb-user+unsubscribe@googlegroups.com
>> See also the IRC channel -- freenode.net#mongodb
Closing the console window by clicking the red X or using Close from the system menu will give you a clean shutdown, so that's fine.
NTFS problems are not necessarily the same as disk hardware problems, though of course hardware problems can cause NTFS problems. We have seen one case of corruption that was accompanied by an NTFS error stating that NTFS was unable to flush buffers to disk, so it is worth looking at; thanks for checking.
mongod tests the speed of writing to a new file with the speed of overwriting the same file a second time and if the overwrite is faster it will preallocate journal files on the theory that this will make journaling faster. If it is taking one minute to allocate three 1 GB files, it looks like this heuristic isn't working right on your system. You can use the --nopreallocj option to disable preallocation of the journal files.
What is your RAID setup? Is it using Intel's RAID chipset and Rapid Storage Technology software, Windows' software RAID, or some other solution?
What version of Windows are you running?
You may need to have your current working directory be the directory that mongod.exe is in for the pdb file to be found. I just did some experimenting and that may be the missing ingredient. I'll post a Jira ticket to get that fixed.
On Sunday, October 21, 2012 7:11:18 AM UTC-4, Assaf Lavie wrote:
> And, yes, mongod.pdb is there (part of the download package you linked to).
> Assaf
> On Sun, Oct 21, 2012 at 1:09 PM, Assaf Lavie <assaf...@gmail.com<javascript:> > > wrote:
>> Hi Tad,
>> I rebuilt the RS from a backup and the assertions are now gone, so it >> definitely looks like corruption.
>> The mongod instances are running on my development Windows machine and >> are usually killed off by closing their console windows. There haven't been >> many hard shutdowns (e.g. power loss) at all - can't remember when one >> happened (there's a UPS here).
>> In addition, the drive that contains the data is actually a mirrored RAID >> drive, so the chances of HW problems are very slim indeed.
>> Nothing in Window's event log related to NTFS.
>> I should mention, though, that almost every time I re-launch the mongod's >> of the RS I get the following:
>> Which usually takes about a minute. This happens even now, after >> completely rebuilding the RS.
>> Assaf
>> On Sun, Oct 21, 2012 at 12:33 PM, Tad Marshall <t...@10gen.com<javascript:> >> > wrote:
>>> Hi Assaf,
>>> The assertion pointing to pdfile.h line 360 is indicating corruption. >>> This line verifies that an extent contains a valid signature, and the test >>> is failing.
>>> This was also happening in the earlier log snippet you posted, which I >>> should have noticed, but I focused on the TTLMonitor aspect and missed the >>> corruption part. The second log post adds another indication of >>> corruption; Assertion: 10334:Invalid BSONObj size.
>>> The collections that are named are in your 'local' database; 'local.me' >>> and 'local.slaves'. It seems that both are corrupt.
>>> The messages every second are triggered by a loop that tries to >>> reconnect to the server that the secondary is trying to sync from. It is >>> failing each time, so you get repeating messages. At log level 1 you would >>> see an additional message; "replset could not connect to localhost:27017".
>>> You are not getting a good stack trace (indicated by the "???" lines). >>> Is mongod.pdb in the same directory as mongod.exe? It needs to >>> be find-able by mongod.exe to get information about mongod.exe in the stack >>> trace.
>>> Has this server had hard shutdowns? Can you check the Windows Event Log >>> for NTFS errors?
>>> Tad
>>> On Sunday, October 21, 2012 4:07:58 AM UTC-4, Assaf Lavie wrote:
>>>> Thanks Tad.
>>>> Running the recent nightly yield a different exception every second >>>> (regardless of actual queries):
>>>> Sun Oct 21 09:59:38 [rsSyncNotifier] kernel32.dll >>>> BaseThreadInitThunk+0xd >>>> Sun Oct 21 09:59:38 [rsSyncNotifier] replset tracking exception: >>>> exception: 0 assertion d:\slave\windows_64bit_v2.2\** >>>> mongo\src\mongo\db\pdfile.h:**360 >>>> Sun Oct 21 09:59:39 [rsSyncNotifier] replset setting oplog notifier to >>>> localhost:27017 >>>> Sun Oct 21 09:59:39 [rsSyncNotifier] local.me Assertion failure >>>> isOk() d:\slave\windows_64bit_v2.2\**mongo\src\mongo\db\pdfile.h 360
>>>> However, this only happens on one of the secondary instances.
>>>> If I run mongod with --traceExceptions I don't see an exception every >>>> second - I only see the following exceptions whenever a query is performed:
>>>> Sun Oct 21 10:02:56 [slaveTracking] update local.slaves query: { _id: >>>> ObjectId('**4fba5e0d0c61cd1166308cac'), host: "127.0.0.1", ns: " >>>> local.oplog.rs" } update: { $set: { syncedTo: Timestamp >>>> 1350806576000|1 } } keyUpdates:0 exception: Invalid BSONObj size: 0 >>>> (0x00000000) first element: EOO code:10334 locks(micros) w:88764 88ms >>>> Sun Oct 21 10:02:56 [slaveTracking] Assertion: 10334:Invalid BSONObj >>>> size: 0 (0x00000000) first element: EOO >>>> Sun Oct 21 10:02:56 [slaveTracking] mongod.exe ??? >>>> ... >>>> Sun Oct 21 10:02:56 [slaveTracking] mongod.exe ??? >>>> Sun Oct 21 10:02:56 [slaveTracking] kernel32.dll >>>> BaseThreadInitThunk+0xd >>>> Sun Oct 21 10:02:56 [slaveTracking] warning: DBException thrown :: >>>> caused by :: 10334 Invalid BSONObj size: 0 (0x00000000) first element: EOO
>>>> Assaf
>>>> On Fri, Oct 19, 2012 at 9:53 PM, Tad Marshall <t...@10gen.com> wrote:
>>>>> The [slaveTracking] message may be something different. Was there a >>>>> stack trace following this line? This would help pin down where the error >>>>> is coming from.
>>>>> If there was no stack trace, could you try either restarting >>>>> mongod.exe with an added "--traceExceptions" command line switch or issue a >>>>> command in the shell to enable stack traces for all exceptions:
>>>>> On Windows, you need to have the mongod.pdb file in the same directory >>>>> as mongod.exe to get usable stack traces.
>>>>> Let us know is this works for you, thanks!
>>>>> Tad
>>>>> On Wednesday, October 17, 2012 11:14:20 AM UTC-4, Assaf Lavie wrote:
>>>>>> After upgrading Mongo to 2.3 on my local Windows dev machine, >>>>>> suddenly the replica set console windows report an assertion whenever any >>>>>> query is made against the DB:
>>>>>> TTLMonitor] problem detected during query over local.system.indexes : >>>>>> { $err: "assertion d:\slave\windows_64b >>>>>> it_v2.2\mongo\src\mongo\db\**pdf**ile.h:360" }
>>>>>> [TTLMonitor] ERROR: error processing ttl for db: local 10065 invalid >>>>>> parameter: expected an object ()
>>>>>> Any idea what could cause this? All 3 nodes in teh RS are on the same >>>>>> machine and aside from these errors the queries themselves seem to succeed.
>>>>>> Assaf
>>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "mongodb-user" group. >>>>> To post to this group, send email to mongod...@googlegroups.com
>>>>> To unsubscribe from this group, send email to >>>>> mongodb-user...@**googlegroups.com
>>>>> See also the IRC channel -- freenode.net#mongodb
>>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "mongodb-user" group. >>> To post to this group, send email to mongod...@googlegroups.com<javascript:> >>> To unsubscribe from this group, send email to >>> mongodb-user...@googlegroups.com <javascript:> >>> See also the IRC channel -- freenode.net#mongodb
I'm running Windows 7 64 bit, and using Windows' software RAID to mirror
two HDs.
Alas, I can't recreate the problem any more since I've rebuilt my RS, so
running from the bin directory is something I could try only next time
something similar happens.
On Sun, Oct 21, 2012 at 6:34 PM, Tad Marshall <t...@10gen.com> wrote:
> Hi Assaf,
> Closing the console window by clicking the red X or using Close from the
> system menu will give you a clean shutdown, so that's fine.
> NTFS problems are not necessarily the same as disk hardware problems,
> though of course hardware problems can cause NTFS problems. We have seen
> one case of corruption that was accompanied by an NTFS error stating that
> NTFS was unable to flush buffers to disk, so it is worth looking at; thanks
> for checking.
> mongod tests the speed of writing to a new file with the speed of
> overwriting the same file a second time and if the overwrite is faster it
> will preallocate journal files on the theory that this will make journaling
> faster. If it is taking one minute to allocate three 1 GB files, it looks
> like this heuristic isn't working right on your system. You can use the
> --nopreallocj option to disable preallocation of the journal files.
> What is your RAID setup? Is it using Intel's RAID chipset and Rapid
> Storage Technology software, Windows' software RAID, or some other solution?
> What version of Windows are you running?
> You may need to have your current working directory be the directory that
> mongod.exe is in for the pdb file to be found. I just did some
> experimenting and that may be the missing ingredient. I'll post a Jira
> ticket to get that fixed.
> Tad
> On Sunday, October 21, 2012 7:11:18 AM UTC-4, Assaf Lavie wrote:
>> And, yes, mongod.pdb is there (part of the download package you linked
>> to).
>> Assaf
>> On Sun, Oct 21, 2012 at 1:09 PM, Assaf Lavie <assaf...@gmail.com> wrote:
>>> Hi Tad,
>>> I rebuilt the RS from a backup and the assertions are now gone, so it
>>> definitely looks like corruption.
>>> The mongod instances are running on my development Windows machine and
>>> are usually killed off by closing their console windows. There haven't been
>>> many hard shutdowns (e.g. power loss) at all - can't remember when one
>>> happened (there's a UPS here).
>>> In addition, the drive that contains the data is actually a mirrored
>>> RAID drive, so the chances of HW problems are very slim indeed.
>>> Nothing in Window's event log related to NTFS.
>>> I should mention, though, that almost every time I re-launch the
>>> mongod's of the RS I get the following:
>>> Which usually takes about a minute. This happens even now, after
>>> completely rebuilding the RS.
>>> Assaf
>>> On Sun, Oct 21, 2012 at 12:33 PM, Tad Marshall <t...@10gen.com> wrote:
>>>> Hi Assaf,
>>>> The assertion pointing to pdfile.h line 360 is indicating corruption.
>>>> This line verifies that an extent contains a valid signature, and the test
>>>> is failing.
>>>> This was also happening in the earlier log snippet you posted, which I
>>>> should have noticed, but I focused on the TTLMonitor aspect and missed the
>>>> corruption part. The second log post adds another indication of
>>>> corruption; Assertion: 10334:Invalid BSONObj size.
>>>> The collections that are named are in your 'local' database; 'local.me'
>>>> and 'local.slaves'. It seems that both are corrupt.
>>>> The messages every second are triggered by a loop that tries to
>>>> reconnect to the server that the secondary is trying to sync from. It is
>>>> failing each time, so you get repeating messages. At log level 1 you would
>>>> see an additional message; "replset could not connect to localhost:27017".
>>>> You are not getting a good stack trace (indicated by the "???" lines).
>>>> Is mongod.pdb in the same directory as mongod.exe? It needs to
>>>> be find-able by mongod.exe to get information about mongod.exe in the stack
>>>> trace.
>>>> Has this server had hard shutdowns? Can you check the Windows Event
>>>> Log for NTFS errors?
>>>> Tad
>>>> On Sunday, October 21, 2012 4:07:58 AM UTC-4, Assaf Lavie wrote:
>>>>> Thanks Tad.
>>>>> Running the recent nightly yield a different exception every second
>>>>> (regardless of actual queries):
>>>>> Sun Oct 21 09:59:38 [rsSyncNotifier] kernel32.dll
>>>>> BaseThreadInitThunk+0xd
>>>>> Sun Oct 21 09:59:38 [rsSyncNotifier] replset tracking exception:
>>>>> exception: 0 assertion d:\slave\windows_64bit_v2.2\**mo**
>>>>> ngo\src\mongo\db\pdfile.h:**360
>>>>> Sun Oct 21 09:59:39 [rsSyncNotifier] replset setting oplog notifier to
>>>>> localhost:27017
>>>>> Sun Oct 21 09:59:39 [rsSyncNotifier] local.me Assertion failure
>>>>> isOk() d:\slave\windows_64bit_v2.2\**mo**ngo\src\mongo\db\pdfile.h 360
>>>>> However, this only happens on one of the secondary instances.
>>>>> If I run mongod with --traceExceptions I don't see an exception every
>>>>> second - I only see the following exceptions whenever a query is performed:
>>>>> Sun Oct 21 10:02:56 [slaveTracking] update local.slaves query: { _id:
>>>>> ObjectId('**4fba5e0d0c61cd116630**8cac'), host: "127.0.0.1", ns: "
>>>>> local.oplog.rs" } update: { $set: { syncedTo: Timestamp
>>>>> 1350806576000|1 } } keyUpdates:0 exception: Invalid BSONObj size: 0
>>>>> (0x00000000) first element: EOO code:10334 locks(micros) w:88764 88ms
>>>>> Sun Oct 21 10:02:56 [slaveTracking] Assertion: 10334:Invalid BSONObj
>>>>> size: 0 (0x00000000) first element: EOO
>>>>> Sun Oct 21 10:02:56 [slaveTracking] mongod.exe ???
>>>>> ...
>>>>> Sun Oct 21 10:02:56 [slaveTracking] mongod.exe ???
>>>>> Sun Oct 21 10:02:56 [slaveTracking] kernel32.dll
>>>>> BaseThreadInitThunk+0xd
>>>>> Sun Oct 21 10:02:56 [slaveTracking] warning: DBException thrown ::
>>>>> caused by :: 10334 Invalid BSONObj size: 0 (0x00000000) first element: EOO
>>>>> Assaf
>>>>> On Fri, Oct 19, 2012 at 9:53 PM, Tad Marshall <t...@10gen.com> wrote:
>>>>>> The [slaveTracking] message may be something different. Was there a
>>>>>> stack trace following this line? This would help pin down where the error
>>>>>> is coming from.
>>>>>> If there was no stack trace, could you try either restarting
>>>>>> mongod.exe with an added "--traceExceptions" command line switch or issue a
>>>>>> command in the shell to enable stack traces for all exceptions:
>>>>>> On Windows, you need to have the mongod.pdb file in the same
>>>>>> directory as mongod.exe to get usable stack traces.
>>>>>> Let us know is this works for you, thanks!
>>>>>> Tad
>>>>>> On Wednesday, October 17, 2012 11:14:20 AM UTC-4, Assaf Lavie wrote:
>>>>>>> After upgrading Mongo to 2.3 on my local Windows dev machine,
>>>>>>> suddenly the replica set console windows report an assertion whenever any
>>>>>>> query is made against the DB:
>>>>>>> TTLMonitor] problem detected during query over local.system.indexes
>>>>>>> : { $err: "assertion d:\slave\windows_64b
>>>>>>> it_v2.2\mongo\src\mongo\db\**pdf****ile.h:360" }
>>>>>>> [TTLMonitor] ERROR: error processing ttl for db: local 10065 invalid
>>>>>>> parameter: expected an object ()
>>>>>>> Any idea what could cause this? All 3 nodes in teh RS are on the
>>>>>>> same machine and aside from these errors the queries themselves seem to
>>>>>>> succeed.
>>>>>>> Assaf
>>>>>> --
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "mongodb-user" group.
>>>>>> To post to this group, send email to mongod...@googlegroups.com
>>>>>> To unsubscribe from this group, send email to
>>>>>> mongodb-user...@**googlegroups.**com
>>>>>> See also the IRC channel -- freenode.net#mongodb
>>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "mongodb-user" group.
>>>> To post to this group, send email to mongod...@googlegroups.com
>>>> To unsubscribe from this group, send email to
>>>> mongodb-user...@**googlegroups.com
>>>> See also the IRC channel -- freenode.net#mongodb
>> --
> You received this message because you are subscribed to the Google
> Groups "mongodb-user"
I filed https://jira.mongodb.org/browse/SERVER-7435 to get the Windows stack traces to work right when starting mongod.exe in a directory other than the current working directory.
I'll do some testing with Windows' software RAID and see if I can duplicate the corruption you saw. Was the corruption just on a secondary, and if so, had it been a primary at an earlier time? Are you using capped collections or only normal collections? How large are your disk drives? They are in RAID 1 mode (mirroring), correct? How large is your database, and does it consist of a single collection or some other number? When writing to the database, are you mostly inserting or do you do a lot of updates? If you do updates, are they mostly in-place updates that would not change the size of a record, or are they additions of new information that would force the records to move to a new location? How many indexes do you have on your collections? Do you index just simple fields, or do you have indexes on arrays?
I'm just probing for the parameters I should use in trying to reproduce your problem. Thanks in advance!
On Monday, October 22, 2012 1:40:35 AM UTC-4, Assaf Lavie wrote:
> Thanks for all the explanations.
> I'm running Windows 7 64 bit, and using Windows' software RAID to mirror > two HDs.
> Alas, I can't recreate the problem any more since I've rebuilt my RS, so > running from the bin directory is something I could try only next time > something similar happens.
> Thanks, > Assaf
> On Sun, Oct 21, 2012 at 6:34 PM, Tad Marshall <t...@10gen.com<javascript:> > > wrote:
>> Hi Assaf,
>> Closing the console window by clicking the red X or using Close from the >> system menu will give you a clean shutdown, so that's fine.
>> NTFS problems are not necessarily the same as disk hardware problems, >> though of course hardware problems can cause NTFS problems. We have seen >> one case of corruption that was accompanied by an NTFS error stating that >> NTFS was unable to flush buffers to disk, so it is worth looking at; thanks >> for checking.
>> mongod tests the speed of writing to a new file with the speed of >> overwriting the same file a second time and if the overwrite is faster it >> will preallocate journal files on the theory that this will make journaling >> faster. If it is taking one minute to allocate three 1 GB files, it looks >> like this heuristic isn't working right on your system. You can use the >> --nopreallocj option to disable preallocation of the journal files.
>> What is your RAID setup? Is it using Intel's RAID chipset and Rapid >> Storage Technology software, Windows' software RAID, or some other solution?
>> What version of Windows are you running?
>> You may need to have your current working directory be the directory that >> mongod.exe is in for the pdb file to be found. I just did some >> experimenting and that may be the missing ingredient. I'll post a Jira >> ticket to get that fixed.
>> Tad
>> On Sunday, October 21, 2012 7:11:18 AM UTC-4, Assaf Lavie wrote:
>>> And, yes, mongod.pdb is there (part of the download package you linked >>> to).
>>> Assaf
>>> On Sun, Oct 21, 2012 at 1:09 PM, Assaf Lavie <assaf...@gmail.com> wrote:
>>>> Hi Tad,
>>>> I rebuilt the RS from a backup and the assertions are now gone, so it >>>> definitely looks like corruption.
>>>> The mongod instances are running on my development Windows machine and >>>> are usually killed off by closing their console windows. There haven't been >>>> many hard shutdowns (e.g. power loss) at all - can't remember when one >>>> happened (there's a UPS here).
>>>> In addition, the drive that contains the data is actually a mirrored >>>> RAID drive, so the chances of HW problems are very slim indeed.
>>>> Nothing in Window's event log related to NTFS.
>>>> I should mention, though, that almost every time I re-launch the >>>> mongod's of the RS I get the following:
>>>> Which usually takes about a minute. This happens even now, after >>>> completely rebuilding the RS.
>>>> Assaf
>>>> On Sun, Oct 21, 2012 at 12:33 PM, Tad Marshall <t...@10gen.com> wrote:
>>>>> Hi Assaf,
>>>>> The assertion pointing to pdfile.h line 360 is indicating corruption. >>>>> This line verifies that an extent contains a valid signature, and the test >>>>> is failing.
>>>>> This was also happening in the earlier log snippet you posted, which I >>>>> should have noticed, but I focused on the TTLMonitor aspect and missed the >>>>> corruption part. The second log post adds another indication of >>>>> corruption; Assertion: 10334:Invalid BSONObj size.
>>>>> The collections that are named are in your 'local' database; 'local.me' >>>>> and 'local.slaves'. It seems that both are corrupt.
>>>>> The messages every second are triggered by a loop that tries to >>>>> reconnect to the server that the secondary is trying to sync from. It is >>>>> failing each time, so you get repeating messages. At log level 1 you would >>>>> see an additional message; "replset could not connect to localhost:27017".
>>>>> You are not getting a good stack trace (indicated by the "???" lines). >>>>> Is mongod.pdb in the same directory as mongod.exe? It needs to >>>>> be find-able by mongod.exe to get information about mongod.exe in the stack >>>>> trace.
>>>>> Has this server had hard shutdowns? Can you check the Windows Event >>>>> Log for NTFS errors?
>>>>> Tad
>>>>> On Sunday, October 21, 2012 4:07:58 AM UTC-4, Assaf Lavie wrote:
>>>>>> Thanks Tad.
>>>>>> Running the recent nightly yield a different exception every second >>>>>> (regardless of actual queries):
>>>>>> Sun Oct 21 09:59:38 [rsSyncNotifier] kernel32.dll >>>>>> BaseThreadInitThunk+0xd >>>>>> Sun Oct 21 09:59:38 [rsSyncNotifier] replset tracking exception: >>>>>> exception: 0 assertion d:\slave\windows_64bit_v2.2\**mo** >>>>>> ngo\src\mongo\db\pdfile.h:**360 >>>>>> Sun Oct 21 09:59:39 [rsSyncNotifier] replset setting oplog notifier >>>>>> to localhost:27017 >>>>>> Sun Oct 21 09:59:39 [rsSyncNotifier] local.me Assertion failure >>>>>> isOk() d:\slave\windows_64bit_v2.2\**mo**ngo\src\mongo\db\pdfile.h >>>>>> 360
>>>>>> However, this only happens on one of the secondary instances.
>>>>>> If I run mongod with --traceExceptions I don't see an exception every >>>>>> second - I only see the following exceptions whenever a query is performed:
>>>>>> Sun Oct 21 10:02:56 [slaveTracking] update local.slaves query: { _id: >>>>>> ObjectId('**4fba5e0d0c61cd116630**8cac'), host: "127.0.0.1", ns: " >>>>>> local.oplog.rs" } update: { $set: { syncedTo: Timestamp >>>>>> 1350806576000|1 } } keyUpdates:0 exception: Invalid BSONObj size: 0 >>>>>> (0x00000000) first element: EOO code:10334 locks(micros) w:88764 88ms >>>>>> Sun Oct 21 10:02:56 [slaveTracking] Assertion: 10334:Invalid BSONObj >>>>>> size: 0 (0x00000000) first element: EOO >>>>>> Sun Oct 21 10:02:56 [slaveTracking] mongod.exe ??? >>>>>> ... >>>>>> Sun Oct 21 10:02:56 [slaveTracking] mongod.exe ??? >>>>>> Sun Oct 21 10:02:56 [slaveTracking] kernel32.dll >>>>>> BaseThreadInitThunk+0xd >>>>>> Sun Oct 21 10:02:56 [slaveTracking] warning: DBException thrown :: >>>>>> caused by :: 10334 Invalid BSONObj size: 0 (0x00000000) first element: EOO
>>>>>> Assaf
>>>>>> On Fri, Oct 19, 2012 at 9:53 PM, Tad Marshall <t...@10gen.com> wrote:
>>>>>>> The [slaveTracking] message may be something different. Was there a >>>>>>> stack trace following this line? This would help pin down where the error >>>>>>> is coming from.
>>>>>>> If there was no stack trace, could you try either restarting >>>>>>> mongod.exe with an added "--traceExceptions" command line switch or issue a >>>>>>> command in the shell to enable stack traces for all exceptions:
>>>>>>> On Windows, you need to have the mongod.pdb file in the same >>>>>>> directory as mongod.exe to get usable stack traces.
>>>>>>> Let us know is this works for you, thanks!
>>>>>>> Tad
>>>>>>> On Wednesday, October 17, 2012 11:14:20 AM UTC-4, Assaf Lavie wrote:
>>>>>>>> After upgrading Mongo to 2.3 on my local Windows dev machine, >>>>>>>> suddenly the replica set console windows report an assertion whenever any >>>>>>>> query is made against the DB:
On Mon, Oct 22, 2012 at 2:19 PM, Tad Marshall <t...@10gen.com> wrote:
> Hi Assaf,
> I filed https://jira.mongodb.org/browse/SERVER-7435 to get the Windows
> stack traces to work right when starting mongod.exe in a directory other
> than the current working directory.
Appreciated.
> I'll do some testing with Windows' software RAID and see if I can
> duplicate the corruption you saw. Was the corruption just on a secondary,
> and if so, had it been a primary at an earlier time?
The errors I was getting were from secondary instances. However, I have no
idea where exactly the corruption was. All I know is that I deleted all
data directories (of all 3 nodes) and restored the data from a dump.
> Are you using capped collections or only normal collections?
Yes, some capped collections for logging as well.
> How large are your disk drives? They are in RAID 1 mode (mirroring),
> correct?
Yes. 2x2TB in a mirror raid (1).
> How large is your database, and does it consist of a single collection or
> some other number?
The data dump is about 4GB in size and is comprised of about 10
collections.
> When writing to the database, are you mostly inserting or do you do a lot
> of updates?
Very few updates.
> If you do updates, are they mostly in-place updates that would not change
> the size of a record, or are they additions of new information that would
> force the records to move to a new location?
We do update some documents (seldom), and when we do we might very well
change the size of the content in some field, but not by much. Only very
tiny documents are updated (few 1..2KB...)
> How many indexes do you have on your collections?
Typically no more than a few per collection. Most of the times just one
field indexed.
> Do you index just simple fields, or do you have indexes on arrays?
No indexes on arrays.
> I'm just probing for the parameters I should use in trying to reproduce
> your problem. Thanks in advance!
No problem. I appreciate the thorough investigation.
> On Monday, October 22, 2012 1:40:35 AM UTC-4, Assaf Lavie wrote:
>> Thanks for all the explanations.
>> I'm running Windows 7 64 bit, and using Windows' software RAID to mirror
>> two HDs.
>> Alas, I can't recreate the problem any more since I've rebuilt my RS, so
>> running from the bin directory is something I could try only next time
>> something similar happens.
>> Thanks,
>> Assaf
>> On Sun, Oct 21, 2012 at 6:34 PM, Tad Marshall <t...@10gen.com> wrote:
>>> Hi Assaf,
>>> Closing the console window by clicking the red X or using Close from the
>>> system menu will give you a clean shutdown, so that's fine.
>>> NTFS problems are not necessarily the same as disk hardware problems,
>>> though of course hardware problems can cause NTFS problems. We have seen
>>> one case of corruption that was accompanied by an NTFS error stating that
>>> NTFS was unable to flush buffers to disk, so it is worth looking at; thanks
>>> for checking.
>>> mongod tests the speed of writing to a new file with the speed of
>>> overwriting the same file a second time and if the overwrite is faster it
>>> will preallocate journal files on the theory that this will make journaling
>>> faster. If it is taking one minute to allocate three 1 GB files, it looks
>>> like this heuristic isn't working right on your system. You can use the
>>> --nopreallocj option to disable preallocation of the journal files.
>>> What is your RAID setup? Is it using Intel's RAID chipset and Rapid
>>> Storage Technology software, Windows' software RAID, or some other solution?
>>> What version of Windows are you running?
>>> You may need to have your current working directory be the directory
>>> that mongod.exe is in for the pdb file to be found. I just did some
>>> experimenting and that may be the missing ingredient. I'll post a Jira
>>> ticket to get that fixed.
>>> Tad
>>> On Sunday, October 21, 2012 7:11:18 AM UTC-4, Assaf Lavie wrote:
>>>> And, yes, mongod.pdb is there (part of the download package you linked
>>>> to).
>>>> Assaf
>>>> On Sun, Oct 21, 2012 at 1:09 PM, Assaf Lavie <assaf...@gmail.com>wrote:
>>>>> Hi Tad,
>>>>> I rebuilt the RS from a backup and the assertions are now gone, so it
>>>>> definitely looks like corruption.
>>>>> The mongod instances are running on my development Windows machine and
>>>>> are usually killed off by closing their console windows. There haven't been
>>>>> many hard shutdowns (e.g. power loss) at all - can't remember when one
>>>>> happened (there's a UPS here).
>>>>> In addition, the drive that contains the data is actually a mirrored
>>>>> RAID drive, so the chances of HW problems are very slim indeed.
>>>>> Nothing in Window's event log related to NTFS.
>>>>> I should mention, though, that almost every time I re-launch the
>>>>> mongod's of the RS I get the following:
>>>>> Which usually takes about a minute. This happens even now, after
>>>>> completely rebuilding the RS.
>>>>> Assaf
>>>>> On Sun, Oct 21, 2012 at 12:33 PM, Tad Marshall <t...@10gen.com> wrote:
>>>>>> Hi Assaf,
>>>>>> The assertion pointing to pdfile.h line 360 is indicating corruption.
>>>>>> This line verifies that an extent contains a valid signature, and the test
>>>>>> is failing.
>>>>>> This was also happening in the earlier log snippet you posted, which
>>>>>> I should have noticed, but I focused on the TTLMonitor aspect and missed
>>>>>> the corruption part. The second log post adds another indication of
>>>>>> corruption; Assertion: 10334:Invalid BSONObj size.
>>>>>> The collections that are named are in your 'local' database; '
>>>>>> local.me' and 'local.slaves'. It seems that both are corrupt.
>>>>>> The messages every second are triggered by a loop that tries to
>>>>>> reconnect to the server that the secondary is trying to sync from. It is
>>>>>> failing each time, so you get repeating messages. At log level 1 you would
>>>>>> see an additional message; "replset could not connect to localhost:27017".
>>>>>> You are not getting a good stack trace (indicated by the "???"
>>>>>> lines). Is mongod.pdb in the same directory as mongod.exe? It needs to
>>>>>> be find-able by mongod.exe to get information about mongod.exe in the stack
>>>>>> trace.
>>>>>> Has this server had hard shutdowns? Can you check the Windows Event
>>>>>> Log for NTFS errors?
>>>>>> Tad
>>>>>> On Sunday, October 21, 2012 4:07:58 AM UTC-4, Assaf Lavie wrote:
>>>>>>> Thanks Tad.
>>>>>>> Running the recent nightly yield a different exception every second
>>>>>>> (regardless of actual queries):
>>>>>>> Sun Oct 21 09:59:38 [rsSyncNotifier] kernel32.dll
>>>>>>> BaseThreadInitThunk+0xd
>>>>>>> Sun Oct 21 09:59:38 [rsSyncNotifier] replset tracking exception:
>>>>>>> exception: 0 assertion d:\slave\windows_64bit_v2.2\**mo****
>>>>>>> ngo\src\mongo\db\pdfile.h:**360
>>>>>>> Sun Oct 21 09:59:39 [rsSyncNotifier] replset setting oplog notifier
>>>>>>> to localhost:27017
>>>>>>> Sun Oct 21 09:59:39 [rsSyncNotifier] local.me Assertion failure
>>>>>>> isOk() d:\slave\windows_64bit_v2.2\**mo****ngo\src\mongo\db\pdfile.h
>>>>>>> 360
>>>>>>> However, this only happens on one of the secondary instances.
>>>>>>> If I run mongod with --traceExceptions I don't see an exception
>>>>>>> every second - I only see the following exceptions whenever a query is
>>>>>>> performed:
>>>>>>> Sun Oct 21 10:02:56 [slaveTracking] update local.slaves query: {
>>>>>>> _id: ObjectId('**4fba5e0d0c61cd116630****8cac'), host: "127.0.0.1",
>>>>>>> ns: "local.oplog.rs" } update: { $set: { syncedTo: Timestamp
>>>>>>> 1350806576000|1 } } keyUpdates:0 exception: Invalid BSONObj size: 0
>>>>>>> (0x00000000) first element: EOO code:10334 locks(micros) w:88764 88ms
>>>>>>> Sun Oct 21 10:02:56 [slaveTracking] Assertion: 10334:Invalid BSONObj
>>>>>>> size: 0 (0x00000000) first element: EOO
>>>>>>> Sun Oct 21 10:02:56 [slaveTracking] mongod.exe ???
>>>>>>> ...
>>>>>>> Sun Oct 21 10:02:56 [slaveTracking] mongod.exe ???
>>>>>>> Sun Oct 21 10:02:56 [slaveTracking] kernel32.dll
>>>>>>> BaseThreadInitThunk+0xd
>>>>>>> Sun Oct 21 10:02:56 [slaveTracking] warning: DBException thrown ::
>>>>>>> caused by :: 10334 Invalid BSONObj size: 0 (0x00000000) first element: EOO
>>>>>>> Assaf
>>>>>>> On Fri, Oct 19, 2012 at 9:53 PM, Tad Marshall <t...@10gen.com>wrote:
>>>>>>>> The [slaveTracking] message may be something different. Was there
>>>>>>>> a stack trace following this line? This would help pin down where the
>>>>>>>> error is coming from.
>>>>>>>> If there was no stack trace, could you try either restarting
>>>>>>>> mongod.exe with an added "--traceExceptions" command line switch or