Got signal: 7 (Bus error)

908 views
Skip to first unread message

Markku

unread,
Jul 17, 2012, 3:36:53 AM7/17/12
to mongod...@googlegroups.com
What causes following crash


ubuntu@ip-10-204-113-153:/ebs/trivian/scripts/web$ sudo mongod --version
db version v2.0.3, pdfile version 4.5
Tue Jul 17 07:36:18 git version: 05bb8aa793660af8fce7e36b510ad48c27439697

Tue Jul 17 07:28:54 [conn126315] profile: warning ns trivian.system.profile does not exist
Tue Jul 17 07:28:56 [snapshotthread] cpu: elapsed:4000  writelock: 9%
Tue Jul 17 07:29:00 [snapshotthread] cpu: elapsed:4000  writelock: 8%
Tue Jul 17 07:29:04 [snapshotthread] cpu: elapsed:4000  writelock: 8%
Tue Jul 17 07:29:08 [snapshotthread] cpu: elapsed:4000  writelock: 0%
Tue Jul 17 07:29:12 [snapshotthread] cpu: elapsed:4000  writelock: 0%
Tue Jul 17 07:29:14 [conn126516] profile: warning ns trivian.system.profile does not exist
Tue Jul 17 07:29:16 [snapshotthread] cpu: elapsed:4000  writelock: 0%
Tue Jul 17 07:29:20 [snapshotthread] cpu: elapsed:4005  writelock: 0%
Tue Jul 17 07:29:24 [snapshotthread] cpu: elapsed:4000  writelock: 0%
Tue Jul 17 07:29:28 [snapshotthread] cpu: elapsed:4000  writelock: 0%
Tue Jul 17 07:29:32 [snapshotthread] cpu: elapsed:4000  writelock: 0%
Tue Jul 17 07:29:36 [snapshotthread] cpu: elapsed:4000  writelock: 0%
Tue Jul 17 07:29:37 [conn126516] query trivian.places ntoreturn:100 nscanned:6711484 scanAndOrder:1 nreturned:100 reslen:17279 22242ms
Tue Jul 17 07:29:37 Invalid access at address: 0x7f6a16dfa094

Tue Jul 17 07:29:37 Got signal: 7 (Bus error).

Tue Jul 17 07:29:37 Backtrace:
0xa90d79 0xa91350 0x7f6a31172cb0 0x7799bc 0x7792b5 0x88e3ec 0xaa37d6 0x637497 0x7f6a3116ae9a 0x7f6a306884bd
 mongod(_ZN5mongo10abruptQuitEi+0x399) [0xa90d79]
 mongod(_ZN5mongo24abruptQuitWithAddrSignalEiP7siginfoPv+0x220) [0xa91350]
 /lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0) [0x7f6a31172cb0]
 mongod(_ZN5mongo14NamespaceIndex7detailsEPKc+0x11c) [0x7799bc]
 mongod(_ZN5mongo7profileERKNS_6ClientERNS_5CurOpE+0xbf5) [0x7792b5]
 mongod(_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0x9fc) [0x88e3ec]
 mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0x76) [0xaa37d6]
 mongod(_ZN5mongo3pms9threadRunEPNS_13MessagingPortE+0x287) [0x637497]
 /lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a) [0x7f6a3116ae9a]
 /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f6a306884bd]

Logstream::get called in uninitialized state
Tue Jul 17 07:29:37 ERROR: Client::~Client _context should be null but is not; client:conn
Logstream::get called in uninitialized state
Tue Jul 17 07:29:37 ERROR: Client::shutdown not called: conn

Markku

unread,
Jul 17, 2012, 3:38:58 AM7/17/12
to mongod...@googlegroups.com
Nice.

Now, MongoDb does not startup anymore.

Tue Jul 17 07:37:00 [initandlisten] MongoDB starting : pid=25677 port=27017 dbpath=/ebs/trivian/mongodb/data 64-bit host=ip-10-204-113-153
Tue Jul 17 07:37:00 [initandlisten] db version v2.0.3, pdfile version 4.5
Tue Jul 17 07:37:00 [initandlisten] git version: 05bb8aa793660af8fce7e36b510ad48c27439697
Tue Jul 17 07:37:00 [initandlisten] build info: Linux ip-10-110-9-236 2.6.21.7-2.ec2.v1.2.fc8xen #1 SMP Fri Nov 20 17:48:28 EST 2009 x86_64 BOOST_LIB_VERSION=1_41
Tue Jul 17 07:37:00 [initandlisten] options: { bind_ip: "127.0.0.1,10.244.18.96", cpu: true, dbpath: "/ebs/trivian/mongodb/data", fork: true, journal: true, logpath: "/ebs/trivian/mongodb/logs/mongodb.log", nohttpinterface: true, profile: 2, quiet: true, slowms: 100 }
Tue Jul 17 07:37:00 [initandlisten] journal dir=/ebs/trivian/mongodb/data/journal
Tue Jul 17 07:37:00 [initandlisten] recover begin
Tue Jul 17 07:37:00 [initandlisten] recover lsn: 45301907
Tue Jul 17 07:37:00 [initandlisten] recover /ebs/trivian/mongodb/data/journal/j._22
Tue Jul 17 07:37:00 [initandlisten] recover skipping application of section seq:42087140 < lsn:45301907
Tue Jul 17 07:37:00 [initandlisten] recover skipping application of section seq:42145461 < lsn:45301907
Tue Jul 17 07:37:01 [initandlisten] recover skipping application of section seq:42201781 < lsn:45301907
Tue Jul 17 07:37:01 [initandlisten] recover skipping application of section seq:42260565 < lsn:45301907
Tue Jul 17 07:37:01 [initandlisten] recover skipping application of section seq:42317114 < lsn:45301907
Tue Jul 17 07:37:02 [initandlisten] recover skipping application of section seq:42375803 < lsn:45301907
Tue Jul 17 07:37:02 [initandlisten] recover skipping application of section seq:42433694 < lsn:45301907
Tue Jul 17 07:37:03 [initandlisten] recover skipping application of section seq:42491068 < lsn:45301907
Tue Jul 17 07:37:03 [initandlisten] recover skipping application of section seq:42547245 < lsn:45301907
Tue Jul 17 07:37:03 [initandlisten] recover skipping application of section more...
Tue Jul 17 07:37:12 [initandlisten] recover /ebs/trivian/mongodb/data/journal/j._23
Tue Jul 17 07:37:17 [initandlisten] recover cleaning up
Tue Jul 17 07:37:17 [initandlisten] removeJournalFiles
Tue Jul 17 07:37:17 [initandlisten] recover done
Tue Jul 17 07:37:17 [initandlisten] profile: warning ns local.system.profile does not exist
Tue Jul 17 07:37:17 [initandlisten] bad .ns file: /ebs/trivian/mongodb/data/trivian.ns
Tue Jul 17 07:37:17 [initandlisten] exception in initAndListen: 10079 bad .ns file length, cannot open database, terminating
Tue Jul 17 07:37:17 dbexit:
Tue Jul 17 07:37:17 [initandlisten] shutdown: going to close listening sockets...
Tue Jul 17 07:37:17 [initandlisten] shutdown: going to flush diaglog...
Tue Jul 17 07:37:17 [initandlisten] shutdown: going to close sockets...
Tue Jul 17 07:37:17 [initandlisten] shutdown: waiting for fs preallocator...
Tue Jul 17 07:37:17 [initandlisten] shutdown: lock for final commit...
Tue Jul 17 07:37:17 [initandlisten] shutdown: final commit...
Tue Jul 17 07:37:17 [initandlisten] shutdown: closing all files...
Tue Jul 17 07:37:17 [initandlisten] closeAllFiles() finished
Tue Jul 17 07:37:17 [initandlisten] journalCleanup...
Tue Jul 17 07:37:17 [initandlisten] removeJournalFiles
Tue Jul 17 07:37:17 [initandlisten] shutdown: removing fs lock...
Tue Jul 17 07:37:17 dbexit: really exiting now

-Markku

Markku

unread,
Jul 17, 2012, 3:42:52 AM7/17/12
to mongod...@googlegroups.com
Repair does not work.

ubuntu@ip-10-204-113-153:/ebs/trivian/scripts/web$ sudo mongod --repair --dbpath=/ebs/trivian/mongodb/data --quiet --nohttpinterface --profile=2 --slowms=100 --journal --cpu --fork --bind_ip 127.0.0.1,10.244.18.96 --logpath=/ebs/trivian/mongodb/logs/mongodb.lo
forked process: 25743
all output going to: /ebs/trivian/mongodb/logs/mongodb.lo
ubuntu@ip-10-204-113-153:/ebs/trivian/scripts/web$ sudo mongod --repair --dbpath=/ebs/trivian/mongodb/data
Tue Jul 17 07:42:06 [initandlisten] MongoDB starting : pid=25750 port=27017 dbpath=/ebs/trivian/mongodb/data 64-bit host=ip-10-204-113-153
Tue Jul 17 07:42:06 [initandlisten] db version v2.0.3, pdfile version 4.5
Tue Jul 17 07:42:06 [initandlisten] git version: 05bb8aa793660af8fce7e36b510ad48c27439697
Tue Jul 17 07:42:06 [initandlisten] build info: Linux ip-10-110-9-236 2.6.21.7-2.ec2.v1.2.fc8xen #1 SMP Fri Nov 20 17:48:28 EST 2009 x86_64 BOOST_LIB_VERSION=1_41
Tue Jul 17 07:42:06 [initandlisten] options: { dbpath: "/ebs/trivian/mongodb/data", repair: true }
Tue Jul 17 07:42:06 [initandlisten] journal dir=/ebs/trivian/mongodb/data/journal
Tue Jul 17 07:42:06 [initandlisten] recover : no journal files present, no recovery needed
Tue Jul 17 07:42:06 [initandlisten] bad .ns file: /ebs/trivian/mongodb/data/trivian.ns
Tue Jul 17 07:42:06 [initandlisten] exception in initAndListen: 10079 bad .ns file length, cannot open database, terminating
Tue Jul 17 07:42:06 dbexit:
Tue Jul 17 07:42:06 [initandlisten] shutdown: going to close listening sockets...
Tue Jul 17 07:42:06 [initandlisten] shutdown: going to flush diaglog...
Tue Jul 17 07:42:06 [initandlisten] shutdown: going to close sockets...
Tue Jul 17 07:42:06 [initandlisten] shutdown: waiting for fs preallocator...
Tue Jul 17 07:42:06 [initandlisten] shutdown: lock for final commit...
Tue Jul 17 07:42:06 [initandlisten] shutdown: final commit...
Tue Jul 17 07:42:06 [initandlisten] shutdown: closing all files...
Tue Jul 17 07:42:06 [initandlisten] closeAllFiles() finished
Tue Jul 17 07:42:06 [initandlisten] journalCleanup...
Tue Jul 17 07:42:06 [initandlisten] removeJournalFiles
Tue Jul 17 07:42:06 [initandlisten] shutdown: removing fs lock...
Tue Jul 17 07:42:06 dbexit: really exiting now

On Tuesday, 17 July 2012 10:36:53 UTC+3, Markku wrote:

Markku

unread,
Jul 17, 2012, 3:48:14 AM7/17/12
to mongod...@googlegroups.com
Anything suspicious here?

ubuntu@ip-10-204-113-153:~$ ls -la /ebs/trivian/mongodb/data/
total 18964548
drwxr-xr-x 3 root root       4096 Jul 16 18:17 .
drwxr-xr-x 4 root root       4096 Jul  3 12:44 ..
-rw------- 1 root root   67108864 Jul 16 13:27 *.0
drwxr-xr-x 2 root root       4096 Jul 17 07:42 journal
-rw------- 1 root root   67108864 Jul 12 12:25 local.0
-rw------- 1 root root   16777216 Jul 12 12:25 local.ns
-rwxr-xr-x 1 root root          0 Jul 17 07:42 mongod.lock
-rw------- 1 root root   16777216 Jul 12 17:24 *.ns
-rw------- 1 root root   67108864 Jul 16 20:59 trivian.0
-rw------- 1 root root  134217728 Jul 17 07:37 trivian.1
-rw------- 1 root root 2146435072 Jul 16 14:44 trivian.10
-rw------- 1 root root 2146435072 Jul 16 14:46 trivian.11
-rw------- 1 root root 2146435072 Jul 17 07:37 trivian.12
-rw------- 1 root root  268435456 Jul 16 14:46 trivian.2
-rw------- 1 root root  536870912 Jul 16 14:46 trivian.3
-rw------- 1 root root 1073741824 Jul 17 07:37 trivian.4
-rw------- 1 root root 2146435072 Jul 17 07:37 trivian.5
-rw------- 1 root root 2146435072 Jul 17 07:37 trivian.6
-rw------- 1 root root 2146435072 Jul 17 07:37 trivian.7
-rw------- 1 root root 2146435072 Jul 17 07:37 trivian.8
-rw------- 1 root root 2146435072 Jul 16 14:44 trivian.9
-rw------- 1 root root          2 Jul 17 07:29 trivian.ns

On Tuesday, 17 July 2012 10:36:53 UTC+3, Markku wrote:

Sougata Pal.

unread,
Jul 17, 2012, 3:49:54 AM7/17/12
to mongod...@googlegroups.com
Why trivian.ns is having this low size?

--
You received this message because you are subscribed to the Google
Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com
To unsubscribe from this group, send email to
mongodb-user...@googlegroups.com
See also the IRC channel -- freenode.net#mongodb



--
Thanks
Sougata Pal.
Chief Architect, Techunits
http://in.linkedin.com/in/skallpaul

Markku

unread,
Jul 17, 2012, 6:17:01 AM7/17/12
to mongod...@googlegroups.com
I don't know. Can someone explain this?

-Markku

See also the IRC channel -- freenode.net#mongodb

markh

unread,
Jul 17, 2012, 9:25:05 AM7/17/12
to mongod...@googlegroups.com
Hi,

Is this part of a replica set? If so, you could step this node down (assuming the data on the secondary is good), clear the data on the primary and perform a full resync. This would be quite simple.

On the other hand, looking at your "ls -la" output -

-rw------- 1 root root   16777216 Jul 12 17:24 *.ns
-rw------- 1 root root   67108864 Jul 16 20:59 trivian.0
........
-rw------- 1 root root 2146435072 Jul 16 14:44 trivian.9
-rw------- 1 root root          2 Jul 17 07:29 trivian.ns

It looks like there was a possible mistake made previously and the original trivian.ns was moved to become "*.ns", which is the correct 16mb size for the namespace. Can you check this out and see if this was the case or was it created otherwise?

I can recreate the issue manually by renaming the .ns file for one of my databases and resolve it by renaming it back to the original, however, I'm not sure if it'll be the same solution for your case.

Mark

Markku

unread,
Jul 18, 2012, 5:06:20 AM7/18/12
to mongod...@googlegroups.com
I have been always wondering that is this *.ns file all about. Anyway, I have already removed data folder. Next time I will try this trick. I am sure MongoDB hangs soon again. It has been super unreliable in Amazon Cloud.

-Markku

markh

unread,
Jul 18, 2012, 5:39:21 AM7/18/12
to mongod...@googlegroups.com
One possibility is that it may have been created by mongodump - there's an old SERVER ticket here. This is very much hypothetical as we don't know what commands were run in the past.

Regarding MongoDB's unreliability in EC2, I'm a little confused because there are a lot of MongoDB users successfully running in the Amazon cloud infrastructure. Have you followed the various documentation on EC2 on the MongoDB website?

Reply all
Reply to author
Forward
0 new messages