Re: [mongodb-user] can't get db response, not connected

707 views
Skip to first unread message

Jason Lewis

unread,
Nov 12, 2012, 11:19:48 PM11/12/12
to mongod...@googlegroups.com
It occurred to me after sending this that things were working fine until I upgraded to 2.2.1.  Is it possible that 2.2.1 introduced something that's causing issues?   

Is it possible to downgrade?  Did anything change that might not be backwards compatible?


On Mon, Nov 12, 2012 at 12:35 PM, Jason Lewis <jle...@packetnexus.com> wrote:
I have a 3 server replica set and I'm using the perl driver to load data in batches.  When attempting to insert the first batch of records, I get the following:   "can't get db response, not connected"

This was working fine and suddenly stopped working.  The logs show normal replica set operation, but I'm unsure where to start looking to troubleshoot.  I normall include all servers in the connect config, but thought maybe it was having issues with a slave, replacing the connect string with only the master doesn't change anything.

{
        "set" : "replset1",
        "date" : ISODate("2012-11-12T17:32:29Z"),
        "myState" : 2,
        "syncingTo" : "mongo2:27017",
        "members" : [
                {
                        "_id" : 0,
                        "name" : "mongo1:27017",
                        "health" : 1,
                        "state" : 2,
                        "stateStr" : "SECONDARY",
                        "uptime" : 18010,
                        "optime" : Timestamp(1352741357000, 2),
                        "optimeDate" : ISODate("2012-11-12T17:29:17Z"),
                        "self" : true
                },
                {
                        "_id" : 1,
                        "name" : "mongo2:27017",
                        "health" : 1,
                        "state" : 1,
                        "stateStr" : "PRIMARY",
                        "uptime" : 18009,
                        "optime" : Timestamp(1352741357000, 2),
                        "optimeDate" : ISODate("2012-11-12T17:29:17Z"),
                        "lastHeartbeat" : ISODate("2012-11-12T17:32:28Z"),
                        "pingMs" : 99
                },
                {
                        "_id" : 2,
                        "name" : "mongo3:27017",
                        "health" : 1,
                        "state" : 2,
                        "stateStr" : "SECONDARY",
                        "uptime" : 18007,
                        "optime" : Timestamp(1352741357000, 2),
                        "optimeDate" : ISODate("2012-11-12T17:29:17Z"),
                        "lastHeartbeat" : ISODate("2012-11-12T17:32:26Z"),
                        "pingMs" : 91
                }
        ],
        "ok" : 1
}

Ideas?

--
You received this message because you are subscribed to the Google
Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com
To unsubscribe from this group, send email to
mongodb-user...@googlegroups.com
See also the IRC channel -- freenode.net#mongodb

Sam Helman

unread,
Nov 13, 2012, 1:14:53 PM11/13/12
to mongod...@googlegroups.com
It shouldn't have - what connection string are you using to connect to the replica set?  Did you upgrade the perl driver as well?

Jason Lewis

unread,
Nov 13, 2012, 5:42:26 PM11/13/12
to mongod...@googlegroups.com
my $conn = MongoDB::Connection->new( "host" => "mongodb://$mongo1:$port,$mongo2:$port,$mongo3:$port",
                                     "query_timeout" => "-1",
                                     )

Module: 0.501.1

I was wondering if I could enable debugging to see more info, but I haven't seen any debug options. Would verbose logging on mongodb give me anything more?

Sam Helman

unread,
Nov 13, 2012, 6:05:01 PM11/13/12
to mongod...@googlegroups.com
It might, but not if the connection isn't even reaching the dbs.  What happens if you try to connect via the shell, using the same parameters?

Jason Lewis

unread,
Nov 13, 2012, 7:15:23 PM11/13/12
to mongod...@googlegroups.com
I'm not sure what you mean?  Via telnet to the listening port?   Yes that works.


On Tue, Nov 13, 2012 at 6:05 PM, Sam Helman <sam.h...@10gen.com> wrote:
It might, but not if the connection isn't even reaching the dbs.  What happens if you try to connect via the shell, using the same parameters?

--

Jason Lewis

unread,
Nov 15, 2012, 8:13:16 AM11/15/12
to mongod...@googlegroups.com
I enabled verbose logging and I've now I'm getting the following in the logs:

Thu Nov 15 13:07:37 [conn291] Assertion: 10334:Invalid BSONObj size:
1767835392 (0x00075F69) first element: : ?type=100

0xaf8c41 0xabedb9 0xabef3c 0x571fb7 0x7abc20 0x7b2de1 0x56fe42
0xae6ed1 0x7f809dfe8e9a 0x7f809d2fbcbd

/usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xaf8c41]

/usr/bin/mongod(_ZN5mongo11msgassertedEiPKc+0x99) [0xabedb9]

/usr/bin/mongod() [0xabef3c]

/usr/bin/mongod(_ZNK5mongo7BSONObj14_assertInvalidEv+0x497) [0x571fb7]

/usr/bin/mongod(_ZN5mongo14receivedInsertERNS_7MessageERNS_5CurOpE+0x200)
[0x7abc20]

/usr/bin/mongod(_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0x10a1)
[0x7b2de1]

/usr/bin/mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0x82)
[0x56fe42]

/usr/bin/mongod(_ZN5mongo3pms9threadRunEPNS_13MessagingPortE+0x411) [0xae6ed1]

/lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a) [0x7f809dfe8e9a]

/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f809d2fbcbd]

Thu Nov 15 13:07:37 [conn291] insert Public.2012_10_data keyUpdates:0
exception: Invalid BSONObj size: 1767835392 (0x00075F69) first
element: : ?type=100 code:10334 0ms

Thu Nov 15 13:07:37 [conn291] recv(): message len 1767835392 is too
large1767835392


Would this cause the DB connection to go away, resulting in the "can't
get db response" error?

Sam Helman

unread,
Nov 16, 2012, 12:01:08 PM11/16/12
to mongod...@googlegroups.com
This is interesting, because it looks like corrupted data is being sent to the database.  Could you post the output of using the mongosniff utility (http://docs.mongodb.org/manual/reference/mongosniff/)
to sniff what the driver is sending to the database?

Jason Lewis

unread,
Nov 16, 2012, 4:30:13 PM11/16/12
to mongod...@googlegroups.com
I don't see a place to download or where it might be in the packages.
I'm using the 10gen ubuntu repo. Where can I find the actual binary?

Jason Lewis

unread,
Nov 17, 2012, 8:59:57 AM11/17/12
to mongod...@googlegroups.com
I ended up compiling from source to get mongosniff. I'm seeing a lot
of these warnings, so there may be a network issue. I'll have to
investigate.

Received too many bytes to complete message, resetting buffer
Received too many bytes to complete message, resetting buffer
Received too many bytes to complete message, resetting buffer
Warning: sequence # mismatch, there may be dropped packets
Invalid message start, skipping packet.
Warning: sequence # mismatch, there may be dropped packets
Invalid message start, skipping packet.
Invalid message start, skipping packet.
Invalid message start, skipping packet.
Invalid message start, skipping packet.
Invalid message start, skipping packet.
Warning: sequence # mismatch, there may be dropped packets
Warning: sequence # mismatch, there may be dropped packets
Warning: sequence # mismatch, there may be dropped packets
Warning: sequence # mismatch, there may be dropped packets

Jason Lewis

unread,
Nov 17, 2012, 10:29:15 PM11/17/12
to mongod...@googlegroups.com
I've been doing testing, trying to figure out what is causing issues.
I moved to a single, non-replica instance of mongo for testing.

I enabled BigInt based on some google searches and now I'm getting this:

SV = PVMG(0x4599410) at 0x38946f0
REFCNT = 1
FLAGS = (ROK)
IV = 0
NV = 0
RV = 0x458e848
SV = PVHV(0x4560ad8) at 0x458e848
REFCNT = 1
FLAGS = (OBJECT,OVERLOAD,SHAREKEYS)
STASH = 0x2c90e50 "Math::BigInt"
ARRAY = 0x4836e50 (0:6, 1:2)
hash quality = 125.0%
KEYS = 2
FILL = 2
MAX = 7
RITER = -1
EITER = 0x0
Elt "value" HASH = 0x1e720953
SV = RV(0x458ebb8) at 0x458eba8
REFCNT = 1
FLAGS = (ROK)
RV = 0x4586118
SV = PVAV(0x4542b38) at 0x4586118
REFCNT = 1
FLAGS = ()
ARRAY = 0x4836ea0
FILL = 0
MAX = 3
ARYLEN = 0x0
FLAGS = (REAL)
Elt No. 0
SV = NV(0x1f9bc28) at 0x458e500
REFCNT = 1
FLAGS = (NOK,pNOK)
NV = 0
Elt "sign" HASH = 0xe7455c8f
SV = PV(0x46203d8) at 0x4593340
REFCNT = 1
FLAGS = (POK,pPOK)
PV = 0x4836cc0 "+"\0
CUR = 1
LEN = 8
PV = 0x458e848 ""
CUR = 0
LEN = 0
failed to fetch BigInt element at
/usr/local/lib/perl/5.10.1/MongoDB/Collection.pm line 299

What exactly has happened?

Jason Lewis

unread,
Nov 18, 2012, 12:49:18 PM11/18/12
to mongod...@googlegroups.com
I'm now using Math::BigInt and I'm not getting any errors or
disconnects. I'm not sure if that's the fix or it's a coincidence.

Sam Helman

unread,
Nov 19, 2012, 4:15:36 PM11/19/12
to mongod...@googlegroups.com
My guess is the perl driver wasn't able to serialize BigInts correctly - as long as the new version is working, go with it.

Jason Lewis

unread,
Nov 28, 2012, 9:05:21 PM11/28/12
to mongod...@googlegroups.com
I'm still seeing "can't get db response, not connected". I'm at a
loss as to what I'm doing to cause it. I'm loading documents via
batch insert in sets of 500.

IDK if this will be coherent, but here is the output of mongostat
during the batch load.

insert query update delete getmore command flushes mapped vsize
res faults locked db idx miss % qr|qw ar|aw
netIn netOut conn time

0 0 0 0 0 735 0 42.2g 84.6g
12.5g 0 local:0.0% 0 0|0
0|0 49k 254k 2 19:05:33

2 0 0 0 0 710 0 42.2g 84.6g
12.5g 0 local:0.0% 0 0|0
0|0 256k 246k 3 19:05:34

0 0 0 0 0 36 1 42.2g 84.6g
12.5g 0 Test_DB_2012:0.5% 0 0|0
0|1 3m 16k 3 19:05:35

3500 0 0 0 0 31 0 42.2g 84.6g
12.6g 0 Test_DB_2012:9.4% 0 0|0
0|1 3m 12k 3 19:05:36

12000 0 0 0 0 25 0 42.2g 84.6g
12.5g 0 Test_DB_2012:34.1% 0 0|0
0|0 2m 5k 3 19:05:37

8501 0 0 0 0 20 0 42.2g 84.6g
12.6g 0 Test_DB_2012:24.6% 0 0|0
0|0 1m 5k 3 19:05:38

12000 0 0 0 0 25 0 42.2g 84.6g
12.6g 0 Test_DB_2012:32.7% 0 0|0
0|0 2m 5k 3 19:05:39

12000 8 0 0 0 25 0 42.2g 84.6g
12.6g 0 Test_DB_2012:34.0% 0 0|0
0|1 2m 5k 3 19:05:40

12500 0 0 0 0 26 0 42.2g 84.6g
12.6g 0 Test_DB_2012:34.3% 0 0|0
0|0 2m 5k 3 19:05:41

11501 0 0 0 0 26 0 42.2g 84.6g
12.6g 0 Test_DB_2012:30.7% 0 0|0
0|0 2m 5k 3 19:05:42

insert query update delete getmore command flushes mapped vsize
res faults locked db idx miss % qr|qw ar|aw
netIn netOut conn time

12000 0 0 0 0 25 0 42.2g 84.6g
12.6g 0 Test_DB_2012:35.1% 0 1|0
0|1 2m 5k 3 19:05:43

12500 0 0 0 0 26 0 42.2g 84.6g
12.6g 0 Test_DB_2012:34.3% 0 0|0
0|0 2m 5k 3 19:05:44

12000 0 0 0 0 25 0 42.2g 84.6g
12.6g 0 Test_DB_2012:35.5% 0 0|0
0|1 2m 5k 3 19:05:45

11501 0 0 0 0 26 0 42.2g 84.6g
12.6g 0 Test_DB_2012:33.0% 0 0|0
0|0 2m 5k 3 19:05:46

11500 0 0 0 0 24 0 42.2g 84.6g
12.6g 0 Test_DB_2012:31.8% 0 0|0
0|0 2m 5k 3 19:05:47

12000 0 0 0 0 25 0 42.2g 84.6g
12.6g 0 Test_DB_2012:34.6% 0 0|0
0|0 2m 5k 3 19:05:48

11500 0 0 0 0 24 0 42.2g 84.6g
12.6g 0 Test_DB_2012:33.9% 0 0|0
0|1 2m 5k 3 19:05:49

12000 0 0 0 0 25 0 42.2g 84.6g
12.6g 0 Test_DB_2012:33.4% 0 0|0
0|0 2m 5k 3 19:05:50

11501 0 0 0 0 26 0 42.2g 84.6g
12.6g 0 Test_DB_2012:31.1% 0 0|0
0|0 2m 5k 3 19:05:51

12000 0 0 0 0 25 0 42.2g 84.6g
12.6g 0 Test_DB_2012:35.7% 0 0|0
0|1 2m 5k 3 19:05:52

insert query update delete getmore command flushes mapped vsize
res faults locked db idx miss % qr|qw ar|aw
netIn netOut conn time

12500 0 0 0 0 26 0 42.2g 84.6g
12.7g 0 Test_DB_2012:33.3% 0 0|0
0|1 2m 5k 3 19:05:53

11566 0 0 0 0 26 0 42.2g 84.6g
12.7g 0 Test_DB_2012:33.1% 0 0|0
0|0 2m 5k 3 19:05:54

11935 0 0 0 0 25 0 42.2g 84.6g
12.7g 1 Test_DB_2012:33.4% 0 0|0
0|0 2m 5k 3 19:05:55

12000 0 0 0 0 25 0 42.2g 84.6g
12.7g 0 Test_DB_2012:33.6% 0 0|0
0|1 2m 5k 3 19:05:56

12500 0 0 0 0 26 0 42.2g 84.6g
12.7g 0 Test_DB_2012:36.2% 0 0|0
0|0 2m 5k 3 19:05:57

12000 0 0 0 0 25 0 42.2g 84.6g
12.7g 0 Test_DB_2012:33.0% 0 0|0
0|1 2m 5k 3 19:05:58

11501 0 0 0 0 26 0 42.2g 84.6g
12.7g 0 Test_DB_2012:32.7% 0 0|0
0|1 2m 5k 3 19:05:59

12000 0 0 0 0 25 0 42.2g 84.6g
12.7g 0 Test_DB_2012:32.2% 0 0|0
0|0 2m 5k 3 19:06:00

11544 0 0 0 0 25 0 42.2g 84.6g
12.7g 0 Test_DB_2012:33.3% 0 0|0
0|0 2m 5k 3 19:06:01

11500 0 0 0 0 24 0 42.2g 84.6g
12.7g 0 Test_DB_2012:33.3% 0 0|0
0|0 2m 5k 3 19:06:02

insert query update delete getmore command flushes mapped vsize
res faults locked db idx miss % qr|qw ar|aw
netIn netOut conn time

11500 0 0 0 0 24 0 42.2g 84.6g
12.7g 0 Test_DB_2012:34.4% 0 0|0
0|0 2m 5k 3 19:06:03

11501 0 0 0 0 26 0 42.2g 84.6g
12.7g 0 Test_DB_2012:31.8% 0 0|0
0|0 2m 5k 3 19:06:04

12000 0 0 0 0 25 0 42.2g 84.6g
12.7g 0 Test_DB_2012:33.1% 0 0|0
0|0 2m 5k 3 19:06:05

12000 0 0 0 0 25 0 42.2g 84.6g
12.7g 1 Test_DB_2012:34.0% 0 0|0
0|0 2m 5k 3 19:06:06

12500 0 0 0 0 26 0 42.2g 84.6g
12.7g 0 Test_DB_2012:33.9% 0 0|0
0|0 2m 5k 3 19:06:07

10501 0 0 0 0 24 0 42.2g 84.6g
12.7g 0 Test_DB_2012:31.6% 0 0|0
0|0 2m 5k 3 19:06:08

2000 0 0 0 0 5 0 42.2g 84.6g
12.8g 0 Test_DB_2012:5.1% 0 0|0
0|1 425k 3k 3 19:06:11

500 0 0 0 0 2 0 42.2g 84.7g
12.7g 3 Test_DB_2012:87.3% 0 0|0
0|0 104k 3k 1 19:06:12

0 0 0 0 0 1 0 42.2g 84.7g
12.7g 0 local:0.0% 0 0|0
0|0 62b 3k 1 19:06:13

0 0 0 0 0 1 0 42.2g 84.7g
12.7g 0 local:0.0% 0 0|0
0|0 62b 3k 1 19:06:14

insert query update delete getmore command flushes mapped vsize
res faults locked db idx miss %

Sam Helman

unread,
Nov 29, 2012, 4:51:11 PM11/29/12
to mongod...@googlegroups.com
Could you try inserting some of the same batches of documents that created problems via the mongo shell instead?  That could help determine if it is a problem with the perl driver.

Jason Lewis

unread,
Nov 30, 2012, 1:34:15 PM11/30/12
to mongod...@googlegroups.com
I saw this thread originally, but didn't think much of it, because I'm
not seeing failover happen and I could replicate the behavior in a
standalone mongo test. Is it possible it *is* related to what I'm
seeing?

On Fri, Nov 30, 2012 at 2:44 AM, Zhihong Zhang
<zhihon...@trustgo.com> wrote:
> https://groups.google.com/forum/?fromgroups=#!topic/mongodb-user/4TDQKoo0Cd4
>
> https://jira.mongodb.org/browse/PERL-196
>
> In mongo_link.c, line 31, instead of
>
> tv.tv_sec = 1;
> it probably must be:
>
> tv.tv_sec = timeout;
>
> change, install manually.
> will fix.
>
>
>
> 在 2012年11月13日星期二UTC+8上午1时35分00秒,Jason Lewis写道:

Jason Lewis

unread,
Dec 1, 2012, 6:48:41 PM12/1/12
to mongod...@googlegroups.com
Well, I made the suggested timeout change, rebuilt the perl driver and
it appears to be working correctly.

I voted up the JIRA issue
Reply all
Reply to author
Forward
0 new messages