Can a query last over 14 days?

Shi Shei

unread,

Apr 5, 2012, 12:47:04 PM4/5/12

to mongodb-user

Since a few hours I'm profiling queries which take more time than 3
seconds.
According to mongo, the longest query took over 14 days (1271309097
ms):
{
"_id" : ObjectId("4f7dc7dc559a38abe85a91cd"),
"ts" : "2012-04-05T14:24:59Z",
"op" : "getmore",
"ns" : "offerStore.offer",
"query" : {
"missingSince" : null,
"shopId" : 5040
},
"cursorid" : NumberLong("1817499255547817100"),
"nreturned" : 68132,
"responseLength" : 3883544,
"millis" : 1271309097,
"client" : "#.#.#.#",
"user" : ""
}

Is this reliable?

We're running mongo Linux64 Bit 2.0.x on 4 Shards, each consisting of
3 nodes. Apart some peeks (which I'm hunting) the system seems not so
bad.
We're using MMS as well and couldn't see massive failures.

Mathias Stearn

unread,

Apr 5, 2012, 5:59:40 PM4/5/12

to mongod...@googlegroups.com

While it is theoretically possible for a query to take 14 days it is unlikely. Since we use end_time - start_time to compute duration, a more likely cause is that the system time was changed while the command was running.

Glenn Maynard

unread,

Apr 6, 2012, 11:40:34 AM4/6/12

to mongod...@googlegroups.com

On Thu, Apr 5, 2012 at 4:59 PM, Mathias Stearn <mat...@10gen.com> wrote:

While it is theoretically possible for a query to take 14 days it is unlikely. Since we use end_time - start_time to compute duration, a more likely cause is that the system time was changed while the command was running.

Don't use the system clock to calculate durations. That's what the monotonic timers are for; they're not affected by the system clock. See clock_gettime(CLOCK_MONOTONIC) and (when available) clock_gettime(CLOCK_MONOTONIC_RAW).

--
Glenn Maynard

Shi Shei

unread,

Apr 6, 2012, 12:09:40 PM4/6/12

to mongodb-user

I'm pretty sure that the system time did not change.
I rather think that it might be a mongo bug since I experienced a
similar issue where explain() told me that the query took "only" 45
seconds even though it took about 10 minutes. Also, during the query I
executed db.currentOp() which displayed already 453 seconds for my
query. So, mongo explain was definitively wrong.
In this thread I'm speaking of this phenomenon:
http://groups.google.com/group/nosql-databases/browse_thread/thread/e93fca6472faef76/5314bd1bc9114be9

Perhaps I should open an JIRA for this but I'm not sure how to
reproduce.

On Apr 6, 5:40 pm, Glenn Maynard <gl...@zewt.org> wrote:

Shi Shei

unread,

Apr 6, 2012, 12:13:21 PM4/6/12

to mongodb-user

This is the better link to the above mentioned thread:
http://groups.google.com/group/mongodb-user/browse_thread/thread/e93fca6472faef76/36db550250941c10

On Apr 6, 6:09 pm, Shi Shei <QTRAURFUI...@spammotel.com> wrote:
> I'm pretty sure that the system time did not change.
> I rather think that it might be a mongo bug since I experienced a
> similar issue where explain() told me that the query took "only" 45
> seconds even though it took about 10 minutes. Also, during the query I
> executed db.currentOp() which displayed already 453 seconds for my
> query. So, mongo explain was definitively wrong.

> In this thread I'm speaking of this phenomenon:http://groups.google.com/group/nosql-databases/browse_thread/thread/e...

Mathias Stearn

unread,

Apr 9, 2012, 11:49:08 AM4/9/12

to mongod...@googlegroups.com

What is the group and host name in MMS?

Shi Shei

unread,

Apr 9, 2012, 5:33:57 PM4/9/12

to mongodb-user

The group name is idealo.
The host name of one of the mongos is sx210:27018 which points to the
4 shards going from offerStore to offerStoreDE4.

On Apr 9, 5:49 pm, Mathias Stearn <math...@10gen.com> wrote:
> What is the group and host name in MMS?
>
>
>
>
>
>
>
> On Friday, April 6, 2012 12:13:21 PM UTC-4, Shi Shei wrote:
>
> > This is the better link to the above mentioned thread:
>

> >http://groups.google.com/group/mongodb-user/browse_thread/thread/e93f...

Shi Shei

unread,

Apr 10, 2012, 11:24:43 AM4/10/12

to mongodb-user

I wrote a script to copy system.profile collections from every mongod
into one globally accessible profiling collection. I added to each
document its mongod connection point to know on which node run the
query.

It turns out that I get more and more of these 14 days queries. Here
are the top 5:

mongos>
db.profiling.find().sort({millis:-1}).limit(5).forEach(printjson)
{
"_id" : ObjectId("4f844705559a774c0f1f1700"),
"ts" : "2012-04-10T14:10:09Z",
"op" : "query",
"ns" : "offerStore.offer",
"query" : {
"_id" : 858680314
},
"idhack" : true,
"responseLength" : 1393,
"millis" : 1271309469,
"client" : "#.#.#.#",
"user" : "",
"mongod" : "s132.ipx:27018"
}
{
"_id" : ObjectId("4f8446fc559a774c0f1f0693"),

"ts" : "2012-04-05T14:24:59Z",
"op" : "getmore",
"ns" : "offerStore.offer",
"query" : {
"missingSince" : null,
"shopId" : 5040
},
"cursorid" : NumberLong("1817499255547817100"),
"nreturned" : 68132,
"responseLength" : 3883544,
"millis" : 1271309097,
"client" : "#.#.#.#",

"user" : "",
"mongod" : "s143.ipx:27018"
}
{
"_id" : ObjectId("4f8446fa559a774c0f1f053d"),
"ts" : "2012-04-08T02:08:08Z",
"op" : "query",

"ns" : "offerStore.offer",
"query" : {
"missingSince" : null,

"categoryBokey" : "212795:StGQqkQ8AV45NElQbaFNfw"
},
"responseLength" : 20,
"millis" : 1271308447,
"client" : "#.#.#.#",
"user" : "",
"mongod" : "s126.ipx:27018"
}
{
"_id" : ObjectId("4f8446fb559a774c0f1f0644"),
"ts" : "2012-04-09T02:08:03Z",
"op" : "query",

"ns" : "offerStore.offer",
"query" : {
"missingSince" : null,

"categoryBokey" : "207283:07ofhAIn/kJWFuKDIlQYXQ"
},
"nscanned" : 8,
"nreturned" : 8,
"responseLength" : 514,
"millis" : 1271303702,
"client" : "#.#.#.#",
"user" : "",
"mongod" : "s131.ipx:27018"
}
{
"_id" : ObjectId("4f844701559a774c0f1f10d7"),
"ts" : "2012-04-08T02:04:02Z",

"op" : "getmore",
"ns" : "offerStore.offer",
"query" : {

"shopId" : 22307
},
"cursorid" : NumberLong("7522793855964665641"),
"nreturned" : 35053,
"responseLength" : 2303689,
"millis" : 1271303044,
"client" : "#.#.#.#",
"user" : "",
"mongod" : "s115.ipx:27018"
}

I verified in the system.profile collection of the concerned mongod if
the slow query really existed or if my script copied something wrong.
It seems that it really exists:

connecting to: s132:27018/offerStore
SECONDARY> rs.slaveOk()
SECONDARY> db.system.profile.findOne({"query._id":858680314})
{
"ts" : ISODate("2012-04-10T12:10:09.218Z"),
"op" : "query",
"ns" : "offerStore.offer",
"query" : {
"_id" : NumberLong(858680314)
},
"idhack" : true,
"responseLength" : 1393,
"millis" : 1271309469,
"client" : "#.#.#.#",
"user" : ""
}

As you can see, the timestamp is of today even though the shard run
well today.
Btw. is the timestamp the start or the end of the query?

Shi Shei

unread,

Apr 11, 2012, 3:47:49 PM4/11/12

to mongodb-user

Mathias Stearn, did you find something about?
Or somebody else?
Thank you!

Shi Shei

unread,

Apr 12, 2012, 6:35:47 AM4/12/12

to mongodb-user

I observed that all these long durations last more or less 1271303044
ms. Could it be that these queries were executed with a noTimeOut
option, so they would live "forever" (max ~1271303044 ms) on the mongo
server?

What happens when an application creates a noTimeOut cursor without
ever closing it? Does mongo terminate the cursor/query after a certain
time? Could it be an (unintentional) overflow of some mongo counter
which results in termination after around 1271303044 ms?

Mathias Stearn

unread,

Apr 26, 2012, 2:43:27 PM4/26/12

to mongod...@googlegroups.com

It looks like there is a bug somewhere, but after a quick examination of the codepath that generates those timings, nothing jumps out at me. Could you file a jira ticket to track this down?

Shi Shei

unread,

May 2, 2012, 12:10:58 PM5/2/12

to mongodb-user

Done: https://jira.mongodb.org/browse/SERVER-5745

Reply all

Reply to author

Forward