date time sort problem

1,720 views
Skip to first unread message

Mason

unread,
Mar 7, 2012, 10:20:11 PM3/7/12
to mongodb-user
Hi,

I am using pymongo to sort a set of collection and only return a
subset of the results. The code looks like

start = 1
rows = self.members.find()\
.sort(online, -1)\
.skip(start)\
.limit(24)

Here is what the data look like
{ "profile_id" : 3, "online" : ISODate("2012-03-01T23:57:00.216Z") }
{ "profile_id" : 4, "online" : ISODate("2012-03-01T23:57:23.383Z") }
{ "profile_id" : 5, "online" : ISODate("2012-03-01T23:57:27.087Z") }
{ "profile_id" : 6, "online" : ISODate("2012-03-01T23:57:30.573Z") }
{ "profile_id" : 7, "online" : ISODate("2012-03-01T23:57:38.350Z") }
{ "profile_id" : 8, "online" : ISODate("2012-03-01T23:57:44.896Z") }
{ "profile_id" : 9, "online" : ISODate("2012-03-01T23:57:51.812Z") }
{ "profile_id" : 10, "online" : ISODate("2012-03-01T23:57:58.373Z") }
{ "profile_id" : 11, "online" : ISODate("2012-03-01T23:58:01.266Z") }
{ "profile_id" : 12, "online" : ISODate("2012-03-01T23:58:06.501Z") }
{ "profile_id" : 13, "online" : ISODate("2012-03-01T23:58:12.630Z") }
{ "profile_id" : 14, "online" : ISODate("2012-03-01T23:58:15.287Z") }
{ "profile_id" : 15, "online" : ISODate("2012-03-01T23:58:21.077Z") }
{ "profile_id" : 16, "online" : ISODate("2012-03-01T23:58:28.554Z") }
{ "profile_id" : 17, "online" : ISODate("2012-03-01T23:58:29.478Z") }
{ "profile_id" : 18, "online" : ISODate("2012-03-01T23:58:31.980Z") }
{ "profile_id" : 19, "online" : ISODate("2012-03-01T23:58:39.369Z") }
{ "profile_id" : 21, "online" : ISODate("2012-03-01T23:58:48.962Z") }
{ "profile_id" : 20, "online" : ISODate("2012-03-01T23:58:44.759Z") }

On the first set of result, everything looks normal, the online field
is sorted (ie the latest online time is 2012-03-03 11:30:17.664000 and
the oldest 2012-03-02 16:53:23.698000) . However, when I try to get
the next page of result by changing start to 24, I go results that are
still in the time range of 2012-03-03 11:30:17.664000 and 2012-03-02
16:53:23.698000. Some of the results show up on 2nd set even though
their online datetime is inside the range of the first result and last
result of the first page. I tried to add an index to 'online', but
that didn't solve my issue. My understanding of find() is you can get
a set of results, sort it and based on your skip, it omit the first
'skip' number of results. Any pointer on what my issue might be?

Thanks,
Mason

Mason

unread,
Mar 7, 2012, 10:39:56 PM3/7/12
to mongodb-user
btw, this issue only show up when I have the sort('online': -1). If I
don't include that, I won't see duplicate results and result that are
supposed to be in page 1 (based on online time ordering) show up in
page 2

Nat

unread,
Mar 7, 2012, 11:08:54 PM3/7/12
to mongod...@googlegroups.com
Can you attach explain plan and indicate version you use as well?

Mason

unread,
Mar 7, 2012, 11:43:37 PM3/7/12
to mongodb-user
I am using mongo 2.0.2. Is there a way that I can see what the query
looks like when using pymongo? For example, in a log file? I can
regenerate what I have in the shell, but would like to see what the
query actually is

Dan Crosta

unread,
Mar 8, 2012, 12:03:39 AM3/8/12
to mongod...@googlegroups.com
On Mar 7, 2012, at 10:20 PM, Mason wrote:
            rows = self.members.find()\
                                     .sort(online, -1)\
                                     .skip(start)\
                                     .limit(24)

What is the value of the `online` variable here? If it's anything other than the string "online", this probably isn't doing what you expect, and could explain why you're not seeing things sorted in the right order. Try:

    self.members.find().sort("online", -1).skip(start).limit(24)

You should also know that for large numbers, skip() can be slow, as it will force MongoDB to iterate through the result set on the server side before it can begin returning results to you. If you are only skipping up to a few hundred results, you probably won't notice a difference, but once you get to thousands or more, it can become prohibitively costly.

- Dan

Mason

unread,
Mar 8, 2012, 12:20:13 AM3/8/12
to mongodb-user
Hi Dan,

The 'online' is a datetime. It looks like "online" :
ISODate("2012-03-01T23:58:52.984Z") in mongo. I changed sort(online,
-1) to sort('online', -1), still the same problem. I am going to
limit skip() to no more than 1000 to keep things running

Thanks,
Mason

Dan Crosta

unread,
Mar 8, 2012, 12:29:22 AM3/8/12
to mongod...@googlegroups.com
Can you post the results of the first and second queries? I think the change to passing a string to sort() should have helped.

- Dan

> --
> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com.
> To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
>

Mason

unread,
Mar 8, 2012, 12:41:43 AM3/8/12
to mongodb-user
I had online set to 'online' before, so I just use took it out and
used sort('online', 1) instead. Here is a bit more info

I ran the query in mongo shell. Below it got me a list of 20 results
order by online descending

PRIMARY> db.members.find({loc: {$near: {lat: 37.36883, lon:
-122.03635}, $maxDistance: 7.246376811594203}, sex: {$in: [2]}},
{online: 1, _id: 0, profile_id: 1}).sort({online: -1}).limit(20)
{ "online" : ISODate("2012-03-07T19:44:50.542Z"), "profile_id" :
203368 }
{ "online" : ISODate("2012-03-06T13:13:25.613Z"), "profile_id" :
1599080 }
{ "profile_id" : 1594412, "online" :
ISODate("2012-03-06T00:38:12.645Z") }
{ "profile_id" : 1578720, "online" :
ISODate("2012-03-06T00:18:04.414Z") }
{ "profile_id" : 1558740, "online" :
ISODate("2012-03-05T23:52:19.920Z") }
{ "profile_id" : 1469377, "online" :
ISODate("2012-03-05T21:57:23.946Z") }
{ "profile_id" : 1273984, "online" :
ISODate("2012-03-05T11:32:12.708Z") }
{ "profile_id" : 1199571, "online" :
ISODate("2012-03-05T06:25:16.338Z") }
{ "profile_id" : 1095081, "online" :
ISODate("2012-03-04T23:31:26.013Z") }
{ "profile_id" : 1037746, "online" :
ISODate("2012-03-04T19:45:16.794Z") }
{ "profile_id" : 649387, "online" :
ISODate("2012-03-03T19:06:11.979Z") }
{ "profile_id" : 524967, "online" :
ISODate("2012-03-03T11:30:17.664Z") }
{ "profile_id" : 511416, "online" :
ISODate("2012-03-03T10:40:51.435Z") }
{ "profile_id" : 511167, "online" :
ISODate("2012-03-03T10:39:53.159Z") }
{ "profile_id" : 510794, "online" :
ISODate("2012-03-03T10:38:32.797Z") }
{ "profile_id" : 480659, "online" :
ISODate("2012-03-03T08:47:41.846Z") }
{ "profile_id" : 444847, "online" :
ISODate("2012-03-03T06:32:47.830Z") }
{ "profile_id" : 420227, "online" :
ISODate("2012-03-03T05:03:22.089Z") }
{ "profile_id" : 350536, "online" :
ISODate("2012-03-03T00:54:25.091Z") }
{ "profile_id" : 217742, "online" :
ISODate("2012-03-02T16:59:22.217Z") }

And then, I ran the query again with .skip(5).limit(10) and got

PRIMARY> db.members.find({loc: {$near: {lat: 37.36883, lon:
-122.03635}, $maxDistance: 7.246376811594203}, sex: {$in: [2]}},
{online: 1, _id: 0, profile_id: 1}).sort({online:
-1}).skip(5).limit(10)
{ "profile_id" : 1199571, "online" :
ISODate("2012-03-05T06:25:16.338Z") }
{ "profile_id" : 1095081, "online" :
ISODate("2012-03-04T23:31:26.013Z") }
{ "profile_id" : 649387, "online" :
ISODate("2012-03-03T19:06:11.979Z") }
{ "profile_id" : 524967, "online" :
ISODate("2012-03-03T11:30:17.664Z") }
{ "profile_id" : 511416, "online" :
ISODate("2012-03-03T10:40:51.435Z") }
{ "profile_id" : 511167, "online" :
ISODate("2012-03-03T10:39:53.159Z") }
{ "profile_id" : 510794, "online" :
ISODate("2012-03-03T10:38:32.797Z") }
{ "profile_id" : 480659, "online" :
ISODate("2012-03-03T08:47:41.846Z") }
{ "profile_id" : 420227, "online" :
ISODate("2012-03-03T05:03:22.089Z") }
{ "profile_id" : 350536, "online" :
ISODate("2012-03-03T00:54:25.091Z") }

I expected it to skip the first 5 entires and return 10 entries start
from profile_id 1558740. However, it is not doing that. Also, of the
10 results returned, it skips 1037746 which is supposed to be the 3rd
entry returned from my list of 10

Mason

unread,
Mar 8, 2012, 12:43:39 AM3/8/12
to mongodb-user
btw, here are the explain() for the 2 queries

PRIMARY> db.members.find({loc: {$near: {lat: 37.36883, lon:
-122.03635}, $maxDistance: 7.246376811594203}, sex: {$in: [2]}},
{online: 1, _id: 0, profile_id: 1}).sort({online:
-1}).limit(20).explain()
{
"cursor" : "GeoSearchCursor",
"nscanned" : 20,
"nscannedObjects" : 20,
"n" : 20,
"scanAndOrder" : true,
"millis" : 81,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {

}
}
PRIMARY>

}
PRIMARY> rs.find({loc: {$near: {lat: 37.36883, lon: -122.03635},
$maxDistance: 7.246376811594203}, sex: {$in: [2]}}, {online: 1, _id:
0, profile_id: 1}).sort({online: -1}).skip(5).limit(10).explain()
{
"cursor" : "GeoSearchCursor",
"nscanned" : 15,
"nscannedObjects" : 15,
"n" : 15,
"scanAndOrder" : true,
"millis" : 76,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {

}
}

Nat

unread,
Mar 8, 2012, 1:00:33 AM3/8/12
to mongod...@googlegroups.com
Did you have the same problem if you don't use geo query?

Mason

unread,
Mar 8, 2012, 1:17:21 AM3/8/12
to mongodb-user
Seems to be okay without geo query. Is there something going on using
the geo query?

Mason

unread,
Mar 8, 2012, 1:20:31 AM3/8/12
to mongodb-user
I did another test with geo query. I expect it to skip the first 5
entries, but it is skipping a lot more. With geo query, is it using
more than 'online' to sort?

PRIMARY> db.members.find({loc: {$near: {lat: 37.36883, lon:
-122.03635}, $maxDistance: 7.246376811594203}, sex: {$in: [2]}},
PRIMARY> db.members.find({loc: {$near: {lat: 37.36883, lon:
-122.03635}, $maxDistance: 7.246376811594203}, sex: {$in: [2]}},
{online: 1, _id: 0, profile_id: 1}).sort({online:
-1}).skip(5).limit(5)
{ "profile_id" : 511167, "online" :
ISODate("2012-03-03T10:39:53.159Z") }
{ "profile_id" : 510794, "online" :
ISODate("2012-03-03T10:38:32.797Z") }
{ "profile_id" : 480659, "online" :
ISODate("2012-03-03T08:47:41.846Z") }
{ "profile_id" : 420227, "online" :
ISODate("2012-03-03T05:03:22.089Z") }
{ "profile_id" : 350536, "online" :
ISODate("2012-03-03T00:54:25.091Z") }

Nat

unread,
Mar 8, 2012, 1:36:25 AM3/8/12
to mongod...@googlegroups.com
Try using $within instead since $near doesn't make a lot of sense when you sort by something else other than distance.
-----Original Message-----
From: Mason <rhyme...@gmail.com>

Mason

unread,
Mar 8, 2012, 1:34:01 PM3/8/12
to mongodb-user
Thanks Nat, using '$within' fixed my problem. I was doing a
geospatial search using "$near". According to the doc, it returns me
results already sorted by distance. By applying .sort('online',
-1).skip(5).limit(10), does that mean I am got a subset of results
(the 6th to 16th docs) and only sort that by online now? I wasn't
doing a sort using online datetime on the entire set and only return
the 6th to 16th docs after the sort

Thanks,
Mason
Reply all
Reply to author
Forward
0 new messages