Perl MongoDB cursor->count always returning 0

687 views
Skip to first unread message

Sven Dowideit

unread,
Nov 4, 2010, 1:24:56 AM11/4/10
to mongod...@googlegroups.com, Kristina Chodorow
Hello there.

In working towards a MongoDB backend for Foswiki, I've come across something
that I hope is a bug.

When I call $cursor->count, I always get zero - has_next and next return
results, but count() does not.

Additionally, is $cursor->explain supposed to be destructive? Once I've run
it, $cursor->next returns a cursor, not an item :/

Cheers
Sven

Kristina Chodorow

unread,
Nov 4, 2010, 5:59:40 PM11/4/10
to mongod...@googlegroups.com, Sven Dowideit
What does $db->run_command({"count" => $collection_name}) return?  Does the count work in the shell?

$cursor->explain resets the cursor (I've made a note about that in the documentation about that).  It shouldn't destroy the cursor, though, it should just re-query the db if you do $cursor->next.  Unfortunately, it looks like explain doesn't clear its flags when it resets, so you'll have to say

undef $cursor->_query->{'$explain'};

before calling next if you don't want another explain (I just fixed this in master, so you won't have to in 0.40).

Sven Dowideit

unread,
Nov 4, 2010, 10:36:35 PM11/4/10
to Kristina Chodorow, mongod...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

excellent :) thanks for the fix for explain()

I've thrown together some output from insider foswiki - though I hope
this issue is not environmental :/


use Data::Dumper;
print STDERR "GET THE count for $collectionName:
".Dumper($database->run_command({"count" => $collectionName}))."\n";

returns

GET THE count for current: $VAR1 = {
'ok' => '1',
'n' => '79169'
};

from the console I get:

> db.current.find({ _web: "TemporarySEARCHTestWebSEARCH" },
{_topic:1}).sort({})
{ "_id" : ObjectId("4cd33224809feb0d31bc8215"), "_topic" : "OkATopic" }
{ "_id" : ObjectId("4cd33224809feb0d31bc8216"), "_topic" : "OkBTopic" }
{ "_id" : ObjectId("4cd33224809feb0d31bc8214"), "_topic" : "OkTopic" }
{ "_id" : ObjectId("4cd33224809feb0d31bc8213"), "_topic" :
"TestTopicSEARCH" }
{ "_id" : ObjectId("4cd33223809feb0d31bc8212"), "_topic" :
"WebPreferences" }
> db.current.find({ _web: "TemporarySEARCHTestWebSEARCH" },
{_topic:1}).sort({}).count()
5


on the other hand
print STDERR "searching mongo : ".Dumper($ixhQuery)." ,
".Dumper($queryAttrs)."\n" if MONITOR;
my $cursor = $collection->query($ixhQuery, $queryAttrs);
print STDERR "found " . $cursor->count . " _BUT_ has_next is
".($cursor->has_next()?'true':'false')."\n" if MONITOR;

results in

searching mongo : $VAR1 = bless( [
{
'_raw_text' => '1',
'_web' => '0'
},
[
'_web',
'_raw_text'
],
[
'TemporarySEARCHTestWebSEARCH',

qr/(?i-xsm:%META:TOPICPARENT[{].*name=\"(TemporarySEARCHTestWebSEARCH\.)?TestCaseAutoFormattedSearch\".*[}]%)/
],
0
], 'Tie::IxHash' );
, $VAR1 = {
'sort_by' => {
'_topic' => 1
}
};

found 0 _BUT_ has_next is true

Sven

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkzTNXAACgkQPAwzu0QrW+lojACeJFW7TuqDGYtDSVJ8tbuo9g25
yJEAnRIFjxgdjp+FE3RIOJyfyQHttT1w
=90iz
-----END PGP SIGNATURE-----

Kristina Chodorow

unread,
Nov 8, 2010, 10:49:58 PM11/8/10
to Sven Dowideit, mongod...@googlegroups.com
The expression you're running in the shell is not the same as the one you're running from Perl.  The equivalent expression in the shell is:

db.current.find({ _web: "TemporarySEARCHTestWebSEARCH", _topic:/someRegex/}).sort({topic:1}).count()

I might be mistaken, but it looks like your regex is TestCaseAutoFormattedSearch, which doesn't match any of the results displayed in your shell output (hence, count is 0).

Sven Dowideit

unread,
Nov 9, 2010, 3:03:12 AM11/9/10
to mongod...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Unfortunately, you're not right :)

the count returns zero, while has_next returns true (and then next()
returns the values i'm expecting)

the output you see is a Data::Dumper of an ixHash, so its a tad wonkey

Sven

iEYEARECAAYFAkzYuekACgkQPAwzu0QrW+mjBQCgnNyQK6byTLYK2mTvwhNljiYP
hKQAoLkpnOwgR1rk5A2g3xBKswvpCxH9
=B45I
-----END PGP SIGNATURE-----

Kristina Chodorow

unread,
Nov 12, 2010, 4:19:49 AM11/12/10
to mongod...@googlegroups.com
Whoops, you're correct. Could you run:

$db->run_command({"profile" => 2});
# do count
print Dumper($db->get_collection("system.profile")->find->all);

and paste it here?  (You should turn the profiling off again afterwards or it'll slow things down, you can do so with run_command({profile=>0}).)



--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.


Sven Dowideit

unread,
Dec 20, 2010, 3:46:20 AM12/20/10
to mongod...@googlegroups.com, Kristina Chodorow
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Argh,

I finally got back to this, and added the profiling.

before i did it, I was still experiencing the problem, and since turning
profile=>2, and then back again, count seems to be returning the right
number.

so... there is something odd happening, but I will probably have to
delete this mongo instance and start fresh again anyway, and then I'll
come back with more details (hopefully it was/is just a problem with
this old and upgraded db)


Cheers&Beers
Sven


On 12/11/10 14:19, Kristina Chodorow wrote:
> Whoops, you're correct. Could you run:
>
> $db->run_command({"profile" => 2});
> # do count
> print Dumper($db->get_collection("system.profile")->find->all);
>
> and paste it here? (You should turn the profiling off again afterwards or
> it'll slow things down, you can do so with run_command({profile=>0}).)
>
>
>>>

iEYEARECAAYFAk0O0YsACgkQPAwzu0QrW+n+jACeJG+VYqRWuH5/ySUnEfXZmrc0
6a0An3j26NQcCGxfbXkAOzvkszcn94FD
=ThyU
-----END PGP SIGNATURE-----

Sven Dowideit

unread,
Feb 25, 2011, 4:55:57 AM2/25/11
to mongod...@googlegroups.com, Kristina Chodorow
sadly, its still happening

> db.version()
1.6.5

and

MongoDB 0.39 0.42
KRISTINA/MongoDB-0.42.tar.gz


(additionally, there seems to be a missing dependency for 0.42 -
Try::Tiny isn't installed on this test box)


[Fri Feb 25 14:35:09 2011] [error] [client 192.168.1.51]
****** starting MongoDBPlugin..

----------------------------------------------------------------------------------
get_collection(system.profile)$VAR1 = {
'info' => 'query foswiki.system.profile reslen:36
nscanned:0
query: {} nreturned:0 bytes:20',
'millis' => '0',
'ts' => bless( {
'local_rd_secs' => 16480,
'local_rd_days' => 734193,
'rd_nanosecs' => 0,
'locale' => bless( {

'default_time_format_length' => 'medium',
'native_territory' =>
'United States',
'native_language' =>
'English',
'native_complete_name'
=> 'English United States',
'en_language' =>
'English',
'id' => 'en_US',

'default_date_format_length' => 'medium',
'en_complete_name' =>
'English United States',
'en_territory' =>
'United States'
},
'DateTime::Locale::en_US' ),
'local_c' => {
'hour' => 4,
'second' => 40,
'month' => 2,
'quarter' => 1,
'day_of_year' => 56,
'day_of_quarter' => 56,
'minute' => 34,
'day' => 25,
'day_of_week' => 5,
'year' => 2011
},
'utc_rd_secs' => 16480,
'formatter' => undef,
'tz' => bless( {
'name' => 'UTC'
},
'DateTime::TimeZone::UTC' ),
'utc_year' => 2012,
'utc_rd_days' => 734193,
'offset_modifier' => 0
}, 'DateTime' )
};
$VAR2 = {
'info' => 'query foswiki.$cmd ntoreturn:1 command: { profile:
0 } reslen:74 bytes:58',
'millis' => '0',
'ts' => bless( {
'local_rd_secs' => 16480,
'local_rd_days' => 734193,
'rd_nanosecs' => 0,
'locale' => $VAR1->{'ts'}{'locale'},
'local_c' => {
'hour' => 4,
'second' => 40,
'month' => 2,
'quarter' => 1,
'day_of_year' => 56,
'day_of_quarter' => 56,
'minute' => 34,
'day' => 25,
'day_of_week' => 5,
'year' => 2011
},
'utc_rd_secs' => 16480,
'formatter' => undef,
'tz' => bless( {
'name' => 'UTC'
},
'DateTime::TimeZone::UTC' ),
'utc_year' => 2012,
'utc_rd_days' => 734193,
'offset_modifier' => 0
}, 'DateTime' )
};
$VAR3 = {
'info' => 'query foswiki.system.profile reslen:280
nscanned:2
query: {} nreturned:2 bytes:264',
'millis' => '0',
'ts' => bless( {
'local_rd_secs' => 16507,
'local_rd_days' => 734193,
'rd_nanosecs' => 0,
'locale' => $VAR1->{'ts'}{'locale'},
'local_c' => {
'hour' => 4,
'second' => 7,
'month' => 2,
'quarter' => 1,
'day_of_year' => 56,
'day_of_quarter' => 56,
'minute' => 35,
'day' => 25,
'day_of_week' => 5,
'year' => 2011
},
'utc_rd_secs' => 16507,
'formatter' => undef,
'tz' => bless( {
'name' => 'UTC'
},
'DateTime::TimeZone::UTC' ),
'utc_year' => 2012,
'utc_rd_days' => 734193,
'offset_modifier' => 0
}, 'DateTime' )
};
$VAR4 = {
'info' => 'query foswiki.$cmd ntoreturn:1 command: { profile:
0 } reslen:74 bytes:58',
'millis' => '0',
'ts' => bless( {
'local_rd_secs' => 16507,
'local_rd_days' => 734193,
'rd_nanosecs' => 0,
'locale' => $VAR1->{'ts'}{'locale'},
'local_c' => {
'hour' => 4,
'second' => 7,
'month' => 2,
'quarter' => 1,
'day_of_year' => 56,
'day_of_quarter' => 56,
'minute' => 35,
'day' => 25,
'day_of_week' => 5,
'year' => 2011
},
'utc_rd_secs' => 16507,
'formatter' => undef,
'tz' => bless( {
'name' => 'UTC'
},
'DateTime::TimeZone::UTC' ),
'utc_year' => 2012,
'utc_rd_days' => 734193,
'offset_modifier' => 0
}, 'DateTime' )
};

----------------------------------------------------------------------------------
found 0
ERROR: cursor count == 0, but cursor->has_next is true

Sven Dowideit

unread,
Feb 25, 2011, 5:05:10 AM2/25/11
to mongod...@googlegroups.com, Kristina Chodorow
Issue is also still present after upgrading the driver to 0.42


basic code path is

#debugging for upstream
print STDERR
"----------------------------------------------------------------------------------\n";
my $connection = $self->_connect();
my $database = $connection->get_database( $self->{database} );
$database->run_command({"profile" => 2});

my $cursor = $collection->query( $ixhQuery, $queryAttrs );


print STDERR "found "
. $cursor->count
. " _BUT_ has_next is "
. ( $cursor->has_next() ? 'true' : 'false' ) . "\n"
if MONITOR;

#more debugging
print STDERR
"get_collection(system.profile)".Dumper($database->get_collection("system.profile")->find->all)."\n";
$database->run_command({"profile" => 0});
print STDERR
"----------------------------------------------------------------------------------\n";


and then later, to get an actualy count, I have to iterate over the
collection - slow, nasty and worse, means I have to iterate over the
large data twice - presumably hitting the MongoDB server twice.

I'm still presuming that I'm doing something wrong - so if you want to
poke the code, its all at

http://svn.foswiki.org/trunk/MongoDBPlugin/lib/

(though it is a small part of a rather large project)

Cheers& Beers
Sven

Kristina Chodorow

unread,
Feb 25, 2011, 6:37:27 PM2/25/11
to Sven Dowideit, mongod...@googlegroups.com
Weird, can you send the output of Perl -V?

You can always run $database->run_command({"count" => $collectionName, "query" => $ixhQuery}) to get the count, $cursor->count just calls that. 

Sven Dowideit

unread,
Feb 27, 2011, 5:46:01 AM2/27/11
to Kristina Chodorow, mongod...@googlegroups.com
interesting.

using the run_command does in fact get a number

found 0 (long_count = $VAR1 = {
'ok' => '1',
'n' => '95'
};
) _BUT_ has_next is true


I presume this is not something that is safe when running multiple
independent clients (as we are)

I've attached the perl -V - this is a reasonably clean debian system -
although by necessity I've used cpan to install the drivers and some
other dependencies.

Sven

perl_V.txt

Sven Dowideit

unread,
Feb 27, 2011, 6:03:59 AM2/27/11
to mongod...@googlegroups.com, Kristina Chodorow
and very very sadly, adding the run_command adds close to 200ms to the
query time - which is slower than just iterating over this small result
set twice.

(each of the 95 entries is a tad large, but retrieving the entire
document makes some sense in this context.

Sven

Kristina Chodorow

unread,
Feb 28, 2011, 8:38:53 PM2/28/11
to Sven Dowideit, mongod...@googlegroups.com
Running this command is actually identical to running $cursor->count() ($cursor->count just calls this command), if this command takes 200ms, so does $cursor->count. 

I have no idea why it would be turning 95 into 0, can you run:
use Devel::Peek;
Dump($db->run_command({count => ...}));

Sven Dowideit

unread,
Mar 2, 2011, 5:09:09 AM3/2/11
to Kristina Chodorow, mongod...@googlegroups.com
ok, I'm surprised - so you're saying that

my $cursor = $collection->query( $ixhQuery, $queryAttrs );

$cursor->count

causes the MongoDB server to go off and do the same query twice?
(intentionally????)

__HOWEVER__ it also appears not to be what is actually happening on my
server (more boring details after the Devel Dump)

----------------------------------------------------------------------------------
SV = RV(0xa8d2a5c) at 0xa8d2a50
REFCNT = 1
FLAGS = (TEMP,ROK)
RV = 0xa8d2bf0
SV = PVHV(0xa9397b4) at 0xa8d2bf0
REFCNT = 1
FLAGS = (SHAREKEYS)
ARRAY = 0xa5db6c8 (0:6, 1:2)
hash quality = 125.0%
KEYS = 2
FILL = 2
MAX = 7
RITER = -1
EITER = 0x0
Elt "ok" HASH = 0x7b427928
SV = NV(0x96de648) at 0xa8d2a20
REFCNT = 1
FLAGS = (NOK,pNOK)
NV = 1
Elt "n" HASH = 0xd98832f1
SV = NV(0x96de5d8) at 0xa8d2860
REFCNT = 1
FLAGS = (NOK,pNOK)
NV = 95

----------------------------------------------------------------------------------

When the code says:
my $cursor = $collection->query( $ixhQuery, $queryAttrs );

my $real_count = $cursor->count;

$real_count is zero (the error we're tracking), and the mongodb.log has

Wed Mar 2 14:43:57 [initandlisten] connection accepted from
127.0.0.1:44183 #9
Wed Mar 2 14:43:57 [conn9] query foswiki.current reslen:377057
nscanned:25670 { orderby: { _topic: 1 }, query: { _web:
"Lauries/GlossaryData", FIELD.Base.value: /^.*para.*$/ } } nreturned:95
197ms
Wed Mar 2 14:43:58 [conn9] end connection 127.0.0.1:44183

whereas if I add the run_command:


my $cursor = $collection->query( $ixhQuery, $queryAttrs );

my $long_count = $database->run_command({"count" => 'current',
"query" => $ixhQuery});
my $real_count = $cursor->count;

real_count = zero and long_count =95 the log file has

Wed Mar 2 14:47:14 [initandlisten] connection accepted from
127.0.0.1:44184 #10
Wed Mar 2 14:47:14 [conn10] query foswiki.$cmd ntoreturn:1 command:
{ count: "current", query: { _web: "Lauries/GlossaryData",
FIELD.Base.value: /^.*para.*$/ } } reslen:64 192ms
Wed Mar 2 14:47:14 [conn10] query foswiki.current reslen:377057
nscanned:25670 { orderby: { _topic: 1 }, query: { _web:
"Lauries/GlossaryData", FIELD.Base.value: /^.*para.*$/ } } nreturned:95
197ms
Wed Mar 2 14:47:15 [conn10] end connection 127.0.0.1:44184


now given that at the time that the log of the intended query is
written, we know that n=95, why doesn't the perl driver?

Sven

Sven Dowideit

unread,
Mar 2, 2011, 5:20:48 AM3/2/11
to Kristina Chodorow, mongod...@googlegroups.com
I think we might have just looked at something i hope is irrelevant.

basically, I am expecting to be able to write

my $cursor = $collection->query( $ixhQuery, $queryAttrs );

and have the cursor _know_ the count without doing yet another query on
the server.

given that calling

my $long_count = $database->run_command({"count" => 'current',
"query" => $ixhQuery});

generates an almost identical log entry, with an almost identical
timeing (in the query i use for testing 200ms), I'm presuming that its
essentially re-running the same thing

so, do i simply have an incorrect presumption (and a $cursor->count that
spuriously returns 0 due to not calling the run command?)?


Sven


On Mon, 2011-02-28 at 15:38 -0500, Kristina Chodorow wrote:

Kristina Chodorow

unread,
Mar 2, 2011, 5:58:05 AM3/2/11
to Sven Dowideit, mongod...@googlegroups.com
Unfortunately, the cursor can't _know_ what the count is.  The cursor just returns a batch of results... what if a terabyte of data matches your query?  MongoDB won't return it all at once and so the cursor isn't going to know about it all at the same time.

Doing a count isn't really the same as doing the query a second time.  You might want to consider adding an index if it's too slow.


so, do i simply have an incorrect presumption (and a $cursor->count that
spuriously returns 0 due to not calling the run command?)?

Did your code possibly override $cursor->count?  You can look at the source code: https://github.com/mongodb/mongo-perl-driver/blob/master/lib/MongoDB/Cursor.pm#L435 is the line where $cursor->count runs the database command. 

Sven Dowideit

unread,
Mar 2, 2011, 7:00:52 AM3/2/11
to mongod...@googlegroups.com, Kristina Chodorow
perhaps count could and should know when you _don't_ have a terabyte of
matches

i'm quite willing to accept that there are some instances where it won't
know due to scale

or do you mean to say that database->query() actually does almost
nothing, and then the actual query evaluation happens during the
cursor->has_next call?

>RANT MODE -
wrt adding index's - turns out that mongo is pretty hopeless at large
scale adhoc queries on adhoc documents - as there is a (really tiny)
limit on the number of indexes, and it refuses to give any results if
there is no index and the match set is really big.

I don't have the luxury of adding indexes for everything, as mongodb
only provides a tiny set - and users are able to create queries on
anything, and order by anything

so coming back to something we can work on now:
1 there appears to be a bug in cursor->count
2 calling query and then run_command(count works, but each of those
commands takes the mongodb server ~200ms
* is this expected - if so, i'll use it and move on with other
work
3 I'm still curious why the mongodb.log knows how many results I have
(presuming that the match set isn't too massive), but the cursor does
not

interestingly, calling run_command(count twice incurs the same penalty -
so i presume there is some scope for optimisation there?

Wed Mar 2 14:54:45 [conn12] query foswiki.$cmd ntoreturn:1 command:


{ count: "current", query: { _web: "Lauries/GlossaryData",
FIELD.Base.value: /^.*para.*$/ } } reslen:64 192ms

Wed Mar 2 14:54:46 [conn12] query foswiki.current reslen:377057


nscanned:25670 { orderby: { _topic: 1 }, query: { _web:
"Lauries/GlossaryData", FIELD.Base.value: /^.*para.*$/ } } nreturned:95
197ms

Wed Mar 2 14:54:46 [conn12] query foswiki.$cmd ntoreturn:1 command:


{ count: "current", query: { _web: "Lauries/GlossaryData",

FIELD.Base.value: /^.*para.*$/ } } reslen:64 193ms


Sven

> --
> You received this message because you are subscribed to the Google
> Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com.
> To unsubscribe from this group, send email to mongodb-user

> +unsub...@googlegroups.com.

Kristina Chodorow

unread,
Mar 2, 2011, 3:29:13 PM3/2/11
to Sven Dowideit, mongod...@googlegroups.com
perhaps count could and should know when you _don't_ have a terabyte of
matches

I could give you access to the number of documents returned if there were only one batch of results.  However, suppose you do a query, insert some new documents, and then do a count?  It seems counter-intuitive to have the new documents not show up in the count.

However, I'm willing to implement this as some separate function if people want.

or do you mean to say that database->query() actually does almost
nothing, and then the actual query evaluation happens during the
cursor->has_next call?

This is how it works, but somewhat unrelated to the problem you're having.  Queries are executed lazily.
 
>RANT MODE -
wrt adding index's - turns out that mongo is pretty hopeless at large
scale adhoc queries on adhoc documents - as there is a (really tiny)
limit on the number of indexes, and it refuses to give any results if
there is no index and the match set is really big.

I don't have the luxury of adding indexes for everything, as mongodb
only provides a tiny set - and users are able to create queries on
anything, and order by anything

This is true of any database... if you're doing random reads across a bigger data set than can fit in memory, you're going to get bound by disk seeks and paging.  If you can design your schema or application to have most queries hit certain fields or in a certain pattern, you can create an index and take advantage of that.
 
so coming back to something we can work on now:
  1 there appears to be a bug in cursor->count

Would you be willing to try a debug build of the driver?  I'm kind of mystified about this.
 
  2 calling query and then run_command(count works, but each of those
commands takes the mongodb server ~200ms
     * is this expected - if so, i'll use it and move on with other
work

It depends on your data... see my comment below about running explain().
 
  3 I'm still curious why the mongodb.log knows how many results I have
(presuming that the match set isn't too massive), but the cursor does
not

If you had, say 700 results, mongod wold say nreturned: 95 (or whatever fit in the first response), not the total 700.
 
interestingly, calling run_command(count twice incurs the same penalty -
so i presume there is some scope for optimisation there?

Wed Mar  2 14:54:45 [conn12] query foswiki.$cmd ntoreturn:1 command:
{ count: "current", query: { _web: "Lauries/GlossaryData",
FIELD.Base.value: /^.*para.*$/ } } reslen:64 192ms
Wed Mar  2 14:54:46 [conn12] query foswiki.current reslen:377057
nscanned:25670 { orderby: { _topic: 1 }, query: { _web:
"Lauries/GlossaryData", FIELD.Base.value: /^.*para.*$/ } }  nreturned:95
197ms
Wed Mar  2 14:54:46 [conn12] query foswiki.$cmd ntoreturn:1 command:
{ count: "current", query: { _web: "Lauries/GlossaryData",
FIELD.Base.value: /^.*para.*$/ } } reslen:64 193ms

It sounds like it just has to go through a lot of data. If you paste an explain() on the query, we can see if there are any optimizations that could be made.


 

Sven Dowideit

unread,
Mar 3, 2011, 4:03:24 AM3/3/11
to Kristina Chodorow, mongod...@googlegroups.com
oh my :)

to summarise - yes, I run debian 32bit on my test box, with 1.6 atm, I'm
willing to run a debug build, or whatever else we can to to try to track
this down..

Sven


On Wed, 2011-03-02 at 10:29 -0500, Kristina Chodorow wrote:
>
> perhaps count could and should know when you _don't_ have a
> terabyte of
> matches
>
> I could give you access to the number of documents returned if there
> were only one batch of results. However, suppose you do a query,
> insert some new documents, and then do a count? It seems
> counter-intuitive to have the new documents not show up in the count.
>
> However, I'm willing to implement this as some separate function if
> people want.

I would __love__ to have query return whatever count it has in the first
batch, and a boolean to say, this is all - or not (as an incomplete
number is still useful.

actually, if it could say 3 things
1 count of matches found so far
2 count of documents considered
3 count of documents not yet considered

then we'd have the ability to show meaningful info to the user / creator
of the adhoc query

ish. there is huge difference between what boring databases do - which
is to slowly generate a result, and what mongodb does - which is to
throw an error and say 'its too hard'

giving up is a problem for adhoc queries on adhoc document sets - there
is no way to pre-optimise - users are able to import _anything_ into the
system, and then make complex queries on any combination of that.

(and yes, i'm ok with this being something that is a longer term project
for mongodb - as I fully understand how complex a use case it is to deal
with - I've worked in that area before - in an enterprise 4GL sense

>
> so coming back to something we can work on now:
> 1 there appears to be a bug in cursor->count
>
> Would you be willing to try a debug build of the driver? I'm kind of
> mystified about this.

yes, I'm very willing to do anything that might help us make Mongo more
capable :)

>
> 2 calling query and then run_command(count works, but each
> of those
> commands takes the mongodb server ~200ms
> * is this expected - if so, i'll use it and move on with
> other
> work
>
> It depends on your data... see my comment below about running
> explain().
>
> 3 I'm still curious why the mongodb.log knows how many
> results I have
> (presuming that the match set isn't too massive), but the
> cursor does
> not
>
> If you had, say 700 results, mongod wold say nreturned: 95 (or
> whatever fit in the first response), not the total 700.

ah, fair enough - see above for the 3 info elements that would help me


>
> interestingly, calling run_command(count twice incurs the same
> penalty -
> so i presume there is some scope for optimisation there?
>
> Wed Mar 2 14:54:45 [conn12] query foswiki.$cmd ntoreturn:1
> command:
> { count: "current", query: { _web: "Lauries/GlossaryData",
> FIELD.Base.value: /^.*para.*$/ } } reslen:64 192ms
>
> Wed Mar 2 14:54:46 [conn12] query foswiki.current
> reslen:377057
> nscanned:25670 { orderby: { _topic: 1 }, query: { _web:
> "Lauries/GlossaryData", FIELD.Base.value: /^.*para.*$/ } }
> nreturned:95
> 197ms
>
> Wed Mar 2 14:54:46 [conn12] query foswiki.$cmd ntoreturn:1
> command:
> { count: "current", query: { _web: "Lauries/GlossaryData",
>
> FIELD.Base.value: /^.*para.*$/ } } reslen:64 193ms
>
> It sounds like it just has to go through a lot of data. If you paste
> an explain() on the query, we can see if there are any optimizations
> that could be made.

y, I'll have to do that some time - though optimising this particular
query doesn't help if it then disadvantages any other query (and we
don't get a priori knowledge of the queries)

Kristina Chodorow

unread,
Mar 3, 2011, 10:42:45 PM3/3/11
to Sven Dowideit, mongod...@googlegroups.com
Great!  I've pushed a version with a bunch of debugging output to https://github.com/mongodb/mongo-perl-driver/tree/debug.  You'll need to have Data::Dumper and Devel::Peek installed.  Can you install it, run a count that returns 0 incorrectly, and send the output?


actually, if it could say 3 things
  1 count of matches found so far
  2 count of documents considered
  3 count of documents not yet considered

I can add 1 and let you know when the cursor has fetched the last batch of results. 

For 2 and 3, do you mean all documents in the collection?  You can get that by doing a count with no criteria, which should be very fast.  If you mean just the matches... MongoDB finds those as you need them, so the driver can't tell you ahead of time.


ish. there is huge difference between what boring databases do - which
is to slowly generate a result, and what mongodb does - which is to
throw an error and say 'its too hard'

giving up is a problem for adhoc queries on adhoc document sets - there
is no way to pre-optimise - users are able to import _anything_ into the
system, and then make complex queries on any combination of that.

I'm not sure what you mean here... if queries are timing out you can remove the cursor timeout.

Sven Dowideit

unread,
Mar 4, 2011, 5:02:55 AM3/4/11
to Kristina Chodorow, mongod...@googlegroups.com
Thanks - I'll test it in the next week - when you're busy, there's
always more new things todo :)

I've added a little more inline to continue the 42 issues and
conversations :)

Sven
I have to say - thankyou very very much for the support


On Thu, 2011-03-03 at 17:42 -0500, Kristina Chodorow wrote:
> Great! I've pushed a version with a bunch of debugging output to
> https://github.com/mongodb/mongo-perl-driver/tree/debug. You'll need
> to have Data::Dumper and Devel::Peek installed. Can you install it,
> run a count that returns 0 incorrectly, and send the output?
>
> actually, if it could say 3 things
> 1 count of matches found so far
> 2 count of documents considered
> 3 count of documents not yet considered
>
> I can add 1 and let you know when the cursor has fetched the last
> batch of results.
>
> For 2 and 3, do you mean all documents in the collection? You can get
> that by doing a count with no criteria, which should be very fast. If
> you mean just the matches... MongoDB finds those as you need them, so
> the driver can't tell you ahead of time.

1 and 'fetched last' is good - what I meant with 2, is that each batch
will have considered and rejected a certain number of documents -
knowing that and the total number will allow more detail than just
'fetched last' - but it presumes the server tracks that.


>
> ish. there is huge difference between what boring databases do
> - which
> is to slowly generate a result, and what mongodb does - which
> is to
> throw an error and say 'its too hard'
>
> giving up is a problem for adhoc queries on adhoc document
> sets - there
> is no way to pre-optimise - users are able to import
> _anything_ into the
> system, and then make complex queries on any combination of
> that.
>

> I'm not sure what you mean here... if queries are timing out you can
> remove the cursor timeout.

ah, thankyou - almost :) What I'm running into is (i think) more
insidious:

Could not perform search. Error was: query error: too much data for
sort() with no index. add an index or specify a smaller limit

and in my context, if a user asks for it, I should darn well give it to
them (clearly, worst case is to limit to something small and chunk it on
the client)

Kristina Chodorow

unread,
Mar 4, 2011, 10:27:05 PM3/4/11
to Sven Dowideit, mongod...@googlegroups.com
1 and 'fetched last' is good - what I meant with 2, is that each batch
will have considered and rejected a certain number of documents -
knowing that and the total number will allow more detail than just
'fetched last' - but it presumes the server tracks that.

I committed a MongoDB::Cursor::info method to which will give you all of the information the cursor has available (http://api.mongodb.org/perl/MongoDB/Cursor.html#info).  It'll be available in 0.43.  You're probably interested in cursor_id and num fields.  The database does keep track of 2, but unfortunately it doesn't pass that info to the driver.


ah, thankyou - almost :) What I'm running into is (i think) more
insidious:

Could not perform search. Error was: query error: too much data for
sort() with no index. add an index or specify a smaller limit

and in my context, if a user asks for it, I should darn well give it to
them (clearly, worst case is to limit to something small and chunk it on
the client)

Oh, I see.  Yes, that's because MongoDB does un-indexed sorts in memory.
Reply all
Reply to author
Forward
0 new messages