In working towards a MongoDB backend for Foswiki, I've come across something
that I hope is a bug.
When I call $cursor->count, I always get zero - has_next and next return
results, but count() does not.
Additionally, is $cursor->explain supposed to be destructive? Once I've run
it, $cursor->next returns a cursor, not an item :/
Cheers
Sven
excellent :) thanks for the fix for explain()
I've thrown together some output from insider foswiki - though I hope
this issue is not environmental :/
use Data::Dumper;
print STDERR "GET THE count for $collectionName:
".Dumper($database->run_command({"count" => $collectionName}))."\n";
returns
GET THE count for current: $VAR1 = {
'ok' => '1',
'n' => '79169'
};
from the console I get:
> db.current.find({ _web: "TemporarySEARCHTestWebSEARCH" },
{_topic:1}).sort({})
{ "_id" : ObjectId("4cd33224809feb0d31bc8215"), "_topic" : "OkATopic" }
{ "_id" : ObjectId("4cd33224809feb0d31bc8216"), "_topic" : "OkBTopic" }
{ "_id" : ObjectId("4cd33224809feb0d31bc8214"), "_topic" : "OkTopic" }
{ "_id" : ObjectId("4cd33224809feb0d31bc8213"), "_topic" :
"TestTopicSEARCH" }
{ "_id" : ObjectId("4cd33223809feb0d31bc8212"), "_topic" :
"WebPreferences" }
> db.current.find({ _web: "TemporarySEARCHTestWebSEARCH" },
{_topic:1}).sort({}).count()
5
on the other hand
print STDERR "searching mongo : ".Dumper($ixhQuery)." ,
".Dumper($queryAttrs)."\n" if MONITOR;
my $cursor = $collection->query($ixhQuery, $queryAttrs);
print STDERR "found " . $cursor->count . " _BUT_ has_next is
".($cursor->has_next()?'true':'false')."\n" if MONITOR;
results in
searching mongo : $VAR1 = bless( [
{
'_raw_text' => '1',
'_web' => '0'
},
[
'_web',
'_raw_text'
],
[
'TemporarySEARCHTestWebSEARCH',
qr/(?i-xsm:%META:TOPICPARENT[{].*name=\"(TemporarySEARCHTestWebSEARCH\.)?TestCaseAutoFormattedSearch\".*[}]%)/
],
0
], 'Tie::IxHash' );
, $VAR1 = {
'sort_by' => {
'_topic' => 1
}
};
found 0 _BUT_ has_next is true
Sven
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEARECAAYFAkzTNXAACgkQPAwzu0QrW+lojACeJFW7TuqDGYtDSVJ8tbuo9g25
yJEAnRIFjxgdjp+FE3RIOJyfyQHttT1w
=90iz
-----END PGP SIGNATURE-----
Unfortunately, you're not right :)
the count returns zero, while has_next returns true (and then next()
returns the values i'm expecting)
the output you see is a Data::Dumper of an ixHash, so its a tad wonkey
Sven
iEYEARECAAYFAkzYuekACgkQPAwzu0QrW+mjBQCgnNyQK6byTLYK2mTvwhNljiYP
hKQAoLkpnOwgR1rk5A2g3xBKswvpCxH9
=B45I
-----END PGP SIGNATURE-----
--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
Argh,
I finally got back to this, and added the profiling.
before i did it, I was still experiencing the problem, and since turning
profile=>2, and then back again, count seems to be returning the right
number.
so... there is something odd happening, but I will probably have to
delete this mongo instance and start fresh again anyway, and then I'll
come back with more details (hopefully it was/is just a problem with
this old and upgraded db)
Cheers&Beers
Sven
On 12/11/10 14:19, Kristina Chodorow wrote:
> Whoops, you're correct. Could you run:
>
> $db->run_command({"profile" => 2});
> # do count
> print Dumper($db->get_collection("system.profile")->find->all);
>
> and paste it here? (You should turn the profiling off again afterwards or
> it'll slow things down, you can do so with run_command({profile=>0}).)
>
>
>>>
iEYEARECAAYFAk0O0YsACgkQPAwzu0QrW+n+jACeJG+VYqRWuH5/ySUnEfXZmrc0
6a0An3j26NQcCGxfbXkAOzvkszcn94FD
=ThyU
-----END PGP SIGNATURE-----
> db.version()
1.6.5
and
MongoDB 0.39 0.42
KRISTINA/MongoDB-0.42.tar.gz
(additionally, there seems to be a missing dependency for 0.42 -
Try::Tiny isn't installed on this test box)
[Fri Feb 25 14:35:09 2011] [error] [client 192.168.1.51]
****** starting MongoDBPlugin..
----------------------------------------------------------------------------------
get_collection(system.profile)$VAR1 = {
'info' => 'query foswiki.system.profile reslen:36
nscanned:0
query: {} nreturned:0 bytes:20',
'millis' => '0',
'ts' => bless( {
'local_rd_secs' => 16480,
'local_rd_days' => 734193,
'rd_nanosecs' => 0,
'locale' => bless( {
'default_time_format_length' => 'medium',
'native_territory' =>
'United States',
'native_language' =>
'English',
'native_complete_name'
=> 'English United States',
'en_language' =>
'English',
'id' => 'en_US',
'default_date_format_length' => 'medium',
'en_complete_name' =>
'English United States',
'en_territory' =>
'United States'
},
'DateTime::Locale::en_US' ),
'local_c' => {
'hour' => 4,
'second' => 40,
'month' => 2,
'quarter' => 1,
'day_of_year' => 56,
'day_of_quarter' => 56,
'minute' => 34,
'day' => 25,
'day_of_week' => 5,
'year' => 2011
},
'utc_rd_secs' => 16480,
'formatter' => undef,
'tz' => bless( {
'name' => 'UTC'
},
'DateTime::TimeZone::UTC' ),
'utc_year' => 2012,
'utc_rd_days' => 734193,
'offset_modifier' => 0
}, 'DateTime' )
};
$VAR2 = {
'info' => 'query foswiki.$cmd ntoreturn:1 command: { profile:
0 } reslen:74 bytes:58',
'millis' => '0',
'ts' => bless( {
'local_rd_secs' => 16480,
'local_rd_days' => 734193,
'rd_nanosecs' => 0,
'locale' => $VAR1->{'ts'}{'locale'},
'local_c' => {
'hour' => 4,
'second' => 40,
'month' => 2,
'quarter' => 1,
'day_of_year' => 56,
'day_of_quarter' => 56,
'minute' => 34,
'day' => 25,
'day_of_week' => 5,
'year' => 2011
},
'utc_rd_secs' => 16480,
'formatter' => undef,
'tz' => bless( {
'name' => 'UTC'
},
'DateTime::TimeZone::UTC' ),
'utc_year' => 2012,
'utc_rd_days' => 734193,
'offset_modifier' => 0
}, 'DateTime' )
};
$VAR3 = {
'info' => 'query foswiki.system.profile reslen:280
nscanned:2
query: {} nreturned:2 bytes:264',
'millis' => '0',
'ts' => bless( {
'local_rd_secs' => 16507,
'local_rd_days' => 734193,
'rd_nanosecs' => 0,
'locale' => $VAR1->{'ts'}{'locale'},
'local_c' => {
'hour' => 4,
'second' => 7,
'month' => 2,
'quarter' => 1,
'day_of_year' => 56,
'day_of_quarter' => 56,
'minute' => 35,
'day' => 25,
'day_of_week' => 5,
'year' => 2011
},
'utc_rd_secs' => 16507,
'formatter' => undef,
'tz' => bless( {
'name' => 'UTC'
},
'DateTime::TimeZone::UTC' ),
'utc_year' => 2012,
'utc_rd_days' => 734193,
'offset_modifier' => 0
}, 'DateTime' )
};
$VAR4 = {
'info' => 'query foswiki.$cmd ntoreturn:1 command: { profile:
0 } reslen:74 bytes:58',
'millis' => '0',
'ts' => bless( {
'local_rd_secs' => 16507,
'local_rd_days' => 734193,
'rd_nanosecs' => 0,
'locale' => $VAR1->{'ts'}{'locale'},
'local_c' => {
'hour' => 4,
'second' => 7,
'month' => 2,
'quarter' => 1,
'day_of_year' => 56,
'day_of_quarter' => 56,
'minute' => 35,
'day' => 25,
'day_of_week' => 5,
'year' => 2011
},
'utc_rd_secs' => 16507,
'formatter' => undef,
'tz' => bless( {
'name' => 'UTC'
},
'DateTime::TimeZone::UTC' ),
'utc_year' => 2012,
'utc_rd_days' => 734193,
'offset_modifier' => 0
}, 'DateTime' )
};
----------------------------------------------------------------------------------
found 0
ERROR: cursor count == 0, but cursor->has_next is true
basic code path is
#debugging for upstream
print STDERR
"----------------------------------------------------------------------------------\n";
my $connection = $self->_connect();
my $database = $connection->get_database( $self->{database} );
$database->run_command({"profile" => 2});
my $cursor = $collection->query( $ixhQuery, $queryAttrs );
print STDERR "found "
. $cursor->count
. " _BUT_ has_next is "
. ( $cursor->has_next() ? 'true' : 'false' ) . "\n"
if MONITOR;
#more debugging
print STDERR
"get_collection(system.profile)".Dumper($database->get_collection("system.profile")->find->all)."\n";
$database->run_command({"profile" => 0});
print STDERR
"----------------------------------------------------------------------------------\n";
and then later, to get an actualy count, I have to iterate over the
collection - slow, nasty and worse, means I have to iterate over the
large data twice - presumably hitting the MongoDB server twice.
I'm still presuming that I'm doing something wrong - so if you want to
poke the code, its all at
http://svn.foswiki.org/trunk/MongoDBPlugin/lib/
(though it is a small part of a rather large project)
Cheers& Beers
Sven
using the run_command does in fact get a number
found 0 (long_count = $VAR1 = {
'ok' => '1',
'n' => '95'
};
) _BUT_ has_next is true
I presume this is not something that is safe when running multiple
independent clients (as we are)
I've attached the perl -V - this is a reasonably clean debian system -
although by necessity I've used cpan to install the drivers and some
other dependencies.
Sven
(each of the 95 entries is a tad large, but retrieving the entire
document makes some sense in this context.
Sven
my $cursor = $collection->query( $ixhQuery, $queryAttrs );
$cursor->count
causes the MongoDB server to go off and do the same query twice?
(intentionally????)
__HOWEVER__ it also appears not to be what is actually happening on my
server (more boring details after the Devel Dump)
----------------------------------------------------------------------------------
SV = RV(0xa8d2a5c) at 0xa8d2a50
REFCNT = 1
FLAGS = (TEMP,ROK)
RV = 0xa8d2bf0
SV = PVHV(0xa9397b4) at 0xa8d2bf0
REFCNT = 1
FLAGS = (SHAREKEYS)
ARRAY = 0xa5db6c8 (0:6, 1:2)
hash quality = 125.0%
KEYS = 2
FILL = 2
MAX = 7
RITER = -1
EITER = 0x0
Elt "ok" HASH = 0x7b427928
SV = NV(0x96de648) at 0xa8d2a20
REFCNT = 1
FLAGS = (NOK,pNOK)
NV = 1
Elt "n" HASH = 0xd98832f1
SV = NV(0x96de5d8) at 0xa8d2860
REFCNT = 1
FLAGS = (NOK,pNOK)
NV = 95
----------------------------------------------------------------------------------
When the code says:
my $cursor = $collection->query( $ixhQuery, $queryAttrs );
my $real_count = $cursor->count;
$real_count is zero (the error we're tracking), and the mongodb.log has
Wed Mar 2 14:43:57 [initandlisten] connection accepted from
127.0.0.1:44183 #9
Wed Mar 2 14:43:57 [conn9] query foswiki.current reslen:377057
nscanned:25670 { orderby: { _topic: 1 }, query: { _web:
"Lauries/GlossaryData", FIELD.Base.value: /^.*para.*$/ } } nreturned:95
197ms
Wed Mar 2 14:43:58 [conn9] end connection 127.0.0.1:44183
whereas if I add the run_command:
my $cursor = $collection->query( $ixhQuery, $queryAttrs );
my $long_count = $database->run_command({"count" => 'current',
"query" => $ixhQuery});
my $real_count = $cursor->count;
real_count = zero and long_count =95 the log file has
Wed Mar 2 14:47:14 [initandlisten] connection accepted from
127.0.0.1:44184 #10
Wed Mar 2 14:47:14 [conn10] query foswiki.$cmd ntoreturn:1 command:
{ count: "current", query: { _web: "Lauries/GlossaryData",
FIELD.Base.value: /^.*para.*$/ } } reslen:64 192ms
Wed Mar 2 14:47:14 [conn10] query foswiki.current reslen:377057
nscanned:25670 { orderby: { _topic: 1 }, query: { _web:
"Lauries/GlossaryData", FIELD.Base.value: /^.*para.*$/ } } nreturned:95
197ms
Wed Mar 2 14:47:15 [conn10] end connection 127.0.0.1:44184
now given that at the time that the log of the intended query is
written, we know that n=95, why doesn't the perl driver?
Sven
basically, I am expecting to be able to write
my $cursor = $collection->query( $ixhQuery, $queryAttrs );
and have the cursor _know_ the count without doing yet another query on
the server.
given that calling
my $long_count = $database->run_command({"count" => 'current',
"query" => $ixhQuery});
generates an almost identical log entry, with an almost identical
timeing (in the query i use for testing 200ms), I'm presuming that its
essentially re-running the same thing
so, do i simply have an incorrect presumption (and a $cursor->count that
spuriously returns 0 due to not calling the run command?)?
Sven
On Mon, 2011-02-28 at 15:38 -0500, Kristina Chodorow wrote:
so, do i simply have an incorrect presumption (and a $cursor->count that
spuriously returns 0 due to not calling the run command?)?
i'm quite willing to accept that there are some instances where it won't
know due to scale
or do you mean to say that database->query() actually does almost
nothing, and then the actual query evaluation happens during the
cursor->has_next call?
>RANT MODE -
wrt adding index's - turns out that mongo is pretty hopeless at large
scale adhoc queries on adhoc documents - as there is a (really tiny)
limit on the number of indexes, and it refuses to give any results if
there is no index and the match set is really big.
I don't have the luxury of adding indexes for everything, as mongodb
only provides a tiny set - and users are able to create queries on
anything, and order by anything
so coming back to something we can work on now:
1 there appears to be a bug in cursor->count
2 calling query and then run_command(count works, but each of those
commands takes the mongodb server ~200ms
* is this expected - if so, i'll use it and move on with other
work
3 I'm still curious why the mongodb.log knows how many results I have
(presuming that the match set isn't too massive), but the cursor does
not
interestingly, calling run_command(count twice incurs the same penalty -
so i presume there is some scope for optimisation there?
Wed Mar 2 14:54:45 [conn12] query foswiki.$cmd ntoreturn:1 command:
{ count: "current", query: { _web: "Lauries/GlossaryData",
FIELD.Base.value: /^.*para.*$/ } } reslen:64 192ms
Wed Mar 2 14:54:46 [conn12] query foswiki.current reslen:377057
nscanned:25670 { orderby: { _topic: 1 }, query: { _web:
"Lauries/GlossaryData", FIELD.Base.value: /^.*para.*$/ } } nreturned:95
197ms
Wed Mar 2 14:54:46 [conn12] query foswiki.$cmd ntoreturn:1 command:
{ count: "current", query: { _web: "Lauries/GlossaryData",
FIELD.Base.value: /^.*para.*$/ } } reslen:64 193ms
Sven
> --
> You received this message because you are subscribed to the Google
> Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com.
> To unsubscribe from this group, send email to mongodb-user
perhaps count could and should know when you _don't_ have a terabyte of
matches
or do you mean to say that database->query() actually does almost
nothing, and then the actual query evaluation happens during the
cursor->has_next call?
>RANT MODE -
wrt adding index's - turns out that mongo is pretty hopeless at large
scale adhoc queries on adhoc documents - as there is a (really tiny)
limit on the number of indexes, and it refuses to give any results if
there is no index and the match set is really big.
I don't have the luxury of adding indexes for everything, as mongodb
only provides a tiny set - and users are able to create queries on
anything, and order by anything
so coming back to something we can work on now:
1 there appears to be a bug in cursor->count
2 calling query and then run_command(count works, but each of those
commands takes the mongodb server ~200ms
* is this expected - if so, i'll use it and move on with other
work
3 I'm still curious why the mongodb.log knows how many results I have
(presuming that the match set isn't too massive), but the cursor does
not
interestingly, calling run_command(count twice incurs the same penalty -
so i presume there is some scope for optimisation there?
Wed Mar 2 14:54:45 [conn12] query foswiki.$cmd ntoreturn:1 command:
{ count: "current", query: { _web: "Lauries/GlossaryData",Wed Mar 2 14:54:46 [conn12] query foswiki.current reslen:377057
FIELD.Base.value: /^.*para.*$/ } } reslen:64 192ms
nscanned:25670 { orderby: { _topic: 1 }, query: { _web:Wed Mar 2 14:54:46 [conn12] query foswiki.$cmd ntoreturn:1 command:
"Lauries/GlossaryData", FIELD.Base.value: /^.*para.*$/ } } nreturned:95
197ms
{ count: "current", query: { _web: "Lauries/GlossaryData",FIELD.Base.value: /^.*para.*$/ } } reslen:64 193ms
to summarise - yes, I run debian 32bit on my test box, with 1.6 atm, I'm
willing to run a debug build, or whatever else we can to to try to track
this down..
Sven
On Wed, 2011-03-02 at 10:29 -0500, Kristina Chodorow wrote:
>
> perhaps count could and should know when you _don't_ have a
> terabyte of
> matches
>
> I could give you access to the number of documents returned if there
> were only one batch of results. However, suppose you do a query,
> insert some new documents, and then do a count? It seems
> counter-intuitive to have the new documents not show up in the count.
>
> However, I'm willing to implement this as some separate function if
> people want.
I would __love__ to have query return whatever count it has in the first
batch, and a boolean to say, this is all - or not (as an incomplete
number is still useful.
actually, if it could say 3 things
1 count of matches found so far
2 count of documents considered
3 count of documents not yet considered
then we'd have the ability to show meaningful info to the user / creator
of the adhoc query
ish. there is huge difference between what boring databases do - which
is to slowly generate a result, and what mongodb does - which is to
throw an error and say 'its too hard'
giving up is a problem for adhoc queries on adhoc document sets - there
is no way to pre-optimise - users are able to import _anything_ into the
system, and then make complex queries on any combination of that.
(and yes, i'm ok with this being something that is a longer term project
for mongodb - as I fully understand how complex a use case it is to deal
with - I've worked in that area before - in an enterprise 4GL sense
>
> so coming back to something we can work on now:
> 1 there appears to be a bug in cursor->count
>
> Would you be willing to try a debug build of the driver? I'm kind of
> mystified about this.
yes, I'm very willing to do anything that might help us make Mongo more
capable :)
>
> 2 calling query and then run_command(count works, but each
> of those
> commands takes the mongodb server ~200ms
> * is this expected - if so, i'll use it and move on with
> other
> work
>
> It depends on your data... see my comment below about running
> explain().
>
> 3 I'm still curious why the mongodb.log knows how many
> results I have
> (presuming that the match set isn't too massive), but the
> cursor does
> not
>
> If you had, say 700 results, mongod wold say nreturned: 95 (or
> whatever fit in the first response), not the total 700.
ah, fair enough - see above for the 3 info elements that would help me
>
> interestingly, calling run_command(count twice incurs the same
> penalty -
> so i presume there is some scope for optimisation there?
>
> Wed Mar 2 14:54:45 [conn12] query foswiki.$cmd ntoreturn:1
> command:
> { count: "current", query: { _web: "Lauries/GlossaryData",
> FIELD.Base.value: /^.*para.*$/ } } reslen:64 192ms
>
> Wed Mar 2 14:54:46 [conn12] query foswiki.current
> reslen:377057
> nscanned:25670 { orderby: { _topic: 1 }, query: { _web:
> "Lauries/GlossaryData", FIELD.Base.value: /^.*para.*$/ } }
> nreturned:95
> 197ms
>
> Wed Mar 2 14:54:46 [conn12] query foswiki.$cmd ntoreturn:1
> command:
> { count: "current", query: { _web: "Lauries/GlossaryData",
>
> FIELD.Base.value: /^.*para.*$/ } } reslen:64 193ms
>
> It sounds like it just has to go through a lot of data. If you paste
> an explain() on the query, we can see if there are any optimizations
> that could be made.
y, I'll have to do that some time - though optimising this particular
query doesn't help if it then disadvantages any other query (and we
don't get a priori knowledge of the queries)
actually, if it could say 3 things
1 count of matches found so far
2 count of documents considered
3 count of documents not yet considered
ish. there is huge difference between what boring databases do - which
is to slowly generate a result, and what mongodb does - which is to
throw an error and say 'its too hard'
giving up is a problem for adhoc queries on adhoc document sets - there
is no way to pre-optimise - users are able to import _anything_ into the
system, and then make complex queries on any combination of that.
I've added a little more inline to continue the 42 issues and
conversations :)
Sven
I have to say - thankyou very very much for the support
On Thu, 2011-03-03 at 17:42 -0500, Kristina Chodorow wrote:
> Great! I've pushed a version with a bunch of debugging output to
> https://github.com/mongodb/mongo-perl-driver/tree/debug. You'll need
> to have Data::Dumper and Devel::Peek installed. Can you install it,
> run a count that returns 0 incorrectly, and send the output?
>
> actually, if it could say 3 things
> 1 count of matches found so far
> 2 count of documents considered
> 3 count of documents not yet considered
>
> I can add 1 and let you know when the cursor has fetched the last
> batch of results.
>
> For 2 and 3, do you mean all documents in the collection? You can get
> that by doing a count with no criteria, which should be very fast. If
> you mean just the matches... MongoDB finds those as you need them, so
> the driver can't tell you ahead of time.
1 and 'fetched last' is good - what I meant with 2, is that each batch
will have considered and rejected a certain number of documents -
knowing that and the total number will allow more detail than just
'fetched last' - but it presumes the server tracks that.
>
> ish. there is huge difference between what boring databases do
> - which
> is to slowly generate a result, and what mongodb does - which
> is to
> throw an error and say 'its too hard'
>
> giving up is a problem for adhoc queries on adhoc document
> sets - there
> is no way to pre-optimise - users are able to import
> _anything_ into the
> system, and then make complex queries on any combination of
> that.
>
> I'm not sure what you mean here... if queries are timing out you can
> remove the cursor timeout.
ah, thankyou - almost :) What I'm running into is (i think) more
insidious:
Could not perform search. Error was: query error: too much data for
sort() with no index. add an index or specify a smaller limit
and in my context, if a user asks for it, I should darn well give it to
them (clearly, worst case is to limit to something small and chunk it on
the client)
1 and 'fetched last' is good - what I meant with 2, is that each batch
will have considered and rejected a certain number of documents -
knowing that and the total number will allow more detail than just
'fetched last' - but it presumes the server tracks that.
ah, thankyou - almost :) What I'm running into is (i think) more
insidious:
Could not perform search. Error was: query error: too much data for
sort() with no index. add an index or specify a smaller limit
and in my context, if a user asks for it, I should darn well give it to
them (clearly, worst case is to limit to something small and chunk it on
the client)