Mongdb - C# Driver - limiting results from distinct function

4,255 views
Skip to first unread message

Sam Martin

unread,
Apr 19, 2012, 7:20:42 PM4/19/12
to mongod...@googlegroups.com
Take this example,

var query = Query.EQ("x","123");
database.GetCollection<MyDoc>("MyCollection").Distinct("Field",query);

Distinct returns an IEnumerable<BSonValue> - does anyone know how to do run this using a cursor, so that I can take only the first x rows?
Thanks in advance,
Sam

Robert Stam

unread,
Apr 20, 2012, 12:13:16 AM4/20/12
to mongod...@googlegroups.com
At a lower level distinct is a database command and not a regular query. See:


What that means is that all the distinct values are returned at once in a single document, so there is no way to use the cursor to limit the number of distinct values returned.

--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To view this discussion on the web visit https://groups.google.com/d/msg/mongodb-user/-/Pyg12gUpryMJ.
To post to this group, send email to mongod...@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.

Sam Martin

unread,
Apr 20, 2012, 4:17:43 AM4/20/12
to mongod...@googlegroups.com
Hi Robert,  thanks for answer - I was aware of the that doc.

It's obviously not a C# driver issue, but are you aware of any alternative technique that would give me the same result?

It doesn't make sense to take only a handful of results from 100,000's of records.

If there are no additional parameters to the distinct command, then the only other way I can think of doing it is managing a distinct collection generated from a map/reduce?

any thoughts/tips on this?

Robert Stam

unread,
Apr 20, 2012, 8:41:57 AM4/20/12
to mongod...@googlegroups.com
If there are 100,000 distinct values and you only want 100 of them which 100 do you want? A random 100?

There's probably something you can do with map/reduce, but how is going depend a lot to your answer to the above question.

Sam Martin

unread,
Apr 20, 2012, 9:24:38 AM4/20/12
to mongod...@googlegroups.com
for example, the top 100 ordered by an indexed property. 


On Friday, April 20, 2012 1:41:57 PM UTC+1, Robert Stam wrote:
If there are 100,000 distinct values and you only want 100 of them which 100 do you want? A random 100?

There's probably something you can do with map/reduce, but how is going depend a lot to your answer to the above question.

On Fri, Apr 20, 2012 at 4:17 AM, Sam Martin <sambo...@gmail.com> wrote:
Hi Robert,  thanks for answer - I was aware of the that doc.

It's obviously not a C# driver issue, but are you aware of any alternative technique that would give me the same result?

It doesn't make sense to take only a handful of results from 100,000's of records.

If there are no additional parameters to the distinct command, then the only other way I can think of doing it is managing a distinct collection generated from a map/reduce?

any thoughts/tips on this?
On 20 April 2012 05:13, Robert Stam <rob...@10gen.com> wrote:
At a lower level distinct is a database command and not a regular query. See:


What that means is that all the distinct values are returned at once in a single document, so there is no way to use the cursor to limit the number of distinct values returned.
On Thu, Apr 19, 2012 at 7:20 PM, Sam Martin <sambo...@gmail.com> wrote:
Take this example,

var query = Query.EQ("x","123");
database.GetCollection<MyDoc>("MyCollection").Distinct("Field",query);

Distinct returns an IEnumerable<BSonValue> - does anyone know how to do run this using a cursor, so that I can take only the first x rows?
Thanks in advance,
Sam

--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To view this discussion on the web visit https://groups.google.com/d/msg/mongodb-user/-/Pyg12gUpryMJ.
To post to this group, send email to mongod...@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user+unsubscribe@googlegroups.com.

For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.

--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user+unsubscribe@googlegroups.com.

For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.

--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user+unsubscribe@googlegroups.com.

Robert Stam

unread,
Apr 20, 2012, 9:33:30 AM4/20/12
to mongod...@googlegroups.com
So for example if your documents contain { ..., ZipCode: "30040", ... }, you have an index on { ZipCode : 1 }, and you want the the first 100 distinct ZipCodes in sorted order?

Something like: "10010", "10017", "20432", etc...

Not sure that's going to be possible, but still trying to understand exactly what you are asking for.

To view this discussion on the web visit https://groups.google.com/d/msg/mongodb-user/-/D-Xuh5O11BIJ.

To post to this group, send email to mongod...@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.

Sam Martin

unread,
Apr 20, 2012, 4:01:32 PM4/20/12
to mongod...@googlegroups.com
Hi Robert,

Yes you're pretty much correct.

In this particular case I'm performing an autocomplete type function by querying a distinct list of matching values. I obviously would only need the first n records.

If its not possible to limit the rows returned from a distinct command/query on the server, the next best options are to hold the distinct list in memory in the app or to serialise the distinct values from a map/reduce into its own collection.

Any further suggestions?

(thanks for your help btw)

Sam Martin

Robert Stam

unread,
Apr 22, 2012, 9:44:42 AM4/22/12
to mongod...@googlegroups.com
I can't think of any easy way to limit the result of the distinct command to the first n values in version 2.0 or earlier of the server, but this is pretty easy to do using the aggregation framework.

Given the collection:

> db.test.find()
{ "_id" : ObjectId("4f94099937456547ba4b9458"), "x" : 1, "y" : 1 }
{ "_id" : ObjectId("4f94099d37456547ba4b9459"), "x" : 1, "y" : 2 }
{ "_id" : ObjectId("4f94099f37456547ba4b945a"), "x" : 1, "y" : 3 }
{ "_id" : ObjectId("4f9409a037456547ba4b945b"), "x" : 1, "y" : 4 }
{ "_id" : ObjectId("4f9409a437456547ba4b945c"), "x" : 1, "y" : 5 }
{ "_id" : ObjectId("4f9409ab37456547ba4b945d"), "x" : 2, "y" : 6 }
{ "_id" : ObjectId("4f9409ad37456547ba4b945e"), "x" : 2, "y" : 7 }
>

I could use the aggregation framework to find the first 3 distinct values of y where { x : 1 } using this pipeline:

> db.test.aggregate( {$match:{x:1}}, {$group:{_id:"$y"}}, {$sort:{_id:1}}, {$limit:3} )
{
        "result" : [
                {
                        "_id" : 1
                },
                {
                        "_id" : 2
                },
                {
                        "_id" : 3
                }
        ],
        "ok" : 1
}
>

You can play with the aggregation framework in the pre-release versions (I used 2.1.0 for the above test) and it will be in the 2.2 release.
Reply all
Reply to author
Forward
0 new messages