[mongodb-user] Complex Composite Query

73 views
Skip to first unread message

Izak

unread,
May 3, 2010, 9:18:09 AM5/3/10
to mongodb-user
Hi All

I am rather new to mongo and just would like to find out how should I
construct a composite query given the following mongo object:

{
"_id":"123",
"name":"object1"
"fields": [
{
"fieldType": "GroupField",
"fields": [
{
"label":"country",
"value":"1",
},
{
"label":"region",
"value": "23",
}
]
},
....
]
}

If I have a collection of such objects, how for example do I retrieve
all the objects where the country is 1, and say region is 23. Since I
first of all need to check whether the label field is county and then
associate that value with whether the value label is 1 and then add
that with whether the region label value is region and its associated
value is 23. I'm am not certain into which direction I must look,
QueryBuilder, group or is it a simple find? The existing documentation
doesn't clearly cover this I feel.

--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.

Kyle Banker

unread,
May 3, 2010, 10:49:17 AM5/3/10
to mongod...@googlegroups.com
I think you could model your data a little more naturally like so:

{
"_id":"123",
"name":"object1"
"fields": [
{
"fieldType": "GroupField",
"country":1
"region": 23
},
....
]
}

Then you could use $elemMatch:

db.foo.find({fields: {'$elemMatch': {'country': 1, region: 23}}});

Izak

unread,
May 3, 2010, 11:45:59 AM5/3/10
to mongodb-user
Hi Kyle

Unfortunately this is how the structure is supposed to look, there is
a reason to why it is structured into groups the way it is. I don't
suspect that this structure will/should limit any operations,
following the dynamicness of mongo/JSON strings. All I really need to
know is how do I group the data even if i group them for 2 values then
group those, for example :

Mongo mongo = MongoHelper.getMongo();
DB db = mongo.getDB(settings.getMongoDb());
DBCollection col = db.getCollection("acollection");

// Section A
//this will find wheter the specific object has a country value for
label
DBObject countryQuery = new BasicDBObject("fields.fields.label",
"country");
// I now need to combine <AND> it with the next statement by some
means, which will check that the value field is 1
DBObject countryValue = new BasicDBObject("fields.fields.value", "1");

// Section B
// Then <AND> them with the following ones...
DBObject regionQuery = new BasicDBObject("fields.fields.label",
"region");
DBObject regionValue = new BasicDBObject("fields.fields.value", "23");

// Group Section A and Section B

//and then add them to a cursor by some means
DBCursor cursor = ???

On May 3, 4:49 pm, Kyle Banker <k...@10gen.com> wrote:
> I think you could model your data a little more naturally like so:
>
> {
>    "_id":"123",
>    "name":"object1"
>    "fields": [
>     {
>           "fieldType": "GroupField",
>           "country":1
>           "region": 23
>     },
>     ....
>   ]
>
> }
>
> Then you could use $elemMatch:
>
> db.foo.find({fields: {'$elemMatch': {'country': 1, region: 23}}});
>
> > For more options, visit this group athttp://groups.google.com/group/mongodb-user?hl=en.
>
> --
> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com.
> To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
> For more options, visit this group athttp://groups.google.com/group/mongodb-user?hl=en.

Guillaume Bodi

unread,
May 4, 2010, 4:18:17 AM5/4/10
to mongodb-user
Hi all,

I'm having a similar problem.
My data has a very similar model and I am trying to find a way to
query it efficiently.

Note: I am using Mongo 1.4.2 Windows x64 on Win7 Pro.

Sample document:
{
meta_data_1: 'aaa',
meta_data_2: 'bbb',
data_bag: [
{
key: 'color',
value: 'red'
},
{
key: 'shape',
value: 'square'
},
{
key: 'color',
value: 'green'
}
]
}

The point is I cannot adopt a
{
color: 'red',
shape: 'square'
}

format for the databag elements since several colors could possibly
saved in the bag.
I cannot either store the values as arrays

{
color: ['red', 'green'],
shape: ['square']
}

since the databag elements may possibly hold other data of their own.

Now, as for the querying goes, if I have only one condition, I can
run:

find({
data_bag: {$elemMatch: {key: 'shape', value: 'square'}}
});

And this works.
However, trying to find documents with both square shape and a red
color cannot seem to be done.

find({
data_bag: {$elemMatch: {key: 'shape', value: 'square'}},
data_bag: {$elemMatch: {key: 'color', value: 'red'}}
});

This does not work since we have a query object with two identical
keys (data_bag).
Thus, the client will only retain the last one (red color) and ignore
the previous.

A possible way to write this elegantly would have been

find({
data_bag: [
{$elemMatch: {key: 'shape', value: 'square'}},
{$elemMatch: {key: 'color', value: 'red'}}
]
});

Which, sadly, does not work (returns an empty result set).

A similar issue seem to have been discussed in:
http://groups.google.com/group/mongodb-user/browse_thread/thread/7067d95e669366fa/a633a40f8a488ec8?hl=en&lnk=gst&q=multiple+conditions#a633a40f8a488ec8

where dwight_10gen said in July 2009:
> probably not possible in that way. this is a limitation of the query
> objects at this point - fully generalized boolean and & or aren't
> there yet (this is very likely long term though)

> however, the is another way to do it - $where - which is a good way to
> do generalized expressions. You'll generally want one term that is
> not in the $where clause, so that the query optimizer has something to
> go by.

I was just wondering if someone found a workaround _without_ using the
$where clause or javascript function evaluation since it would impact
the performance of the application strongly (the application I am
working on is aiming for several million documents in this
collection).

Any pointers on the issue?

Kyle Banker

unread,
May 4, 2010, 10:21:35 AM5/4/10
to mongod...@googlegroups.com
I'd love to help you on this, but it's hard to see exactly what you're
trying to model. Can you be more concrete (instead of using colors and
shapes)? What type of data are you trying to represent? What kinds of
queries do you need to do on it? There may be a completely different
way of modeling the data.

My initial reaction is that data structures of this format

data_bag: [
{
key: 'color',
value: 'red'
}
]

don't make much sense in MongoDB, and I don't quite follow your
reasoning about why this

{
color: ['red', 'green'],
shape: ['square']
}

is not possible.

Guillaume Bodi

unread,
May 4, 2010, 8:09:44 PM5/4/10
to mongodb-user
Hi Kyle,

Thanks for your support.

What I am trying to do is to find a way to capture arbitrary data,
coming from different sources under a key-value scheme, organized by
user sessions.
I have no way of knowing for sure what the key nor the values will be
since it's left at the appreciation of a third party client
application.

So, my color/shape sample is just as possible as anything else.

Also, since we will likely to be interested in saving extra data for
each key-value pair, I feel like the following representation would be
impractical

data_bag: {
color: ['red', 'green'],
shape: ['square']
}

If I want to have some contextual data, let's say a timestamp, in
addition for each key value pair, I would need to have something like

data_bag: {
color: [{value: 'red', timestamp:'2010-05-04 10:25:32+0200'}, {value:
'green', timestamp:'2010-05-04 10:27:58+0200'}],
shape: [{value: 'square', timestamp:'2010-05-04 10:25:35+0200'}]
}

Now, this modelization would most likely solve part of the problem,
since I figure I can use $elemMatch to ignore the timestamp part when
querying.
However, I can't think of a way to ask for all documents matching the
following condition: (color: 'red' AND shape: 'square') OR (color:
'blue' AND shape: 'triangle').
Additionally, the reason why I initially thought of a key: 'color',
value: 'red' design is that I would like to set an index to improve
querying speed.
And with

data_bag: {
color: [{value: 'red', timestamp:'2010-05-04 10:25:32+0200'}, {value:
'green', timestamp:'2010-05-04 10:27:58+0200'}],
shape: [{value: 'square', timestamp:'2010-05-04 10:25:35+0200'}]
}

I would need to index data_bag.color, data_bag.shape and so on. Since
I don't know what the keys will be, I cannot define proper indexing.
Unless setting an index on data_bag would allow me to speed up queries
on data_bag's properties (how would an index on an array of objects be
handled?).

I also considered a different modelization where a key-value pair
would be a single document
{
session_key: 1,
key: 'color',
value: 'red',
timestamp:'2010-05-04 10:25:32+0200'
},
{
session_key:1,
key: 'shape',
value: 'square,
timestamp:'2010-05-04 10:25:35+0200'
},
{
session_key:1,
key: 'color',
value: 'green,
timestamp:'2010-05-04 10:27:58+0200'
}

By indexing {key:1, value:1}, I can get high speed querying even on
very large collections.
Now, I needed to add this session_key in order to group all the same
session information together. But I need to have session specific meta
data, let's say geographical information.
If it was a RDBMS, I would just have a separate table and perform a
join.
In a document database, I think the correct way would be to have this
data in each document (denormalizing the info). Gobbling up the disk
space, but fair enough, it's an acceptable trade-off.

But here when I want to query for all sessions with color: 'red' AND
shape: 'square', I would need to run an aggregation (map/reduce I
gather), which will inevitably deeply impact speed.

Now, this is my perception of the problem, I would be glad to share
your thoughts on this.

Thanks,

Guillaume
> >http://groups.google.com/group/mongodb-user/browse_thread/thread/7067...

Kyle Banker

unread,
May 4, 2010, 11:54:07 PM5/4/10
to mongod...@googlegroups.com
Guillaume,

I don't have a definitive answer for you at the moment, but here are a
few thoughts / responses:

- The AND query could be accomplished if you modeled your data like this:

{bucket: [{color: 'red'}, {shape: 'square'}]

You could have an index on 'bucket' in this case, but you wouldn't be
able to store the timestamp metadata since each entire object would be
indexed as a whole. But this would allow for a kind of OR query since
you could use an $in operator:

db.foo.find({bucket: {'$in': [{color: 'red}, {'shape': 'square'}]});

- I think that the one-object-per-key-value-pair approach might be the
most flexible; you can always query for sessions by geography and then
do a second query on the attributes: this wouldn't be terrible.

- I agree that the 'key' and 'value' fields are okay given that you
don't know in advance the kinds of data you'll be receiving nor the
sort of queries you'll be performing.

- If you go with the original embedded approach, you may just have to
perform two queries and merge the results client-side. Again, not a
terrible solution for the moment.

- you may want to watch this issue: http://jira.mongodb.org/browse/SERVER-205

- you may want to open an issue for $or with $elemMatch

Just some thoughts. Let us know what you decide.

Guillaume Bodi

unread,
May 5, 2010, 2:23:16 AM5/5/10
to mongodb-user
Kyle,

Thanks for your feedback.
I'm afraid we will need the timestamp metadata so your first
suggestion won't cut it.

I'll try to evaluate how well it can fare when having to handle part
of the processing at the application layer, but this will be a painful
task, given the high volume of info we will be likely to manipulate.

Thanks for the pointer on the $or operator issue.
Instead of opening an issue for a $or with $elemMatch (which would be
depending on the $or issue), I opened a new feature request for an
$and operator, which is also needed: http://jira.mongodb.org/browse/SERVER-1089

Not having an $or & $and operator turns out to be a crippling issue
for our project.

On May 5, 1:54 pm, Kyle Banker <k...@10gen.com> wrote:
> Guillaume,
>
> I don't have a definitive answer for you at the moment, but here are a
> few thoughts / responses:
>
> - The AND query could be accomplished if you modeled your data like this:
>
> {bucket: [{color: 'red'}, {shape: 'square'}]
>
> You could have an index on 'bucket' in this case, but you wouldn't be
> able to store the timestamp metadata since each entire object would be
> indexed as a whole. But this would allow for a kind of OR query since
> you could use an $in operator:
>
> db.foo.find({bucket: {'$in': [{color: 'red}, {'shape': 'square'}]});
>
> - I think that the one-object-per-key-value-pair approach might be the
> most flexible; you can always query for sessions by geography and then
> do a second query on the attributes: this wouldn't be terrible.
>
> - I agree that the 'key' and 'value' fields are okay given that you
> don't know in advance the kinds of data you'll be receiving nor the
> sort of queries you'll be performing.
>
> - If you go with the original embedded approach, you may just have to
> perform two queries and merge the results client-side. Again, not a
> terrible solution for the moment.
>
> - you may want to watch this issue:http://jira.mongodb.org/browse/SERVER-205
>
> - you may want to open an issue for $or with $elemMatch
>
> Just some thoughts. Let us know what you decide.
>
> On Tue, May 4, 2010 at 8:09 PM, Guillaume Bodi
>
> ...
>
> read more »
Reply all
Reply to author
Forward
0 new messages