Help with index construction

61 views
Skip to first unread message

Daniel Galinkin

unread,
May 8, 2012, 5:58:20 PM5/8/12
to mongodb-user
Hello,

We have a problem with our application backed by MongoDB, we were
wondering if you could give us some hints as to how to proceed.

Our application is as follows: each document has a number of fields
that can be included in a query issued by the user. When the user
queries for the documents, he can either choose a specific value for a
field, or say he does not care what the value of this field is.

For example: suppose we have the fields "f1", "f2" and "f3". Each of
those can have 3 possible values: 1, 2 or 3. A possible query the user
can make is: find me the documents where f1=2, f2=1 and f3=2. Another
possible query is: find me the documents in which f1=2, f3=3 and the
value of f2 can be anything.

We are having some problems indexing this collection for this query.
We have many fields (10+), so making a compound index with all the
fields is too expensive, as the index gets way too big, and since we
do not use every field in the query every time, it is also inneficient
to use.

We are currently trying to implement a fulltext search-like solution,
in which we created a array field "a" in each document, that contains
"tags" in it representing the values of our fields. For example, a
document that has f1=1, f2=3 and f3=1 would have this array containg
the tags ["f1-1", "f2-3", "f3-1"]. Then, our idea was to use the $all
operator based on this. Using the previously mentioned example query:
find me the documents in which f1=2, f3=3 and the value of f2 can be
anything. This would be translated to {"a" : {$all : ["f1-2",
"f3-3"]}. We also indexed this field, following the suggestions listed
in http://www.mongodb.org/display/DOCS/Full+Text+Search+in+Mongo. When
making this query, using this index, the results were not good. Using
the explain() feature, we saw that the database is going through all
documents in which f1=2, and searches in them the ones that have f3=3.
For example, if there are 2000 documents with f1=2, and out of those
just 2 also have f3=3, mongo scans all the 2000 documents to return
just those 2.

We came across this jira ticket (https://jira.mongodb.org/browse/
SERVER-1000) that seems to be directly related to our problem, so it
seems mongo still does not support the solution we tried to use.

Do you guys have a suggestion of how to solve this problem?
Thanks in advance.

Kyle Banker

unread,
May 9, 2012, 1:55:08 PM5/9/12
to mongod...@googlegroups.com
First, I'd recommend commenting and voting and adding yourself as a watcher on SERVER-1000.

Another technique to try would be to store each field along with the document id it references in a separate collection. So, your entries would look like this:
{_id: ObjectId, d_id: ObjectId, k: "f1", v: 1}

Next, build the following compound index:
{k: 1, v: 1, d_id: 1}

Then you can do a query like this:
db.docs.find({$all: [{k: "f1", v: 1}]}, {_id: 0, d_id: 1})

This should do an index-only scan and return a set of document ids corresponding to your original document. I know this isn't ideal, but the probably the next best strategy until SERVER-1000 is fixed.

Daniel Galinkin

unread,
May 11, 2012, 8:17:16 AM5/11/12
to mongodb-user
Thanks for the answer!
I just voted for and began watching this ticket.

As for the suggestion, I didn't quite understand how this will enable
us to query for documents that match a set of criterias, such as f1=1
AND f2=3. Also, what's the advantage of using a separate collection
and index for this, instead of indexing the original collection?

Thanks once more for your trouble.
> > inhttp://www.mongodb.org/display/DOCS/Full+Text+Search+in+Mongo. When
> > making this query, using this index, the results were not good. Using
> > the explain() feature, we saw that the database is going through all
> > documents in which f1=2, and searches in them the ones that have f3=3.
> > For example, if there are 2000 documents with f1=2, and  out of those
> > just 2 also have f3=3, mongo scans all the 2000 documents to return
> > just those 2.
>
> > We came across this jira ticket (https://jira.mongodb.org/browse/
> > SERVER-1000 <https://jira.mongodb.org/browse/SERVER-1000>) that seems to

Daniel Galinkin

unread,
May 11, 2012, 8:51:07 AM5/11/12
to mongodb-user
Just to record it here, we have tried another (failed) approach, that
was to use a binary mapping field.

Let's use the same example. Suppose we have thre fields, "f1", "f2"
and "f3", each of them with three possible values, 1, 2 and 3.

We would then map those values to bits.
1=01
2=10
3=11

And then, we built a new field in the documents, "b", that was a
binary string representing the values of the three fields ("f1", "f2",
"f3"). The first two bits represent the "f1" value, the next two
represent the "f2" value, and the last two represent the "f3" value.

A document that has "f1"=1, "f2"=3 and "f3"=2 would have the field
"b"=011110.

Our ideia was to index this field, and then we would have a simple
index on a single field.

This would work fine if the user always queried for a specific value
in all of the three fields. However, as we said in the first post, the
user can also ask for a "anything" value for a specific field, a
"don't care" value.

So, if the user asks for documents in which f1=1,f2=1 and f3=2, we
would just make a query like db.docs.find({b:'010110'}). However, if
this user then decides that f2 can be anything, we would have to
search for documents that match a certain bitmask. This operation is
only possible with a map reduce operation, which is expensive, as we
would have to scan the whole collection.

We then tried to expand the possibilities, and query using an $in
operation. If the user queries for documents in which f1=1, f2 can be
anything and f3=2, he would actually be querying for documents in
which "b" is either 010110, 011010 or 011110. However, our actual
application has 10+ fields, and since the user can set any number of
fields to 'anything', the number of combinations would get huge, and
the query would get too expensive again.

I just listed this hoping to make this problem clearer, and to record
our attempts to tackle it.

Kyle Banker

unread,
May 11, 2012, 9:49:59 AM5/11/12
to mongod...@googlegroups.com
The advantage of the approach I gave you is that you can do an index-only scan to resolve the query. That's pretty efficient. Have you tried it yet?

Daniel Galinkin

unread,
May 11, 2012, 10:32:59 AM5/11/12
to mongodb-user
Thanks again for the fast answer.

I still don't quite understand how this approach enables us to make
queries that have more than one field in them (f1=2 AND f3=1, for
example).

As for the index-only scan, I don't think that's possible in our case.
When formulating our problem, I tried to simplify it, and ommited some
aspects of it. One of the aspects in question is that we also limit
our results to a certain range based on another field, and we also
sort the results on this field. I believe it is not possible to
combine index-only searches with sorted queries limited by range, is
it? As said in your book MongoDB in Action we bought some time ago,
"The main thing to remember is that a compound-key index can
efficiently serve just a single range or sort per query".

Kyle Banker

unread,
May 15, 2012, 4:15:01 PM5/15/12
to mongod...@googlegroups.com
You can sort and limit on a range and have it be indexOnly. See here:

MongoDB shell version: 2.0.3
connecting to: test
> db.foo.save({a: 1})
> db.foo.save({a: 2})
> db.foo.save({a: 3})
> db.foo.save({a: 4})
> db.foo.save({a: 5})

> db.foo.find({a: {$gt: 1}}, {_id: 0, a: 1}).sort({a: -1}).limit(2).explain()
{
"cursor" : "BtreeCursor a_1 reverse",
"nscanned" : 2,
"nscannedObjects" : 2,
"n" : 2,
"millis" : 0,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : true,
"indexBounds" : {
"a" : [
[
1.7976931348623157e+308,
1
]
]
}
}

> db.foo.ensureIndex({a: 1})

The key is that the sort and range have to be on the same field. If they're on different fields, then
this won't work.

But if you're additionally querying on these other fields (e.g., f1, f2), then you're right: an index only query isn't possible.

Luiz Felipe Mendes

unread,
May 31, 2012, 2:00:13 PM5/31/12
to mongodb-user
Hello Kyle,

I am a developer in the same project as Daniel Galinkin. In that
specific case that my operation uses the same field sorting and in a
range query ( a ) and querying on other fields (e.g., f1, f2) what is
the best approach?

Let me show an example, there are 4 fields (Z,a,b,c) and we have to
get them using something like this:

"Z": ObjectId("someID"),
"a": {
"$gte": ISODate("2012-04-25T03: 00: 00.0Z"),
"$lte": ISODate("2012-05-03T02: 59: 59.0Z")
},
"b": {
"$in": {
"0": 0,
"1": 1,
"2": 2
}
},
"c": {
"$in": {
"0": 2,
"1": 0
}
}
},
"orderby": {
"a": -1
},

What is the best index for this case?

{ -a,Z,b,c }
{ Z,-a,b,c }
{ Z,b,c,-a }

Thanks
Reply all
Reply to author
Forward
0 new messages