Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Help with index construction
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  8 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Daniel Galinkin  
View profile  
 More options May 8 2012, 5:58 pm
From: Daniel Galinkin <danielgalin...@gmail.com>
Date: Tue, 8 May 2012 14:58:20 -0700 (PDT)
Local: Tues, May 8 2012 5:58 pm
Subject: Help with index construction
Hello,

We have a problem with our application backed by MongoDB, we were
wondering if you could give us some hints as to how to proceed.

Our application is as follows: each document has a number of fields
that can be included in a query issued by the user. When the user
queries for the documents, he can either choose a specific value for a
field, or say he does not care what the value of this field is.

For example: suppose we have the fields "f1", "f2" and "f3". Each of
those can have 3 possible values: 1, 2 or 3. A possible query the user
can make is: find me the documents where f1=2, f2=1 and f3=2. Another
possible query is: find me the documents in which f1=2, f3=3 and the
value of f2 can be anything.

We are having some problems indexing this collection for this query.
We have many fields (10+), so making a compound index with all the
fields is too expensive, as the index gets way too big, and since we
do not use every field in the query every time, it is also inneficient
to use.

We are currently trying to implement a fulltext search-like solution,
in which we created a array field "a" in each document, that contains
"tags" in it representing the values of our fields. For example, a
document that has f1=1, f2=3 and f3=1 would have this array containg
the tags ["f1-1", "f2-3", "f3-1"]. Then, our idea was to use the $all
operator based on this. Using the previously mentioned example query:
find me the documents in which f1=2, f3=3 and the value of f2 can be
anything. This would be translated to {"a" : {$all : ["f1-2",
"f3-3"]}. We also indexed this field, following the suggestions listed
in http://www.mongodb.org/display/DOCS/Full+Text+Search+in+Mongo. When
making this query, using this index, the results were not good. Using
the explain() feature, we saw that the database is going through all
documents in which f1=2, and searches in them the ones that have f3=3.
For example, if there are 2000 documents with f1=2, and  out of those
just 2 also have f3=3, mongo scans all the 2000 documents to return
just those 2.

We came across this jira ticket (https://jira.mongodb.org/browse/
SERVER-1000) that seems to be directly related to our problem, so it
seems mongo still does not support the solution we tried to use.

Do you guys have a suggestion of how to solve this problem?
Thanks in advance.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Kyle Banker  
View profile  
 More options May 9 2012, 1:55 pm
From: Kyle Banker <k...@10gen.com>
Date: Wed, 9 May 2012 10:55:08 -0700 (PDT)
Local: Wed, May 9 2012 1:55 pm
Subject: Re: Help with index construction

First, I'd recommend commenting and voting and adding yourself as a watcher
on SERVER-1000.

Another technique to try would be to store each field along with the
document id it references in a separate collection. So, your entries would
look like this:
{_id: ObjectId, d_id: ObjectId, k: "f1", v: 1}

Next, build the following compound index:
{k: 1, v: 1, d_id: 1}

Then you can do a query like this:
db.docs.find({$all: [{k: "f1", v: 1}]}, {_id: 0, d_id: 1})

This should do an index-only scan and return a set of document ids
corresponding to your original document. I know this isn't ideal, but the
probably the next best strategy until SERVER-1000 is fixed.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Daniel Galinkin  
View profile  
 More options May 11 2012, 8:17 am
From: Daniel Galinkin <danielgalin...@gmail.com>
Date: Fri, 11 May 2012 05:17:16 -0700 (PDT)
Local: Fri, May 11 2012 8:17 am
Subject: Re: Help with index construction
Thanks for the answer!
I just voted for and began watching this ticket.

As for the suggestion, I didn't quite understand how this will enable
us to query for documents that match a set of criterias, such as f1=1
AND f2=3. Also, what's the advantage of using a separate collection
and index for this, instead of indexing the original collection?

Thanks once more for your trouble.

On May 9, 2:55 pm, Kyle Banker <k...@10gen.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Daniel Galinkin  
View profile  
 More options May 11 2012, 8:51 am
From: Daniel Galinkin <danielgalin...@gmail.com>
Date: Fri, 11 May 2012 05:51:07 -0700 (PDT)
Local: Fri, May 11 2012 8:51 am
Subject: Re: Help with index construction
Just to record it here, we have tried another (failed) approach, that
was to use a binary mapping field.

Let's use the same example. Suppose we have thre fields, "f1", "f2"
and "f3", each of them with three possible values, 1, 2 and 3.

We would then map those values to bits.
1=01
2=10
3=11

And then, we built a new field in the documents, "b", that was a
binary string representing the values of the three fields ("f1", "f2",
"f3"). The first two bits represent the "f1" value, the next two
represent the "f2" value, and the last two represent the "f3" value.

A document that has "f1"=1, "f2"=3 and "f3"=2 would have the field
"b"=011110.

Our ideia was to index this field, and then we would have a simple
index on a single field.

This would work fine if the user always queried for a specific value
in all of the three fields. However, as we said in the first post, the
user can also ask for a "anything" value for a specific field, a
"don't care" value.

So, if the user asks for documents in which f1=1,f2=1 and f3=2, we
would just make a query like db.docs.find({b:'010110'}). However, if
this user then decides that f2 can be anything, we would have to
search for documents that match a certain bitmask. This operation is
only possible with a map reduce operation, which is expensive, as we
would have to scan the whole collection.

We then tried to expand the possibilities, and query using an $in
operation. If the user queries for documents in which f1=1, f2 can be
anything and f3=2, he would actually be querying for documents in
which "b" is either 010110, 011010 or 011110. However, our actual
application has 10+ fields, and since the user can set any number of
fields to 'anything', the number of combinations would get huge, and
the query would get too expensive again.

I just listed this hoping to make this problem clearer, and to record
our attempts to tackle it.

On May 11, 9:17 am, Daniel Galinkin <danielgalin...@gmail.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Kyle Banker  
View profile  
 More options May 11 2012, 9:49 am
From: Kyle Banker <kyleban...@gmail.com>
Date: Fri, 11 May 2012 06:49:59 -0700 (PDT)
Local: Fri, May 11 2012 9:49 am
Subject: Re: Help with index construction

The advantage of the approach I gave you is that you can do an index-only
scan to resolve the query. That's pretty efficient. Have you tried it yet?


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Daniel Galinkin  
View profile  
 More options May 11 2012, 10:32 am
From: Daniel Galinkin <danielgalin...@gmail.com>
Date: Fri, 11 May 2012 07:32:59 -0700 (PDT)
Local: Fri, May 11 2012 10:32 am
Subject: Re: Help with index construction
Thanks again for the fast answer.

I still don't quite understand how this approach enables us to make
queries that have more than one field in them (f1=2 AND f3=1, for
example).

As for the index-only scan, I don't think that's possible in our case.
When formulating our problem, I tried to simplify it, and ommited some
aspects of it. One of the aspects in question is that we also limit
our results to a certain range based on another field, and we also
sort the results on this field. I believe it is not possible to
combine index-only searches with sorted queries limited by range, is
it? As said in your book MongoDB in Action we bought some time ago,
"The main thing to remember is that a compound-key index can
efficiently serve just a single range or sort per query".

On May 11, 10:49 am, Kyle Banker <kyleban...@gmail.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Kyle Banker  
View profile  
 More options May 15 2012, 4:15 pm
From: Kyle Banker <kyleban...@gmail.com>
Date: Tue, 15 May 2012 13:15:01 -0700 (PDT)
Local: Tues, May 15 2012 4:15 pm
Subject: Re: Help with index construction

You can sort and limit on a range and have it be indexOnly. See here:

MongoDB shell version: 2.0.3
connecting to: test

> db.foo.save({a: 1})
> db.foo.save({a: 2})
> db.foo.save({a: 3})
> db.foo.save({a: 4})
> db.foo.save({a: 5})
> db.foo.find({a: {$gt: 1}}, {_id: 0, a: 1}).sort({a:

-1}).limit(2).explain()
{
"cursor" : "BtreeCursor a_1 reverse",
"nscanned" : 2,
"nscannedObjects" : 2,
"n" : 2,
"millis" : 0,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : true,
"indexBounds" : {
"a" : [
[
1.7976931348623157e+308,
1
]
]

}
}
> db.foo.ensureIndex({a: 1})

The key is that the sort and range have to be on the same field. If they're
on different fields, then
this won't work.

But if you're additionally querying on these other fields (e.g., f1, f2),
then you're right: an index only query isn't possible.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Luiz Felipe Mendes  
View profile  
 More options May 31 2012, 2:00 pm
From: Luiz Felipe Mendes <lfomen...@gmail.com>
Date: Thu, 31 May 2012 11:00:13 -0700 (PDT)
Local: Thurs, May 31 2012 2:00 pm
Subject: Re: Help with index construction
Hello Kyle,

  I am a developer in the same project as Daniel Galinkin. In that
specific case that my operation uses the same field sorting and in a
range query ( a ) and querying on other fields (e.g., f1, f2) what is
the best approach?

  Let me show an example, there are 4 fields (Z,a,b,c) and we have to
get them using something like this:

     "Z": ObjectId("someID"),
     "a": {
         "$gte": ISODate("2012-04-25T03: 00: 00.0Z"),
         "$lte": ISODate("2012-05-03T02: 59: 59.0Z")
      },
       "b": {
         "$in": {
           "0": 0,
           "1": 1,
           "2": 2
        }
      },
       "c": {
         "$in": {
           "0": 2,
           "1": 0
        }
      }
    },
     "orderby": {
       "a": -1
    },

What is the best index for this case?

 { -a,Z,b,c }
 { Z,-a,b,c }
 { Z,b,c,-a }

Thanks

On 15 Maio, 17:15, Kyle Banker <kyleban...@gmail.com> wrote:

...

read more »


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »