Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Message from discussion Help with index construction
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Kyle Banker  
View profile  
 More options May 11 2012, 9:49 am
From: Kyle Banker <kyleban...@gmail.com>
Date: Fri, 11 May 2012 06:49:59 -0700 (PDT)
Local: Fri, May 11 2012 9:49 am
Subject: Re: Help with index construction

The advantage of the approach I gave you is that you can do an index-only
scan to resolve the query. That's pretty efficient. Have you tried it yet?

On Friday, May 11, 2012 8:51:07 AM UTC-4, Daniel Galinkin wrote:

> Just to record it here, we have tried another (failed) approach, that
> was to use a binary mapping field.

> Let's use the same example. Suppose we have thre fields, "f1", "f2"
> and "f3", each of them with three possible values, 1, 2 and 3.

> We would then map those values to bits.
> 1=01
> 2=10
> 3=11

> And then, we built a new field in the documents, "b", that was a
> binary string representing the values of the three fields ("f1", "f2",
> "f3"). The first two bits represent the "f1" value, the next two
> represent the "f2" value, and the last two represent the "f3" value.

> A document that has "f1"=1, "f2"=3 and "f3"=2 would have the field
> "b"=011110.

> Our ideia was to index this field, and then we would have a simple
> index on a single field.

> This would work fine if the user always queried for a specific value
> in all of the three fields. However, as we said in the first post, the
> user can also ask for a "anything" value for a specific field, a
> "don't care" value.

> So, if the user asks for documents in which f1=1,f2=1 and f3=2, we
> would just make a query like db.docs.find({b:'010110'}). However, if
> this user then decides that f2 can be anything, we would have to
> search for documents that match a certain bitmask. This operation is
> only possible with a map reduce operation, which is expensive, as we
> would have to scan the whole collection.

> We then tried to expand the possibilities, and query using an $in
> operation. If the user queries for documents in which f1=1, f2 can be
> anything and f3=2, he would actually be querying for documents in
> which "b" is either 010110, 011010 or 011110. However, our actual
> application has 10+ fields, and since the user can set any number of
> fields to 'anything', the number of combinations would get huge, and
> the query would get too expensive again.

> I just listed this hoping to make this problem clearer, and to record
> our attempts to tackle it.

> On May 11, 9:17 am, Daniel Galinkin <danielgalin...@gmail.com> wrote:
> > Thanks for the answer!
> > I just voted for and began watching this ticket.

> > As for the suggestion, I didn't quite understand how this will enable
> > us to query for documents that match a set of criterias, such as f1=1
> > AND f2=3. Also, what's the advantage of using a separate collection
> > and index for this, instead of indexing the original collection?

> > Thanks once more for your trouble.

> > On May 9, 2:55 pm, Kyle Banker <k...@10gen.com> wrote:

> > > First, I'd recommend commenting and voting and adding yourself as a
> watcher
> > > on SERVER-1000.

> > > Another technique to try would be to store each field along with the
> > > document id it references in a separate collection. So, your entries
> would
> > > look like this:
> > > {_id: ObjectId, d_id: ObjectId, k: "f1", v: 1}

> > > Next, build the following compound index:
> > > {k: 1, v: 1, d_id: 1}

> > > Then you can do a query like this:
> > > db.docs.find({$all: [{k: "f1", v: 1}]}, {_id: 0, d_id: 1})

> > > This should do an index-only scan and return a set of document ids
> > > corresponding to your original document. I know this isn't ideal, but
> the
> > > probably the next best strategy until SERVER-1000 is fixed.

> > > On Tuesday, May 8, 2012 5:58:20 PM UTC-4, Daniel Galinkin wrote:

> > > > Hello,

> > > > We have a problem with our application backed by MongoDB, we were
> > > > wondering if you could give us some hints as to how to proceed.

> > > > Our application is as follows: each document has a number of fields
> > > > that can be included in a query issued by the user. When the user
> > > > queries for the documents, he can either choose a specific value for
> a
> > > > field, or say he does not care what the value of this field is.

> > > > For example: suppose we have the fields "f1", "f2" and "f3". Each of
> > > > those can have 3 possible values: 1, 2 or 3. A possible query the
> user
> > > > can make is: find me the documents where f1=2, f2=1 and f3=2.
> Another
> > > > possible query is: find me the documents in which f1=2, f3=3 and the
> > > > value of f2 can be anything.

> > > > We are having some problems indexing this collection for this query.
> > > > We have many fields (10+), so making a compound index with all the
> > > > fields is too expensive, as the index gets way too big, and since we
> > > > do not use every field in the query every time, it is also
> inneficient
> > > > to use.

> > > > We are currently trying to implement a fulltext search-like
> solution,
> > > > in which we created a array field "a" in each document, that
> contains
> > > > "tags" in it representing the values of our fields. For example, a
> > > > document that has f1=1, f2=3 and f3=1 would have this array containg
> > > > the tags ["f1-1", "f2-3", "f3-1"]. Then, our idea was to use the
> $all
> > > > operator based on this. Using the previously mentioned example
> query:
> > > > find me the documents in which f1=2, f3=3 and the value of f2 can be
> > > > anything. This would be translated to {"a" : {$all : ["f1-2",
> > > > "f3-3"]}. We also indexed this field, following the suggestions
> listed
> > > > inhttp://www.mongodb.org/display/DOCS/Full+Text+Search+in+Mongo.
> When
> > > > making this query, using this index, the results were not good.
> Using
> > > > the explain() feature, we saw that the database is going through all
> > > > documents in which f1=2, and searches in them the ones that have
> f3=3.
> > > > For example, if there are 2000 documents with f1=2, and  out of
> those
> > > > just 2 also have f3=3, mongo scans all the 2000 documents to return
> > > > just those 2.

> > > > We came across this jira ticket (https://jira.mongodb.org/browse/
> > > > SERVER-1000 <https://jira.mongodb.org/browse/SERVER-1000>) that
> seems to
> > > > be directly related to our problem, so it
> > > > seems mongo still does not support the solution we tried to use.

> > > > Do you guys have a suggestion of how to solve this problem?
> > > > Thanks in advance.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.