We have a problem with our application backed by MongoDB, we were
wondering if you could give us some hints as to how to proceed.
Our application is as follows: each document has a number of fields
that can be included in a query issued by the user. When the user
queries for the documents, he can either choose a specific value for a
field, or say he does not care what the value of this field is.
For example: suppose we have the fields "f1", "f2" and "f3". Each of
those can have 3 possible values: 1, 2 or 3. A possible query the user
can make is: find me the documents where f1=2, f2=1 and f3=2. Another
possible query is: find me the documents in which f1=2, f3=3 and the
value of f2 can be anything.
We are having some problems indexing this collection for this query.
We have many fields (10+), so making a compound index with all the
fields is too expensive, as the index gets way too big, and since we
do not use every field in the query every time, it is also inneficient
to use.
We are currently trying to implement a fulltext search-like solution,
in which we created a array field "a" in each document, that contains
"tags" in it representing the values of our fields. For example, a
document that has f1=1, f2=3 and f3=1 would have this array containg
the tags ["f1-1", "f2-3", "f3-1"]. Then, our idea was to use the $all
operator based on this. Using the previously mentioned example query:
find me the documents in which f1=2, f3=3 and the value of f2 can be
anything. This would be translated to {"a" : {$all : ["f1-2",
"f3-3"]}. We also indexed this field, following the suggestions listed
in http://www.mongodb.org/display/DOCS/Full+Text+Search+in+Mongo. When
making this query, using this index, the results were not good. Using
the explain() feature, we saw that the database is going through all
documents in which f1=2, and searches in them the ones that have f3=3.
For example, if there are 2000 documents with f1=2, and out of those
just 2 also have f3=3, mongo scans all the 2000 documents to return
just those 2.
We came across this jira ticket (https://jira.mongodb.org/browse/ SERVER-1000) that seems to be directly related to our problem, so it
seems mongo still does not support the solution we tried to use.
Do you guys have a suggestion of how to solve this problem?
Thanks in advance.
First, I'd recommend commenting and voting and adding yourself as a watcher on SERVER-1000.
Another technique to try would be to store each field along with the document id it references in a separate collection. So, your entries would look like this: {_id: ObjectId, d_id: ObjectId, k: "f1", v: 1}
Next, build the following compound index: {k: 1, v: 1, d_id: 1}
Then you can do a query like this: db.docs.find({$all: [{k: "f1", v: 1}]}, {_id: 0, d_id: 1})
This should do an index-only scan and return a set of document ids corresponding to your original document. I know this isn't ideal, but the probably the next best strategy until SERVER-1000 is fixed.
On Tuesday, May 8, 2012 5:58:20 PM UTC-4, Daniel Galinkin wrote:
> Hello,
> We have a problem with our application backed by MongoDB, we were > wondering if you could give us some hints as to how to proceed.
> Our application is as follows: each document has a number of fields > that can be included in a query issued by the user. When the user > queries for the documents, he can either choose a specific value for a > field, or say he does not care what the value of this field is.
> For example: suppose we have the fields "f1", "f2" and "f3". Each of > those can have 3 possible values: 1, 2 or 3. A possible query the user > can make is: find me the documents where f1=2, f2=1 and f3=2. Another > possible query is: find me the documents in which f1=2, f3=3 and the > value of f2 can be anything.
> We are having some problems indexing this collection for this query. > We have many fields (10+), so making a compound index with all the > fields is too expensive, as the index gets way too big, and since we > do not use every field in the query every time, it is also inneficient > to use.
> We are currently trying to implement a fulltext search-like solution, > in which we created a array field "a" in each document, that contains > "tags" in it representing the values of our fields. For example, a > document that has f1=1, f2=3 and f3=1 would have this array containg > the tags ["f1-1", "f2-3", "f3-1"]. Then, our idea was to use the $all > operator based on this. Using the previously mentioned example query: > find me the documents in which f1=2, f3=3 and the value of f2 can be > anything. This would be translated to {"a" : {$all : ["f1-2", > "f3-3"]}. We also indexed this field, following the suggestions listed > in http://www.mongodb.org/display/DOCS/Full+Text+Search+in+Mongo. When > making this query, using this index, the results were not good. Using > the explain() feature, we saw that the database is going through all > documents in which f1=2, and searches in them the ones that have f3=3. > For example, if there are 2000 documents with f1=2, and out of those > just 2 also have f3=3, mongo scans all the 2000 documents to return > just those 2.
Thanks for the answer!
I just voted for and began watching this ticket.
As for the suggestion, I didn't quite understand how this will enable
us to query for documents that match a set of criterias, such as f1=1
AND f2=3. Also, what's the advantage of using a separate collection
and index for this, instead of indexing the original collection?
Thanks once more for your trouble.
On May 9, 2:55 pm, Kyle Banker <k...@10gen.com> wrote:
> First, I'd recommend commenting and voting and adding yourself as a watcher
> on SERVER-1000.
> Another technique to try would be to store each field along with the
> document id it references in a separate collection. So, your entries would
> look like this:
> {_id: ObjectId, d_id: ObjectId, k: "f1", v: 1}
> Next, build the following compound index:
> {k: 1, v: 1, d_id: 1}
> Then you can do a query like this:
> db.docs.find({$all: [{k: "f1", v: 1}]}, {_id: 0, d_id: 1})
> This should do an index-only scan and return a set of document ids
> corresponding to your original document. I know this isn't ideal, but the
> probably the next best strategy until SERVER-1000 is fixed.
> On Tuesday, May 8, 2012 5:58:20 PM UTC-4, Daniel Galinkin wrote:
> > Hello,
> > We have a problem with our application backed by MongoDB, we were
> > wondering if you could give us some hints as to how to proceed.
> > Our application is as follows: each document has a number of fields
> > that can be included in a query issued by the user. When the user
> > queries for the documents, he can either choose a specific value for a
> > field, or say he does not care what the value of this field is.
> > For example: suppose we have the fields "f1", "f2" and "f3". Each of
> > those can have 3 possible values: 1, 2 or 3. A possible query the user
> > can make is: find me the documents where f1=2, f2=1 and f3=2. Another
> > possible query is: find me the documents in which f1=2, f3=3 and the
> > value of f2 can be anything.
> > We are having some problems indexing this collection for this query.
> > We have many fields (10+), so making a compound index with all the
> > fields is too expensive, as the index gets way too big, and since we
> > do not use every field in the query every time, it is also inneficient
> > to use.
> > We are currently trying to implement a fulltext search-like solution,
> > in which we created a array field "a" in each document, that contains
> > "tags" in it representing the values of our fields. For example, a
> > document that has f1=1, f2=3 and f3=1 would have this array containg
> > the tags ["f1-1", "f2-3", "f3-1"]. Then, our idea was to use the $all
> > operator based on this. Using the previously mentioned example query:
> > find me the documents in which f1=2, f3=3 and the value of f2 can be
> > anything. This would be translated to {"a" : {$all : ["f1-2",
> > "f3-3"]}. We also indexed this field, following the suggestions listed
> > inhttp://www.mongodb.org/display/DOCS/Full+Text+Search+in+Mongo. When
> > making this query, using this index, the results were not good. Using
> > the explain() feature, we saw that the database is going through all
> > documents in which f1=2, and searches in them the ones that have f3=3.
> > For example, if there are 2000 documents with f1=2, and out of those
> > just 2 also have f3=3, mongo scans all the 2000 documents to return
> > just those 2.
Just to record it here, we have tried another (failed) approach, that
was to use a binary mapping field.
Let's use the same example. Suppose we have thre fields, "f1", "f2"
and "f3", each of them with three possible values, 1, 2 and 3.
We would then map those values to bits.
1=01
2=10
3=11
And then, we built a new field in the documents, "b", that was a
binary string representing the values of the three fields ("f1", "f2",
"f3"). The first two bits represent the "f1" value, the next two
represent the "f2" value, and the last two represent the "f3" value.
A document that has "f1"=1, "f2"=3 and "f3"=2 would have the field
"b"=011110.
Our ideia was to index this field, and then we would have a simple
index on a single field.
This would work fine if the user always queried for a specific value
in all of the three fields. However, as we said in the first post, the
user can also ask for a "anything" value for a specific field, a
"don't care" value.
So, if the user asks for documents in which f1=1,f2=1 and f3=2, we
would just make a query like db.docs.find({b:'010110'}). However, if
this user then decides that f2 can be anything, we would have to
search for documents that match a certain bitmask. This operation is
only possible with a map reduce operation, which is expensive, as we
would have to scan the whole collection.
We then tried to expand the possibilities, and query using an $in
operation. If the user queries for documents in which f1=1, f2 can be
anything and f3=2, he would actually be querying for documents in
which "b" is either 010110, 011010 or 011110. However, our actual
application has 10+ fields, and since the user can set any number of
fields to 'anything', the number of combinations would get huge, and
the query would get too expensive again.
I just listed this hoping to make this problem clearer, and to record
our attempts to tackle it.
On May 11, 9:17 am, Daniel Galinkin <danielgalin...@gmail.com> wrote:
> Thanks for the answer!
> I just voted for and began watching this ticket.
> As for the suggestion, I didn't quite understand how this will enable
> us to query for documents that match a set of criterias, such as f1=1
> AND f2=3. Also, what's the advantage of using a separate collection
> and index for this, instead of indexing the original collection?
> Thanks once more for your trouble.
> On May 9, 2:55 pm, Kyle Banker <k...@10gen.com> wrote:
> > First, I'd recommend commenting and voting and adding yourself as a watcher
> > on SERVER-1000.
> > Another technique to try would be to store each field along with the
> > document id it references in a separate collection. So, your entries would
> > look like this:
> > {_id: ObjectId, d_id: ObjectId, k: "f1", v: 1}
> > Next, build the following compound index:
> > {k: 1, v: 1, d_id: 1}
> > Then you can do a query like this:
> > db.docs.find({$all: [{k: "f1", v: 1}]}, {_id: 0, d_id: 1})
> > This should do an index-only scan and return a set of document ids
> > corresponding to your original document. I know this isn't ideal, but the
> > probably the next best strategy until SERVER-1000 is fixed.
> > On Tuesday, May 8, 2012 5:58:20 PM UTC-4, Daniel Galinkin wrote:
> > > Hello,
> > > We have a problem with our application backed by MongoDB, we were
> > > wondering if you could give us some hints as to how to proceed.
> > > Our application is as follows: each document has a number of fields
> > > that can be included in a query issued by the user. When the user
> > > queries for the documents, he can either choose a specific value for a
> > > field, or say he does not care what the value of this field is.
> > > For example: suppose we have the fields "f1", "f2" and "f3". Each of
> > > those can have 3 possible values: 1, 2 or 3. A possible query the user
> > > can make is: find me the documents where f1=2, f2=1 and f3=2. Another
> > > possible query is: find me the documents in which f1=2, f3=3 and the
> > > value of f2 can be anything.
> > > We are having some problems indexing this collection for this query.
> > > We have many fields (10+), so making a compound index with all the
> > > fields is too expensive, as the index gets way too big, and since we
> > > do not use every field in the query every time, it is also inneficient
> > > to use.
> > > We are currently trying to implement a fulltext search-like solution,
> > > in which we created a array field "a" in each document, that contains
> > > "tags" in it representing the values of our fields. For example, a
> > > document that has f1=1, f2=3 and f3=1 would have this array containg
> > > the tags ["f1-1", "f2-3", "f3-1"]. Then, our idea was to use the $all
> > > operator based on this. Using the previously mentioned example query:
> > > find me the documents in which f1=2, f3=3 and the value of f2 can be
> > > anything. This would be translated to {"a" : {$all : ["f1-2",
> > > "f3-3"]}. We also indexed this field, following the suggestions listed
> > > inhttp://www.mongodb.org/display/DOCS/Full+Text+Search+in+Mongo. When
> > > making this query, using this index, the results were not good. Using
> > > the explain() feature, we saw that the database is going through all
> > > documents in which f1=2, and searches in them the ones that have f3=3.
> > > For example, if there are 2000 documents with f1=2, and out of those
> > > just 2 also have f3=3, mongo scans all the 2000 documents to return
> > > just those 2.
On Friday, May 11, 2012 8:51:07 AM UTC-4, Daniel Galinkin wrote:
> Just to record it here, we have tried another (failed) approach, that > was to use a binary mapping field.
> Let's use the same example. Suppose we have thre fields, "f1", "f2" > and "f3", each of them with three possible values, 1, 2 and 3.
> We would then map those values to bits. > 1=01 > 2=10 > 3=11
> And then, we built a new field in the documents, "b", that was a > binary string representing the values of the three fields ("f1", "f2", > "f3"). The first two bits represent the "f1" value, the next two > represent the "f2" value, and the last two represent the "f3" value.
> A document that has "f1"=1, "f2"=3 and "f3"=2 would have the field > "b"=011110.
> Our ideia was to index this field, and then we would have a simple > index on a single field.
> This would work fine if the user always queried for a specific value > in all of the three fields. However, as we said in the first post, the > user can also ask for a "anything" value for a specific field, a > "don't care" value.
> So, if the user asks for documents in which f1=1,f2=1 and f3=2, we > would just make a query like db.docs.find({b:'010110'}). However, if > this user then decides that f2 can be anything, we would have to > search for documents that match a certain bitmask. This operation is > only possible with a map reduce operation, which is expensive, as we > would have to scan the whole collection.
> We then tried to expand the possibilities, and query using an $in > operation. If the user queries for documents in which f1=1, f2 can be > anything and f3=2, he would actually be querying for documents in > which "b" is either 010110, 011010 or 011110. However, our actual > application has 10+ fields, and since the user can set any number of > fields to 'anything', the number of combinations would get huge, and > the query would get too expensive again.
> I just listed this hoping to make this problem clearer, and to record > our attempts to tackle it.
> On May 11, 9:17 am, Daniel Galinkin <danielgalin...@gmail.com> wrote: > > Thanks for the answer! > > I just voted for and began watching this ticket.
> > As for the suggestion, I didn't quite understand how this will enable > > us to query for documents that match a set of criterias, such as f1=1 > > AND f2=3. Also, what's the advantage of using a separate collection > > and index for this, instead of indexing the original collection?
> > Thanks once more for your trouble.
> > On May 9, 2:55 pm, Kyle Banker <k...@10gen.com> wrote:
> > > First, I'd recommend commenting and voting and adding yourself as a > watcher > > > on SERVER-1000.
> > > Another technique to try would be to store each field along with the > > > document id it references in a separate collection. So, your entries > would > > > look like this: > > > {_id: ObjectId, d_id: ObjectId, k: "f1", v: 1}
> > > Next, build the following compound index: > > > {k: 1, v: 1, d_id: 1}
> > > Then you can do a query like this: > > > db.docs.find({$all: [{k: "f1", v: 1}]}, {_id: 0, d_id: 1})
> > > This should do an index-only scan and return a set of document ids > > > corresponding to your original document. I know this isn't ideal, but > the > > > probably the next best strategy until SERVER-1000 is fixed.
> > > On Tuesday, May 8, 2012 5:58:20 PM UTC-4, Daniel Galinkin wrote:
> > > > Hello,
> > > > We have a problem with our application backed by MongoDB, we were > > > > wondering if you could give us some hints as to how to proceed.
> > > > Our application is as follows: each document has a number of fields > > > > that can be included in a query issued by the user. When the user > > > > queries for the documents, he can either choose a specific value for > a > > > > field, or say he does not care what the value of this field is.
> > > > For example: suppose we have the fields "f1", "f2" and "f3". Each of > > > > those can have 3 possible values: 1, 2 or 3. A possible query the > user > > > > can make is: find me the documents where f1=2, f2=1 and f3=2. > Another > > > > possible query is: find me the documents in which f1=2, f3=3 and the > > > > value of f2 can be anything.
> > > > We are having some problems indexing this collection for this query. > > > > We have many fields (10+), so making a compound index with all the > > > > fields is too expensive, as the index gets way too big, and since we > > > > do not use every field in the query every time, it is also > inneficient > > > > to use.
> > > > We are currently trying to implement a fulltext search-like > solution, > > > > in which we created a array field "a" in each document, that > contains > > > > "tags" in it representing the values of our fields. For example, a > > > > document that has f1=1, f2=3 and f3=1 would have this array containg > > > > the tags ["f1-1", "f2-3", "f3-1"]. Then, our idea was to use the > $all > > > > operator based on this. Using the previously mentioned example > query: > > > > find me the documents in which f1=2, f3=3 and the value of f2 can be > > > > anything. This would be translated to {"a" : {$all : ["f1-2", > > > > "f3-3"]}. We also indexed this field, following the suggestions > listed > > > > inhttp://www.mongodb.org/display/DOCS/Full+Text+Search+in+Mongo. > When > > > > making this query, using this index, the results were not good. > Using > > > > the explain() feature, we saw that the database is going through all > > > > documents in which f1=2, and searches in them the ones that have > f3=3. > > > > For example, if there are 2000 documents with f1=2, and out of > those > > > > just 2 also have f3=3, mongo scans all the 2000 documents to return > > > > just those 2.
I still don't quite understand how this approach enables us to make
queries that have more than one field in them (f1=2 AND f3=1, for
example).
As for the index-only scan, I don't think that's possible in our case.
When formulating our problem, I tried to simplify it, and ommited some
aspects of it. One of the aspects in question is that we also limit
our results to a certain range based on another field, and we also
sort the results on this field. I believe it is not possible to
combine index-only searches with sorted queries limited by range, is
it? As said in your book MongoDB in Action we bought some time ago,
"The main thing to remember is that a compound-key index can
efficiently serve just a single range or sort per query".
On May 11, 10:49 am, Kyle Banker <kyleban...@gmail.com> wrote:
> The advantage of the approach I gave you is that you can do an index-only
> scan to resolve the query. That's pretty efficient. Have you tried it yet?
> On Friday, May 11, 2012 8:51:07 AM UTC-4, Daniel Galinkin wrote:
> > Just to record it here, we have tried another (failed) approach, that
> > was to use a binary mapping field.
> > Let's use the same example. Suppose we have thre fields, "f1", "f2"
> > and "f3", each of them with three possible values, 1, 2 and 3.
> > We would then map those values to bits.
> > 1=01
> > 2=10
> > 3=11
> > And then, we built a new field in the documents, "b", that was a
> > binary string representing the values of the three fields ("f1", "f2",
> > "f3"). The first two bits represent the "f1" value, the next two
> > represent the "f2" value, and the last two represent the "f3" value.
> > A document that has "f1"=1, "f2"=3 and "f3"=2 would have the field
> > "b"=011110.
> > Our ideia was to index this field, and then we would have a simple
> > index on a single field.
> > This would work fine if the user always queried for a specific value
> > in all of the three fields. However, as we said in the first post, the
> > user can also ask for a "anything" value for a specific field, a
> > "don't care" value.
> > So, if the user asks for documents in which f1=1,f2=1 and f3=2, we
> > would just make a query like db.docs.find({b:'010110'}). However, if
> > this user then decides that f2 can be anything, we would have to
> > search for documents that match a certain bitmask. This operation is
> > only possible with a map reduce operation, which is expensive, as we
> > would have to scan the whole collection.
> > We then tried to expand the possibilities, and query using an $in
> > operation. If the user queries for documents in which f1=1, f2 can be
> > anything and f3=2, he would actually be querying for documents in
> > which "b" is either 010110, 011010 or 011110. However, our actual
> > application has 10+ fields, and since the user can set any number of
> > fields to 'anything', the number of combinations would get huge, and
> > the query would get too expensive again.
> > I just listed this hoping to make this problem clearer, and to record
> > our attempts to tackle it.
> > On May 11, 9:17 am, Daniel Galinkin <danielgalin...@gmail.com> wrote:
> > > Thanks for the answer!
> > > I just voted for and began watching this ticket.
> > > As for the suggestion, I didn't quite understand how this will enable
> > > us to query for documents that match a set of criterias, such as f1=1
> > > AND f2=3. Also, what's the advantage of using a separate collection
> > > and index for this, instead of indexing the original collection?
> > > Thanks once more for your trouble.
> > > On May 9, 2:55 pm, Kyle Banker <k...@10gen.com> wrote:
> > > > First, I'd recommend commenting and voting and adding yourself as a
> > watcher
> > > > on SERVER-1000.
> > > > Another technique to try would be to store each field along with the
> > > > document id it references in a separate collection. So, your entries
> > would
> > > > look like this:
> > > > {_id: ObjectId, d_id: ObjectId, k: "f1", v: 1}
> > > > Next, build the following compound index:
> > > > {k: 1, v: 1, d_id: 1}
> > > > Then you can do a query like this:
> > > > db.docs.find({$all: [{k: "f1", v: 1}]}, {_id: 0, d_id: 1})
> > > > This should do an index-only scan and return a set of document ids
> > > > corresponding to your original document. I know this isn't ideal, but
> > the
> > > > probably the next best strategy until SERVER-1000 is fixed.
> > > > On Tuesday, May 8, 2012 5:58:20 PM UTC-4, Daniel Galinkin wrote:
> > > > > Hello,
> > > > > We have a problem with our application backed by MongoDB, we were
> > > > > wondering if you could give us some hints as to how to proceed.
> > > > > Our application is as follows: each document has a number of fields
> > > > > that can be included in a query issued by the user. When the user
> > > > > queries for the documents, he can either choose a specific value for
> > a
> > > > > field, or say he does not care what the value of this field is.
> > > > > For example: suppose we have the fields "f1", "f2" and "f3". Each of
> > > > > those can have 3 possible values: 1, 2 or 3. A possible query the
> > user
> > > > > can make is: find me the documents where f1=2, f2=1 and f3=2.
> > Another
> > > > > possible query is: find me the documents in which f1=2, f3=3 and the
> > > > > value of f2 can be anything.
> > > > > We are having some problems indexing this collection for this query.
> > > > > We have many fields (10+), so making a compound index with all the
> > > > > fields is too expensive, as the index gets way too big, and since we
> > > > > do not use every field in the query every time, it is also
> > inneficient
> > > > > to use.
> > > > > We are currently trying to implement a fulltext search-like
> > solution,
> > > > > in which we created a array field "a" in each document, that
> > contains
> > > > > "tags" in it representing the values of our fields. For example, a
> > > > > document that has f1=1, f2=3 and f3=1 would have this array containg
> > > > > the tags ["f1-1", "f2-3", "f3-1"]. Then, our idea was to use the
> > $all
> > > > > operator based on this. Using the previously mentioned example
> > query:
> > > > > find me the documents in which f1=2, f3=3 and the value of f2 can be
> > > > > anything. This would be translated to {"a" : {$all : ["f1-2",
> > > > > "f3-3"]}. We also indexed this field, following the suggestions
> > listed
> > > > > inhttp://www.mongodb.org/display/DOCS/Full+Text+Search+in+Mongo.
> > When
> > > > > making this query, using this index, the results were not good.
> > Using
> > > > > the explain() feature, we saw that the database is going through all
> > > > > documents in which f1=2, and searches in them the ones that have
> > f3=3.
> > > > > For example, if there are 2000 documents with f1=2, and out of
> > those
> > > > > just 2 also have f3=3, mongo scans all the 2000 documents to return
> > > > > just those 2.
On Friday, May 11, 2012 10:32:59 AM UTC-4, Daniel Galinkin wrote:
> Thanks again for the fast answer.
> I still don't quite understand how this approach enables us to make > queries that have more than one field in them (f1=2 AND f3=1, for > example).
> As for the index-only scan, I don't think that's possible in our case. > When formulating our problem, I tried to simplify it, and ommited some > aspects of it. One of the aspects in question is that we also limit > our results to a certain range based on another field, and we also > sort the results on this field. I believe it is not possible to > combine index-only searches with sorted queries limited by range, is > it? As said in your book MongoDB in Action we bought some time ago, > "The main thing to remember is that a compound-key index can > efficiently serve just a single range or sort per query".
> On May 11, 10:49 am, Kyle Banker <kyleban...@gmail.com> wrote: > > The advantage of the approach I gave you is that you can do an > index-only > > scan to resolve the query. That's pretty efficient. Have you tried it > yet?
> > On Friday, May 11, 2012 8:51:07 AM UTC-4, Daniel Galinkin wrote:
> > > Just to record it here, we have tried another (failed) approach, that > > > was to use a binary mapping field.
> > > Let's use the same example. Suppose we have thre fields, "f1", "f2" > > > and "f3", each of them with three possible values, 1, 2 and 3.
> > > We would then map those values to bits. > > > 1=01 > > > 2=10 > > > 3=11
> > > And then, we built a new field in the documents, "b", that was a > > > binary string representing the values of the three fields ("f1", "f2", > > > "f3"). The first two bits represent the "f1" value, the next two > > > represent the "f2" value, and the last two represent the "f3" value.
> > > A document that has "f1"=1, "f2"=3 and "f3"=2 would have the field > > > "b"=011110.
> > > Our ideia was to index this field, and then we would have a simple > > > index on a single field.
> > > This would work fine if the user always queried for a specific value > > > in all of the three fields. However, as we said in the first post, the > > > user can also ask for a "anything" value for a specific field, a > > > "don't care" value.
> > > So, if the user asks for documents in which f1=1,f2=1 and f3=2, we > > > would just make a query like db.docs.find({b:'010110'}). However, if > > > this user then decides that f2 can be anything, we would have to > > > search for documents that match a certain bitmask. This operation is > > > only possible with a map reduce operation, which is expensive, as we > > > would have to scan the whole collection.
> > > We then tried to expand the possibilities, and query using an $in > > > operation. If the user queries for documents in which f1=1, f2 can be > > > anything and f3=2, he would actually be querying for documents in > > > which "b" is either 010110, 011010 or 011110. However, our actual > > > application has 10+ fields, and since the user can set any number of > > > fields to 'anything', the number of combinations would get huge, and > > > the query would get too expensive again.
> > > I just listed this hoping to make this problem clearer, and to record > > > our attempts to tackle it.
> > > On May 11, 9:17 am, Daniel Galinkin <danielgalin...@gmail.com> wrote: > > > > Thanks for the answer! > > > > I just voted for and began watching this ticket.
> > > > As for the suggestion, I didn't quite understand how this will > enable > > > > us to query for documents that match a set of criterias, such as > f1=1 > > > > AND f2=3. Also, what's the advantage of using a separate collection > > > > and index for this, instead of indexing the original collection?
> > > > Thanks once more for your trouble.
> > > > On May 9, 2:55 pm, Kyle Banker <k...@10gen.com> wrote:
> > > > > First, I'd recommend commenting and voting and adding yourself as > a > > > watcher > > > > > on SERVER-1000.
> > > > > Another technique to try would be to store each field along with > the > > > > > document id it references in a separate collection. So, your > entries > > > would > > > > > look like this: > > > > > {_id: ObjectId, d_id: ObjectId, k: "f1", v: 1}
> > > > > Then you can do a query like this: > > > > > db.docs.find({$all: [{k: "f1", v: 1}]}, {_id: 0, d_id: 1})
> > > > > This should do an index-only scan and return a set of document ids > > > > > corresponding to your original document. I know this isn't ideal, > but > > > the > > > > > probably the next best strategy until SERVER-1000 is fixed.
> > > > > On Tuesday, May 8, 2012 5:58:20 PM UTC-4, Daniel Galinkin wrote:
> > > > > > Hello,
> > > > > > We have a problem with our application backed by MongoDB, we > were > > > > > > wondering if you could give us some hints as to how to proceed.
> > > > > > Our application is as follows: each document has a number of > fields > > > > > > that can be included in a query issued by the user. When the > user > > > > > > queries for the documents, he can either choose a specific value > for > > > a > > > > > > field, or say he does not care what the value of this field is.
> > > > > > For example: suppose we have the fields "f1", "f2" and "f3". > Each of > > > > > > those can have 3 possible values: 1, 2 or 3. A possible query > the > > > user > > > > > > can make is: find me the documents where f1=2, f2=1 and f3=2. > > > Another > > > > > > possible query is: find me the documents in which f1=2, f3=3 and > the > > > > > > value of f2 can be anything.
> > > > > > We are having some problems indexing this collection for this > query. > > > > > > We have many fields (10+), so making a compound index with all > the > > > > > > fields is too expensive, as the index gets way too big, and > since we > > > > > > do not use every field in the query every time, it is also > > > inneficient > > > > > > to use.
> > > > > > We are currently trying to implement a fulltext search-like > > > solution, > > > > > > in which we created a array field "a" in each document, that > > > contains > > > > > > "tags" in it representing the values of our fields. For example, > a > > > > > > document that has f1=1, f2=3 and f3=1 would have this array > containg > > > > > > the tags ["f1-1", "f2-3", "f3-1"]. Then, our idea was to use the > > > $all > > > > > > operator based on this. Using the previously mentioned example > > > query: > > > > > > find me the documents in which f1=2, f3=3 and the value of f2 > can be > > > > > > anything. This would be translated to {"a" : {$all : ["f1-2", > > > > > > "f3-3"]}. We also indexed this field, following the suggestions > > > listed > > > > > > inhttp://www.mongodb.org/display/DOCS/Full+Text+Search+in+Mongo.
> > > When > > > > > > making this query, using this index, the results were not good. > > > Using > > > > > > the explain() feature, we saw that the database is going through > all > > > > > > documents in which f1=2, and searches in them the ones that have > > > f3=3. > > > > > > For example, if there are 2000 documents with f1=2, and out of > > > those > > > > > > just 2 also have f3=3, mongo scans all the 2000 documents to > return > > > > > > just those 2.
> > > > > > We came across this jira ticket ( > https://jira.mongodb.org/browse/ > > > > > > SERVER-1000 <https://jira.mongodb.org/browse/SERVER-1000>) that > > > seems to > > > > > > be directly related to our problem, so it > > > > > > seems mongo still does not support the solution we tried to use.
> > > > > > Do you guys have a suggestion of how to solve this problem? > > > > > > Thanks in advance.
I am a developer in the same project as Daniel Galinkin. In that
specific case that my operation uses the same field sorting and in a
range query ( a ) and querying on other fields (e.g., f1, f2) what is
the best approach?
Let me show an example, there are 4 fields (Z,a,b,c) and we have to
get them using something like this:
> The key is that the sort and range have to be on the same field. If they're
> on different fields, then
> this won't work.
> But if you're additionally querying on these other fields (e.g., f1, f2),
> then you're right: an index only query isn't possible.
> On Friday, May 11, 2012 10:32:59 AM UTC-4, Daniel Galinkin wrote:
> > Thanks again for the fast answer.
> > I still don't quite understand how this approach enables us to make
> > queries that have more than one field in them (f1=2 AND f3=1, for
> > example).
> > As for the index-only scan, I don't think that's possible in our case.
> > When formulating our problem, I tried to simplify it, and ommited some
> > aspects of it. One of the aspects in question is that we also limit
> > our results to a certain range based on another field, and we also
> > sort the results on this field. I believe it is not possible to
> > combine index-only searches with sorted queries limited by range, is
> > it? As said in your book MongoDB in Action we bought some time ago,
> > "The main thing to remember is that a compound-key index can
> > efficiently serve just a single range or sort per query".
> > On May 11, 10:49 am, Kyle Banker <kyleban...@gmail.com> wrote:
> > > The advantage of the approach I gave you is that you can do an
> > index-only
> > > scan to resolve the query. That's pretty efficient. Have you tried it
> > yet?
> > > On Friday, May 11, 2012 8:51:07 AM UTC-4, Daniel Galinkin wrote:
> > > > Just to record it here, we have tried another (failed) approach, that
> > > > was to use a binary mapping field.
> > > > Let's use the same example. Suppose we have thre fields, "f1", "f2"
> > > > and "f3", each of them with three possible values, 1, 2 and 3.
> > > > We would then map those values to bits.
> > > > 1=01
> > > > 2=10
> > > > 3=11
> > > > And then, we built a new field in the documents, "b", that was a
> > > > binary string representing the values of the three fields ("f1", "f2",
> > > > "f3"). The first two bits represent the "f1" value, the next two
> > > > represent the "f2" value, and the last two represent the "f3" value.
> > > > A document that has "f1"=1, "f2"=3 and "f3"=2 would have the field
> > > > "b"=011110.
> > > > Our ideia was to index this field, and then we would have a simple
> > > > index on a single field.
> > > > This would work fine if the user always queried for a specific value
> > > > in all of the three fields. However, as we said in the first post, the
> > > > user can also ask for a "anything" value for a specific field, a
> > > > "don't care" value.
> > > > So, if the user asks for documents in which f1=1,f2=1 and f3=2, we
> > > > would just make a query like db.docs.find({b:'010110'}). However, if
> > > > this user then decides that f2 can be anything, we would have to
> > > > search for documents that match a certain bitmask. This operation is
> > > > only possible with a map reduce operation, which is expensive, as we
> > > > would have to scan the whole collection.
> > > > We then tried to expand the possibilities, and query using an $in
> > > > operation. If the user queries for documents in which f1=1, f2 can be
> > > > anything and f3=2, he would actually be querying for documents in
> > > > which "b" is either 010110, 011010 or 011110. However, our actual
> > > > application has 10+ fields, and since the user can set any number of
> > > > fields to 'anything', the number of combinations would get huge, and
> > > > the query would get too expensive again.
> > > > I just listed this hoping to make this problem clearer, and to record
> > > > our attempts to tackle it.
> > > > On May 11, 9:17 am, Daniel Galinkin <danielgalin...@gmail.com> wrote:
> > > > > Thanks for the answer!
> > > > > I just voted for and began watching this ticket.
> > > > > As for the suggestion, I didn't quite understand how this will
> > enable
> > > > > us to query for documents that match a set of criterias, such as
> > f1=1
> > > > > AND f2=3. Also, what's the advantage of using a separate collection
> > > > > and index for this, instead of indexing the original collection?
> > > > > Thanks once more for your trouble.
> > > > > On May 9, 2:55 pm, Kyle Banker <k...@10gen.com> wrote:
> > > > > > First, I'd recommend commenting and voting and adding yourself as
> > a
> > > > watcher
> > > > > > on SERVER-1000.
> > > > > > Another technique to try would be to store each field along with
> > the
> > > > > > document id it references in a separate collection. So, your
> > entries
> > > > would
> > > > > > look like this:
> > > > > > {_id: ObjectId, d_id: ObjectId, k: "f1", v: 1}
> > > > > > Then you can do a query like this:
> > > > > > db.docs.find({$all: [{k: "f1", v: 1}]}, {_id: 0, d_id: 1})
> > > > > > This should do an index-only scan and return a set of document ids
> > > > > > corresponding to your original document. I know this isn't ideal,
> > but
> > > > the
> > > > > > probably the next best strategy until SERVER-1000 is fixed.
> > > > > > On Tuesday, May 8, 2012 5:58:20 PM UTC-4, Daniel Galinkin wrote:
> > > > > > > Hello,
> > > > > > > We have a problem with our application backed by MongoDB, we
> > were
> > > > > > > wondering if you could give us some hints as to how to proceed.
> > > > > > > Our application is as follows: each document has a number of
> > fields
> > > > > > > that can be included in a query issued by the user. When the
> > user
> > > > > > > queries for the documents, he can either choose a specific value
> > for
> > > > a
> > > > > > > field, or say he does not care what the value of this field is.
> > > > > > > For example: suppose we have the fields "f1", "f2" and "f3".
> > Each of
> > > > > > > those can have 3 possible values: 1, 2 or 3. A possible query
> > the
> > > > user
> > > > > > > can make is: find me the documents where f1=2, f2=1 and f3=2.
> > > > Another
> > > > > > > possible query is: find me the documents in which f1=2, f3=3 and
> > the
> > > > > > > value of f2 can be anything.
> > > > > > > We are having some problems indexing this collection for this
> > query.
> > > > > > > We have many fields (10+), so making a compound index with all
> > the
> > > > > > > fields is too expensive, as the index gets way too big, and
> > since we
> > > > > > > do not use every field in the query every time, it is also
> > > > inneficient
> > > > > > > to use.
> > > > > > > We are currently trying to implement a fulltext search-like
> > > > solution,
> > > > > > > in which we created a array field "a" in each document, that
> > > > contains
> > > > > > > "tags" in it representing the values of our fields. For example,
> > a
> > > > > > > document that has f1=1, f2=3 and f3=1 would have this array
> > containg
> > > > > > > the tags ["f1-1", "f2-3", "f3-1"]. Then, our idea was to use the
> > > > $all
> > > > > > > operator based on this. Using the previously mentioned example
> > > > query:
> > > > > > > find me the documents in which f1=2, f3=3 and the value of f2
> > can be
> > > > > > > anything. This would be translated to {"a" : {$all : ["f1-2",
> > > > > > > "f3-3"]}. We also indexed this field, following the suggestions
> > > > listed
> > > > > > > inhttp://www.mongodb.org/display/DOCS/Full+Text+Search+in+Mongo.
> > > > When
> > > > > > > making this query, using this index, the results were not good.
> > > > Using
> > > > > > > the explain() feature, we saw that the database is going through
> > all
> > > > > > > documents in which f1=2, and searches in them the ones that have
> > > > f3=3.
> > > > > > > For example, if there are 2000 documents with f1=2, and out of
> > > > those
> > > > > > > just 2 also have f3=3, mongo scans all the 2000 documents to
> > return
> > > > > > > just those 2.