performance issues filtering a query with a long $nin

137 views
Skip to first unread message

henrique matias

unread,
Apr 30, 2015, 11:06:59 PM4/30/15
to mongod...@googlegroups.com
Hello Guys,

I'm currently having an issue with a system i'm designing, basically i need to filter my query with a $nin, but the $nin for each user grows everyday and it won't stop growing.

So basically sooner or later this queries will start to smell and leak..

Is there any "well known pattern" to filter out results from a query?

As an example: 500k users do some "advanced search" every day, but they can never see the same article. 

Articles are constantly being added and new users and constantly registering.

My current solution is to store the _ids in an Array owned by the user document, but with this design things gonna blow when users start to query with thousands os values on $nin..


Any ideas?

peace
--
time isn't passing, it's you passing.

❁ ❂ ❁ ❂ ❁ ❂ ❁ ❂ ❁ ❂ ❁ ❂ ❁ ❂ ❁ ❂ ❁ ❂ ❁ ❂ ❁ ❂ ❁ ❂ ❁ ❂ ❁ ❂ ❁ ❂ ❁ ❂ 
❂ ❁ ❂ ❁ ❂ ❁ ❂ ❁ ❂ ❁ ❂ ❁ ❂ ❁ ❂ ❁ ❂ ❁ ❂ ❁ ❂ ❁ ❂ ❁ ❂ ❁ ❂ ❁ ❂ ❁ ❂ ❁ 

henrique matias

unread,
May 1, 2015, 2:05:11 AM5/1/15
to mongod...@googlegroups.com
One way i thought of, is to push the user id to a "viewed_by" property, so then instead of doing 2 queries one to find the viewed articles and another one to find unread articles, i'll just do 1 query looking forward articles where the "id" of the user isn't contained in the document's 'viewed_by' property.

The problem then is that this "viewed_by" property, will become bigger and bigger and it might become slow as well at some point ?

Is there such thing as storing the values in a "sorted" way, so then when using executing $ne it won't need to lookup all the values of the viewed_by array ( using binary search or some other fast way ) ?

any advice is highly appreciated,

thanks

peace

Rodrigo Jose Villalba Otto

unread,
Apr 25, 2016, 1:34:34 PM4/25/16
to mongodb-user

Hi,

I'm facing a similar issue. Did you manage to solve the problem?

Rhys Campbell

unread,
Apr 26, 2016, 4:19:02 AM4/26/16
to mongodb-user
Couldn't you find the highest contiguous id and say $gt that?

For example. A User has the following list of ids...

1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,27,99,101,110

We could reduce this to...

15,27,99,101,110

Alas SQL We would say

SELECT *
FROM article
WHERE id
>= 15
AND id NOT IN
(27,99,101,110)

I'd also consider putting a datetime restriction or perhaps just the last 10K articles as well.

Wan Bachtiar

unread,
May 6, 2016, 1:19:01 AM5/6/16
to mongodb-user

Hi,

Depending on your use case and application, one solution is to segment off the data - a similar approach to the hybrid schema design. You could try to use either time or application pagination to segment and reduce the searchable dataset. For example, search only articles from the past 3 days to reduce the amount of data to be filtered.

Worth noting that $nin is not very selective since it often matches a large portion of the index. As a result, a $nin query with an index may perform no better than a $nin query that must scan all documents in a collection.

If possible, it is preferable to change the document structure such that queries can be performed on values that exist, instead of documents that don’t contain specific values. In addition, you can add other selective filters (i.e. date/timestamp) to increase the selectivity of $nin.

You may also find Socialite: Feed Service a useful resource.

Regards,

Wan.

Reply all
Reply to author
Forward
0 new messages