Efficient way to find first unmatched item from a list

238 views
Skip to first unread message

Harley Burton

unread,
Aug 26, 2018, 6:00:13 PM8/26/18
to mongodb-user
I'm looking for advice on the most efficient way to find the first item from a list that doesn't exist in a collection.

It's hard to explain without an example.

Assume that I have a list objects. Say an array of strings:

checkoutItems = [ "item1", "item2", ... ]

And I have a collection of items that are checked out that looks something like:
{
    "_id": {
        "$oid": "5b..."
    },
    "name": {
        "item1"
    }
}

I want to find the first item from the checkoutItems array where the collection doesn't contain a document with a name property that matches.

My first instinct, because I'm fairly new to mongo, is to create a loop (or map) over checkoutItems and do a search for each one until I find one with no match.

Something like (not a real language):

foreach(item in checkoutItems){
  
  result = SearchMongoCollection( name, item );
  if(result is empty) return item;

}

// Handle no return from above

I may have some slight mistakes in that, but I can make that approach work. However, I'm going to be iterating over a lot of items at times, and I'm willing to bet that there is a better approach. Maybe I just need to know what to search for in the documentation. If anyone can show me a good way to do this, or point me to the correct place to look in the documentation, I would appreciate it.

Thanks.

Oleg Toropov

unread,
Aug 26, 2018, 7:53:24 PM8/26/18
to mongod...@googlegroups.com
Look at the $nin operator

--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
 
For other MongoDB technical support options, see: https://docs.mongodb.com/manual/support/
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user...@googlegroups.com.
To post to this group, send email to mongod...@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/9f732ff3-971b-44f0-bd2c-bb8ceb0253c3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
//
// Oleg V.Toropov, MCP | oleg.t...@gmail.com
// -----------------------------------------------------------------
// I have discovered that if I read enough stuff that's over my head,
// I actually begin to understand some of it.
//

Harley Burton

unread,
Aug 26, 2018, 10:21:01 PM8/26/18
to mongodb-user
Thank you for your response. I think I did a poor job explaining what I was doing.

I will be given an array of strings, and I need to find the first one that doesn't exist as a property in any document in a collection. 

I was hoping for something built in to mongo (or mongoose <- I should have mentioned) that would let me use the array to more efficiently do that, as apposed to iterating over the array and doing a query for each string from the array.

I think I have a better example.

Assume I have a collection of static IP Address assignments.

{
  ip: "172.16.0.10",
  name: "beardGuy PC"
},
{
  ip: "172.16.0.150",
  name: "cakeLady Printer"
}

and for some reason, the dba didn't didn't prepopulate the collection with all usable subnet addresses, but only the ones that have been assigned.

Now I need to find one that hasn't been assigned, and I'm given a list of all possible IP Addresses on a subnet, in the form of an array of strings.
["172.16.0.1", ... , "172.16.0.254"]

Instead of iterating over the whole array (254 items) and doing a query for each one to see if I have a document with { 'ip': '172.16.0.1' }, { 'ip': '172.16.0.2' }, { 'ip': '172.16.0.3' } etc, I was hoping that mongodb or the mongoose library may have a mechanism that would allow me to be more efficient.

In this example, the solution would be to go ahead and create documents for each valid address in the subnet, and then I could just get the first one with no "name" property, but assume there was no way to know all possible values.

I don't expect the number of items in the array to ever exceed ~2000, and it isn't a query that will run often (literally less than daily), so, based on some simple benchmarking in python with pymongo, I don't think it will be a major issue, but I would still be curious if there is some build-in mechanism that would make this kind of thing more efficient.

Thanks again.

Wan Bachtiar

unread,
Sep 11, 2018, 9:43:59 PM9/11/18
to mongodb-user

I will be given an array of strings, and I need to find the first one that doesn’t exist as a property in any document in a collection.

Hi Harley,

Have you tried what Oleg has mentioned above ? The $nin operator seems to be what you’re looking for. For example, given below example documents:

{ "_id" : 1, "ip" : "172.16.0.1", "name" : "cname" }
{ "_id" : 2, "ip" : "172.16.0.2", "name" : "piuuf" }
{ "_id" : 3, "ip" : "172.16.0.3", "name" : "tbooe" }
{ "_id" : 4, "ip" : "172.16.0.4", "name" : "gleap" }

Given the array ["172.16.0.1", "172.16.0.2", "172.16.0.4", "172.16.0.5"], you can use the below query to find "172.16.0.3":

db.collection.findOne({"ip":{
                            "$nin":["172.16.0.1", "172.16.0.2", "172.16.0.4", "172.16.0.5"]
                            }
});

I don’t expect the number of items in the array to ever exceed ~2000, and it isn’t a query that will run often

Depending on the size of array in $nin, you may reached the BSON Document Size limitation of 16MB while constructing the query. If that’s the case you could create a temporary collection to store the array, and use $lookup stage to find the document that doesn’t matched.

For example, store the array in a temporary collection called excluded_ip, then:

db.collection.aggregate([
    { "$lookup": { 
            "from":"excluded_ip", 
            "localField":"ip", 
            "foreignField":"ip", 
            "as":"found"}
    }, 
    { "$match": { "found": { "$eq":[] } } }
]);

Regards,
Wan.

Oleg Toropov

unread,
Sep 11, 2018, 9:50:15 PM9/11/18
to mongod...@googlegroups.com
Wan, this is great!

I did not think about $lookup approach. Great solution for big arrays.

Thanks!

--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
 
For other MongoDB technical support options, see: https://docs.mongodb.com/manual/support/
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user...@googlegroups.com.
To post to this group, send email to mongod...@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.

For more options, visit https://groups.google.com/d/optout.

Harley Burton

unread,
Sep 11, 2018, 10:01:51 PM9/11/18
to mongodb-user
Oh. Wow. Yes. Thank you both. 

I completely missed this when torus_ot gave me the answer. I'm sorry. I thought you missed the question. I read the documentation on $nin, and got the impression that I could only use it to find what didn't exist in a collection. I had no Idea I could apply it this way. This is perfect.

I did play around with creating a temporary collection similar to what you did with excluded_ip, but I couldn't make it work the ways that I tried it, and it seemed like it would be even more inefficient in my case. Probably, as you say, only a good idea if you have a special need, like some size limitations.

Again, thank you both. This is significantly cleaner code if it doesn't save a single cycle.
Reply all
Reply to author
Forward
0 new messages