nested array comparisons with null and empty array

nonagon

unread,

Oct 31, 2017, 10:22:13 AM10/31/17

to mongodb-user

Hi,

I have some documents with nested arrays in Mongo (3.2.11):

> db.test.find()
{ "a" : [ ] }  // doc number 1
{ "a" : [ { "b" : [ ] } ] }  // doc number 2
{ "a" : [ { "b" : [ { "c" : 1 } ] } ] }  // doc number 3
{ "a" : [ { "b" : [ { "c" : 1 }, { "d" : 1 } ] } ] }  // doc number 4
{ "a" : [ { "b" : [ { "c" : 1 } ] }, { "d" : 1 } ] }  // doc number 5

I'd like to understand how comparisons with null and the empty array work on this collection.

I think this one queries for all docs with an array "a" containing at least one element with a null or missing "b" element:

> db.test.find({'a.b':null})
{ "a" : [ { "b" : [ { "c" : 1 } ] }, { "d" : 1 } ] }

but then shouldn't this one query for all docs with an array "a" containing at least one element with a "b" element which is not missing, not null and not the empty array? I'd expect it to match docs 3, 4 and 5, but it matches docs 2, 3 and 4 instead.

> db.test.find({'a.b':{$nin:[null,[]]}})
{ "a" : [ ] }
{ "a" : [ { "b" : [ { "c" : 1 } ] } ] }
{ "a" : [ { "b" : [ { "c" : 1 }, { "d" : 1 } ] } ] }

Regards,

Jeremy

Kevin Adistambha

unread,

Nov 8, 2017, 12:06:19 AM11/8/17

to mongodb-user

Hi Jeremy

The output you see is correct. This is because the query is:

{'a.b':{$nin:[null,[]]}} // 'a.b' is neither null nor []

Looking at the requested condition on the example documents you posted, I believe this is how the matching works:

{ "a" : [ ] }, // 'a.b' doesn't exist, strictly speaking, 'a.b' is neither null nor [] (match)
{ "a" : [ { "b" : [ ] } ] }, // 'a.b' is [] (not a match)
{ "a" : [ { "b" : [ { "c" : 1 } ] } ] }, // 'a.b' is neither null nor [] (it contains 1 array element) (match)
{ "a" : [ { "b" : [ { "c" : 1 }, { "d" : 1 } ] } ] }, // 'a.b' is neither null nor [] (it contains 1 array element) (match)
{ "a" : [ { "b" : [ { "c" : 1 } ] }, { "d" : 1 } ] } // 'a.b' is null (missing) on sub-document {d: 1} (not a match)

Therefore the three documents satisfy the query.

More information is available in Query an Array and Query for Null or Missing Fields

Best regards
Kevin

nonagon

unread,

Nov 8, 2017, 9:47:34 AM11/8/17

to mongodb-user

Hi Kevin,

In the docs on querying an array, it says that the entire array will match if it contains at least one element which matches. So wouldn't that mean my document number 5 should be included in the results (since the first element of array 'a' matches the query)?

I'm also struggling with what I believe to be a very important inconsistency regarding {$ne:null} (or, in my example, {$nin:[null,...]}). When applied to a top-level field I believe these constructs will not match a missing value. For example {q:{$ne:null}} will *not* match a document which has no field named 'q'. When applied to array elements, it seems that {$ne:null} *will* match a missing field (e.g. my doc number 1 above). The $ne docs don't mention this at all but it has led to some difficult bugs for me in the field.

Regards,

Jeremy

Kevin Adistambha

unread,

Nov 9, 2017, 1:27:50 AM11/9/17

to mongodb-user

Hi Jeremy

I guess the confusion stems from a combination of null, $ne, and arrays; although I would think that {$ne: null} cannot match a missing element: it is specifically looking for a field that is not null (i.e. a field with a value in it), so it can’t be missing.

null can be considered either the literal null value, or a non-existent field. For example:

> db.test.find()
{ "_id": ObjectId("..."), "a": 1 }
{ "_id": ObjectId("..."), "a": 1, "b": null }
{ "_id": ObjectId("..."), "a": 1, "b": 1 }

> db.test.find({b:null})
{ "_id": ObjectId("..."), "a": 1 }
{ "_id": ObjectId("..."), "a": 1, "b": null }

> db.test.find({b:{$ne:null}})
{ "_id": ObjectId("..."), "a": 1, "b": 1 }

In the above example, testing for {b: null} matches both documents where b both doesn’t exist and contains an actual null value. Searching for b: {$ne: null} will return the document where b is not null. If a is an array:

> db.test2.find()
{ "_id": ObjectId("..."), "a": [ { "b": 1 } ] }
{ "_id": ObjectId("..."), "a": [ { "b": 1 }, { "c": 1 } ] }

> db.test2.find({'a.b':null})
{ "_id": ObjectId("..."),
  "a": [ 
    { "b": 1 },
    { "c": 1 }  // {'a.b': null} matches this array element
  ] }

> db.test2.find({'a.b':{$ne:null}})
{ "_id": ObjectId("..."),
  "a": [
     { "b": 1 }  // {'a.b': {$ne: null}} matches this array element
  ] }

I guess one can argue that both queries should return both documents in the test2 collection. This is being investigated in SERVER-31876.

Regarding your question, the fifth document doesn’t match the condition of {'a.b': {$nin: [null, []]}}, because a contains an array with two sub-documents:

{ "a" : [ 
    { "b" : [{ "c" : 1 }] },  // 1st 'a' element
    { "d" : 1 }                 // 2nd 'a' element
]}

If we ignore the 1st element, the document can be thought of as:

{ "a": [ { "d" : 1 } ] }

in this case, since a.b doesn’t exist, it can be considered as null (as per the very first example of db.test.find({b:null})), hence the document doesn’t satisfy 'a.b' not being null or [].

I guess the gist is, you should be explicit in your schema, and not use null as placeholder for both non-existent field and an actual null value. It’s probably better to set the field to some known value, since it will be easier to reason about, and likely easier to index as well.

Best regards,
Kevin

Reply all

Reply to author

Forward