> db.test.find()
{ "a" : [ ] } // doc number 1
{ "a" : [ { "b" : [ ] } ] } // doc number 2
{ "a" : [ { "b" : [ { "c" : 1 } ] } ] } // doc number 3
{ "a" : [ { "b" : [ { "c" : 1 }, { "d" : 1 } ] } ] } // doc number 4
{ "a" : [ { "b" : [ { "c" : 1 } ] }, { "d" : 1 } ] } // doc number 5
> db.test.find({'a.b':null})
{ "a" : [ { "b" : [ { "c" : 1 } ] }, { "d" : 1 } ] }
> db.test.find({'a.b':{$nin:[null,[]]}})
{ "a" : [ ] }
{ "a" : [ { "b" : [ { "c" : 1 } ] } ] }
{ "a" : [ { "b" : [ { "c" : 1 }, { "d" : 1 } ] } ] }
Hi Jeremy
The output you see is correct. This is because the query is:
{'a.b':{$nin:[null,[]]}} // 'a.b' is neither null nor []
Looking at the requested condition on the example documents you posted, I believe this is how the matching works:
{ "a" : [ ] }, // 'a.b' doesn't exist, strictly speaking, 'a.b' is neither null nor [] (match)
{ "a" : [ { "b" : [ ] } ] }, // 'a.b' is [] (not a match)
{ "a" : [ { "b" : [ { "c" : 1 } ] } ] }, // 'a.b' is neither null nor [] (it contains 1 array element) (match)
{ "a" : [ { "b" : [ { "c" : 1 }, { "d" : 1 } ] } ] }, // 'a.b' is neither null nor [] (it contains 1 array element) (match)
{ "a" : [ { "b" : [ { "c" : 1 } ] }, { "d" : 1 } ] } // 'a.b' is null (missing) on sub-document {d: 1} (not a match)
Therefore the three documents satisfy the query.
More information is available in Query an Array and Query for Null or Missing Fields
Best regards
Kevin
Hi Jeremy
I guess the confusion stems from a combination of null, $ne, and arrays; although I would think that {$ne: null} cannot match a missing element: it is specifically looking for a field that is not null (i.e. a field with a value in it), so it can’t be missing.
null can be considered either the literal null value, or a non-existent field. For example:
> db.test.find()
{ "_id": ObjectId("..."), "a": 1 }
{ "_id": ObjectId("..."), "a": 1, "b": null }
{ "_id": ObjectId("..."), "a": 1, "b": 1 }
> db.test.find({b:null})
{ "_id": ObjectId("..."), "a": 1 }
{ "_id": ObjectId("..."), "a": 1, "b": null }
> db.test.find({b:{$ne:null}})
{ "_id": ObjectId("..."), "a": 1, "b": 1 }
In the above example, testing for {b: null} matches both documents where b both doesn’t exist and contains an actual null value. Searching for b: {$ne: null} will return the document where b is not null. If a is an array:
> db.test2.find()
{ "_id": ObjectId("..."), "a": [ { "b": 1 } ] }
{ "_id": ObjectId("..."), "a": [ { "b": 1 }, { "c": 1 } ] }
> db.test2.find({'a.b':null})
{ "_id": ObjectId("..."),
"a": [
{ "b": 1 },
{ "c": 1 } // {'a.b': null} matches this array element
] }
> db.test2.find({'a.b':{$ne:null}})
{ "_id": ObjectId("..."),
"a": [
{ "b": 1 } // {'a.b': {$ne: null}} matches this array element
] }
I guess one can argue that both queries should return both documents in the test2 collection. This is being investigated in SERVER-31876.
Regarding your question, the fifth document doesn’t match the condition of {'a.b': {$nin: [null, []]}}, because a contains an array with two sub-documents:
{ "a" : [
{ "b" : [{ "c" : 1 }] }, // 1st 'a' element
{ "d" : 1 } // 2nd 'a' element
]}
If we ignore the 1st element, the document can be thought of as:
{ "a": [ { "d" : 1 } ] }
in this case, since a.b doesn’t exist, it can be considered as null (as per the very first example of db.test.find({b:null})), hence the document doesn’t satisfy 'a.b' not being null or [].
I guess the gist is, you should be explicit in your schema, and not use null as placeholder for both non-existent field and an actual null value. It’s probably better to set the field to some known value, since it will be easier to reason about, and likely easier to index as well.
Best regards,
Kevin