How to query a non existing field

2,139 views
Skip to first unread message

Nick

unread,
Jun 5, 2012, 10:46:01 AM6/5/12
to rav...@googlegroups.com
Let's say my document has not the Field "CaughtDate". I am doing an update of my web app where new documents are created with this field.
In a query, I want to get the documents that have the field set to null, lower than a specific date or that don't have the field.

The static index is like this:

Map = plants => from plant in plants
                where (!plant.Status.Contains(ReleasedPlant.Status_Archived) && plant.Status.Contains(ReleasedPlant.Status_CaughtConfirmed) && plant.Enabled)
                select new { CaughtDate = plant.CaughtDate };

My query is like this:

var plants = session.Query<ReleasedPlant, CaughtConfirmedNonArchivedPlantsIndex>()
    .Where(x => (x.CaughtDate == null) || (x.CaughtDate < date));

And guess what, it doesn't return the documents that have not the field. The rest of the query works well.
How can I get the documents with the missing field?

Thanks.

Nick

unread,
Jun 6, 2012, 8:04:00 AM6/6/12
to rav...@googlegroups.com
up

Kijana Woodard

unread,
Jun 6, 2012, 8:18:26 AM6/6/12
to rav...@googlegroups.com
Does this help?

I'm assuming CaughtDate is a nullable DateTime?

Nick

unread,
Jun 6, 2012, 8:24:27 AM6/6/12
to rav...@googlegroups.com
hummm no, I don't this how it relates to a missing field.
Yes CaughtDate is nullable DT.

Nick

unread,
Jun 6, 2012, 8:24:45 AM6/6/12
to rav...@googlegroups.com
Read: I don't see...

Kijana Woodard

unread,
Jun 6, 2012, 8:35:15 AM6/6/12
to rav...@googlegroups.com
Just to make sure, by "missing field", do you mean "CaughtDate is set to null" or something else?

I was curious, so I tried this and it passed:

Kijana Woodard

unread,
Jun 6, 2012, 8:37:54 AM6/6/12
to rav...@googlegroups.com
What is the type on plant.Status?
Try commenting out the where clause to see if null values start showing up. 
I've seen several other people have trouble with Contains. I *think* the normal suggestion is to use Any() instead.

Nick

unread,
Jun 6, 2012, 8:45:52 AM6/6/12
to rav...@googlegroups.com
Missing means that the field is not existing in the DB.
I already removed the other part of the filter and I still get no results. My Contains works well btw.

Kijana Woodard

unread,
Jun 6, 2012, 8:58:48 AM6/6/12
to rav...@googlegroups.com
If it's not in the DB, then "CaughtDate != null". Null and "not there" aren't the same.
Now I'm curious, how did you store the Plant document without the CaughtDate property?
I can imagine how you could do that storing json directly, but I don't see how to do it storing a CLR object in the normal manner.

Nick

unread,
Jun 6, 2012, 9:00:03 AM6/6/12
to rav...@googlegroups.com
The entity version 1 had not this class field. Now I'm adding it in entity version 2.

Kijana Woodard

unread,
Jun 6, 2012, 9:09:47 AM6/6/12
to rav...@googlegroups.com
Ahhhh. Good. I'll wait to see what's posted. 
I'm curious about migrations issues. 

I'm still in development so it's easy just to blow away the data file and start over.

Wallace Turner

unread,
Jun 6, 2012, 9:42:56 AM6/6/12
to rav...@googlegroups.com
> And guess what, it doesn't return the documents that have not the field. The rest of the query works well.
How can I get the documents with the missing field? 

You can't, without looping over all the documents in your database and *then* applying your query. eg:
var plants = MethodToGetAllPlants( ).Where(x => (x.CaughtDate == null) || (x.CaughtDate < date)); 
*calling Oren* Personally I don't like this and think it should change as its inconsistent - from a callee's perspective the field is null I dont really care if it exists in the database or not, null is null. I think this question is going to crop a lot more.

Kijana Woodard

unread,
Jun 6, 2012, 10:00:38 AM6/6/12
to rav...@googlegroups.com
I have to disagree with "null is null"
Goto SQL Server studio and enter
select 1 where null=null
select 1 where 1=1

The first has no results, the second has one result. "null is not null"

I think it would be "interesting" (meaning difficult), to interpret "== null" as "actually null or if the field doesn't actually exist". I imagine some unintended consequences from doing this. Now watch one of the HR folks jump in and say "fixed, next build". :-)

Maybe they could have an extension method like .PropertyExists(). Yuck.

Reading through this:

Maybe you should do this:

Add the CaughtDate property to all existing Plant documents as a "one time job".

Matt Warren

unread,
Jun 6, 2012, 12:47:16 PM6/6/12
to rav...@googlegroups.com
Yeah patching is one way to go and it's clean because all the docs in the dBase now match the V2 format.

However, you can do it purely in the index, but it's a bit funky:

    Map = @"from doc in docs.Customers
	select new {
		Name = doc.Name, 
		Age = ((decimal)doc.Age),
		CaughtDate = doc.Inner.ContainsKey(""CaughtDate"") ? doc.CaughtDate : ""[MISSING_FIELD]""
	})"

If would be nicer if the last line could be replaced by a PropertyExists(..) method.

Doing it this way you end up with the following data in the index:
  • "[MISSING_FIELD]" if the field doesn't exist in the Json doc
  • "NULL_VALUE" if it's there but null
  • The actual date (e.g. "20120223000000000") if the field is there and has a value
Then to find only the docs that have missing fields, you can write a query like this (unfortunately you have to use LuceneQuery):
    var missingFieldMatches = session.Advanced.LuceneQuery<CustomerV1>("CustomerIndex")
					.WhereEquals("CaughtDate"@"[MISSING_FIELD]")
					.ToList();

And this to find the ones with CaughtDate = null, you just do this:
       var nullMatches = session.Query<CustomerV2>("CustomerIndex")
					.Where(x => x.CaughtDate == null)
					.ToList();

But I think it would be nicer if RavenDB handled some or all of it for you, kinda like the way it does for null values at the moment. So I'm not saying this is a elegant workaround, just that it's possible ;-)

Kijana Woodard

unread,
Jun 6, 2012, 1:01:51 PM6/6/12
to rav...@googlegroups.com
Cool. Thanks.Good to know what's possible. I didn't know about that doc.Inner jazz.

I think it might be tough for Raven to do it for you without some kind of keyword. Imagine mongo or some other even more schema free NoSQL store. If you had a random assortment of json docs in a collection, x.CaughtDate == null might bring you some weird results when it brings you documents you were considering because they "don't have a caught date". Could be really confusing. (No I only want the ones where CaughtDate is really there an is set to null. Aaarrrgghhhhh).

This all boils down to us programmers never squaring away null vs nil vs nonexistent. We hack so much around these concepts we don't even think about it. It's an int, but they haven't set it, uhh we'll use 0. Oh wait 0 is a valid choice, -1. MinDate is null date. 

You always find weird multiple use cases with null: Is it null because they haven't looked at it yet and the value initialized to null? Have they looked at the field, but aren't sure what the value is yet? Have they looked at the value and are sure that it is nothing/null/nil? Who knows, so we write more and more code trying to divine the situation.

Matt Warren

unread,
Jun 6, 2012, 3:46:08 PM6/6/12
to rav...@googlegroups.com
Yeah that doc.Inner stuff relies on knowing the inner workings of Raven and so it not very robuts/elegant.

Actually RavenDB can handle mull/missing quite well because all data has to end up as text in the Lucene index. That means that you can have "[NULL_VALUE]" as it already does, plus "[MISSING_FIELD]" as I added and actual dates (or ints, longs etc) all indexed against a field.

But I think RavenDB should allow a nicer syntax, something like (this would be strongly typed lambdas):

   from doc in docs.Customers
	select new {
		Name = doc.Name, 
		Age = doc.Age,
		CaughtDate = doc.CaughtDate.StoreMissingFieldInIndex()
	})"

Where "StoreMissingFieldInIndex()" does the same as the original I posted above. This way you can specify this behaviour per-field, but only when you need it. The default behaviour covers most use cases, but being able to do this makes migrations easier (if you care about the difference between "null" and "missing")

Nick

unread,
Jun 6, 2012, 5:04:36 PM6/6/12
to rav...@googlegroups.com
In the meantime, I had developped a one time migration task, loading all the entities and calling SaveChanges, which adds the missing field, set to null.
But a simple solution supported by the API, as proposed by you, would be a great addition.

Matt Warren

unread,
Jun 6, 2012, 5:23:24 PM6/6/12
to rav...@googlegroups.com
Yeah that's probably the best way to do, especially if you don't care about the difference between "null" and "missing" when you query.

Wallace Turner

unread,
Jun 6, 2012, 8:44:09 PM6/6/12
to rav...@googlegroups.com
> I have to disagree with "null is null"
>Goto SQL Server studio and enter

I'll stop you right there. :) why did you mention SQL Server, this is RavenDB?

Just think about a common scenario and how much this 'gotcha' will irritate/confuse the average c# consumer of ravenDB.

1) Add entity to database
2) Modifiy entity by adding new field
3) Update entities with null field  (equivalent to sql's UPDATE Foo SET Field1='bar' WHERE Field1 IS NULL)

So step3 doesnt work then the end user has to work out how to load ALL the entities and then apply their query (loading all the entities isnt much fun ootb either)


Kijana Woodard

unread,
Jun 6, 2012, 10:47:44 PM6/6/12
to rav...@googlegroups.com
Your premise was null is null. SQL Server was "a more familiar example" to show that to statement to be false.

You don't have to load them all. Do a set operation. 
http://ravendb.net/docs/client-api/set-based-operations 

That's fairly standard DB migrations work. RavenDB actually makes this process easier since most "schema" changes just work.

The problem with equating null and missing is what if I don't want to see documents that haven't upgraded yet. I'm stuck if "== null" finds them automatiically. If someone puts raw json in Raven instead of going through CLR classes, they could be in a world of pain trying to figure out why these "other documents" keep appearing.

Kijana Woodard

unread,
Jun 6, 2012, 11:18:54 PM6/6/12
to rav...@googlegroups.com
What happens with a query like "x.Value != 3". Does that match "x.Value == null"? Does that match every document that doesn't have a Value property?

The tricky thing with a robust product like RavenDB is they have to take into account all the cases, not just the one that seems odd at first blush. Giving us set based operations to handle these sorts of migration issues is fantastic. There's no need for them to clutter the api for edge cases.

But, as I said, for all I know, someone who actually works on the team may pop in here and declare that "null == missing", and that will be that. :-)

I'm glad their tool makes me productive enough to be able to chime in on a forum and still get work done!

Matt Warren

unread,
Jun 7, 2012, 5:34:11 AM6/7/12
to rav...@googlegroups.com
> What happens with a query like "x.Value != 3". Does that match "x.Value == null"? Does that match every document that 
> doesn't have a Value property? 

Yeah you're right, I just checked and if you issue the query "x.Value != 3" when there are docs with "Value = null" then they are returned as part of the results.

But I think that for this scenario you should be issuing the query below instead, i.e. it's up to the dev to handle it rather than Raven doing it for you.

      x.Value != null && x.Value != 3 

Kijana Woodard

unread,
Jun 7, 2012, 10:00:03 AM6/7/12
to rav...@googlegroups.com
Hmmmm. I think that might be a bug. I wouldn't expect to have to explicitly exclude null since null usually means "maybe it's 3, maybe it's not, so let's just not consider them in results. Maybe for RavenDB, null has the more solid "no value" meaning.

I think this shows that raven should definitely *not* bring back documents where the property is missing. How would you explicitly exclude them...without some special keyword like you put forth Matt.

Oren Eini (Ayende Rahien)

unread,
Jun 7, 2012, 3:55:36 PM6/7/12
to rav...@googlegroups.com
select new { CaughtDate = plant.CaughtDate };

This is a missing value, and since it is the only value of the index, we see that this is an invalid result,and won't index it.

                select new { CaughtDate = plant.CaughtDate, CaughtDateExists = plant.CaughtDate  != null };

Would actually give you results when the value isn't there, note that if you want to query for missing, you would have to check for the Exists value.

Wallace Turner

unread,
Jun 7, 2012, 8:13:53 PM6/7/12
to rav...@googlegroups.com
Great example, assuming you meant users/4 instead of users/1 for Name: "D" ?

> You can't do a query on a missing field, but you can query on its absence. 

For my guide and others who will come across this, how does one retrieve these rows, either from c# or Lucene?

On Friday, 8 June 2012 04:03:49 UTC+8, Oren Eini wrote:
Guys,
RavenDB has different behavior for missing vs. null.

users/1 - { "Name": "A" }
users/2 - { "Name": "B", "StoppedBeatingTheWife": false }
users/3 - { "Name": "C", "StoppedBeatingTheWife": true }
users/1 - { "Name": "D", "StoppedBeatingTheWife": null }

Now, let us see what sort of queries we can do on this data set.

Note that "StoppedBeatingTheWife" is a boolean (non nullable) property. I intentionally choose a this example, mind.
Now, what should be the behavior of asking "give all users who stopped beating their wives?" 
StoppedBeatingTheWife:true would return C

StoppedBeatingTheWife:false would return B

Now, we may have users that declined to answer, in which case we can do:

StoppedBeatingTheWife:[[NULL_VALUE]] and get D

But in the case of A, it isn't that the value is null, is it that there isn't a value at all.
We can't check for null equality.  In most cases, it is a sign of schema migration, but we have a lot of users that use RavenDB for sparse schema, in which null properties are simply not written to the document.

And even if it is just additional property, RavenDB doesn't know what to put it, and null isn't appropriate, since it is already a valid value.

As a result, we make a distinction between null the value and missing value.

You can't do a query on a missing field, but you can query on its absence.


Oren Eini (Ayende Rahien)

unread,
Jun 7, 2012, 8:16:22 PM6/7/12
to rav...@googlegroups.com
You define an index with:

select new {  StoppedBeatingTheWifeExists =  doc.StoppedBeatingTheWife != null }

And then query on  StoppedBeatingTheWifeExists:false

Matt Warren

unread,
Jun 8, 2012, 5:36:53 AM6/8/12
to rav...@googlegroups.com
I'm not seeing the same behaviour, it still treats "null" and missing as the same, I have docs that match yours:
    users/1 - { "Name": "A" }
    users/2 - { "Name": "B", "StoppedBeatingTheWife": false }
    users/3 - { "Name": "C", "StoppedBeatingTheWife": true }
    users/4 - { "Name": "D", "StoppedBeatingTheWife": null }

with an index like this:
        Map = @"from doc in docs.Users
		select new {
				Name = doc.Name, 
				StoppedBeatingTheWifeExists = doc.StoppedBeatingTheWife != null
			})"

But it treats missing and null as the same, if I do the query "StoppedBeatingTheWifeExists:true", I get "users/2" & "users/3"

If I do the query "StoppedBeatingTheWifeExists:false", I get "users/1" & "users/4" (should only by users/1)?

If I change the index to the one below it works fine, but this is a bit hacky:

 Map = @"from doc in docs.Users
	  select new {
	    Name = doc.Name, 
	    StoppedBeatingTheWifeExists = doc.Inner.ContainsKey(""StoppedBeatingTheWife"") ? true : false,	  })"

Full code sample is here https://gist.github.com/2894675, am I missing something? 

I did have to change StoppedBeatingTheWife to a nullable, so that it could be stored in json as null, but I can't see how that makes a difference because I end up exactly the same Json you describe?

On Thursday, 7 June 2012 21:03:49 UTC+1, Oren Eini wrote:
Guys,
RavenDB has different behavior for missing vs. null.

users/1 - { "Name": "A" }
users/2 - { "Name": "B", "StoppedBeatingTheWife": false }
users/3 - { "Name": "C", "StoppedBeatingTheWife": true }
users/1 - { "Name": "D", "StoppedBeatingTheWife": null }

Now, let us see what sort of queries we can do on this data set.

Note that "StoppedBeatingTheWife" is a boolean (non nullable) property. I intentionally choose a this example, mind.
Now, what should be the behavior of asking "give all users who stopped beating their wives?" 
StoppedBeatingTheWife:true would return C

StoppedBeatingTheWife:false would return B

Now, we may have users that declined to answer, in which case we can do:

StoppedBeatingTheWife:[[NULL_VALUE]] and get D

But in the case of A, it isn't that the value is null, is it that there isn't a value at all.
We can't check for null equality.  In most cases, it is a sign of schema migration, but we have a lot of users that use RavenDB for sparse schema, in which null properties are simply not written to the document.

And even if it is just additional property, RavenDB doesn't know what to put it, and null isn't appropriate, since it is already a valid value.

As a result, we make a distinction between null the value and missing value.

You can't do a query on a missing field, but you can query on its absence.

Oren Eini (Ayende Rahien)

unread,
Jun 8, 2012, 5:40:44 AM6/8/12
to rav...@googlegroups.com
Oh, sorry, of course.
If you need to make the distinction, you do:

doc.StoppedBeatingTheWife.IsExplicitNull == false

Nick

unread,
Jun 8, 2012, 8:52:51 AM6/8/12
to rav...@googlegroups.com
Trying to catch up with all what was said.

I added the Exists field in the index as per your instructions. But my query still returns no result:

            var plants = RavenSession.Query<ReleasedPlant, CaughtConfirmedNonArchivedPlantsIndex>()

                .Where(x => (x.CaughtDate == null) || (x.CaughtDate < date));

If I understand well, I have to add the Exists field in my query but since this is an index field, I have to transform it to a Lucene query, right?
What would be this query?

Nick

unread,
Jun 8, 2012, 9:34:36 AM6/8/12
to rav...@googlegroups.com
Oh, and from where does this IsExplicitNull comes from?



On Friday, June 8, 2012 5:40:44 AM UTC-4, Oren Eini wrote:

Oren Eini (Ayende Rahien)

unread,
Jun 8, 2012, 9:01:28 AM6/8/12
to rav...@googlegroups.com
You need to check ON the exists field.
      var plants = RavenSession.Advanced.LuceneQuery<ReleasedPlant, CaughtConfirmedNonArchivedPlantsIndex>()
                .WhereEquals("CuaghtDateExists", true);

Nick

unread,
Jun 8, 2012, 10:20:09 AM6/8/12
to rav...@googlegroups.com
Yes I know. My question was more about how to format the full date query but I found some info.
So I have this:

var q = RavenSession.Advanced.LuceneQuery<ReleasedPlant, CaughtConfirmedNonArchivedPlantsIndex>()
    .WaitForNonStaleResultsAsOfNow()
    .Where("CaughtDateExists:false OR CaughtDate:[[NULL_VALUE]] OR CaughtDate:{* TO 20120605134639713}")
    .ToList();

However, is it possible to have a more "linq-style" query, especially for the date "less than" filter?

And what is your ExplicitNull var?



On Friday, June 8, 2012 9:01:28 AM UTC-4, Oren Eini wrote:
You need to check ON the exists field.
      var plants = RavenSession.Advanced.LuceneQuery<ReleasedPlant, CaughtConfirmedNonArchivedPlantsIndex>()
                .WhereEquals("CuaghtDateExists", true);

Oren Eini (Ayende Rahien)

unread,
Jun 8, 2012, 12:16:08 PM6/8/12
to rav...@googlegroups.com
It is a property of the null object that ravendb uses.

Oren Eini (Ayende Rahien)

unread,
Jun 8, 2012, 12:18:17 PM6/8/12
to rav...@googlegroups.com
session.Advanced.LuceneQuery< ReleasedPlant, CaughtConfirmedNonArchivedPlantsIndex >()
   .WhereEquals("CaughtDateExists", false)
.OrElse()
.WhereEquals(x=>x.CaughtDate, null)
.OrElse()
.WhereLessThan(x=>x.CaughtDate, DateTime.Today)

Nick

unread,
Jun 8, 2012, 6:12:54 PM6/8/12
to rav...@googlegroups.com
Got it. Thanks.


On Friday, June 8, 2012 12:18:17 PM UTC-4, Oren Eini wrote:
session.Advanced.LuceneQuery< ReleasedPlant, CaughtConfirmedNonArchivedPlantsIndex >()
   .WhereEquals("CaughtDateExists", false)
.OrElse()
.WhereEquals(x=>x.CaughtDate, null)
.OrElse()
.WhereLessThan(x=>x.CaughtDate, DateTime.Today)


Ben Wilde

unread,
Aug 26, 2014, 4:32:06 PM8/26/14
to rav...@googlegroups.com
So how about just doing this (using the code from the original question):

Keep the same query, and in the index:
select new { CaughtDate = plant.CaughtDate ?? null };

Then yon can query for null and get it when the field is either missing or null.



On Tuesday, June 5, 2012 9:46:01 AM UTC-5, Nick wrote:
Let's say my document has not the Field "CaughtDate". I am doing an update of my web app where new documents are created with this field.
In a query, I want to get the documents that have the field set to null, lower than a specific date or that don't have the field.

The static index is like this:

Map = plants => from plant in plants
                where (!plant.Status.Contains(ReleasedPlant.Status_Archived) && plant.Status.Contains(ReleasedPlant.Status_CaughtConfirmed) && plant.Enabled)
                select new { CaughtDate = plant.CaughtDate };

My query is like this:

var plants = session.Query<ReleasedPlant, CaughtConfirmedNonArchivedPlantsIndex>()

    .Where(x => (x.CaughtDate == null) || (x.CaughtDate < date));

Oren Eini (Ayende Rahien)

unread,
Aug 26, 2014, 4:34:48 PM8/26/14
to ravendb
That would be valid as well



Oren Eini

CEO


Mobile: + 972-52-548-6969

Office:  + 972-4-622-7811

Fax:      + 972-153-4622-7811





--
You received this message because you are subscribed to the Google Groups "RavenDB - 2nd generation document database" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Oren Eini (Ayende Rahien)

unread,
Jun 7, 2012, 4:03:49 PM6/7/12
to rav...@googlegroups.com
Guys,
RavenDB has different behavior for missing vs. null.

users/1 - { "Name": "A" }
users/2 - { "Name": "B", "StoppedBeatingTheWife": false }
users/3 - { "Name": "C", "StoppedBeatingTheWife": true }
users/1 - { "Name": "D", "StoppedBeatingTheWife": null }

Now, let us see what sort of queries we can do on this data set.

Note that "StoppedBeatingTheWife" is a boolean (non nullable) property. I intentionally choose a this example, mind.
Now, what should be the behavior of asking "give all users who stopped beating their wives?" 
StoppedBeatingTheWife:true would return C

StoppedBeatingTheWife:false would return B

Now, we may have users that declined to answer, in which case we can do:

StoppedBeatingTheWife:[[NULL_VALUE]] and get D

But in the case of A, it isn't that the value is null, is it that there isn't a value at all.
We can't check for null equality.  In most cases, it is a sign of schema migration, but we have a lot of users that use RavenDB for sparse schema, in which null properties are simply not written to the document.

And even if it is just additional property, RavenDB doesn't know what to put it, and null isn't appropriate, since it is already a valid value.

As a result, we make a distinction between null the value and missing value.

You can't do a query on a missing field, but you can query on its absence.

Reply all
Reply to author
Forward
0 new messages