Indexing and querying events

153 views
Skip to first unread message

ZNS

unread,
Aug 14, 2012, 6:55:08 PM8/14/12
to rav...@googlegroups.com
Hello,

I have a dilemma with figuring out how index/query our events. Each event is a document and can have multiple dates. For example jan 21, jan 23, feb 4 and march 8. I need to query these documents on range, for example find all events that are happening between jan 24 and feb 8. I've tried a couple of solutions but none of them have worked. I would be very grateful for any ideas.

Oren Eini (Ayende Rahien)

unread,
Aug 15, 2012, 2:58:36 AM8/15/12
to rav...@googlegroups.com
public class Event
{
  public DateTime[] Dates;
}

session.Advanced.LuceneQuery<Event>().WhereBetween(x=>x.Dates, first,second).ToList();

ZNS

unread,
Aug 15, 2012, 9:47:52 AM8/15/12
to rav...@googlegroups.com

So easy, thanks! Although this brings up a new problem which I unfortunately did not forsee in my first question. In the same index we have documents that have no dates. I think these are ignored now since I added "from dates in mydoc.Dates" to the index def. Is this solvable or do I need separate indexes?

Oren Eini (Ayende Rahien)

unread,
Aug 15, 2012, 9:54:08 AM8/15/12
to rav...@googlegroups.com
Those are currently ignored for this query, yes

ZNS

unread,
Aug 15, 2012, 10:09:45 AM8/15/12
to rav...@googlegroups.com

Thank you. I added DefaultIfEmpty() to the dates-collection and that seems to have solved it.

ZNS

unread,
Aug 16, 2012, 8:03:38 AM8/16/12
to rav...@googlegroups.com

Alright, problem solved, new problem arrives. Paging (to any given page) seems to not be supported using these kind of indexes (according to https://groups.google.com/d/topic/ravendb/7rIgoPbVJ5g/discussion). Is this correct, and is it possible to do some kind of work around?




On Wednesday, August 15, 2012 8:58:36 AM UTC+2, Oren Eini wrote:

Oren Eini (Ayende Rahien)

unread,
Aug 16, 2012, 11:54:58 AM8/16/12
to rav...@googlegroups.com
You need to make use of the SkippedResults value in the statistics for this query

ZNS

unread,
Aug 16, 2012, 12:02:49 PM8/16/12
to rav...@googlegroups.com

Yes but I need to to be able to fetch any page. If I want to fetch page 8 I can ofcourse first request pages 1-7 and sum the skippedresult-values, but that doesn't seem that efficient. Also there is no way to fetch the actual total number of hits. Also sorting did not work with this "loop"-solution.

I have now tried simply inserting an array of all the dates for an event to a single field instead. That seems to work and I can simply query the field with a "between" query. The problem then is sorting though..

Oren Eini (Ayende Rahien)

unread,
Aug 16, 2012, 12:12:20 PM8/16/12
to rav...@googlegroups.com
That is the solution for this issue, yes.

ZNS

unread,
Aug 16, 2012, 12:29:10 PM8/16/12
to rav...@googlegroups.com

First of all, thanks for taking the time. I guess you mean the page-loop-request is the solution? I could probably live with the loop if the sorting worked and I could get the total result count, otherwise it's impossible to create a pager in a UI. The problem with sorting is that if I sort by the date-field it doesn't work, I think this has something to do which duplicates are removed by raven.

Oren Eini (Ayende Rahien)

unread,
Aug 16, 2012, 12:29:48 PM8/16/12
to rav...@googlegroups.com
No, I mean, use an array there.

ZNS

unread,
Aug 16, 2012, 1:32:26 PM8/16/12
to rav...@googlegroups.com

Ah. Yeah, the problem is sorting though. No use showing a list of events that are not sorted by date.

Oren Eini (Ayende Rahien)

unread,
Aug 16, 2012, 1:43:28 PM8/16/12
to rav...@googlegroups.com
The problem is, which date?
What happen if you have more than one date in your list that matches?

ZNS

unread,
Aug 16, 2012, 2:57:20 PM8/16/12
to rav...@googlegroups.com

Yes, which date is the problem. Say I get all events between august and december, I want to show those events that have a date in august first. So I pretty much want to sort by the earliest available date within the queried range.
Now I've "solved" it by adding the first date of the event to a new field, however I'd need to reindex all the events every day for that to work.

ZNS

unread,
Aug 16, 2012, 5:03:34 PM8/16/12
to rav...@googlegroups.com

Alright, I had an idea. I can add a field for each month the event has dates for using CreateField. The value of each field will be the first day in that month the event has. I can then sort on these, like for a search between august and october I'll order by the fields 20128, 20129, 201210. I tried adding this in my map like this:

_ = prod.Dates
.GroupBy((Func<dynamic, dynamic>)(x => x.Date.Year.ToString() + x.Date.Month.ToString()))
.Select((Func<IGrouping<dynamic, dynamic>, dynamic>)(g => CreateField(g.Key, g.Min(x => x.Date.Day), false, false)))

However the DynamicList does not allow GroupBy. I guess I can do something like this using map/reduce but I really don't know how, cause I don't really want to reduce I just want to create extra fields.

Oren Eini (Ayende Rahien)

unread,
Aug 17, 2012, 12:33:24 AM8/17/12
to rav...@googlegroups.com
What build are you using?

ZNS

unread,
Aug 17, 2012, 3:58:21 AM8/17/12
to rav...@googlegroups.com

I'm using the latest, build 960.

ZNS

unread,
Aug 17, 2012, 3:36:52 PM8/17/12
to rav...@googlegroups.com

I'm trying to move this to Stack overflow cause I really need to solve it....
http://stackoverflow.com/questions/12010845/unsolvable-events-scenario-in-ravendb

Kijana Woodard

unread,
Aug 17, 2012, 4:00:39 PM8/17/12
to rav...@googlegroups.com
You might discourage answers when the title says "Unsolvable" - :-D

ZNS

unread,
Aug 19, 2012, 11:33:29 AM8/19/12
to rav...@googlegroups.com

I wanted to encourage users to solve it by adding a qustionmark at the end ;) Anyway, JBland on stack seems to have a good solution, I can't quite wrap my head around the sorting though..

ZNS

unread,
Aug 22, 2012, 3:39:08 AM8/22/12
to rav...@googlegroups.com

Alright, this is sincerely starting to drive me insane. Mainly because this pretty much is a deal-breaking solution for us. Would it be possible to consult someone at ravendb to look at it?

Oren Eini (Ayende Rahien)

unread,
Aug 22, 2012, 3:51:19 AM8/22/12
to rav...@googlegroups.com
We are here.
What is your question?

ZNS

unread,
Aug 22, 2012, 4:40:50 AM8/22/12
to rav...@googlegroups.com

Thanks! :) I have a good summary of it on stack overflow:
http://stackoverflow.com/questions/12010845/unsolvable-events-scenario-in-ravendb

Oren Eini (Ayende Rahien)

unread,
Aug 22, 2012, 4:50:14 AM8/22/12
to rav...@googlegroups.com
And the answer remains the same.
You have two choices:


from event in docs.Events
from date in event.Dates
select new { Dates = date}

Which gives you the ability to sort by that particular date, but means you have to take into account SkippedResults.

Or, 

from event in docs.Events
select new { Dates = event.Dates }

You don't have to worry about SkippedResults, but sorting on this will be done based on the min/max values.


And my recommendation would be to go with the first option.

ZNS

unread,
Aug 22, 2012, 5:29:18 AM8/22/12
to rav...@googlegroups.com

The issue I have with the first option is that I really must be able to page it since I'm building an rest-webservice against this query. So the api must give users the option of getting for example page 6. Also when I tried this (looping and requesting page 1 - 5 for getting page 6) the sorting did not work. Page 6 returned the wrong results.

The second solution I will not be able to sort as you say. If I have an event with dates 2012-01-01 and 2012-03-05 and another with 2012-01-05 and 2012-03-02 and I query for all events in march, sort them by start date, they will be sorted in the wrong order.

Oren Eini (Ayende Rahien)

unread,
Aug 22, 2012, 5:30:21 AM8/22/12
to rav...@googlegroups.com
Can you show the code you used to get to page 6?

Oren Eini (Ayende Rahien)

unread,
Aug 22, 2012, 5:30:44 AM8/22/12
to rav...@googlegroups.com
Also, note that in most REST API, you don't give people access to any page, you give them access to the next page.

ZNS

unread,
Aug 22, 2012, 5:40:04 AM8/22/12
to rav...@googlegroups.com

That could be acceptable, but then the application would need to remember all requests made to the api, to know which one is the next page for this specific request? Or am I missing something?

As for the paging-code I scrapped it, but it was based on this post:
https://groups.google.com/d/topic/ravendb/G0JEdU3nHdQ/discussion
I looped through and requested page 1, 2, 3 etc summing up the skippedresults count..

Oren Eini (Ayende Rahien)

unread,
Aug 22, 2012, 5:41:07 AM8/22/12
to rav...@googlegroups.com
You just need to point to the next page, you don't need to remember everything.

And your paging code should work, if you can show me something that isn't, we will fix ti

ZNS

unread,
Aug 22, 2012, 5:48:41 AM8/22/12
to rav...@googlegroups.com

Ah, I could ofcourse pass a url for getting the next page with the api.

The paging worked but the result was sorted incorrectly. I'll see if I can write a test for this.

ZNS

unread,
Aug 22, 2012, 6:31:58 PM8/22/12
to rav...@googlegroups.com

Here is a failing test for this scenario, using stable build 960.
https://gist.github.com/3430075
It fails pretty big also, so I'm wondering if the test is correctly written, but I can't find anything wrong with it. The strange thing is that the skippedresults when getting all results as one page are very different from skippedresults sum when paging.

Oren Eini (Ayende Rahien)

unread,
Aug 23, 2012, 1:13:27 AM8/23/12
to rav...@googlegroups.com
Confirmed, looking at this now

Oren Eini (Ayende Rahien)

unread,
Aug 23, 2012, 3:09:45 AM8/23/12
to rav...@googlegroups.com
Okay, I think I fixed this.
We might have just invalidated the whole concept of Skipped Results :-)
Will be in the next build

ZNS

unread,
Aug 23, 2012, 5:39:57 AM8/23/12
to rav...@googlegroups.com

Alright, sorry about that ;)
My problem now is that I really would like to run this in the stable build. Because the new version only supports .net 4 right? And I still need to be able to support 3.5 for a while longer. Would it be possible to get this fix as a patch or something and I can compile it into the 960 build myself?

Oren Eini (Ayende Rahien)

unread,
Aug 23, 2012, 10:00:33 AM8/23/12
to rav...@googlegroups.com
You could backport the fix, sure.
But the 3.5 support is just the client API, you should be able to still create one from 1.2

ZNS

unread,
Aug 23, 2012, 10:10:06 AM8/23/12
to rav...@googlegroups.com

But if I run server 1.2 won't I need the client from that version also (because of date-format-changes etc)? Or do you mean I can compile the 1.2 client for .net 3.5?

Oren Eini (Ayende Rahien)

unread,
Aug 23, 2012, 10:11:33 AM8/23/12
to rav...@googlegroups.com
You can create a 3.5 version of the client farily easily.

Chris Marisic

unread,
Aug 23, 2012, 10:39:32 AM8/23/12
to rav...@googlegroups.com
The real solution is to upgrade from legacy .NET, being 2 major versions behind is a serious technological dead weight in an organization.

Whatever reasons 3.5 is being used, I personally would attempt to trail-blaze the path to 4.5.

ZNS

unread,
Aug 24, 2012, 4:32:15 AM8/24/12
to rav...@googlegroups.com

I see that this fix is available in 1.2 now, and it says that the client does not need to care about skippedresults anymore. Does that mean that you can just page as usual now?

Oren Eini (Ayende Rahien)

unread,
Aug 24, 2012, 8:25:46 AM8/24/12
to rav...@googlegroups.com
Yes

ZNS

unread,
Aug 24, 2012, 8:33:22 AM8/24/12
to rav...@googlegroups.com

Alright, did you get my test to pass with the 2071 build? Because when I try it I get an exception:

Raven.Database.Indexing.Index.IndexQueryOperation.RecordResultsAlreadySeenForDistinctQuery(Searchable indexSearcher, TopDocs search, Boolean adjustStart, Int32& start) in c:\Builds\RavenDB-Unstable-v1.2\Raven.Database\Indexing\Index.cs: line 952

Oren Eini (Ayende Rahien)

unread,
Aug 24, 2012, 8:33:53 AM8/24/12
to rav...@googlegroups.com
what IS the exception?

ZNS

unread,
Aug 24, 2012, 8:54:55 AM8/24/12
to rav...@googlegroups.com

Sorry, bad copy/paste. I've changed the test so that it does not pass the sum of skipped results to the .Skip() method anymore.

System.IndexOutOfRangeException: Index was outside the bounds of the array.


Raven.Database.Indexing.Index.IndexQueryOperation.RecordResultsAlreadySeenForDistinctQuery(Searchable indexSearcher, TopDocs search, Boolean adjustStart, Int32& start) in c:\Builds\RavenDB-Unstable-v1.2\Raven.Database\Indexing\Index.cs: line 952
Raven.Database.Indexing.Index.IndexQueryOperation.<Query>d__48.MoveNext() in c:\Builds\RavenDB-Unstable-v1.2\Raven.Database\Indexing\Index.cs: line 827
System.Linq.Enumerable.WhereSelectEnumerableIterator`2.MoveNext()
System.Linq.Enumerable.WhereSelectEnumerableIterator`2.MoveNext()
System.Collections.Generic.List`1.InsertRange(Int32 index, IEnumerable`1 collection)
Raven.Database.DocumentDatabase.<>c__DisplayClass88.<Query>b__7e(IStorageActionsAccessor actions) in c:\Builds\RavenDB-Unstable-v1.2\Raven.Database\DocumentDatabase.cs: line 935
Raven.Storage.Managed.TransactionalStorage.ExecuteBatch(Action`1 action) in c:\Builds\RavenDB-Unstable-v1.2\Raven.Database\Storage\Managed\TransactionalStorage.cs: line 131
Raven.Storage.Managed.TransactionalStorage.Batch(Action`1 action) in c:\Builds\RavenDB-Unstable-v1.2\Raven.Database\Storage\Managed\TransactionalStorage.cs: line 112
Raven.Database.DocumentDatabase.Query(String index, IndexQuery query) in c:\Builds\RavenDB-Unstable-v1.2\Raven.Database\DocumentDatabase.cs: line 878
Raven.Client.Embedded.EmbeddedDatabaseCommands.Query(String index, IndexQuery query, String[] includes, Boolean metadataOnly) in c:\Builds\RavenDB-Unstable-v1.2\Raven.Client.Embedded\EmbeddedDatabaseCommands.cs: line 373
Raven.Client.Document.AbstractDocumentQuery`2.ExecuteActualQuery() in c:\Builds\RavenDB-Unstable-v1.2\Raven.Client.Lightweight\Document\AbstractDocumentQuery.cs: line 487
Raven.Client.Document.AbstractDocumentQuery`2.InitSync() in c:\Builds\RavenDB-Unstable-v1.2\Raven.Client.Lightweight\Document\AbstractDocumentQuery.cs: line 469
Raven.Client.Document.AbstractDocumentQuery`2.GetEnumerator() in c:\Builds\RavenDB-Unstable-v1.2\Raven.Client.Lightweight\Document\AbstractDocumentQuery.cs: line 646
System.Collections.Generic.List`1..ctor(IEnumerable`1 collection)
System.Linq.Enumerable.ToList[TSource](IEnumerable`1 source)
TestProject1.MyTests.Can_project_InternalId_from_transformResults() in D:\Projekt\BaseToolAPI\BaseTool.Public\src\BaseTool.Public.Tests\Can_project_InternalId_from_transformResults.cs: line 103

Oren Eini (Ayende Rahien)

unread,
Aug 24, 2012, 9:02:41 AM8/24/12
to rav...@googlegroups.com
This is a passing test
ZNS.cs

ZNS

unread,
Aug 24, 2012, 11:17:47 AM8/24/12
to rav...@googlegroups.com

Works like a charm! Thank you very much for all the help.
Is the source code for the unstable builds available? I can only find the source for stable builds on github. I would like to try to compile the client for 3.5 or backport the fix you made.

Oren Eini (Ayende Rahien)

unread,
Aug 24, 2012, 12:00:55 PM8/24/12
to rav...@googlegroups.com

ZNS

unread,
Aug 24, 2012, 5:17:14 PM8/24/12
to rav...@googlegroups.com

Sorry, I'm back. The error I got before (see below in earlier post) is back. I can reproduce it using this updated test:
https://gist.github.com/3455691
NOTE: This test will pass sometime, but most often when I've run it it has thrown the exception below.

Oren Eini (Ayende Rahien)

unread,
Aug 24, 2012, 5:37:00 PM8/24/12
to rav...@googlegroups.com
What build are you seeing this error on?

ZNS

unread,
Aug 24, 2012, 5:41:00 PM8/24/12
to rav...@googlegroups.com

I have only tested it on build 2071

Oren Eini (Ayende Rahien)

unread,
Aug 24, 2012, 11:21:36 PM8/24/12
to rav...@googlegroups.com
Can you try 2072, this test passes for me?

ZNS

unread,
Aug 25, 2012, 5:11:03 AM8/25/12
to rav...@googlegroups.com

As I said it passes sometimes but I have a failure rate of about 50%, with 2072 also.

Oren Eini (Ayende Rahien)

unread,
Aug 25, 2012, 5:35:56 AM8/25/12
to rav...@googlegroups.com
Thanks, reproduced now.
Fixed in the next build.
Reply all
Reply to author
Forward
0 new messages