Intermittent error – Read past EOF with Intersect query

214 views
Skip to first unread message

Murray

unread,
May 3, 2012, 4:31:45 PM5/3/12
to rav...@googlegroups.com

(Using build 918)

I have an index built on a Dictionary<string,string>:

Map = statitems => from statitem in statitems

                               from d in statitem.Dimensions

                               select new

                               {

                                   Dimensions_Key = d.Key,

                                   Dimensions_Value = d.Value,                  

                               };


This index is being queried with an Intersect(), for example:

var result = session.Query<StatItem, StatItem_ByDimensions>()

.Where(o => o.Dimensions.Any(t => t.Key == "foo" && t.Value == "bar")).OrderBy(o => o.Id);

.Intersect().Where(o => o.Dimensions.Any(t => t.Key == "fiz" && t.Value == "bin")).ToList();

             

About 1 of 5 queries fail with:

System.IO.IOException: read past EOF

   at Lucene.Net.Store.BufferedIndexInput.Refill() in z:\Libs\lucene.net\src\core\Store\BufferedIndexInput.cs:line 179

   at Lucene.Net.Store.BufferedIndexInput.ReadByte() in z:\Libs\lucene.net\src\core\Store\BufferedIndexInput.cs:line 42

   at Lucene.Net.Store.IndexInput.ReadInt() in z:\Libs\lucene.net\src\core\Store\IndexInput.cs:line 76

   at Lucene.Net.Store.IndexInput.ReadLong() in z:\Libs\lucene.net\src\core\Store\IndexInput.cs:line 102

   at Lucene.Net.Index.FieldsReader.Doc(Int32 n, FieldSelector fieldSelector) in z:\Libs\lucene.net\src\core\Index\FieldsReader.cs:line 231

   at Raven.Database.Indexing.IntersectionCollector.Collect(Int32 doc) in c:\Builds\RavenDB-Unstable\Raven.Database\Indexing\IntersectionCollector.cs:line 39

   at Lucene.Net.Search.BooleanScorer2.Score(Collector collector) in z:\Libs\lucene.net\src\core\Search\BooleanScorer2.cs:line 391

 ...

Deleting and recreating the index has no effect.  Inserting new documents may change which particular combinations fail, or may not.

The index is not stale, and the current development database has less than 500 documents of this class and less than 1000 documents overall.

Is my usage of Intersect() incorrect?  Am I missing a setting?  Is there an existing bug documented?  Workaround?


Itamar Syn-Hershko

unread,
May 3, 2012, 4:50:20 PM5/3/12
to rav...@googlegroups.com
Can you provide a failing test?

Murray

unread,
May 3, 2012, 7:00:53 PM5/3/12
to rav...@googlegroups.com
I've attached a test that creates 750 documents and queries them, generating the error ...


On Thursday, May 3, 2012 1:50:20 PM UTC-7, Itamar Syn-Hershko wrote:
Can you provide a failing test?

IntersectQueryFailsWithEOF.txt

Matt Warren

unread,
May 4, 2012, 6:54:21 AM5/4/12
to rav...@googlegroups.com
Itamar/Oren

I've had a look at this and realised the problem, the code here  https://github.com/ayende/ravendb/blob/master/Raven.Database/Indexing/IntersectionCollector.cs#L39 

Should just be
var document = currentReader.Document(doc);  //Note don't need to add currentBase to doc

I mis-understood how you use the currentBase value. You don't need to apply it when getting the doc from currentReader, just when calculating the docID across all the readers (if you need that info). The doc passed to Collect(..) is already relative to the current reader. See http://lucene.apache.org/core/old_versioned_docs/versions/3_0_1/api/all/org/apache/lucene/search/Collector.html for some more info:

NOTE: The doc that is passed to the collect method is relative to the current reader. If your collector needs to resolve this to the docID space of the Multi*Reader, you must re-base it by recording the docBase from the most recent setNextReader call. Here's a simple example showing how to collect docIDs into a BitSet: 

Murray

unread,
May 4, 2012, 1:02:33 PM5/4/12
to rav...@googlegroups.com
Thank you.  The Intersect() queries do work with that change.  We will watch for builds including this change.



On Friday, May 4, 2012 3:54:21 AM UTC-7, Matt Warren wrote:
Itamar/Oren

I've had a look at this and realised the problem, the code here  https://github.com/ayende/ravendb/blob/master/Raven.Database/Indexing/IntersectionCollector.cs#L39 

Should just be
var document = currentReader.Document(doc);  //Note don't need to add currentBase to doc

I mis-understood how you use the currentBase value. You don't need to apply it when getting the doc from currentReader, just when calculating the docID across all the readers (if you need that info). The doc passed to Collect(..) is already relative to the current reader. See http://lucene.apache.org/core/old_versioned_docs/versions/3_0_1/api/all/org/apache/lucene/search/Collector.html for some more info:

NOTE: The doc that is passed to the collect method is relative to the current reader. If your collector needs to resolve this to the docID space of the Multi*Reader, you must re-base it by recording the docBase from the most recent setNextReader call. Here's a simple example showing how to collect docIDs into a BitSet: 

On Thursday, 3 May 2012 21:50:20 UTC+1, Itamar Syn-Hershko wrote:
Can you provide a failing test?

Matt Warren

unread,
May 4, 2012, 4:40:36 PM5/4/12
to rav...@googlegroups.com
Cool, glad it works for you as well.

Oren Eini (Ayende Rahien)

unread,
May 5, 2012, 7:28:57 AM5/5/12
to rav...@googlegroups.com
Matt, can you send a pull request for that?

Matt Warren

unread,
May 5, 2012, 3:02:50 PM5/5/12
to rav...@googlegroups.com
Yeah sure, probably won't be till Monday though

Matt Warren

unread,
May 5, 2012, 3:03:56 PM5/5/12
to rav...@googlegroups.com
Also I'll add in the unit test that Murray supplied, it'll be useful to have more tests covering the Intersect functionality.
Reply all
Reply to author
Forward
0 new messages