FilePostingReader / IntersectionMatcher IndexError Exception

4 views
Skip to first unread message

Jeremy Slater

unread,
Mar 4, 2011, 10:11:09 AM3/4/11
to who...@googlegroups.com
I am using whoosh 1.7.6. I have a fairly large index, 2+ million
entries, ~250MB. I have one particular search that fails with an
IndexError, see below.

The issue is happening in the IntersectionMatcher here:

while a.is_active() and b.is_active() and aq + bq <= minquality:
if aq < bq:
skipped += a.skip_to_quality(minquality - bq)
else:
skipped += b.skip_to_quality(minquality - aq)
if a.id() != b.id():
self._find_next()
aq = a.block_quality()
bq = b.block_quality()

The problem is that the b.skip_to_quality() call is reading to the end
of the blocks trying to find a better quality (I guess?). b is set
not active, and then the call to b.id() fails with the index out of
range issue. I assume there is some underlying issue here. I tried
changing the line:

if a.id() != b.id():

to:

if a.is_active() and b.is_active() and a.id() != b.id():

which eliminates the exception. This may be the solution, but I am
having another issue with ANDMAYBE and search limits that is masking
it.

Jeremy


Trace:


/Library/Python/2.6/site-packages/Whoosh-1.7.6-py2.6.egg/whoosh/searching.pyc
in search(self, q, limit, sortedby, reverse, groupedby, optimize,
scored, filter, collector)
481 collector.scored = scored
482
--> 483 return collector.search(self, q, filter=filter)
484
485

/Library/Python/2.6/site-packages/Whoosh-1.7.6-py2.6.egg/whoosh/searching.pyc
in search(self, searcher, q, filter)
582 self.add_searcher(s, q)
583 else:
--> 584 self.add_searcher(searcher, q)
585
586 if self.timer:

/Library/Python/2.6/site-packages/Whoosh-1.7.6-py2.6.egg/whoosh/searching.pyc
in add_searcher(self, searcher, q)
608 """
609
--> 610 self.add_matches(searcher, q.matcher(searcher))
611
612 def score(self, searcher, matcher):

/Library/Python/2.6/site-packages/Whoosh-1.7.6-py2.6.egg/whoosh/searching.pyc
in add_matches(self, searcher, matcher)
653 return self.add_all_matches(searcher, matcher)
654 else:
--> 655 return self.add_top_matches(searcher, matcher)
656
657 def add_top_matches(self, searcher, matcher):

/Library/Python/2.6/site-packages/Whoosh-1.7.6-py2.6.egg/whoosh/searching.pyc
in add_top_matches(self, searcher, matcher)
669 greedy = self.greedy
670
--> 671 for id, quality in self.pull_matches(matcher, usequality):
672 if timelimited and not greedy and self.timesup:
673 raise TimeLimit

/Library/Python/2.6/site-packages/Whoosh-1.7.6-py2.6.egg/whoosh/searching.pyc
in pull_matches(self, matcher, usequality)
759 # required quality

760 if usequality and checkquality and self.minquality
is not None:
--> 761 matcher.skip_to_quality(self.minquality)
762 # Skipping ahead might have moved the matcher
to the end of the

763 # posting list


/Library/Python/2.6/site-packages/Whoosh-1.7.6-py2.6.egg/whoosh/matching.pyc
in skip_to_quality(self, minquality)
981 else:
982 skipped += b.skip_to_quality(minquality - aq)
--> 983 if a.id() != b.id():
984 self._find_next()
985 aq = a.block_quality()

/Library/Python/2.6/site-packages/Whoosh-1.7.6-py2.6.egg/whoosh/filedb/filepostings.pyc
in id(self)
150
151 def id(self):
--> 152 return self.block.ids[self.i]
153
154 def items_as(self, astype):

IndexError: array index out of range

Matt Chaput

unread,
Mar 4, 2011, 10:20:15 AM3/4/11
to who...@googlegroups.com
On 04/03/2011 10:11 AM, Jeremy Slater wrote:
> I am using whoosh 1.7.6. I have a fairly large index, 2+ million
> entries, ~250MB. I have one particular search that fails with an
> IndexError, see below.

Filed as issue #121. I'm going to try to get caught up and release a
version with recent bug fixes this weekend.

https://bitbucket.org/mchaput/whoosh/issue/121/

Thanks!

Matt

Reply all
Reply to author
Forward
0 new messages