Montezuma 0.1.3a "term out of order" error

3 views
Skip to first unread message

Yoni Rabkin

unread,
Feb 23, 2009, 7:54:03 AM2/23/09
to montezuma-dev
Hello,

I'm on SBCL 1.0.25 with Montezuma 0.1.3a.

Previously (version 0.1.1), I've been successfully indexed and
searched my 5000+ document corpus. With version 0.1.3a I'm receiving
the following error (the first "none" is terminated by #x0, and the
second "none" by two):

term out of order: #S(MONTEZUMA::TERM
:FIELD #1="link"
:TEXT "none") < #S(MONTEZUMA::TERM
:FIELD #1#
:TEXT "none")
[Condition of type SIMPLE-ERROR]

Restarts:
0: [RETRY] Retry SLIME REPL evaluation request.
1: [ABORT] Return to SLIME's top level.
2: [TERMINATE-THREAD] Terminate this thread (#<THREAD "repl-thread"
RUNNING {AB8CE69}>)

Backtrace:
0: ((SB-PCL::FAST-METHOD MONTEZUMA::ADD-TERM (MONTEZUMA::TERM-INFOS-
WRITER T T)) ..)
Locals:
SB-DEBUG::ARG-0 = #(0 NIL 6 NIL 2 NIL ...)
SB-DEBUG::ARG-1 = :<NOT-AVAILABLE>
SB-DEBUG::ARG-2 = #<MONTEZUMA::TERM-INFOS-WRITER {12D076B1}>
SB-DEBUG::ARG-3 = #S(MONTEZUMA::TERM :FIELD "link" :TEXT
"none")
SB-DEBUG::ARG-4 = #<MONTEZUMA::TERM-INFO
df=1:fp=50991:pp=109093:so=2 {1140D9E9}>
1: ((SB-PCL::FAST-METHOD MONTEZUMA::MERGE-TERM-INFOS
(MONTEZUMA::SEGMENT-MERGER)) #(8 NIL 3 NIL) #<unavailable argument>
#<MONTEZUMA::SEGMENT-MERGER {1140D951}>)
Locals:
SB-DEBUG::ARG-0 = #(8 NIL 3 NIL)
SB-DEBUG::ARG-1 = :<NOT-AVAILABLE>
SB-DEBUG::ARG-2 = #<MONTEZUMA::SEGMENT-MERGER {1140D951}>
2: ((SB-PCL::FAST-METHOD MONTEZUMA::MERGE-TERMS (MONTEZUMA::SEGMENT-
MERGER)) #(0 NIL 4 NIL 5 NIL ...) #<unavailable argument>
#<MONTEZUMA::SEGMENT-MERGER {1140D951}>)
Locals:
SB-DEBUG::ARG-0 = #(0 NIL 4 NIL 5 NIL ...)
SB-DEBUG::ARG-1 = :<NOT-AVAILABLE>
SB-DEBUG::ARG-2 = #<MONTEZUMA::SEGMENT-MERGER {1140D951}>
3: ((SB-PCL::FAST-METHOD MONTEZUMA::MERGE (MONTEZUMA::SEGMENT-
MERGER)) #(4 NIL) #<unavailable argument> #<MONTEZUMA::SEGMENT-MERGER
{1140D951}>)
Locals:
SB-DEBUG::ARG-0 = #(4 NIL)
SB-DEBUG::ARG-1 = :<NOT-AVAILABLE>
SB-DEBUG::ARG-2 = #<MONTEZUMA::SEGMENT-MERGER {1140D951}>
4: ((SB-PCL::FAST-METHOD MONTEZUMA::MERGE-SEGMENTS (MONTEZUMA:INDEX-
WRITER T)) #(0 NIL 12 NIL 11 NIL ...) #<unused argument>
#<MONTEZUMA:INDEX-WRITER {A916BA1}> 8 #<unavailable argument>)
Locals:
SB-PCL::.PV. = #(0 NIL 12 NIL 11 NIL ...)
MONTEZUMA::MAX-SEGMENT = :<NOT-AVAILABLE>
MONTEZUMA::MAX-SEGMENT-SUPPLIED-P = NIL
MONTEZUMA::MIN-SEGMENT = 8
MONTEZUMA::SELF = #<MONTEZUMA:INDEX-WRITER {A916BA1}>
5: ((SB-PCL::FAST-METHOD MONTEZUMA::MAYBE-MERGE-SEGMENTS
(MONTEZUMA:INDEX-WRITER)) #(6 NIL 4 NIL 5 NIL ...) #<unavailable
argument> #<MONTEZUMA:INDEX-WRITER {A916BA1}>)
Locals:
SB-DEBUG::ARG-0 = #(6 NIL 4 NIL 5 NIL ...)
SB-DEBUG::ARG-1 = :<NOT-AVAILABLE>
SB-DEBUG::ARG-2 = #<MONTEZUMA:INDEX-WRITER {A916BA1}>
6: ((SB-PCL::FAST-METHOD MONTEZUMA:ADD-DOCUMENT-TO-INDEX
(MONTEZUMA:INDEX T)) #(6 NIL 8 NIL 2 NIL ...) #<unused argument>
#<MONTEZUMA::INDEX-CORRAL {F354159}> #<unavailable argument> NIL)
Locals:
SB-PCL::.PV. = #(6 NIL 8 NIL 2 NIL ...)
MONTEZUMA:ANALYZER = NIL
MONTEZUMA:DOC = :<NOT-AVAILABLE>
MONTEZUMA::SELF = #<MONTEZUMA::INDEX-CORRAL {F354159}>

Leslie P. Polzer

unread,
Feb 23, 2009, 11:33:29 AM2/23/09
to montezuma-dev
> term out of order: #S(MONTEZUMA::TERM
>                       :FIELD #1="link"
>                       :TEXT "none") < #S(MONTEZUMA::TERM
>                                           :FIELD #1#
>                                           :TEXT "none")
>    [Condition of type SIMPLE-ERROR]

I couldn't reproduce that, are you able to isolate a test case
or send me your corpus?


>         MONTEZUMA::SELF = #<MONTEZUMA::INDEX-CORRAL {F354159}>

Any special things in that index class?

Leslie P. Polzer

unread,
Feb 23, 2009, 1:41:49 PM2/23/09
to montezuma-dev
And another question, does this occur with 0.1.2a, too?

Yoni Rabkin

unread,
Feb 24, 2009, 2:38:06 PM2/24/09
to montezuma-dev
On Feb 23, 8:41 pm, "Leslie P. Polzer" <s...@viridian-project.de>
wrote:
> And another question, does this occur with 0.1.2a, too?

Good question. Does 0.1.2 appear on [http://code.google.com/p/
montezuma/downloads/list]? I can't find it there.

Yoni Rabkin

unread,
Feb 24, 2009, 2:42:38 PM2/24/09
to montezuma-dev
Thank you google for breaking the URL...

Yoni Rabkin

unread,
Feb 24, 2009, 2:52:21 PM2/24/09
to montezuma-dev
On Feb 23, 6:33 pm, "Leslie P. Polzer" <s...@viridian-project.de>
wrote:
> > term out of order: #S(MONTEZUMA::TERM
> >                       :FIELD #1="link"
> >                       :TEXT "none") < #S(MONTEZUMA::TERM
> >                                           :FIELD #1#
> >                                           :TEXT "none")
> >    [Condition of type SIMPLE-ERROR]
>
> I couldn't reproduce that, are you able to isolate a test case
> or send me your corpus?

I'll post something here if I manage to isolate it with vanilla
montezuma.

> >         MONTEZUMA::SELF = #<MONTEZUMA::INDEX-CORRAL {F354159}>
>
> Any special things in that index class?

Yes, but the same special things that don't break 0.1.1. I'll try to
use my own
http://yrk.livejournal.com/235234.html montezuma-indexfiles package to
try and isolate the bug on an unmodified version.

Leslie P. Polzer

unread,
Feb 24, 2009, 4:03:19 PM2/24/09
to montezuma-dev
On Feb 24, 8:38 pm, Yoni Rabkin <yonirab...@gmail.com> wrote:

> Good question. Does 0.1.2 appear on [http://code.google.com/p/
> montezuma/downloads/list]? I can't find it there.

No, you need to check out the tag directly:

svn checkout http://montezuma.googlecode.com/svn/tags/release-candidate-0.1.2a

Yoni Rabkin

unread,
Feb 24, 2009, 4:57:52 PM2/24/09
to montezuma-dev
On Feb 24, 11:03 pm, "Leslie P. Polzer" <s...@viridian-project.de>
wrote:
With the exact same corpus and my modified code (sorry, didn't get
around to indexing with a vanilla montezuma yet) I get the out-of-term
error only with version 0.1.3a.

For good measure, I cleaned out all fasls for all of the dependencies
between the two tests.

I'll post more detailed info if I manage to isolate some more useful
info.

Leslie P. Polzer

unread,
Feb 25, 2009, 3:35:56 AM2/25/09
to montez...@googlegroups.com

> I'll post more detailed info if I manage to isolate some more useful
> info.

Thanks, you're being very helpful. It's especially good to know
that 0.1.3a is responsible.

Yoni Rabkin

unread,
Feb 25, 2009, 11:06:17 AM2/25/09
to montezuma-dev
> It's especially good to know that 0.1.3a is responsible.

I broke the Reuters corpus into 18,000+ files (1 per report) and
0.1.3a indexed it fine. My conclusion is that Montezuma indexing is
currently undefined if the corpus isn't ASCII clean.

Next I'll try to "poison" the Reuters corpus and try to re-create the
term-out-of-order bug.

Leslie P. Polzer

unread,
Mar 19, 2009, 4:13:05 PM3/19/09
to montezuma-dev
Yoni,

can you try this patch, please:

diff --git a/lib/src/montezuma/src/util/strings.lisp b/lib/src/
montezuma/src/util/strings.lisp
index 5cc6f1f..91cf611 100644
--- a/lib/src/montezuma/src/util/strings.lisp
+++ b/lib/src/montezuma/src/util/strings.lisp
@@ -12,9 +12,12 @@
s))


-(defun string-to-bytes (string &key (start 0) (end (length string)))
+(defun string-to-bytes (string &key (start 0) end)
"Converts a string to a sequence of bytes (unsigned-byte 8) using
the implementation's default character encoding."
+ (let ((s (sb-ext:string-to-octets string)))
+ (subseq s start (or end (length s))))
+ #+(or)
(let ((s (subseq string start end)))
(sb-ext:string-to-octets s))
#+(or)

Yoni Rabkin

unread,
Mar 25, 2009, 7:43:24 AM3/25/09
to montezuma-dev
On Mar 19, 10:13 pm, "Leslie P. Polzer" <s...@viridian-project.de>
wrote:

>     (let ((s (subseq string start end)))
>       (sb-ext:string-to-octets s))
>     #+(or)

Sorry for the delay.

A couple of things: the first is that I don't completely understand
the above patch. 0.1.3a of Montezuma calls babel, not sb-ext. What
version of the code is that patch against?

More importantly I've since updated and recompiled all of the
dependencies and can now successfully index and search Hebrew with
0.1.3a with no errors (and without the above patch). Perhaps the "term
out of order" error was a result of a slightly incompatible version of
a library? This isn't a very satisfying resolution to the bug, but
once I start integrating the Hebrew index and search into my work
project I might know more (the loads on the system will be a lot
higher).

The the benefit of posterity here are the library versions that worked
for me:
sbcl 1.0.25
montezuma-0.1.3a
cl-ppcre-2.0.1
cl-fad-0.6.2
babel_0.3.0

Leslie P. Polzer

unread,
Mar 25, 2009, 10:33:02 AM3/25/09
to montez...@googlegroups.com

> On Mar 19, 10:13 pm, "Leslie P. Polzer" <s...@viridian-project.de>
> wrote:
>
>>     (let ((s (subseq string start end)))
>>       (sb-ext:string-to-octets s))
>>     #+(or)
>
> Sorry for the delay.
>
> A couple of things: the first is that I don't completely understand
> the above patch. 0.1.3a of Montezuma calls babel, not sb-ext. What
> version of the code is that patch against?

Oh, sorry. I have an older version of that code around here
that still uses SB-EXT. Just replace sb-ext with babel and you
should be fine.


> More importantly I've since updated and recompiled all of the
> dependencies and can now successfully index and search Hebrew with
> 0.1.3a with no errors (and without the above patch). Perhaps the "term
> out of order" error was a result of a slightly incompatible version of
> a library? This isn't a very satisfying resolution to the bug, but
> once I start integrating the Hebrew index and search into my work
> project I might know more (the loads on the system will be a lot
> higher).

I'm still suspicious. Well, we'll see what you get. :)

Yoni Rabkin

unread,
Mar 26, 2009, 1:51:20 PM3/26/09
to montezuma-dev
It works! I made sure to clean out fasls and start with a clean Lisp
image before each indexing and am getting the "term out of order"
error only *without* the patch. Library versions remain as I posted
above.

> Oh, sorry. I have an older version of that code around here
> that still uses SB-EXT. Just replace sb-ext with babel and you
> should be fine.

I manually entered the code instead. I don't understand the #'(or)
conditional compilation thingy so I might have wrote it wrong (but hey
it works). My string-to-bytes now looks like this:

(defun string-to-bytes (string &key (start 0) end)
"Converts a string to a sequence of bytes (unsigned-byte 8) using
the implementation's default character encoding."
(let ((s (sb-ext:string-to-octets string)))
(subseq s start (or end (length s))))
#+(or)
(let ((s (subseq string start end)))
(sb-ext:string-to-octets s))
#+(or)
(let ((s (subseq string start end)))
(babel:string-to-octets s)))

Leslie P. Polzer

unread,
Mar 26, 2009, 2:46:05 PM3/26/09
to montez...@googlegroups.com

> It works! I made sure to clean out fasls and start with a clean Lisp
> image before each indexing and am getting the "term out of order"
> error only *without* the patch. Library versions remain as I posted
> above.

Great, I'm going to release the beta version with the fix soon then!

Reply all
Reply to author
Forward
0 new messages