Maybe a bug with large index

19 views
Skip to first unread message

thecat

unread,
Dec 16, 2009, 4:38:48 PM12/16/09
to montezuma-dev
Hello;
I copy here a stack trace of an error which occur in my software.
I try to index a _lot_ of small text (on sentence or few words). This
take a while on my machine so I let it run the whole night. In the
morning I get the stack trace below.
Note that two files in the index are around 2Gb (so maybe I suspect a
2Gb limit somewhere ?)

(I do not know yet if the problem come really from montezuma or from
something else ...)

Thank you to work on this great project.
Christophe.

canard@linux-4u9g:~/devel/lisp/dbpedia/fulltextindex> ls -lh
total 4.1G
-rw-r--r-- 1 canard users 98K 2009-12-16 00:59 _2dqe.f0
-rw-r--r-- 1 canard users 98K 2009-12-16 00:59 _2dqe.f1
-rw-r--r-- 1 canard users 3.9M 2009-12-16 00:44 _2dqe.fdt
-rw-r--r-- 1 canard users 782K 2009-12-16 00:44 _2dqe.fdx
-rw-r--r-- 1 canard users 14 2009-12-16 00:44 _2dqe.fnm
-rw-r--r-- 1 canard users 2.1G 2009-12-16 00:59 _2dqe.frq
-rw-r--r-- 1 canard users 556K 2009-12-16 00:59 _2dqe.prx
-rw-r--r-- 1 canard users 18K 2009-12-16 00:59 _2dqe.tii
-rw-r--r-- 1 canard users 1.4M 2009-12-16 00:59 _2dqe.tis
-rw-r--r-- 1 canard users 2.1G 2009-12-16 01:18 _2dqe.tmp
-rw-r--r-- 1 canard users 4 2009-12-16 00:59 deletable
-rw-r--r-- 1 canard users 30 2009-12-16 00:59 segments

Argument Y is not a NUMBER: NIL
[Condition of type SIMPLE-TYPE-ERROR]

Restarts:
0: [ABORT] Return to SLIME's top level.
1: [ABORT] Exit debugger, returning to top level.

Backtrace:
0: (SB-KERNEL:TWO-ARG-= 2147483648 NIL)
1: ("no debug information for frame")
2: ((SB-PCL::FAST-METHOD MONTEZUMA::READ-BYTES (MONTEZUMA::BUFFERED-
INDEX-INPUT T T T)) ..)
Locals:
SB-DEBUG::ARG-0 = #(3 NIL 4 NIL 1 NIL ...)
SB-DEBUG::ARG-1 = :<NOT-AVAILABLE>
SB-DEBUG::ARG-2 = #<MONTEZUMA::FS-INDEX-INPUT {C648B01}>
SB-DEBUG::ARG-3 = #(16 16 16 16 16 16 ...)
SB-DEBUG::ARG-4 = 0
SB-DEBUG::ARG-5 = 1024
3: ((SB-PCL::FAST-METHOD MONTEZUMA::COPY-FILE (MONTEZUMA::COMPOUND-
FILE-WRITER T T)) ..)
Locals:
SB-DEBUG::ARG-0 = #(0 NIL)
SB-DEBUG::ARG-1 = :<NOT-AVAILABLE>
SB-DEBUG::ARG-2 = #<MONTEZUMA::COMPOUND-FILE-WRITER {C4B43F9}>
SB-DEBUG::ARG-3 = #<MONTEZUMA::COMPOUND-FILE-WRITER-FILE-ENTRY
{C4B4D99}>
SB-DEBUG::ARG-4 = #<MONTEZUMA::FS-INDEX-OUTPUT {C4B4FD9}>
4: ((SB-PCL::FAST-METHOD MONTEZUMA:CLOSE (MONTEZUMA::COMPOUND-FILE-
WRITER)) #(0 NIL 3 NIL 1 NIL ...) #<unavailable argument>
#<MONTEZUMA::COMPOUND-FILE-WRITER {C4B43F9}>)
Locals:
SB-DEBUG::ARG-0 = #(0 NIL 3 NIL 1 NIL ...)
SB-DEBUG::ARG-1 = :<NOT-AVAILABLE>
SB-DEBUG::ARG-2 = #<MONTEZUMA::COMPOUND-FILE-WRITER {C4B43F9}>
5: ((SB-PCL::FAST-METHOD MONTEZUMA::CREATE-COMPOUND-FILE
(MONTEZUMA::SEGMENT-MERGER T)) #(0 NIL 4 NIL 1 NIL) #<unavailable
argument> #<MONTEZUMA::SEGMENT-MERGER {C621351}> "_2dqe.tmp")
Locals:
SB-DEBUG::ARG-0 = #(0 NIL 4 NIL 1 NIL)
SB-DEBUG::ARG-1 = :<NOT-AVAILABLE>
SB-DEBUG::ARG-2 = #<MONTEZUMA::SEGMENT-MERGER {C621351}>
SB-DEBUG::ARG-3 = "_2dqe.tmp"
6: ((SB-PCL::FAST-METHOD MONTEZUMA::MERGE-SEGMENTS (MONTEZUMA:INDEX-
WRITER T)) #(0 NIL 12 NIL 11 NIL ...) #<unused argument>
#<MONTEZUMA:INDEX-WRITER {B25A209}> 0 #<unavailable argument>)
Locals:
SB-PCL::.PV. = #(0 NIL 12 NIL 11 NIL ...)
MONTEZUMA::MAX-SEGMENT = :<NOT-AVAILABLE>
MONTEZUMA::MAX-SEGMENT-SUPPLIED-P = NIL
MONTEZUMA::MIN-SEGMENT = 0
MONTEZUMA::SELF = #<MONTEZUMA:INDEX-WRITER {B25A209}>
7: ((SB-PCL::FAST-METHOD MONTEZUMA::MAYBE-MERGE-SEGMENTS
(MONTEZUMA:INDEX-WRITER)) #(6 NIL 4 NIL 5 NIL ...) #<unavailable
argument> #<MONTEZUMA:INDEX-WRITER {B25A209}>)
Locals:
SB-DEBUG::ARG-0 = #(6 NIL 4 NIL 5 NIL ...)
SB-DEBUG::ARG-1 = :<NOT-AVAILABLE>
SB-DEBUG::ARG-2 = #<MONTEZUMA:INDEX-WRITER {B25A209}>
8: ((SB-PCL::FAST-METHOD MONTEZUMA:ADD-DOCUMENT-TO-INDEX
(MONTEZUMA:INDEX T)) #(6 NIL 8 NIL 2 NIL ...) #<unused argument>
#<MONTEZUMA:INDEX {AAD8F21}> #<unavailable argument> NIL)
Locals:
SB-PCL::.PV. = #(6 NIL 8 NIL 2 NIL ...)
MONTEZUMA:ANALYZER = NIL
MONTEZUMA:DOC = :<NOT-AVAILABLE>
MONTEZUMA::SELF = #<MONTEZUMA:INDEX {AAD8F21}>
9: (RESOLVE-OBJECT-ID "\"A1 volleyball league (Portugal)\"@en")
10: (LOAD-NTZ-FILE "/home/canard/devel/lisp/data/
articles_label_en.nt.gz")

Michael McDermott

unread,
Apr 5, 2010, 3:16:39 PM4/5/10
to montez...@googlegroups.com
I have been hitting this same problem (running off of the latest
subversion checkout, SBCL 1.0.36, Ubuntu) with a large number of
documents (mostly paragraph length), despite having an index with
files no larger than 12M. What I found was that indexing in larger
chunks, thereby decreasing the quantity of documents, worked. I intend
to keep playing around with this, but does anyone have any ideas what
could be done to fix it?

Yoni Rabkin

unread,
Apr 5, 2010, 3:56:54 PM4/5/10
to montezuma-dev
> but does anyone have any ideas what could be done to fix it?

Nothing obvious until someone finds time to dive into /store/buffered-
index-io.lisp and track down the bug.

Reply all
Reply to author
Forward
0 new messages