The problem was that the loop in `read-term-vector' expects the
"common" part of the sequence from the previous term to be available
in the buffer when it reads the new term.
The fix works for me. If it makes sense I'll post a test as well.
P.S. Perhaps a more comprehensive fix would take into account the
FIXME note for `read-char'.
On Oct 26, 10:19 pm, Yoni Rabkin <yonirab...@gmail.com> wrote:
> I ran into another UTF-8 encoding bug and wrote a one-liner fix for
> it:http://montezuma-dev.googlegroups.com/web/fix-carry-over-bug.patch.
> The more octets used to encode characters, the more likely this bug is
> to happen (and is therefore rare in plain English).
I'm happy you manage to dig out so many hidden multi-byte character
bugs. :)
> The problem was that the loop in `read-term-vector' expects the
> "common" part of the sequence from the previous term to be available
> in the buffer when it reads the new term.
Is it never available (as I would guess from your patch) or only
sometimes?
> P.S. Perhaps a more comprehensive fix would take into account the
> FIXME note for `read-char'.
What note are you referring to? Do you mean READ-CHARS by any chance?
> On Oct 26, 10:19 pm, Yoni Rabkin <yonirab...@gmail.com> wrote:
> > I ran into another UTF-8 encoding bug and wrote a one-liner fix for
> > it:http://montezuma-dev.googlegroups.com/web/fix-carry-over-bug.patch.
> > The more octets used to encode characters, the more likely this bug is
> > to happen (and is therefore rare in plain English).
> I'm happy you manage to dig out so many hidden multi-byte character
> bugs. :)
> > The problem was that the loop in `read-term-vector' expects the
> > "common" part of the sequence from the previous term to be available
> > in the buffer when it reads the new term.
> Is it never available (as I would guess from your patch) or only
> sometimes?
When using the original 15-long buffer the content mattered or
not depending on the value of START, that is, how much in common
the current term had with the previous term.
But when the 15-long buffer wasn't long enough the code created a
completely new buffer, and didn't copy over any "common" parts
from the smaller one. So in this case the common parts where
never available.
> > P.S. Perhaps a more comprehensive fix would take into account the
> > FIXME note for `read-char'.
> What note are you referring to? Do you mean READ-CHARS by any chance?
My apologies for being so vague; Yes, I mean `read-chars' from line 50
of
store/index-io.lisp.
(I tend to agree with the FIXME note there. Forcing callers to
use the return value seems unlispy to me.)
On Oct 29, 6:00 pm, Yoni Rabkin <yonirab...@gmail.com> wrote:
> I definitely don't get that with the patch installed. Can you post the
> rest of the "evaluated to ..." The null chars cause the copy-paste to
> truncate.
The output is actually truncated at the console already.
A print form of the string in (aref (terms tv) 1) after putting it
through string-to-octets
yields:
That's the exact output I would expect without the term-vectors-
io.lisp patch. Could you please post your version of (defmethod read-
term-vector ((self term-vectors-reader) ...) from term-vectors-
io.lisp?
On Oct 30, 3:53 pm, Yoni Rabkin <yonirab...@gmail.com> wrote:
> That's the exact output I would expect without the term-vectors-
> io.lisp patch. Could you please post your version of (defmethod read-
> term-vector ((self term-vectors-reader) ...) from term-vectors-
> io.lisp?
Committed as r420 after having resolved the problem on #lisp.