A short fix for a buffer "carry over" bug

Yoni Rabkin

unread,

Oct 26, 2009, 5:19:13 PM10/26/09

to montezuma-dev

I ran into another UTF-8 encoding bug and wrote a one-liner fix for
it: http://montezuma-dev.googlegroups.com/web/fix-carry-over-bug.patch.
The more octets used to encode characters, the more likely this bug is
to happen (and is therefore rare in plain English).

The problem was that the loop in `read-term-vector' expects the
"common" part of the sequence from the previous term to be available
in the buffer when it reads the new term.

The fix works for me. If it makes sense I'll post a test as well.

P.S. Perhaps a more comprehensive fix would take into account the
FIXME note for `read-char'.

Leslie P. Polzer

unread,

Oct 27, 2009, 5:30:17 AM10/27/09

to montezuma-dev

On Oct 26, 10:19 pm, Yoni Rabkin <yonirab...@gmail.com> wrote:
> I ran into another UTF-8 encoding bug and wrote a one-liner fix for
> it:http://montezuma-dev.googlegroups.com/web/fix-carry-over-bug.patch.
> The more octets used to encode characters, the more likely this bug is
> to happen (and is therefore rare in plain English).

I'm happy you manage to dig out so many hidden multi-byte character
bugs. :)

> The problem was that the loop in `read-term-vector' expects the
> "common" part of the sequence from the previous term to be available
> in the buffer when it reads the new term.

Is it never available (as I would guess from your patch) or only
sometimes?

> P.S. Perhaps a more comprehensive fix would take into account the
> FIXME note for `read-char'.

What note are you referring to? Do you mean READ-CHARS by any chance?

Leslie

Yoni Rabkin

unread,

Oct 27, 2009, 6:17:46 AM10/27/09

to montezuma-dev

On Oct 27, 11:30 am, "Leslie P. Polzer" <s...@viridian-project.de>
wrote:

> On Oct 26, 10:19 pm, Yoni Rabkin <yonirab...@gmail.com> wrote:
>
> > I ran into another UTF-8 encoding bug and wrote a one-liner fix for
> > it:http://montezuma-dev.googlegroups.com/web/fix-carry-over-bug.patch.
> > The more octets used to encode characters, the more likely this bug is
> > to happen (and is therefore rare in plain English).
>
> I'm happy you manage to dig out so many hidden multi-byte character
> bugs. :)
>
> > The problem was that the loop in `read-term-vector' expects the
> > "common" part of the sequence from the previous term to be available
> > in the buffer when it reads the new term.
>
> Is it never available (as I would guess from your patch) or only
> sometimes?

When using the original 15-long buffer the content mattered or
not depending on the value of START, that is, how much in common
the current term had with the previous term.

But when the 15-long buffer wasn't long enough the code created a
completely new buffer, and didn't copy over any "common" parts
from the smaller one. So in this case the common parts where
never available.

> > P.S. Perhaps a more comprehensive fix would take into account the
> > FIXME note for `read-char'.
>
> What note are you referring to? Do you mean READ-CHARS by any chance?

My apologies for being so vague; Yes, I mean `read-chars' from line 50
of
store/index-io.lisp.

(I tend to agree with the FIXME note there. Forcing callers to
use the return value seems unlispy to me.)

Leslie P. Polzer

unread,

Oct 27, 2009, 7:31:24 AM10/27/09

to montez...@googlegroups.com

Yoni Rabkin wrote:

> (I tend to agree with the FIXME note there. Forcing callers to
> use the return value seems unlispy to me.)

Yes, CLHS has enough precedence cases to make me agree with that.
Let's remove the FIXME and replace it with a check for the array
bounds.

The original patch looks appropriate now that I understand the
issue better.

Leslie

Yoni Rabkin

unread,

Oct 29, 2009, 3:24:11 AM10/29/09

to montezuma-dev

> Let's remove the FIXME and replace it with a check for the array
> bounds.
>
> The original patch looks appropriate now that I understand the
> issue better.

OK, how about these?

http://groups.google.com/group/montezuma-dev/web/test-fix-carry-over-bug.patch
http://groups.google.com/group/montezuma-dev/web/read-chars-add-bounds-check.patch

Leslie P. Polzer

unread,

Oct 29, 2009, 6:12:08 AM10/29/09

to montez...@googlegroups.com

Yoni Rabkin wrote:

> http://groups.google.com/group/montezuma-dev/web/test-fix-carry-over-bug.patch
> http://groups.google.com/group/montezuma-dev/web/read-chars-add-bounds-check.patch

Hmm, the test case fails for me even after rebuilding all object files:

;; MONTEZUMA::TEST-TERM-VECTOR-IO-ADD-FIELDS-BMP .........
WARNING: FAILURE:
Test MONTEZUMA::TEST-TERM-VECTOR-IO-ADD-FIELDS-OVERFLOW-BUFFER-10:
(AREF (MONTEZUMA::TERMS MONTEZUMA::TV)
1) evaluated to " RUNNING {A989C49}>:
Condition MONTEZUMA::TEST-FAILURE was signalled.

Yoni Rabkin

unread,

Oct 29, 2009, 1:00:11 PM10/29/09

to montezuma-dev

On Oct 29, 12:12 pm, "Leslie P. Polzer" <s...@viridian-project.de>
wrote:
> Yoni Rabkin wrote:
> >http://groups.google.com/group/montezuma-dev/web/test-fix-carry-over-...
> >http://groups.google.com/group/montezuma-dev/web/read-chars-add-bound...

>
> Hmm, the test case fails for me even after rebuilding all object files:
>
> ;; MONTEZUMA::TEST-TERM-VECTOR-IO-ADD-FIELDS-BMP .........
> WARNING: FAILURE:
> Test MONTEZUMA::TEST-TERM-VECTOR-IO-ADD-FIELDS-OVERFLOW-BUFFER-10:
> (AREF (MONTEZUMA::TERMS MONTEZUMA::TV)
> 1) evaluated to " RUNNING {A989C49}>:
> Condition MONTEZUMA::TEST-FAILURE was signalled.

I definitely don't get that with the patch installed. Can you post the
rest of the "evaluated to ..." The null chars cause the copy-paste to
truncate.

Leslie P. Polzer

unread,

Oct 30, 2009, 9:28:06 AM10/30/09

to montezuma-dev

On Oct 29, 6:00 pm, Yoni Rabkin <yonirab...@gmail.com> wrote:

> I definitely don't get that with the patch installed. Can you post the
> rest of the "evaluated to ..." The null chars cause the copy-paste to
> truncate.

The output is actually truncated at the console already.

A print form of the string in (aref (terms tv) 1) after putting it
through string-to-octets
yields:

#(0 0 0 0 0 0 112 115 101 117 100 111 104 121 112 111 112 97 114 97
116 104 121
114 111 105 100 105 115 109)

The exact form I put in before test-term-vector-io-add-fields-overflow-
buffer-10 is:

(pprint (babel:string-to-octets (aref (terms tv) 1)))

Hope that helps...

Yoni Rabkin

unread,

Oct 30, 2009, 10:53:18 AM10/30/09

to montezuma-dev

That's the exact output I would expect without the term-vectors-
io.lisp patch. Could you please post your version of (defmethod read-
term-vector ((self term-vectors-reader) ...) from term-vectors-
io.lisp?

Leslie P. Polzer

unread,

Oct 30, 2009, 11:08:42 AM10/30/09

to montezuma-dev

Committed as r420 after having resolved the problem on #lisp.

Reply all

Reply to author

Forward