Bug#663189: buffer overflow in python-pyfribidi

Ralf Schmitt

unread,

Mar 9, 2012, 4:20:02 AM3/9/12

to

Package: python-pyfribidi
Architecture: i386
Source: pyfribidi
Version: 0.10.0-2

There's a buffer overflow in pyfribidi:

# python2.6 -c 'import pyfribidi; pyfribidi.log2vis(unichr(0x10000)*5)'
Segmentation fault

The reason is the following (see
https://github.com/pediapress/pyfribidi/issues/2):

fribidi_utf8_to_unicode consumes at most 3 bytes for a single unicode
character, i.e. it does not handle unicode character above 0xffff. For a
4 byte utf-8 sequence it will generate 2 unicode characters, which
overflows the logical buffer.

It's fixed with
https://github.com/pediapress/pyfribidi/commit/d2860c655357975e7b32d84e6b45e98f0dcecd7a
(or with pyfribidi 0.11 from pypi)

IMHO the issue is security relevant.

--
Cheers
Ralf

--
To UNSUBSCRIBE, email to debian-bugs-...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org

Jakub Wilk

unread,

Mar 9, 2012, 6:40:01 AM3/9/12

to

severity 663189 grave
tags 663189 + confirmed security
thanks

* Ralf Schmitt <ra...@systemexit.de>, 2012-03-09, 10:11:

># python2.6 -c 'import pyfribidi; pyfribidi.log2vis(unichr(0x10000)*5)'
>Segmentation fault
>
>The reason is the following (see
>https://github.com/pediapress/pyfribidi/issues/2):
>
>fribidi_utf8_to_unicode consumes at most 3 bytes for a single unicode
>character, i.e. it does not handle unicode character above 0xffff.

As far as I can see this is not true. In Debian, we allocate 4 bytes per
characters. (An upstream version, which the Debian package is based on,
is completely broken in this respect: it allocates a buffer of static
size. See bug #570068)

>For a 4 byte utf-8 sequence it will generate 2 unicode characters,
>which overflows the logical buffer.

I'm confused. What is "it" in your sentence? Why 2 Unicode characters?

Anyway I tried to double the buffer size (8 bytes per characters of
original string) but this didn't fix the crash. So likely the problem
lies somewhere else.

--
Jakub Wilk

Ralf Schmitt

unread,

Mar 9, 2012, 7:00:01 AM3/9/12

to

Jakub Wilk <jw...@debian.org> writes:

>>The reason is the following (see
>>https://github.com/pediapress/pyfribidi/issues/2):
>>
>> fribidi_utf8_to_unicode consumes at most 3 bytes for a single
>> unicode character, i.e. it does not handle unicode character above
>> 0xffff.
>
> As far as I can see this is not true. In Debian, we allocate 4 bytes
> per characters. (An upstream version, which the Debian package is
> based on, is completely broken in this respect: it allocates a buffer
> of static size. See bug #570068)

upstream is pretty much dead in this case. I've published our version on
PyPI. However, I didn't ask or inform the original authors about that.

>
>> For a 4 byte utf-8 sequence it will generate 2 unicode characters,
>> which overflows the logical buffer.
>
> I'm confused. What is "it" in your sentence? Why 2 Unicode characters?

"it" refers to the 4 byte utf-8 sequence.

here's the inner loop of "fribidi_utf8_to_unicode" from
fribidi-char-sets-utf8.c:

,----
| length = 0;
| while ((FriBidiStrIndex) (s - t) < len)
| {
| register unsigned char ch = *s;
| if (ch <= 0x7f) /* one byte */
| {
| *us++ = *s++;
| }
| else if (ch <= 0xdf) /* 2 byte */
| {
| *us++ = ((*s & 0x1f) << 6) + (*(s + 1) & 0x3f);
| s += 2;
| }
| else /* 3 byte */
| {
| *us++ =
| ((int) (*s & 0x0f) << 12) +
| ((*(s + 1) & 0x3f) << 6) + (*(s + 2) & 0x3f);
| s += 3;
| }
| length++;
| }
`----

Assume you have a 4-byte utf-8 sequence. One loop step consumes a maximum of
3 bytes of that 4-byte sequence (there's no "4 byte" case), leaving
1-byte of that sequence for further processing. this 1 byte will
generate another unicode character. pyfribidi uses the length of the
python unicode string as buffer size, which is less than what the
fribidi_utf8_to_unicode generates. and there you have your buffer
overflow.

to confirm the issue, you can add an assert and check that
fribidi_utf8_to_unicode's return value (the length of the string) equals
unicode_length.

>
> Anyway I tried to double the buffer size (8 bytes per characters of
> original string) but this didn't fix the crash. So likely the problem
> lies somewhere else.

I'm pretty sure my analysis is correct and I'm not so quite sure what
you did here.

--
Cheers
Ralf

Jakub Wilk

unread,

Mar 9, 2012, 7:00:01 AM3/9/12

to

* Ralf Schmitt <ra...@systemexit.de>, 2012-03-09, 10:11:

>It's fixed with
>https://github.com/pediapress/pyfribidi/commit/d2860c655357975e7b32d84e6b45e98f0dcecd7a
>(or with pyfribidi 0.11 from pypi)

Right, 0.11 on pypi looks much saner than the current one. Thanks.

--
Jakub Wilk

Jakub Wilk

unread,

Mar 9, 2012, 8:10:02 AM3/9/12

to

* Ralf Schmitt <ra...@systemexit.de>, 2012-03-09, 12:49:

>>>fribidi_utf8_to_unicode consumes at most 3 bytes for a single unicode
>>>character, i.e. it does not handle unicode character above 0xffff.

Now I woke up I finally understand what you meant here. :) Sorry for the
noise.

>here's the inner loop of "fribidi_utf8_to_unicode" from
>fribidi-char-sets-utf8.c:
>
>,----
>| length = 0;
>| while ((FriBidiStrIndex) (s - t) < len)
>| {
>| register unsigned char ch = *s;
>| if (ch <= 0x7f) /* one byte */
>| {
>| *us++ = *s++;
>| }
>| else if (ch <= 0xdf) /* 2 byte */
>| {
>| *us++ = ((*s & 0x1f) << 6) + (*(s + 1) & 0x3f);
>| s += 2;
>| }
>| else /* 3 byte */
>| {
>| *us++ =
>| ((int) (*s & 0x0f) << 12) +
>| ((*(s + 1) & 0x3f) << 6) + (*(s + 2) & 0x3f);
>| s += 3;
>| }
>| length++;
>| }
>`----

Ugh. That's so broken...

--
Jakub Wilk

أحمد المحمودي‎

unread,

Mar 10, 2012, 4:40:02 AM3/10/12

to

On Fri, Mar 09, 2012 at 12:49:16PM +0100, Jakub Wilk wrote:
> Right, 0.11 on pypi looks much saner than the current one. Thanks.

---end quoted text---

The package is ready at:
http://mentors.debian.net/debian/pool/main/p/pyfribidi/pyfribidi_0.11.0-1.dsc

--
‎أحمد المحمودي (Ahmed El-Mahmoudy)
Digital design engineer
GPG KeyID: 0xEDDDA1B7
GPG Fingerprint: 8206 A196 2084 7E6D 0DF8 B176 BC19 6A94 EDDD A1B7

signature.asc