editor fails on cyrillic symbols

27 views
Skip to first unread message

Nikita Egorov

unread,
Dec 12, 2016, 9:36:27 AM12/12/16
to fltk.coredev
Hi, guys!

I'm again facing with old problem: debug assertion window popups when cyrillic symbols are used in Fl_Text_Editor. [Win7, VS2015 Debug build]

A simple sequence to show this error: start test/editord.exe, copy 'oй' and paste into editor, press Ctrl+Left (previous word, etc) and the application fails!

IIRC, it was Albrecht who fixed previous cases with this annoying bug. I'm afraid, the error needs some more complex solution, otherwise it will appear again...

Nikita Egorov
editor_fails.png

Albrecht Schlosser

unread,
Dec 12, 2016, 12:35:05 PM12/12/16
to fltkc...@googlegroups.com
On 12.12.2016 15:36 Nikita Egorov wrote:
> Hi, guys!
>
> I'm again facing with old problem: debug assertion window popups when
> cyrillic symbols are used in Fl_Text_Editor. [Win7, VS2015 Debug build]
>
> A simple sequence to show this error: start test/editord.exe, copy 'oй'
> and paste into editor, press Ctrl+Left (previous word, etc) and the
> application fails!

Yep, confirmed :-(

> IIRC, it was Albrecht who fixed previous cases with this annoying bug.

I don't remember. Can you give me a hint which bug this was and when it
was fixed?

> I'm afraid, the error needs some more complex solution, otherwise it
> will appear again...

You found a bug where a Microsoft assert() fails. BTW: a Release build
does not crash, there is supposedly no assert() included.

The code is indeed faulty because it is not yet fully adapted to UTF-8:

static inline int fl_isseparator(unsigned int c) {
// FIXME: this does not take UCS-4 encoding into account

I'm afraid there can be more such issues hidden in the code. :-(


However, here's a simple patch that works for me:

diff --git a/src/Fl_Text_Display.cxx b/src/Fl_Text_Display.cxx
index 59582b1..db85659 100644
--- a/src/Fl_Text_Display.cxx
+++ b/src/Fl_Text_Display.cxx
@@ -1581,7 +1581,7 @@ int Fl_Text_Display::rewind_lines(int startPos,
int nLines) {

static inline int fl_isseparator(unsigned int c) {
// FIXME: this does not take UCS-4 encoding into account
- return c != '$' && c != '_' && (isspace(c) || ispunct(c));
+ return c <= 255 && c != '$' && c != '_' && (isspace(c) || ispunct(c));
}


--- end of patch, cut here ---

Use 'patch -p1' to apply (note the two empty lines).

Please file a bug report (STR) against 1.3.4 with HIGH priority for
reference, even if this patch solves the issue for you, and please
confirm (here) if it works for you. Thanks.


Note 1: This minimal patch assumes that all word separators are in the
allowed range of isspace(), ispunct() and others which is "the value of
an unsigned char or EOF" according to the man page. Other Unicode word
separators (are there any?) would not be recognized with this patch, but
this is obviously better than to risk a crash.


Note 2: Behavior of isspace() etc. is not defined for values outside
this range, so a crash is one of many outcomes to be expected.


PS: Question to devs: is this worth a 1.3.5 release? I hope not, but a
crash is really, really bad. :-(

Ian MacArthur

unread,
Dec 12, 2016, 1:30:07 PM12/12/16
to fltkc...@googlegroups.com

On Mon Dec 12 2016 17:35:04, Albrecht Schlosser wrote:
>
> I'm afraid there can be more such issues hidden in the code. :-(


Yes - and I’m not sure how best we find them all. (Maybe Nikita will find them all the hard way!)

Nikita, do you happen to know what the “actual values” are for the glyphs that triggered the failure (I suppose I can look them up myself...)

I was wondering if there was some easy pattern we can use to trigger this failure elsewhere in the code?


>
> Note 1: This minimal patch assumes that all word separators are in the allowed range of isspace(), ispunct() and others which is "the value of an unsigned char or EOF" according to the man page. Other Unicode word separators (are there any?)

Yes, there are other Unicode code points higher up the range that serve as “spaces” of various types for other character sets beyond LGC. The same is true of ispunct() too, for that matter, there are other punctuations we would not catch here.


> would not be recognized with this patch, but this is obviously better than to risk a crash.

Indeed, I guess we need to strive to “fail safe” rather than crash. Wrong but still running is likely the best we can hope for!


>
> Note 2: Behavior of isspace() etc. is not defined for values outside this range, so a crash is one of many outcomes to be expected.


Yes: we probably could make our own isspace() function that tries to do the Right Thing, but I’m not sure it would be right... I’m envisaging something like our existing UTF8 support functions here, such as:

unsigned short XUtf8IsNonSpacing(unsigned int ucs);

et al (though not testing for the non-spacing case here of course!)

Beyond that, it is very likely that the various platform text engines (and pango, maybe even XFT/FC etc.) have a fairly robust idea about whether a code point is a space or not, so maybe we need to create some code to wrap the platform specific API’s and get at the “right” answer that way?


>
> PS: Question to devs: is this worth a 1.3.5 release? I hope not, but a crash is really, really bad. :-(


Hmm. I *want* to say no; but I suspect the right answer might be yes.
But I suppose we should try to find a few more of these cases first, just in case!




Albrecht Schlosser

unread,
Dec 12, 2016, 2:01:10 PM12/12/16
to fltkc...@googlegroups.com
On 12.12.2016 19:30 Ian MacArthur wrote:
>
> On Mon Dec 12 2016 17:35:04, Albrecht Schlosser wrote:
>>
>> I'm afraid there can be more such issues hidden in the code. :-(
>
>
> Yes - and I’m not sure how best we find them all. (Maybe Nikita will find them all the hard way!)
>
> Nikita, do you happen to know what the “actual values” are for the glyphs that triggered the failure (I suppose I can look them up myself...)

Since I happened to test this ...

It was the letter 'й' that triggered the assert() to fail.

Decimal value 1081, U+0439, "CYRILLIC SMALL LETTER SHORT I".

http://unicode.org/cldr/utility/character.jsp?a=0439

Every letter "above" ISO-8859-1 (U+0000 .. U+00FF) would do.

BTW: the assertion in my Visual Studio 2015 looks slightly different
although likely equivalent. See attached image.

Editor_Crash_Cyrillic.png

Nikita

unread,
Dec 12, 2016, 4:37:31 PM12/12/16
to fltkc...@googlegroups.com
12.12.2016 20:35, Albrecht Schlosser wrote:
> On 12.12.2016 15:36 Nikita Egorov wrote:
>
>> IIRC, it was Albrecht who fixed previous cases with this annoying bug.
>
> I don't remember. Can you give me a hint which bug this was and when
> it was fixed?
First time it was STR #2726 and I'm sure this trouble occured one more
time at least.
>
>> I'm afraid, the error needs some more complex solution, otherwise it
>> will appear again...
>
> You found a bug where a Microsoft assert() fails. BTW: a Release build
> does not crash, there is supposedly no assert() included.
Yes, I know. Linking with release of library is only way to get rid of
this crash...
>
> The code is indeed faulty because it is not yet fully adapted to UTF-8:
>
> static inline int fl_isseparator(unsigned int c) {
> // FIXME: this does not take UCS-4 encoding into account
>
> I'm afraid there can be more such issues hidden in the code. :-(

You are right, alas.

>
> However, here's a simple patch that works for me:
>
Yes, it works, but... (see below)

On 12.12.2016 19:30 Ian MacArthur wrote:

> (Maybe Nikita will find them all the hard way!)

All right, you brought this on yourself :)

1) Double click at any cyrillic word makes crash in Fl_Text_Editor, you
can test it with my previos example, even after Albrecht's patch.

2) The same action in Fl_Input gives the same result (inputd.exe helps you).

3) Trying to put cyrillic symbol after first @ in label makes crash too
(I used Fluid).

4) Trying to generate code in Fluid when fl file in in russian. E.g.
'Тест.fl'

In other words, I found places where isspace() (isalpha(), etc) is used
without mask ( & 255) and checked them. They are very suspicious ones.

Nikita Egorov
Reply all
Reply to author
Forward
0 new messages