On 19.03.2015 10:38 MacArthur, Ian (Selex ES, UK) wrote:
>
>> Someone sent me this on private email today..
>>
>> I don't have time to look into this; looks like an assertion
>> is triggered when double clicking on right-to-left text
>> (which I don't think FLTK supports?)
>
> We do kind of fake-up RTL text though, in a very limited way. *Very* limited.
>
> That said, I don't get this assert; but then I don't have the MS tools, I'm using mingw/msys.
>
>
> Nikego reports "It's well known bug of fltk" but I have to say this is the first time I heard of it.
>
> It only occurs in debug builds, I assume?
Reading the MS docs, I'd say yes. See more below.
> Nikego's analysis says:
>
>> (debug build for Windows) occuring inside isspace() function.
>>
>> _ASSERTE((unsigned)(c + 1) <= 256);
>>
>> The debug assert window arises if argument is less than zero.
>
> Which may be true, but my reading of isspace() (which may well be flawed) was that it recognises a (potentially locale specific) set of "whitespace" characters, so the signed-ness or otherwise of the parameter really ought not matter, it either is one of the "space" values or it is not...
The Linux man page says:
"These functions check whether c, which must have the value of an
unsigned char or EOF, falls into a certain character class according to
the current locale."
http://linux.die.net/man/3/isspace
Feeding arbitrary (parts of) UTF-8 characters [defined as (signed) char]
does certainly not satisfy this condition because they can be negative
and not EOF (which is most likely -1).
The man page does not mention what happens if the input is not one of
the accepted characters.
Whereas `man toupper' (which is related) says:
"If c is not an unsigned char value, or EOF, the behavior of these
functions is undefined."
> Do we need to find everywhere we call isspace et all and explicitly cast the char parameter to unsigned?
> That might be a bit of a pain (and potentially confusing on platforms where a char defaults to unsigned anyway.)
It's even worse than that. Strictly speaking we don't have a 'locale' we
can use, and we can't use all these functions as we do now by parsing
strings byte by byte (!), because we have UTF-8 strings and one
character can be more than one byte.
Here is one example of a handle() method that retrieves one "character"
in fluid/menutype.cxx:
int Shortcut_Button::handle(int e) {
...
int v = Fl::event_text()[0];
if ( (v > 32 && v < 0x7f) || (v > 0xa0 && v <= 0xff) ) {
...
// usage of isupper() and tolower() follows
As I see this, Fl::event_text()[0] can be the first byte of a UTF-8
character and as such it can be > 127 (unsigned), but event_text() is
'char *' (see FL/Fl.H):
static const char* event_text() {return e_text;}
Question: can the 2nd part of the 'or' expression
'(v > 0xa0 && v <= 0xff)'
ever be true if 'char' is signed?
This is only one example I picked, but unfortunately there are lots of
similar issues. These are remnants of the FLTK 1.1 code base, i.e. code
that is not (yet) fully ported to UTF-8.
>> For example, see attached image with crashed Fluid -
>> I have created button with label of two symbols - '@'
>> and any cyrillic symbol.
>
> Hmm, tried this; didn’t crash. What tools etc. do you build with?
AFAICT he used MS tools in Debug mode. The MS docs at
https://msdn.microsoft.com/library/y13z34da.aspx
say:
"Determines whether an integer represents a space character.
...
The behavior of isspace and _isspace_l is undefined if c is not
EOF or in the range 0 through 0xFF, inclusive. When a debug CRT
library is used and c is not one of these values, the functions
raise an assertion."
Although this is somewhat questionable I must admit that the MS behavior
is (IMHO) legitimate, because they define otherwise 'undefined' behavior.
That said, I believe that all sorts of isspace(), isupper(), toupper(),
tolower() etc. must not be in the FLTK code base anymore, since they are
all defined for "the current locale" which is not applicable for UTF-8
encoding [1]. For UTF-8 text we have for instance fl_tolower() and
fl_utf_tolower() for UCS characters and UTF-8 strings, resp..
[1] Although they may work for the ASCII _subset_ of UTF-8 and thus for
most English text.