Problem with non-ASCII characters

244 views
Skip to first unread message

Eduard Tikhenko

unread,
Jun 5, 2013, 11:50:41 AM6/5/13
to fltkg...@googlegroups.com
Hello to all!
I have recently started to teach FLTK, and I have a little problem. There is a small code: the user selects a folder on his PC, and then the directory name is written into the text field.

static void cb (Fl_Widget*, void* obj)
{
   
Fl_Output* x = static_cast<Fl_Output*> (obj);
   
Fl_Native_File_Chooser* dialog = new Fl_Native_File_Chooser (
       
Fl_Native_File_Chooser::BROWSE_DIRECTORY);
   
if (!dialog->show()) {
        x
->value (dialog->filename());
   
}
}

int main()
{
   
Fl_Window* win = new Fl_Window (400, 30, "non-ASCII");
   
Fl_Output* field = new Fl_Output (1, 1, 300, 29);
   
Fl_Button* btn = new Fl_Button (field->x() + field->w(), field->y(),
        win
->w() - field->w() - 2, field->h(), "Browse");
    btn
->callback (cb, field);
    win
->end();
    win
->show();
   
return Fl::run();
}

However, the non-Latin characters are displayed incorrectly. Please, advise how to fix it.

Eduard Tikhenko

unread,
Jun 5, 2013, 11:52:32 AM6/5/13
to fltkg...@googlegroups.com
Forgot to say, I use FLTK v1.3.

MacArthur, Ian (Selex ES, UK)

unread,
Jun 5, 2013, 12:34:38 PM6/5/13
to fltkg...@googlegroups.com
> However, the non-Latin characters are displayed
> incorrectly. Please, advise how to fix it.

Hard to say; not enough info.

You don't say what platform you are on - though I'm guessing some Windows variant. It would also help if you could show us the UTF8 text for the expected string, and how you expect it to appear.

And also perhaps if you can check the font face you are loading does provide the required glyphs.
Though the string that is rendered in the screen grab clearly is not using the "missing glyph" cell, so it does seem most likely that this is an issue with conversions between UTF8 and some other character representation.

Do you know anything about how the aberrant PC is configured? Is it set to some "code page" or is it set for Unicode behaviour, that sort of thing?




Selex ES Ltd
Registered Office: Sigma House, Christopher Martin Road, Basildon, Essex SS14 3EL
A company registered in England & Wales. Company no. 02426132
********************************************************************
This email and any attachments are confidential to the intended
recipient and may also be privileged. If you are not the intended
recipient please delete it from your system and notify the sender.
You should not copy it or use it for any purpose nor disclose or
distribute its contents to any other person.
********************************************************************

MacArthur, Ian (Selex ES, UK)

unread,
Jun 5, 2013, 1:12:55 PM6/5/13
to fltkg...@googlegroups.com
> However, the non-Latin characters are displayed incorrectly.
> Please, advise how to fix it.

Hmmm, this looks like it *might* be a bug in Fl_Native_File_Chooser_WIN32.cxx, maybe around line 435 or so, e.g. in the method Fl_Native_File_Chooser::showfile() I suspect.

It looks like, if you do a BROWSE_DIRECTORY, then the returned string is *not* transliterated to UTF8 by calling wchartoutf8(...) on it.

c.f. BROWSE_FILE, for example, which does call wchartoutf8(...) on the strings it finds.

If you adjust your program to do a BROWSE_FILE then I think you will find that it works OK and returns the expected string (if you try to select a file with a non-ASCII name that is of course, since the BROWSE_FILE option will not let you return a directory name...)

Well, it worked for me: I had a test folder containing folders and files with names in Chinese characters, and the BROWSE_FILE option works to return Chinese and mixed-text file names, but the BROWSE_DIRECTORY option *does not* work for returning directory names...

Might be good if we could get Greg's opinion on this, since he knows more about the native file chooser code than anyone!

Cheers,
--
Ian

Ian MacArthur

unread,
Jun 5, 2013, 1:18:08 PM6/5/13
to fltkg...@googlegroups.com, ian.ma...@selex-es.com
Actually, it looks more likely that it is Fl_Native_File_Chooser::showdir(); that needs to be tweaked.
 
Though I'm not quite sure what to do to it though!
 
 
 

Eduard Tikhenko

unread,
Jun 5, 2013, 4:19:31 PM6/5/13
to fltkg...@googlegroups.com
Yes, you are right, if the code is changed to BROWSE_FILE, then the string is displayed correctly: http://storage3.static.itmages.ru/i/13/0606/h_1370462787_5162865_e37f7e6711.png
The operating system is Windows, g++ compiler from the mingw, Cyrillic characters (Russian), encoding cp1251 or 866 (I do not know exactly, but it is a standard encoding Windows Russian locale).

The expected string -   C:\Users\Eduard\Desktop\Что-то на русском
The resulting string -    C:\Users\Eduard\Desktop\×òî-òî íà ðóññêîì

If I point out, for example, the button label in the non-ASCII characters, it will be correctly written. I am writing a program interface in Russian and in English, and this problem does not exist.

Ian MacArthur

unread,
Jun 5, 2013, 4:28:38 PM6/5/13
to fltkg...@googlegroups.com
Yes - looks like a bug in fltk:

In the file Fl_Native_File_Chooser_WIN32.cxx at line 536 we have: 

     add_pathname(path);

which I suspect should be

     add_pathname(some_conversion_function(path));

Where "some_conversion_function()" converts the string from the format returned by SHGetPathFromIDList(); into an array of UTF8 characters for fltk to use...

But, I'm not really sure what that function would actually be; certainly I don't think it is the wchartoutf8() function used elsewhere in this file for similar conversions...

I'm hoping maybe Greg or Manolo will have ideas here...

Cheers,
-- 
Ian




nikego

unread,
Jun 5, 2013, 6:21:44 PM6/5/13
to fltkg...@googlegroups.com
Hello, Eduard!
Unfortunately Fl_Native_File_Chooser has bugs. Often it works with characters in native Windows encoding instead of UTF8.
At the moment to display filename you first have to convert string to UTF8. I changed your code and it works correctly now. (VS2010, Windows7 64)

#include "FL\fl_utf8.h"


static void cb (Fl_Widget*, void* obj)
{
   Fl_Output* x = static_cast<Fl_Output*> (obj);
   Fl_Native_File_Chooser* dialog = new Fl_Native_File_Chooser (
   Fl_Native_File_Chooser::BROWSE_DIRECTORY);

   dialog->title("Выбор"); // this string will be displayed in Russian unlike any others fltk widgets, see below

   if (!dialog->show()) {  
       const char* acp_name = dialog->filename(); // it returned filename in default Windows encoding
       char* utf8_name = fl_locale_to_utf8(acp_name, strlen(acp_name), CP_ACP);
       x->value (utf8_name);
   }
}

int main()
{
   extern unsigned int fl_codepage;
   fl_codepage = CP_ACP; // work around a bug in fl_locale_to_utf8


   Fl_Window* win = new Fl_Window (400, 30, "non-ASCII");
   Fl_Output* field = new Fl_Output (1, 1, 300, 29);
   Fl_Button* btn = new Fl_Button (field->x() + field->w(), field->y(),
   win->w() - field->w() - 2, field->h(), "Нажми"); // this string will be displayed as mess of symbols, because it's not UTF8, at least in VS

Greg Ercolano

unread,
Jun 5, 2013, 7:38:10 PM6/5/13
to fltkg...@googlegroups.com
On 06/05/13 13:28, Ian MacArthur wrote:
> Yes - looks like a bug in fltk:
> In the file Fl_Native_File_Chooser_WIN32.cxx at line 536 we have:
> add_pathname(path);
> which I suspect should be
> add_pathname(some_conversion_function(path));
> Where "some_conversion_function()" converts the string from the format returned by SHGetPathFromIDList(); into an array of UTF8 characters for fltk to use...
> But, I'm not really sure what that function would actually be; certainly I don't think it is the wchartoutf8() function used elsewhere in this file for similar conversions...
> I'm hoping maybe Greg or Manolo will have ideas here...

I don't have a multilingual OS configured, so I can't really properly
dev a solution I don't think.

Looks like FLTK provides these translation functions for windows:

char *fl_utf8_to_locale(const char *s, int len, unsigned int codepage);
char *fl_locale_to_utf8(const char *s, int len, unsigned int codepage);

I'm not sure what codepage value should be in this case though.
Looks like Nikego shows an example using the above with CP_ACP
as the code page.

I really don't know anything about the native codepage stuff..
Probably best thing would be for someone to provide patches.

I take it we want to be able to feed utf8 into the widget,
and have it immediately translated into whatever the windows
native widget wants, and vice-versa..? I hope nothing gets
'lost in translation', because any translation done wrong
will cause a lot of confusion.





Eduard Tikhenko

unread,
Jun 6, 2013, 1:03:14 AM6/6/13
to fltkg...@googlegroups.com
Big thanks you for help!
Unfortunately, your example had no effect (Windows 7x64, mingw): http://storage5.static.itmages.ru/i/13/0606/h_1370494624_1828347_34130f60db.png
However, I corrected here this line:

/* fl_codepage = CP_ACP; */
fl_codepage
= 1251;

and, oh happiness, it worked! =)  http://storage3.static.itmages.ru/i/13/0606/h_1370494849_2479759_c160ca48ae.png

Eduard Tikhenko

unread,
Jun 6, 2013, 1:43:56 AM6/6/13
to fltkg...@googlegroups.com
Or I can just modify the function of the event, and the main left untouched. I like this version more, because smaller patches.

static void cb (Fl_Widget*, void* obj)
{
   
Fl_Output* x = static_cast<Fl_Output*> (obj);
   
Fl_Native_File_Chooser* dialog = new Fl_Native_File_Chooser (
       
Fl_Native_File_Chooser::BROWSE_DIRECTORY);

   
if (!dialog->show()) {
       
/*
        x->value (dialog->filename());
        Instead of the simple solutions do little hack =)
        */

       
const char* acp_name = dialog->filename();

       
char* utf8_name = fl_locale_to_utf8(acp_name, strlen(acp_name), 1251);

        x
->value (utf8_name);        
   
}
}

int main()
{

   
Fl_Window* win = new Fl_Window (400, 30, "non-ASCII");
   
Fl_Output* field = new Fl_Output (1, 1, 300, 29);
   
Fl_Button* btn = new Fl_Button (field->x() + field->w(), field->y(),

        win
->w() - field->w() - 2, field->h(), "Browse");

    btn
->callback (cb, field);
    win
->end();
    win
->show();
   
return Fl::run();
}

And it works correctly: http://storage7.static.itmages.ru/i/13/0606/h_1370497328_6535013_aef569da33.png
Thanks again for help.

Nikita Egorov

unread,
Jun 6, 2013, 3:41:40 AM6/6/13
to fltkg...@googlegroups.com
Hi, Eduard,
yes you are right, "1251" works fine on all systems, but CP_ACP (it's
default) on Windows with russian UI only. Of course, changing of
fl_codepage is not necessary in this case.

2013/6/6 Eduard Tikhenko <aqua...@gmail.com>:
> Or I can just modify the function of the event, and the main left untouched.
> I like this version more, because smaller patches.

I hope you understood that Fl_Native_File_Chooser::title() need string
in native encoding (1251 in our case) too. Be careful.

--
Nikita Egorov

MacArthur, Ian (Selex ES, UK)

unread,
Jun 6, 2013, 4:57:15 AM6/6/13
to fltkg...@googlegroups.com

Nikita (et al),

 

We need to be careful here about conflating multiple features into the one bug; I think there are in fact several (related) things going on here:-

 

0: Recall that, internally, fltk now uses UTF8 encoded strings exclusively, not code-page encoded strings (though there may be bugs…)

 

1: Fl_Native_File_Chooser_WIN32 attempts to convert any strings it reads from the OS into UTF8 before returning them to fltk, BUT in the BROWSE_DIRECTORY case it is getting that wrong (a fltk bug) and returning the codepage string instead – which we then try and display as if it were UTF8 and it looks like garbage!

 

2: Nikita’s example code is “wrong” in so far as when the label is set on the button, a codepage encoded string is passed to a button that expects a UTF8 label. I don’t know if it is possible to get the VS editor to wrote UTF8 encoded strings, I don’t use it because I couldn’t get UTF8 out of it many years ago. I use Sublime2 at present and it handles UTF8 strings OK.

 

3: There appears to be another bug in that fltk is NOT converting the dialog title correctly. Again, this is supposed to be a UTF8 string, not a codepage string…

 

So, in summary; we need to make sure all strings are UTF8 encoded. Then we need to fix fltk so that is actually does UTF8 correctly on WIN32…

 

 

 

 

Hello, Eduard!
Unfortunately Fl_Native_File_Chooser has bugs. Often it works with characters in native Windows encoding instead of UTF8.

At the moment to display filename you first have to convert string to UTF8. I changed your code and it works correctly now. (VS2010, Windows7 64)

 

Selex ES Ltd

MacArthur, Ian (Selex ES, UK)

unread,
Jun 6, 2013, 4:58:54 AM6/6/13
to fltkg...@googlegroups.com

Or I can just modify the function of the event, and the main left untouched. I like this version more, because smaller patches.

I think the better option would be to fix Fl_Native_File_Chooser_WIN32 however, rather than making non-portable workarounds in you own code.

 

 

 

Albrecht Schlosser

unread,
Jun 6, 2013, 5:35:36 AM6/6/13
to fltkg...@googlegroups.com
On 06.06.2013 10:57, MacArthur, Ian (Selex ES, UK) wrote:

> Nikita (et al),
>
> We need to be careful here about conflating multiple features into the
> one bug; I think there are in fact several (related) things going on here:-

Agreed, 100%. I tested this with Russian (cyrillic) characters (as the
OP showed) and additionally with German Umlaut characters in file and
directory names, and I extended the test program for more tests. What I
found is that all FLTK BUGS mentioned below only appear in
BROWSE_DIRECTORY mode - BROWSE_FILE mode seems to work well in all cases.

> 0: Recall that, internally, fltk now uses UTF8 encoded strings
> exclusively, not code-page encoded strings (though there may be bugs�)
>
> 1: Fl_Native_File_Chooser_WIN32 attempts to convert any strings it reads
> from the OS into UTF8 before returning them to fltk, BUT in the
> BROWSE_DIRECTORY case it is getting that wrong (a fltk bug) and
> returning the codepage string instead � which we then try and display as
> if it were UTF8 and it looks like garbage!

Yep, confirmed; BROWSE_FILE mode is okay, however.
This is bug #1.

> 2: Nikita�s example code is �wrong� in so far as when the label is set
> on the button, a codepage encoded string is passed to a button that
> expects a UTF8 label. I don�t know if it is possible to get the VS
> editor to wrote UTF8 encoded strings, I don�t use it because I couldn�t
> get UTF8 out of it many years ago. I use Sublime2 at present and it
> handles UTF8 strings OK.

Yep, confirmed as well. UTF-8 strings (and I used the given example
string) work well if encoded correctly in the file (Windows 7, MinGW,
gcc/g++). So this needs to be fixed in the application code. In the
worst case you'd need to encode the string in hex values, but if VS
insisted on the wrong encoding of strings, I'd try to use one of FLTK's
wide-character-to-utf-8 conversion functions.

> 3: There appears to be another bug in that fltk is NOT converting the
> dialog title correctly. Again, this is supposed to be a UTF8 string, not
> a codepage string�

Yep, this is bug #2. Again, this works correctly with UTF-8 encoded
strings in BROWSE_FILE mode in my tests.

> So, in summary; we need to make sure all strings are UTF8 encoded. Then
> we need to fix fltk so that is actually does UTF8 correctly on WIN32�

True, and I want to add that all *workarounds* done now will later
(after FLTK has been fixed) lead to garbage! So please take care to mark
local workarounds clearly so that they will be easy to remove later.

Albrecht

PS: I didn't find a bug report for this issue, maybe the OP could file
one... ?
http://www.fltk.org/str.php

Albrecht Schlosser

unread,
Jun 6, 2013, 5:53:30 AM6/6/13
to fltkg...@googlegroups.com
On 06.06.2013 01:38, Greg Ercolano wrote:
> On 06/05/13 13:28, Ian MacArthur wrote:
>> Yes - looks like a bug in fltk:
>> In the file Fl_Native_File_Chooser_WIN32.cxx at line 536 we have:
>> add_pathname(path);
>> which I suspect should be
>> add_pathname(some_conversion_function(path));
>> Where "some_conversion_function()" converts the string from the format returned by SHGetPathFromIDList(); into an array of UTF8 characters for fltk to use...
>> But, I'm not really sure what that function would actually be; certainly I don't think it is the wchartoutf8() function used elsewhere in this file for similar conversions...
>> I'm hoping maybe Greg or Manolo will have ideas here...
>
> I don't have a multilingual OS configured, so I can't really properly
> dev a solution I don't think.
>
> Looks like FLTK provides these translation functions for windows:
>
> char *fl_utf8_to_locale(const char *s, int len, unsigned int codepage);
> char *fl_locale_to_utf8(const char *s, int len, unsigned int codepage);
>
> I'm not sure what codepage value should be in this case though.
> Looks like Nikego shows an example using the above with CP_ACP
> as the code page.

I learned by reading docs and testing: none of these functions must be
used anywhere when converting file names internally. The reason is
simple: if a filename string would be in any locale encoding, then you
would restrict it to the < 255 character values of that locale. This
does not work with multi-language file names.

Windows stores file names on NTFS volumes in Unicode (UCS-16) encoding,
but on FAT volumes in the local codepage (single byte characters). I'd
hope that you would always get wide character strings (wchar_t) from all
OS-related functions, hence we should only have to convert between OS
representation = wchar_t (UCS-16) and FLTK representation = UTF-8.

> I really don't know anything about the native codepage stuff..
> Probably best thing would be for someone to provide patches.

If nobody comes with patches, I may perhaps take a look at the weekend,
if I can find the time, since I have a valid test case.

> I take it we want to be able to feed utf8 into the widget,
> and have it immediately translated into whatever the windows
> native widget wants, and vice-versa..? I hope nothing gets
> 'lost in translation', because any translation done wrong
> will cause a lot of confusion.

Absolutely true. But as I said above, there should be only two different
representations, and since everything works well in the BROWSE_FILE
case, this should be doable. However, I didn't look into the code yet.

Albrecht

Ian MacArthur

unread,
Jun 6, 2013, 6:12:53 AM6/6/13
to fltkg...@googlegroups.com
(OK, I'm posting this via the web interface, so I'ne *no idea* how mangled it may or may not get in transit!)
 
Like Albrecht, I have had a quick hack at this; my example code posted below for consideration.
 
This was tested on Win7, 32-bit, built with mingw. The source was edited with Sublime2 and *was* UTF8 encoded when I wrote it (though who knows what format it will be by the time you receive this message!)
The file and directory names being browsed were stored on a NTFS formatted volume. (I don't have any FAT32 volumes here to test.)
 
There are strings in ASCII, Cyrllic and Chinese. Please don't get hung up on the actual strings used, I just cut-n-pasted them from samples I had lying around...
 
Observations:
 
All the buttons display as expected; there is no corruption.
The tooltip displays as expected, there is no corruption.
 
This seems to indicate that, so long as UTF8 encoded strings are used, buttons, tooltips (and hopefully other general widgets) display the strings correctly.
 
Native File Chooser WIN32 seems to work OK in BROWSE_FILE mode; both the setting of the dialog title and the returning of selected strings are correctly UTF8 encoded and display fine.
 
However, in BROWSE_DIRECTORY mode, neither the title nor the returned strings work... Both appear to need codepage handling. This is unfortunate and looks like a fltk bug.
 
Anyway, here is the sample code for others to try, if you are playing along at home...
 
----------------------------
 
/* Test UTF8 handling in native file chooser on Win32 */
/* fltk-config --compile test.cxx */
#include <stdlib.h>
#include <stdio.h>
#include <FL/Fl.H>
#include <FL/Fl_Window.H>
#include <FL/Fl_Button.H>
#include <FL/Fl_Output.H>
#include <FL/Fl_Native_File_Chooser.H>
static Fl_Window* win = 0;
static Fl_Output* field = 0;
static Fl_Output* f2 = 0;
static void cb (Fl_Widget*, void* obj)
{
    Fl_Output* x = static_cast<Fl_Output*>(obj);
    int mode = Fl_Native_File_Chooser::BROWSE_DIRECTORY;
    if (x == f2) {
        mode = Fl_Native_File_Chooser::BROWSE_FILE;
    }
    Fl_Native_File_Chooser* dialog = new Fl_Native_File_Chooser(mode);
    // Setting title works OK for browse-file case, but not for browse-dir case
    // I guess browse-dir is expecting local-codepage, and I have UTF8, but
    // browse-file must be accepting UTF8 string OK.
    dialog->title("Выбор Что-то на русском");
    if (dialog->show() == 0) {
     int count = dialog->count();
     printf("selected %d items\n", count);
     if(count) {
      const char *nm = dialog->filename();
      printf("got %s which is %d bytes long\n", nm, strlen(nm));
         x->value (nm);
     }
     fflush(stdout);
    }
}
static void cb_q(Fl_Widget*, void*)
{
    if(win) win->hide();
}
int main(int argc, char **argv)
{
    win = new Fl_Window (400, 300, "non-ASCII title 中文測試");
    win->begin();
    field = new Fl_Output (1, 1, 300, 30);

    Fl_Button* btn = new Fl_Button (field->x() + field->w(), field->y(),
        win->w() - field->w() - 2, field->h(), "Dir");
    btn->callback (cb, field);
    f2 = new Fl_Output (1, 40, 300, 30);
    Fl_Button* b2 = new Fl_Button (f2->x() + f2->w(), f2->y(),
        win->w() - f2->w() - 2, f2->h(), "File");
    b2->callback (cb, f2);
    Fl_Button *b3 = new Fl_Button(  5, 80, 80, 30, "ASCII");
    Fl_Button *b4 = new Fl_Button( 90, 80, 80, 30, "中文測試");
    Fl_Button *b5 = new Fl_Button(175, 80, 80, 30, "Нажми");
    Fl_Button *quit = new Fl_Button(5, win->h() - 35, win->w() - 10, 30, "QUIT Что-то на русском");
    quit->tooltip("Tooltip 中文測試 Нажми");
    quit->callback(cb_q);
    win->end();
    win->show(argc, argv);
    return Fl::run();
}
/* end of file */
 
 
 
 
 

MacArthur, Ian (Selex ES, UK)

unread,
Jun 6, 2013, 6:40:43 AM6/6/13
to fltkg...@googlegroups.com
> Windows stores file names on NTFS volumes in Unicode (UCS-16) encoding,
> but on FAT volumes in the local codepage (single byte characters). I'd
> hope that you would always get wide character strings (wchar_t) from
> all
> OS-related functions, hence we should only have to convert between OS
> representation = wchar_t (UCS-16) and FLTK representation = UTF-8.


I guess that may mean our fl_utf8from_mb() or (perhaps better) fl_utf8fromwc() functions (from fl_utf.c) might be the right thing for us to use in adjusting the strings returned by the file choose in BROWSE_DIR mode then?

We also have fl_utf8toUtf16() which is widely used internally in fltk for going the "other way", i.e. when passing strings to the WIN32 API, so I guess that's what we'd want to call when setting the dialog title.


> > I take it we want to be able to feed utf8 into the widget,
> > and have it immediately translated into whatever the windows
> > native widget wants, and vice-versa..? I hope nothing gets
> > 'lost in translation', because any translation done wrong
> > will cause a lot of confusion.
>
> Absolutely true. But as I said above, there should be only two
> different
> representations, and since everything works well in the BROWSE_FILE
> case, this should be doable. However, I didn't look into the code yet.

Yes, for the most part it looks like (at least in my testing) that so long as you have UTF8 in your fltk code, it does the Right Thing.

Except in the Win32 native Browse Dir case, where we seem to have missed the conversions to/from the native representation (which is probably UTF16, at least on my system, and that ties in with Albrecht's finding too, I think...)

However, the test code that Nikita posted exhibits other failures that complicate matters, but they seem to be because the source strings being passed to fltk are *not* UTF8 encoded - certainly when I test that program, the strings all work fine for me once I have Sublime2 save the file in UTF8 encoding.

Albrecht Schlosser

unread,
Jun 6, 2013, 6:50:42 AM6/6/13
to fltkg...@googlegroups.com
On 06.06.2013 12:12, Ian MacArthur wrote:
> (OK, I'm posting this via the web interface, so I'ne *no idea* how
> mangled it may or may not get in transit!)

Worked well for me. I could cut'n'paste it from thunderbird into a
notepad++ editor session after I changed the editor's encoding to UTF-8
(w/o BOM).

> Like Albrecht, I have had a quick hack at this; my example code posted
> below for consideration.

Looks good, pretty much what I did, but *better* with your additional
buttons.

> This was tested on Win7, 32-bit, built with mingw. The source was edited
> with Sublime2 and *was* UTF8 encoded when I wrote it (though who knows
> what format it will be by the time you receive this message!)

Tested here on Win7, 32-bit, built with mingw (but as 32-bit executable,
anyway, so there shouldn't be any difference).

> The file and directory names being browsed were stored on a NTFS
> formatted volume. (I don't have any FAT32 volumes here to test.)

Any USB stick should do, most are formatted as FAT32.

> Observations:
> All the buttons display as expected; there is no corruption.
> The tooltip displays as expected, there is no corruption.
> This seems to indicate that, so long as UTF8 encoded strings are used,
> buttons, tooltips (and hopefully other general widgets) display the
> strings correctly.
> Native File Chooser WIN32 seems to work OK in BROWSE_FILE mode; both the
> setting of the dialog title and the returning of selected strings are
> correctly UTF8 encoded and display fine.
> However, in BROWSE_DIRECTORY mode, neither the title nor the returned
> strings work... Both appear to need codepage handling. This is
> unfortunate and looks like a fltk bug.

Same here. Yes, I agree it's a fltk bug (two different bugs, to be precise).

One more observation: the file chooser seems to remember the last
directory visited, whereas the directory chooser (BROWSE_DIRECTORY)
always seems to start (open) the "Computer" top-level directory. This is
a minor issue and should probably be fixed as well, to make it
consistent with the file chooser mode.

Albrecht

Nikita Egorov

unread,
Jun 6, 2013, 7:51:44 AM6/6/13
to fltkg...@googlegroups.com
Ian, I knew about these bugs inside Native_File_Chooser for a few
years. To be honest I thought I wrote the STR message about it among
others ones... :(

> However, the test code that Nikita posted exhibits other failures that complicate matters, but they seem to be because the source strings being passed to fltk are *not* UTF8 encoded - certainly when I test that program, the strings all work fine for me once I have Sublime2 save the file in UTF8 encoding.

Yes, there is a problem in VS with UTF8 encoding. VS2010 displays utf8
files properly (e.g. your latest test sample), but it converts the
strings during compiling. I attached screenshots of your test:

* there is no russian labels at all
* the file browser returns utf8
* the folder browser returns 1251
* title of the file browser is corrupted
* title of the folder browser is OK

--
Nikita Egorov
ian_test_dir.jpg
ian_test_file.jpg
ian_test1.jpg

MacArthur, Ian (Selex ES, UK)

unread,
Jun 6, 2013, 8:13:24 AM6/6/13
to fltkg...@googlegroups.com
All,

I've committed a "fix" at r9932 that I think fixes this, at least for the BROWSE_DIRECTORY case and seems to work OK with my test program.

However, it would be fair to say I have no real clue what I am doing with the WIN32 file chooser stuff, so I would appreciate some review of the code and perhaps feedback/corrections if any are needed!

Cheers,
--
Ian

MacArthur, Ian (Selex ES, UK)

unread,
Jun 6, 2013, 9:03:25 AM6/6/13
to fltkg...@googlegroups.com
> > The file and directory names being browsed were stored on a NTFS
> > formatted volume. (I don't have any FAT32 volumes here to test.)
>
> Any USB stick should do, most are formatted as FAT32.

Yup; though I'm working on a secured work PC right now, and USB is inhibited, so...



> Same here. Yes, I agree it's a fltk bug (two different bugs, to be
> precise).
>
> One more observation: the file chooser seems to remember the last
> directory visited, whereas the directory chooser (BROWSE_DIRECTORY)
> always seems to start (open) the "Computer" top-level directory. This
> is
> a minor issue and should probably be fixed as well, to make it
> consistent with the file chooser mode.

OK, I *think* I have this working in r9933, except for the default location that the dir chooser opens in.

To be honest, I couldn't easily see how to set that, so I kinda just left it...

MacArthur, Ian (Selex ES, UK)

unread,
Jun 6, 2013, 9:12:12 AM6/6/13
to fltkg...@googlegroups.com
> Ian, I knew about these bugs inside Native_File_Chooser for a few
> years. To be honest I thought I wrote the STR message about it among
> others ones... :(

Oh well; I think I have "fixed" it for now - or at least, I have made it use UTF8 for setting/getting the strings to the BROWSE_DIRECTORY option...

Actually, if your toolchain is *not* using UTF8, that probably makes things even worse for you, not better!


> > However, the test code that Nikita posted exhibits other failures
> > that complicate matters, but they seem to be because the source strings
> > being passed to fltk are *not* UTF8 encoded - certainly when I test
> > that program, the strings all work fine for me once I have Sublime2
> > save the file in UTF8 encoding.
>
> Yes, there is a problem in VS with UTF8 encoding. VS2010 displays utf8
> files properly (e.g. your latest test sample), but it converts the
> strings during compiling. I attached screenshots of your test:
>
> * there is no russian labels at all
> * the file browser returns utf8
> * the folder browser returns 1251
> * title of the file browser is corrupted
> * title of the folder browser is OK

Urgh; that is not good.
There must be some way to get VS to save the files as UTF8 rather than converting them?
This is the second decade of the 21st Century after all, not 1980, so surely by now...?

If you edit/save the files with a UTF8 aware editor (I use Sublime, looks like Albrecht uses notepad++) and then compile it...
Is it the VS *editor* that is messing things up, or is it the MS compiler?
I have to assume it is the editor and not the compiler, surely?
--
Ian

Nikita Egorov

unread,
Jun 6, 2013, 9:54:54 AM6/6/13
to fltkg...@googlegroups.com
> There must be some way to get VS to save the files as UTF8 rather than converting them?
> This is the second decade of the 21st Century after all, not 1980, so surely by now...?
>
> If you edit/save the files with a UTF8 aware editor (I use Sublime, looks like Albrecht uses notepad++) and then compile it...
> Is it the VS *editor* that is messing things up, or is it the MS compiler?
> I have to assume it is the editor and not the compiler, surely?

At last I've found solution (so far I use the gettext utility in my
projects, it's more flexible)
http://support.microsoft.com/kb/980263/en-us

#pragma execution_character_set("utf-8")

Perhaps it would be useful to put the information in FLTK docs since
it works with UTF8 strings only.

MacArthur, Ian (Selex ES, UK)

unread,
Jun 6, 2013, 10:35:33 AM6/6/13
to fltkg...@googlegroups.com
> > If you edit/save the files with a UTF8 aware editor (I use Sublime,
> looks like Albrecht uses notepad++) and then compile it...
> > Is it the VS *editor* that is messing things up, or is it the MS
> compiler?
> > I have to assume it is the editor and not the compiler, surely?
>
> At last I've found solution (so far I use the gettext utility in my
> projects, it's more flexible)
> http://support.microsoft.com/kb/980263/en-us

Hmm, interesting. The note specifically refers to VS 2008 SP1 in the article, but you reckon it still holds?

In any case, it does sound like it is the *compiler* that is messing up, not the editor. Not what I'd expected...



> #pragma execution_character_set("utf-8")
>
> Perhaps it would be useful to put the information in FLTK docs since
> it works with UTF8 strings only.

Or, do you think it is safe for us to go a step beyond that, and put this string into (for example) Fl.H, with some sort of "#ifdef MSVC" check around it, thereby "forcing" it on for all users?

Or would that cause more problems than it solves...?

Greg Ercolano

unread,
Jun 6, 2013, 1:03:46 PM6/6/13
to fltkg...@googlegroups.com
On 06/06/13 07:35, MacArthur, Ian (Selex ES, UK) wrote:
>> #pragma execution_character_set("utf-8")
> Or, do you think it is safe for us to go a step beyond that, and put this string into (for example) Fl.H, with some sort of "#ifdef MSVC" check around it, thereby "forcing" it on for all users?
> Or would that cause more problems than it solves...?

I'm -1 on that.. as good as it sounds in the current context,
it can break old code that depends on whatever the default behavior is.

It should probably be documented in the fltk manual under "OS ISSUES",
and perhaps in a separate section on internationalization & UTF-8.

Nikita Egorov

unread,
Jun 6, 2013, 1:26:57 PM6/6/13
to fltkg...@googlegroups.com
> Hmm, interesting. The note specifically refers to VS 2008 SP1 in the article, but you reckon it still holds?
Of course I tested it on VS 2010. It works :)

> In any case, it does sound like it is the *compiler* that is messing up, not the editor. Not what I'd expected...

I wrote in the beginning that editor works absolutely correctly, I can
load, view, edit and save UTF8 file without any problems. Troubles
start when I compile the file.

> Or, do you think it is safe for us to go a step beyond that, and put this string into (for example) Fl.H, with some sort of "#ifdef MSVC" check around it, thereby "forcing" it on for all users?

No, it would be an excess thing. I meant only comment or note in the
FLTK docs about utf8.

--
Nikita Egorov

Ian MacArthur

unread,
Jun 6, 2013, 1:49:55 PM6/6/13
to fltkg...@googlegroups.com
On 6 Jun 2013, at 18:03, Greg Ercolano wrote:

> On 06/06/13 07:35, MacArthur, Ian (Selex ES, UK) wrote:
>>> #pragma execution_character_set("utf-8")
>> Or, do you think it is safe for us to go a step beyond that, and put this string into (for example) Fl.H, with some sort of "#ifdef MSVC" check around it, thereby "forcing" it on for all users?
>> Or would that cause more problems than it solves...?
>
> I'm -1 on that.. as good as it sounds in the current context,
> it can break old code that depends on whatever the default behavior is.

Me too...

Though maybe for a slightly different reason: searching on MSDN, it is apparent that, although this pragma worked with VS 2008 and VS 20010 it seems it is currently broken in VS2102, and will be fixed "in a later issue"...

So even if we did add it. it will only work for a limited subset of VS users anyway.


>
> It should probably be documented in the fltk manual under "OS ISSUES",
> and perhaps in a separate section on internationalization & UTF-8.

I guess so; though I think that there really ought to be a better way of telling the compiler that a string is explicitly meant to be UTF8.

Of course, C++11 does provide the u8"string" mechanism for this, but it looks as if it is not widely supported, certainly not by VS at any rate...



Greg Ercolano

unread,
Jun 6, 2013, 2:34:04 PM6/6/13
to fltkg...@googlegroups.com
On 06/06/13 10:49, Ian MacArthur wrote:
> Of course, C++11 does provide the u8"string" mechanism for this, but it looks as if it is not widely supported, certainly not by VS at any rate...

I should read up on C++11.

The u8"string" sounds similar to techniques I've seen in scripting languages
like python, where little prefixes on string literals to give hints to the
interpreter/compiler about special handling of the string payload.

In python for instance you can use: r"some text" to protect backslash expansions
and such.

Albrecht Schlosser

unread,
Jun 7, 2013, 4:45:36 AM6/7/13
to fltkg...@googlegroups.com
On 06.06.2013 14:13, MacArthur, Ian (Selex ES, UK) wrote:

> I've committed a "fix" at r9932 that I think fixes this, at least for the BROWSE_DIRECTORY case and seems to work OK with my test program.
>
> However, it would be fair to say I have no real clue what I am doing with the WIN32 file chooser stuff, so I would appreciate some review of the code and perhaps feedback/corrections if any are needed!

My feedback: I looked at the source code and couldn't find any errors or
missing points, but I concentrated on the parts you changed. I don't
know much (anything?) about the win32 API used here, so this may not
mean much. Anyway, I wanted to let you know...

Albrecht

Ian MacArthur

unread,
Jun 7, 2013, 4:16:59 PM6/7/13
to fltkg...@googlegroups.com
Cheers Albrecht,

I think it is OK, the change I made; I've tried it on a few systems, form XP to Win7 and it appears to be doing the Right Thing, and I'm not seeing any regressions either, so...

The change is pretty localised in the file, and I tried to just replicate, for the BROWSE_DIR case, what was already being done "successfully" for the BROWSE_FILE case, so I'm hopeful that it is fine!

Cheers,
--
Ian


MacArthur, Ian (Selex ES, UK)

unread,
Jun 10, 2013, 10:22:55 AM6/10/13
to fltkg...@googlegroups.com
So here's a bit of a story; and note that I *do not* have the VS tools installed, so this is *all* speculation and hearsay...


Based on Nikita's experiences in practice, and what I have gleaned from googling about over the weekend, it appears that:

- The MS compilers have a (perhaps annoying) habit of (re-)encoding strings, into the binary object, in the encoding of the codepage that is active for the process space in which the compiler is running.

- So, if the input source file is in UTF8, the compiler translates the UTF8 strings into the active codepage representation. (I find this somewhat unexpected...)

- If you have VS2008-sp1 or VS2010-sp-something, you can set the

#pragma execution_character_set("utf-8")

in your source file, and any strings will be read verbatim by the compiler, without applying the codepage re-encoding... But this pragma possibly does not work for VS2005 nor VS2012.

- It is *rumoured*, at least with VS2005, that if you save your source file in UTF8, but ensure there is NO BOM on the file, then the VS2005 compiler (at least, possibly others) can not tell what encoding the source is in so passes the strings through *without* re-encoding them. Which would work for us I guess?

So, it would be interesting to know if that actually does work, i.e. build a fltk test program (containing non-ASCII widget labels) with e.g. VS2010, saving the source file in UTF8 with NO BOM, and see if the compiler re-encodes the strings (hence breaking them) or whether that actually works...



For comparison, so far as I can tell, gcc-mingw seems to pass the UTF8 strings through verbatim, so that pretty much Just Works for me every time!

Nikita Egorov

unread,
Jun 10, 2013, 11:02:52 AM6/10/13
to fltkg...@googlegroups.com
> - It is *rumoured*, at least with VS2005, that if you save your source file in UTF8, but ensure there is NO BOM on the file, then the VS2005 compiler (at least, possibly others) can not tell what encoding the source is in so passes the strings through *without* re-encoding them. Which would work for us I guess?
>
> So, it would be interesting to know if that actually does work, i.e. build a fltk test program (containing non-ASCII widget labels) with e.g. VS2010, saving the source file in UTF8 with NO BOM, and see if the compiler re-encodes the strings (hence breaking them) or whether that actually works...

Hi, Ian
Yes, that's true. If the utf8 source file has not BOM then the
compiler works correctly.
But every time BOM appears again when the VS editor saves the file. (I
used notepad++ to remove BOM.)
Thus, there is no practical sense in this feature.

--
Best Regards
Nikita Egorov

Ian MacArthur

unread,
Jun 10, 2013, 4:45:36 PM6/10/13
to fltkg...@googlegroups.com
On 10 Jun 2013, at 16:02, Nikita Egorov wrote:

>> - It is *rumoured*, at least with VS2005, that if you save your source file in UTF8, but ensure there is NO BOM on the file, then the VS2005 compiler (at least, possibly others) can not tell what encoding the source is in so passes the strings through *without* re-encoding them. Which would work for us I guess?
>>
>> So, it would be interesting to know if that actually does work, i.e. build a fltk test program (containing non-ASCII widget labels) with e.g. VS2010, saving the source file in UTF8 with NO BOM, and see if the compiler re-encodes the strings (hence breaking them) or whether that actually works...
>
> Hi, Ian
> Yes, that's true. If the utf8 source file has not BOM then the
> compiler works correctly.


OK - well that's useful to know; at least if others are reading this, we now know that using the pragma can help (on supported VS compiler versions) -OR- saving the file as UTF8 with NO BOM will work too.

Or, you know, don't us the VS compiler, I guess...


> But every time BOM appears again when the VS editor saves the file. (I
> used notepad++ to remove BOM.)


Huh, so there's no way to tell the VS editor to *not* write a BOM?
That's hopeless...

Just checked, and so far as I can see, all the editors I have on this Mac (and that's quite a few...) have options for whether or not to write a BOM into a UTF8 file.

I wonder why I ever stopped using VS...


> Thus, there is no practical sense in this feature.


Oh well, I suppose if folks are using the VS compiler, but with other editors (and that might describe Greg's workflow, I suspect) then that can work for them.
But not with the VS IDE itself, it seems!




Nikita Egorov

unread,
Jun 10, 2013, 5:41:46 PM6/10/13
to fltkg...@googlegroups.com
>
> Huh, so there's no way to tell the VS editor to *not* write a BOM?
> That's hopeless...

Yes, google said to me it's possible :). "File->Advanced Save
Options..." In the open dialog user should select an item "Unicode
(UTF8 without signature) - Codepage 65001".
There is a small defect: you have to apply it to every file within
your project!
And at the moment I don't know how to set the item as global option
for any files...

--
Nikita Egorov
Reply all
Reply to author
Forward
0 new messages