On 19.11.2016 22:59 Jean-Yves Garneau wrote:
> Tell me, if FLUID use UTF-8 internally, it's easy to add general option
> to generate UTF-8 file from FLUID, with or without BOM? It's just a
> fwrite(), no?
No, it's not just a fwrite(). There can always be characters inside a
string that must be quoted (decimal 0-31, e.g. 10 = 0x0a = <LF> = '\n')
or DEL (decimal 127). The current fluid code does also quote all
character values in the range 128 to 255.
I did not write the code, but I can only assume that this is always safe
for all compilers, as I wrote before.
> A patch can be fine for me, but everybody today want to use universal
> caracter coding without code page and no 7 bits ASCII.
I agree, but it's still problematic. The patch I append should work for
all Unicode characters if the compiler you use interprets strings as UTF-8.
> VS2015 support utf-8 and compile well.
I don't know the Visual Studio compilers very well, but I know of their
option to define UNICODE (not sure, maybe something similar?) and use
the TEXT macro for strings to distinguish ASCII and "Unicode"
compilations. In case of "Unicode" they expect "their own" wide
character encoding (UTF-16), AFAICT. I'm not sure about the
implications, but if you don't define Unicode then it should just work.
> Thank you for your support!
Welcome.
Now to the patch: I attach three files to this post for later reference:
(1) test.fl: a modified version of your fluid file with all ISO-8859-1
characters encoded as UTF-8 (only extended range, not ASCII part). This
is also a subset of Microsoft's Windows Codepage 1252 ("Western").
Unicode range U+00a0 to U+00ff).
(2) main.cxx: a main program to compile test.cxx. This #include's
test.cxx and indirectly test.h generated by fluid from test.fl (I didn't
want to add a main program to your test.fl file).
(3) fluid_write_code_utf8.patch: the patch against FLTK 1.3.4 (stable
release).
This patch basically does three things:
- Fix reading character string bytes "unsigned", i.e. in range 0-255.
- Don't limit line length to avoid breaking lines inside UTF-8 char's.
- Write all ASCII and UTF-8 characters literally, i.e. without quoting.
You may use this patch if it works for you. Note that this is tested
with your and my modified test cases, resp., but I'm not sure if this
will be okay for all users and compilers. Please report if it works for you.
Note: this will not be integrated if FLTK 1.3.x because FLTK 1.3 is
closed for new features. If you want this to be in FLTK 1.4 please file
an STR with status RFE (Request for enhancement) for FLTK 1.4
("1.4-feature") at our "Bugs & Features") page:
http://www.fltk.org/str.php
Note 2: A "complete" solution would split strings (limit line length)
w/o breaking inside UTF-8 characters and would presumably have an option
to switch literal UTF-8 output on and off (on: literal/new vs. off:
octal-quoted/old behavior).