On Tuesday, 6 June 2017 17:31:00 UTC+3, Alf P. Steinbach wrote:
> On 06-Jun-17 10:33 AM, Öö Tiib wrote:
> > On Tuesday, 6 June 2017 04:40:51 UTC+3, Christiano wrote:
> >>
> >> 1- Verify if your source is encoded with UTF-8 using an Hex editor (example: HxD )
> >> Verify the BOM of the source file [2]
> >
> > Software putting BOM to UTF-8 is doing it wrong.
> > According to the Unicode standard, the BOM for UTF-8 files is neither
> > required nor recommended:
>
> In earlier years (I think pre 2015) you needed a BOM for UTF-8 to
> identify the encoding for the Visual C++ compiler, even for a single
> user code file translation unit.
Microsoft is fully capable to drop their megalomania and to adapt to
reality. In my experience far more capable than Apple for example.
Yes it has been reason of pain that big guys can insist on doing it
wrong.
>
> Now MSVC can be informed of the encoding of the main file via an option,
> but you still need that BOM to identify UTF-8 as such in included
> headers, when they can be included from files with other encodings.
The second reason of nonsense has always been the people who use a full
rainbow of possibilities together. Why to use all available encodings in single
project? May be also change the encoding of each file now and then?
Such people IMHO deserve that tools, starting from repo slow down and
sometimes hiccup on that thing.
> Unfortunately g++ doesn't do such encoding detection: AFAIK it's unable
> to handle different source encodings in the same translation unit.
> Earlier g++ was even unable to handle BOM in an UTF-8 file, which was a
> huge problem: g++ couldn't handle it, while MSVC required it…
That is the third reason of pain ... the people who have ultra narrow mind
and speak only ASCII.
>
> So, a BOM in UTF-8 files has certain clear advantages, over and above
> being the Windows convention, and it has no problems except with some
> old *nix tools, which at one time included the g++ compiler. The Unicode
> standard's wording is unfortunate because many *nix fanboys read “not
> recommended” as “recommended to abstain from”, so that the sorry lack of
> support in many *nix tools, at one time, could be argued as being
> positive standard-conformance rather than the negative low quality it
> was.
I did not mean that software handling UTF-8 should be incapable of
handling BOM. I meant that software accepting text should support
UTF-8 without BOM. UTF-8 is right now about 90% of internet text
content and so people won't buy excuses that one shows garbage
because there was no BOM. A tool that rejects everything else but
UTF-8 would be likely more acceptable.
> Microsoft is a founding member of the Unicode consortium and the
> UTF-8 BOM convention is crucial for some of their APIs and tools,
> including Visual C++, so the “recommended to abstain from”
> interpretation is very unlikely to be the single intended meaning. At a
> guess the wording was intentionally ambiguous, a political thing.
Microsoft is actually most capable of adapting.