In fact I had the problem too. Some crappy editor (like notepad++) may
add that BOM as the first character.
If one uses the “short” title (= My document) asciidoc doesn't correctly
parse it because the first character in the line is not the =.
It'd be nice if asciidoc could skip it (may save time for people because
it might be hard to know what happens) but it's obviously a problem in
the editor and corner-casing everything might a be a little hard
I not clear what the problem is, are you saying that it's caused by an editor
automatically inserting a BOM character at the start of a document?
How do you you repeat the problem?
Yes, some editor insert a BOM in front of UTF-8 documents (notepad++
does, with the default configuration, iirc). I don't have a windows so I
can't reproduce, but I'm sure someone else could provide a legit
document (or I could just manually insert a BOM in a text document using
an hex editor).
I saved 3 files in MS Notepad in UTF-8, the start of the file contained the
UTF-8 BOM (ef bb bf).
If the solution is simply to strip the leading utf8 BOM from the file then this
would be straight-forward. Are there any implications to doing this?
I don't think so. Even in the original .txt file the BOM shouldn't be
needed (afaik UTF-8 doesn't need it at all, UTF-8 is endian-indep). But
the generated file shouldn't bother at all, and the running script
either, since it'll accept valid UTF-8 correctly.
I don't exactly know what you should do when you encounter a BOM on
UTF-8 file, but “ignore it” seems valid for me.
Next time, just use an hexadecimal editor like bvi :)