Hi guys,
to my understanding, current versions of perl have utf-8 support
activated by default to recognize utf-8 files and treat them
accordingly. Even variables containing utf-8 encoded characters are
internally marked and treated as such in matching operations etc. So I
believe reopening the file will not be necessary. There are loads of
webpages that actually refer to older perl versions that did not have
the current level of utf-8 support and therefore propagate all manner of
workarounds no longer necessary.
The BOM is permissible, but superfluous on utf-8 files. That is unless
they originated in a different encoding and are being altered and
returned to the sender. So unless Gedcom.pm intends to be able to export
encodings other than ASCII or utf-8, it should be perfectly safe to
simply delete/ignore the BOM when it is ASCII- or utf-8-encoded.
I personally wouldn't ignore it when encoded otherwise as that might
lead to problems later on in processing the actual data. In those cases
an error upon processing the BOM should be just what the doctor ordered.
I'm not an authority on utf-8, it's simply how I understand
http://www.perlmonks.org/?node_id=599720
https://en.wikipedia.org/wiki/Byte_Order_Mark
http://www.unicode.org/faq/utf_bom.html#22
applied to this situation.
If I should be wrong I would appreciate being corrected as I will be
spending some time on the processing of utf-8 gedcom files using
Gedcom.pm in the immediate future.
Michael