Thoughts On Integrating File::BOM

7 views
Skip to first unread message

Brandon McCaig

unread,
Jun 13, 2017, 6:24:17 PM6/13/17
to ack users
I wonder if it would be useful to integrate File::BOM with a special option like --file-bom or --bom or something to attempt to decode Unicode files (of various encodings) before greping the text. Being a Perl program it should be relatively easy to piggy back on File::BOM and relying on the BOM should be a relatively safe mechanism to operate on without any false positives (and if the user asked for it they obviously are OK with whatever risks are involved). I typically work with code points in the 0x00-0x1F range, and I can match them in UTF-16LE files if I prefix each byte with /\x{00}?/, but that's a pain to do. It would be convenient if ack would check for a BOM and decode automatically, falling back to not decoding if it can't figure it out. While UTF-8 is far superior for code, some tools, especially in Windows, insist upon using UTF-16LE by default and even make it difficult to change this. It seems on-topic to look for a solution to this problem unless somebody else already know of one. Thoughts?
Reply all
Reply to author
Forward
0 new messages