Hi Ben,
On Sun, Oct 15, 2023 at 02:22:43AM -0700, Ben Wong wrote:
> Package: dos2unix
> Version: 7.5.1-1
> Severity: normal
> X-Debbugs-Cc:
bugs.de...@wongs.net
>
> Dear Maintainer,
>
> The dos2unix man page claims that the default mode is "ASCII" and that
> in ASCII mode only line endings will be changed. This is no longer
> true. In the default mode, UTF-16 is converted to UTF-8 and the BOM is
> removed.
>
> I do not know if this is still considered an "ASCII" mode or if the
> default is some new UTF-8 mode. Please consider updating the
> documentation to match the current behavior.
Thank you for your bug report.
I believe the portion of the manpage you are referring to is:
CONVERSION MODES
ascii
In mode "ascii" only line breaks are converted. This is the default
conversion mode. [**Missing information about UTF-16 behavior.**]
Although the name of this mode is ASCII, which is a 7 bit standard,
the actual mode is 8 bit. Use always this mode when converting
Unicode UTF-8 files.
Is this where you are expecting to see the manpage updated?
It is perhaps somewhat hidden in the manpage, but I think this at least
partially addresses the use case you describe:
-u, --keep-utf16
Keep the original UTF-16 encoding of the input file. The output
file will be written in the same UTF-16 encoding, little or big
endian, as the input file. This prevents transformation to UTF-8.
An UTF-16 BOM will be written accordingly. This option can be
disabled with the "-ascii" option.
That is, the use of -ascii (the default) negates --keep-utf16 and thus
*does* perform the transformation to UTF-8 and *does not* write the
UTF-16 BOM.
I will forward the report to the upstream author.
Thank you,
tony