Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

change file encoding in batch file

2,441 views
Skip to first unread message

sean

unread,
Oct 11, 2016, 6:40:36 PM10/11/16
to
Hi All,

I have a need to change file encoding from whatever it is now to utf-16
and I'd like to do this programmatically without much hassle. Doing a
few searches, I came across chcp and various non-bundled windows tools.
I'd prefer if there were no external dependencies, but if that's not
easily doable, bundle as few dependencies as necessary in my batch file.

The searches I performed reference chcp 1201 and chcp 65001 but doing
the former in the command prompt, I get invalid code page error.

Any pointers on changing the file type will be greatly appreciated!

Best,
sean

David Solimano

unread,
Oct 15, 2016, 1:47:44 AM10/15/16
to
On Tue, 11 Oct 2016 15:40:34 -0700, sean
<se...@sean.eternal-september.org> wrote:

>Hi All,
>
>I have a need to change file encoding from whatever it is now to utf-16
>and I'd like to do this programmatically without much hassle. Doing a
>few searches, I came across chcp and various non-bundled windows tools.
>I'd prefer if there were no external dependencies, but if that's not
>easily doable, bundle as few dependencies as necessary in my batch file.
>
>The searches I performed reference chcp 1201 and chcp 65001 but doing
>the former in the command prompt, I get invalid code page error.

chcp changes the code page of your console, so I don't think that
would help you.

>
>Any pointers on changing the file type will be greatly appreciated!

I'd recommend iconv - https://en.wikipedia.org/wiki/Iconv. I don't
know of any bundled tool off the top of my head . . .

Are you on DOS, or Windows in a command prompt? PowerShell makes this
possible -
http://stackoverflow.com/questions/18684793/powershell-batch-change-files-encoding-to-utf-8

But only for Windows, obviously.

>
>Best,
>sean
--
David Solimano
da...@solimano.org

sean

unread,
Oct 15, 2016, 2:36:40 AM10/15/16
to
On 10/14/2016 10:47 PM, David Solimano wrote:
> On Tue, 11 Oct 2016 15:40:34 -0700, sean
> <se...@sean.eternal-september.org> wrote:
>
>> Hi All,
>>
>> I have a need to change file encoding from whatever it is now to utf-16
>> and I'd like to do this programmatically without much hassle. Doing a
>> few searches, I came across chcp and various non-bundled windows tools.
>> I'd prefer if there were no external dependencies, but if that's not
>> easily doable, bundle as few dependencies as necessary in my batch file.
>>
>> The searches I performed reference chcp 1201 and chcp 65001 but doing
>> the former in the command prompt, I get invalid code page error.
>
> chcp changes the code page of your console, so I don't think that
> would help you.

Ah, that definitely won't help in my case.

>
>>
>> Any pointers on changing the file type will be greatly appreciated!
>
> I'd recommend iconv - https://en.wikipedia.org/wiki/Iconv. I don't
> know of any bundled tool off the top of my head . . .

iconv looks pretty promising. It somewhat looks like I need to specify
the existing file type, but in a quick test I did on freebsd, it looks
like I can fake it.

% touch david
% echo "hello david" > david
% file david
david: ASCII text
% iconv -f UTF-8 -t UTF-16 < david > d
% file d
d: Big-endian UTF-16 Unicode text


>
> Are you on DOS, or Windows in a command prompt? PowerShell makes this
> possible -
> http://stackoverflow.com/questions/18684793/powershell-batch-change-files-encoding-to-utf-8
>
> But only for Windows, obviously.
>

yes, windows. Clients vary, windows 7 or xp and the server also
varies-2008 or 2012.

Thanks for pointing me to the SO post, too.

sean

unread,
Oct 15, 2016, 5:16:54 AM10/15/16
to
On 10/14/2016 11:36 PM, sean wrote:
> iconv looks pretty promising. It somewhat looks like I need to specify
> the existing file type, but in a quick test I did on freebsd, it looks
> like I can fake it.


This isn't the case in Windows from my testing.

If a file is already utf-16 and I convert it to utf-16, the new file is
blank.

iconv: c:\temp\bin\temp.xml: cannot convert

So knowing this, is there a way to detect the file type so I can skip
files that are already correct?

Thanks,
sean

David Solimano

unread,
Oct 15, 2016, 10:37:56 AM10/15/16
to
Looks like GnuWin32 ships with 'file' which can determine that.
There's also a bit of powershell magic here you can plug into.

http://stackoverflow.com/a/28079177/58074

It just looks for a BOM to see if it's UTF. I think UTF-16 should
always have a BOM, so you should be good there.
--
David Solimano
da...@solimano.org

foxidrive

unread,
Oct 15, 2016, 12:03:39 PM10/15/16
to
On 12/10/2016 09:40, sean wrote:

> I have a need to change file encoding from whatever it is now to utf-16
> and I'd like to do this programmatically without much hassle.
>
> Any pointers on changing the file type will be greatly appreciated!

I'm not so clued up on UTF encoding but reading around it indicates that
your files may have a BOM that you can check for.

A BOM isn't mandatory and it would depend on the files you're processing to
see if they all have one, and also the form of the BOM.


sean

unread,
Oct 15, 2016, 10:48:50 PM10/15/16
to
I believe I have something that I can work with!

this is change.ps1:
(Get-Content "C:\file\settings.xml") | Out-File -Force -Encoding
BigEndianUnicode "C:\file\settings.xml"

In another script, I have xcopy copy the above script to a temp
directory and then with psexec, I execute this:

bin\psexec -accepteula -nobanner -u %USER_NAME% -p %PASSWORD% \\%%A
powershell -inputformat none -ExecutionPolicy ByPass -File
"\\%%A\c$\temp\bin\change.ps1"

the various variables are set at the top of the script, as usual.

The out-file encoding doesn't care what the current format is so I don't
have to worry about like I was was iconv.

Of course this may fail on Windows XP machines, but if powershell.exe is
just a single binary, maybe I can ship it with my script and it will
work. I don't know if that's the case, seems too easy to be true.

Thanks for the pointers to powershell and iconv!

sean

unread,
Oct 15, 2016, 10:54:03 PM10/15/16
to
The company has a policy to use utf-16, no idea if that's BOM or not,
but there's no technical reason utf-8 can't be used. I first encountered
this problem when trying to parse the files with
http://xmlstar.sourceforge.net/

I may have been (and still should look into it) a method to see if
xmlstar can parse xml files regardless of the encoding. I think was
really upset it was that the file said it was a utf-16 file but the
encoding was utf8.

thanks,
sean
0 new messages