Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Converting UTF-8 to UTF-16

213 views
Skip to first unread message

Frank P. Westlake

unread,
Aug 4, 2009, 5:21:45 AM8/4/09
to
The follow-up message contains the script "UTF8to16.cmd" which will write a UTF-16LE file from a UTF-8 file. This might be useful if it is desirable to display Unicode from a UTF-8 file in the local codepage instead of in the UTF-8 codepage.

The script is also contained in:

<http://geocities.com/fp.westlake/utftools.zip>


Frank

Frank P. Westlake

unread,
Aug 4, 2009, 7:10:52 AM8/4/09
to
:: BEGIN FILE ::::::::::::::::::::::::::::::::::::::::::::::::::::
:: UTF8to16.cmd
:: Write a UTF-16 file from a UTF-8 file.
:: Frank P. Westlake, 2009-07-27
@Echo OFF
SetLocal ENABLEEXTENSIONS ENABLEDELAYEDEXPANSION
If /I "%1" EQU "/?" (
Echo.Writes UTF-16LE from a UTF-8 file.
Echo.
Echo. %0 filein fileout
Echo.
Echo. filein Name of the new UTF-8 file.
Echo. fileout Name of the new UTF-16 file.
Echo.
Echo.Example:
Echo. %0 UTF8.txt UTF16.txt
Goto :EOF
)
Set "Me=%~n0"
Set "FileIn="
Set "FileOut="
:: Alterable environment:
Set "MyDir=%temp%\ASCII"
Set "TmpFile=%TEMP%\%Me%"
:: End alterable environment
If "%1" EQU ":WriteBinaryFiles" (Shift & Goto :WriteBinaryFiles)
:args
If /I "%1" EQU "/NOCI" (
Set "CI="
Shift
) Else If DEFINED FileOut (
Echo.%Me%: Too many filenames. >&2
Goto :EOF
) Else If DEFINED FileIn (
Set "FileOut=%1"
Shift
) Else (
Set "FileIn=%1"
Shift
)
IF "%1" NEQ "" Goto :args
If "%FileIn%" EQU "" (
Set /P "FileIn=%Me%: Please enter the name of the existing UTF-8 file: " >&2
)
If "%FileOut%" EQU "" (
Set /P "FileOut=%Me%: Please enter the name of the new UTF-16 file: " >&2
)
If "%FileIn%" EQU "" (Echo.%Me%: Aborting. Need input filename. >&2 & Goto :EOF)
If "%FileOut%" EQU "" (Echo.%Me%: Aborting. Need output filename. >&2 & Goto :EOF)
For %%f in (%FileIn%) Do (Set "FileIn=%%~ff" & Set "fs=%%~zf")
For %%f in (%FileOut%) Do (Set "FileOut=%%~ff")
Set "TmpFile=%TEMP%\%~n0.tmp"
Set "HX=0123456789ABCDEF"
Start "" /wait /MIN %Me% :WriteBinaryFiles %MyDir%
ChDir /d %MyDir%
Set "FSUtil=1"
If NOT EXIST FSUTIL.EXE (
For %%f in (FSUTIL.EXE) Do (
If NOT EXIST %%~$PATH:f (
Set "FSUtil="
)
)
)
If DEFINED FSUTIL (
FSUtil FILE CREATENEW %TmpFile%.fc %fs% >NUL:
) Else (
TYPE NUL: >%TmpFile%.fc
For /L %%i in (1 1 %fs%) Do TYPE ASCII00.0 >>%TmpFile%.fc
)
Set /a b=-1, U=0, n=0
Type NUL: >%FileOut%
For /F "skip=1 tokens=1,2 delims=: " %%a in (
'FC /b %FileIn% %TmpFile%.fc') Do (
Set /A "byte=0x%%b"
If !byte! LSS 0x80 ( REM ASCII, Byte 1 of 1
Set /A "U=byte, b=1, n=1"
) Else If !byte! LSS 0xC0 ( REM Byte 2, 3 or 4
Set /A "U<<=6, byte&=0x3F, U|=byte, b+=1"
) Else If !byte! LSS 0xC2 ( REM Overlong
Echo.%Me%: Aborting. Overlong encoding of ASCII character at %%ah. >&2
Exit /B 1
) Else If !byte! LSS 0xE0 ( REM Byte 1 of 2
Set /A "U=byte&0x1F, n=2, b=1"
) Else If !byte! LSS 0xF0 ( REM Byte 1 of 3
Set /A "U=byte&0x0F, n=3, b=1"
) Else If !byte! LSS 0xF5 ( REM Byte 1 of 4
Set /A "U=byte&0x07, n=4, b=1"
) Else ( REM Restricted or undefined.
Echo.%Me%: Aborting. Restricted or undefined character at %%ah. >&2
Exit /B 2
)
If !b! EQU !n! (
If !U! GTR 0xFFFF (
Set /A "U-=0x10000, UL=U&0x3FF, UL|=0xDC00"
Set /A "U>>=10, U&=0x3FF, U|= 0xD800"

Set /A "L=U&0x00FF, U>>=8, n=0"
TYPE ASCII??.!L! >>%FileOut% 2>NUL:
TYPE ASCII??.!U! >>%FileOut% 2>NUL:

Set /A "L=UL&0x00FF, UL>>=8"
TYPE ASCII??.!L! >>%FileOut% 2>NUL:
TYPE ASCII??.!UL! >>%FileOut% 2>NUL:
) Else (
Set /A "L=U&0x00FF, U>>=8, n=0"
TYPE ASCII??.!L! >>%FileOut% 2>NUL:
TYPE ASCII??.!U! >>%FileOut% 2>NUL:
)
)
)
For %%x in (fc) Do Erase %TmpFile%.%%x
Goto :EOF
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
:WriteBinaryFiles path
SetLocal
MkDir %MyDir% >NUL: 2>&1
ChDir /d %MyDir%
FOR /L %%i in (0 1 0x7F) Do If NOT EXIST ASCII*.%%i (
Set /A "h1=(%%i&0xF0)>>4, h2=(%%i&0x0F)"
Call Set "h=%%HX:~!h1!,1%%%%HX:~!h2!,1%%"
(
CALL Echo N ascii%%h%%.%%i
CALL Echo E 0000 %%h%%
Echo R CX
Echo 1
Echo W 0
Echo Q
) | DEBUG >NUL:
)
Exit
Goto :EOF
:: END FILE ::::::::::::::::::::::::::::::::::::::::::::::::::::::

foxidrive

unread,
Aug 4, 2009, 8:52:06 AM8/4/09
to


Page not found error, Frank.

Herbert Kleebauer

unread,
Aug 4, 2009, 10:51:20 AM8/4/09
to
"Frank P. Westlake" wrote:

> CALL Echo N ascii%%h%%.%%i
> CALL Echo E 0000 %%h%%
> Echo R CX
> Echo 1
> Echo W 0
> Echo Q
> ) | DEBUG >NUL:


I think this code is a problem because 16 bit code like debug
can't be executed in 64 bit Windows. And in a short time even the
cheapest PCs will be equipped with more than 4 Gbyte RAM and
therefore will use 64 bit Windows. I wasn't abel to find a
way to generate binary files from a batch program without using
16 bit code. And when you need an additional external program
(like a hex2bin converter), then it doesn't make any sense to
use an UTF8to16.cmd because then you can also directly distribute
an UTF8to16.exe instead of a hex2bin.exe.

Frank P. Westlake

unread,
Aug 5, 2009, 6:48:47 AM8/5/09
to
"Frank P. Westlake" news:h594vs$iv2$1...@news.albasani.net...
> :WriteBinaryFiles path

> FOR /L %%i in (0 1 0x7F) Do If NOT EXIST ASCII*.%%i (

The above FOR statement from the :WriteBinaryFiles routine should have the parameters "(0 1 0xFF)".

Frank

Frank P. Westlake

unread,
Aug 5, 2009, 7:09:08 AM8/5/09
to
"foxidrive" news:tmbg75hrjobos5a9t...@4ax.com...
>> <http://geocities.com/fp.westlake/utftools.zip>

> Page not found error, Frank.

It might have been stored in upper case. I uploaded a lower case file this time.

Frank


Frank P. Westlake

unread,
Aug 5, 2009, 7:18:29 AM8/5/09
to
"Herbert Kleebauer" news:4A784AE8...@unibwm.de...

> ... 16 bit code like debug


> can't be executed in 64 bit Windows.

I know; but the necessary files can be easily created manually with a hex editor.

These scripts are intended to be used locally by a human operator. In some cases they can be used in an automated environment.

Frank

foxidrive

unread,
Aug 5, 2009, 7:38:29 AM8/5/09
to
On Wed, 5 Aug 2009 04:09:08 -0700, "Frank P. Westlake"
<frank.w...@yahoo.com> wrote:

>>> <http://geocities.com/fp.westlake/utftools.zip>


>
>It might have been stored in upper case. I uploaded a lower case file this time.

Thanks Frank, it works fine.

Frank P. Westlake

unread,
Aug 5, 2009, 9:38:11 AM8/5/09
to
"Herbert Kleebauer" news:4A784AE8...@unibwm.de...
> ... 16 bit code like debug can't be executed in 64 bit Windows.

Sometimes when MS removes something from the OS they later add it back. For example, Windows Vista is distributed without the ability to display the old style *.HLP files, but they now offer the program to display them as a download from their servers. A couple more examples are FORFILES.EXE and CHOICE.EXE, which were removed from previous OSs (NT4 at least) but are now distributed with Windows Vista.

Hopefully, perhaps after enough complaints, they will add a 32-bit or 64-bit debug. I suspect they have long ago rewritten the program but decided to not include it with 32-bit and later OSs.

Can cscript be used to write individual bytes of any value (00-FFh)?

Frank

Herbert Kleebauer

unread,
Aug 5, 2009, 2:47:06 PM8/5/09
to
"Frank P. Westlake" wrote:
> "Herbert Kleebauer" news:4A784AE8...@unibwm.de...

> > ... 16 bit code like debug can't be executed in 64 bit Windows.
>
> Sometimes when MS removes something from the OS they later add
> it back. For example, Windows Vista is distributed without the
> ability to display the old style *.HLP files, but they now offer
> the program to display them as a download from their servers. A
> couple more examples are FORFILES.EXE and CHOICE.EXE, which were
> removed from previous OSs (NT4 at least) but are now distributed
> with Windows Vista.

For me a batch solution only makes sense, when the batch can
be executed in a standard installation. If you first have to
install additional, optional software to run the batch, then
in most cases it is easier to not use batch code at all but
use an exe which directly does the job.



> Hopefully, perhaps after enough complaints, they will add a 32-bit
> or 64-bit debug. I suspect they have long ago rewritten the program
> but decided to not include it with 32-bit and later OSs.

A 32 bit debug.exe wouldn't be of much help. Debug is mostly
used to generate small 16 bit com files, but 16 bit code doesn't
run in 64 bit mode. And if you use debug only for writing
binary bytes to a file, then a simple extension to the echo
command which would allow to echo binary bytes (\nnn) would be
much better. The bad guy is AMD, which don't support V86 mode
when CPU is in 64 bit mode. And I suppose, Microsoft was quite happy
to get rid of 16 bit legacy code this way. And I doubt, that
MS will add a software emulated V86 mode so you can continue to
use your collection of 16 bit utilities. Instead in Windows7 you
can optionally install a XP in a virtual machine where the installed
XP software is also available from Windows7 by transparently
executing the program in the virtual XP machine.

Also the end of cmd.exe seems to be not to far. Windows7 comes with
PowerShell installed and you can download it for XP/Vista.



> Can cscript be used to write individual bytes of any value (00-FFh)?

I don't know, but if you use cscript, why use batch code at all?

Todd Vargo

unread,
Aug 5, 2009, 5:08:34 PM8/5/09
to
Frank P. Westlake wrote:
snip...

>
> Can cscript be used to write individual bytes of any value (00-FFh)?

It sure can.

--
Todd Vargo
(Post questions to group only. Remove "z" to email personal messages)

Frank P. Westlake

unread,
Aug 6, 2009, 7:14:49 AM8/6/09
to
"Herbert Kleebauer" news:4A79D3AA...@unibwm.de...

> For me a batch solution only makes sense, when the batch can
> be executed in a standard installation.


There are a great many different ways to use a computer. It appears that your focus in on multiple users in a business or educational enviroment. Mine is single user, single computer.

This newsgroup and CMD.EXE aren't particular.

Frank

Herbert Kleebauer

unread,
Aug 6, 2009, 2:51:04 PM8/6/09
to
"Frank P. Westlake" wrote:
> "Herbert Kleebauer" news:4A79D3AA...@unibwm.de...
>
> > For me a batch solution only makes sense, when the batch can
> > be executed in a standard installation.
>
> There are a great many different ways to use a computer. It
> appears that your focus in on multiple users in a business
> or educational enviroment.

My focus is, to solve a given problem with a minimal effort.
And this in the first place means to select the proper language
or combination of languages. If speed doesn't matter, many things
can be done with a few lines of batch code. But sometimes you need
some additional support which isn't provided by the tools available
in a standard Windows installation. In 32 bit Windows this isn't
a problem, because you can embed small programs either in ascii form
or by a debug script, in 64 bit Windows this is no longer possible.
And if a batch program needs an external program which has to be
provided separately, then I see no advantage compared to the use
of an external, more powerful shell instead of cmd.exe.


> Mine is single user, single computer.

But what is then your motivation to implement an utf conversion
in batch code instead of maybe in C. In this group some people
mentioned, that they are not allowed to use external programs
so the have to do it in pure batch, but if it's your own computer,
you can install anything you like. I also don't understand the
logic to allow to copy and paste (or manually retype) some batch
code posted here but disallow to copy and paste some C or assembly
code and compile it to an executable. Well written C code is much
better readable than batch code and for security it doesn't matter
whether the source code is interpreted (like batch code) or compiled
and then executed (like C code).

0 new messages