Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

CMD unicode check?

254 views
Skip to first unread message

JJ

unread,
Jul 24, 2012, 4:00:05 AM7/24/12
to
I've seen a old thread somewhere in newsgroup about determining
whether CMD is runing in Unicode or ANSI mode, but I can't seem to
find it again. I can't remember if it was posted here either. Or it
might be in 4DOS newsgroup that's discussing 4NT, instead of CMD.

Basically, I need it for my batch file. Since I need to work with REG
registry file, which is in Unicode, my batch file need to enter
Unicode automatically using "CMD /U /C mybatch.cmd". But without the
Unicode mode check, the batch file would go into endless loop. So,
could anyone care to explain to me (again)?

Sorry for the trouble.

frank.w...@gmail.com

unread,
Jul 24, 2012, 6:43:23 AM7/24/12
to
From JJ :
>...my batch file need to enter
>Unicode automatically using "CMD /U /C mybatch.cmd". But
>without the
>Unicode mode check, the batch file would go into endless
>loop.

I use a command line parameter to avoid that. Something
like:

If "%1" neq "/subprocess" (
"%comspec%" /c:"/subprocess *%"
goto :eof
)

Frank

frank.w...@gmail.com

unread,
Jul 24, 2012, 8:17:17 AM7/24/12
to
From frank.westlake:

> If "%1" neq "/subprocess" (
> "%comspec%" /c:"/subprocess *%"
> goto :eof
> )

Oops. More like this

If "%~1" neq "/subprocess" (
"%comspec%" /u /c:"mybatch.cmd /subprocess *%"
goto :eof
)

Frank

JJ

unread,
Jul 25, 2012, 12:06:41 AM7/25/12
to
Thaks, it solves my problem although it's not for checking the
Unicode mode.

Do CMD in Vista and Windows 7 still run in ANSI mode as default
mode?

Frank P. Westlake

unread,
Jul 25, 2012, 9:57:55 AM7/25/12
to
On 2012-07-24 21:06, JJ wrote:
> Thaks, it solves my problem although it's not for checking the
> Unicode mode.

I don't think there is any Registry or any system setting that can be
examined. It is an internal CMD setting and there is no console command
that exposes this setting.

> Do CMD in Vista and Windows 7 still run in ANSI mode as default
> mode?

How do we know? I think the only way you can determine which mode the
script's CMD is in is to pipe some Unicode into something else and
examine what has been received. I don't know which internal commands to
do this with, where to get guaranteed Unicode output from, and how to
examine the received text. The examination would probably require
writing a file and looking for 00 with FC.

The default might be language dependent so I think you are better off
making this test every time, if you can find out how to make the test.
If you can't, or if the test is awkward, and you don't need to retain
environment variables, then it is probably best to CMD/U your script
each time. Since the test would probably require writing temporary files
then if you need to retain environment variables you could write them
into a file and read them into the parent CMD's environment.

Is there someone with CMD/U experience willing to develop a procedure?

Frank

Herbert Kleebauer

unread,
Jul 25, 2012, 11:57:58 AM7/25/12
to
On 25.07.2012 15:57, Frank P. Westlake wrote:

> How do we know? I think the only way you can determine which mode the
> script's CMD is in is to pipe some Unicode into something else and
> examine what has been received. I don't know which internal commands to
> do this with, where to get guaranteed Unicode output from, and how to
> examine the received text. The examination would probably require
> writing a file and looking for 00 with FC.

With an external file you only need to check the size:

echo.>tmp.txt
for %%i in (tmp.txt) do if %%~zi==2 (echo no unicode) else if %%~zi==4 (echo unicode) else echo ????

Todd Vargo

unread,
Jul 25, 2012, 5:19:22 PM7/25/12
to
This is a good test. I was going to post something similar but yours was
already there. I would have used ASCII instead of "no unicode".

--
Todd Vargo
(Post questions to group only. Remove "z" to email personal messages)

frank.w...@gmail.com

unread,
Jul 26, 2012, 6:28:56 AM7/26/12
to
From Herbert Kleebauer :
>echo.>tmp.txt
>for %%i in (tmp.txt) do if %%~zi==2 (echo no unicode)
>else if %%~zi==4 (echo unicode) else echo ????

Thanks doc. I'll turn it into a portable test:

Set "unicode="
Set "T=%temp%\%0.%random%.txt"
For %%i in ("%T%") do (
ERASE "%T%"
If %%~zi==4 Set "unicode=true"
)
Set "T="

Use 'If DEFINED unicode' to determine if CMD is using
Unicode.

Frank

frank.w...@gmail.com

unread,
Jul 26, 2012, 6:46:02 AM7/26/12
to
From frank.westlake:
> Set "unicode="
> Set "T=%temp%\%0.%random%.txt"
> For %%i in ("%T%") do (
> ERASE "%T%"
> If %%~zi==4 Set "unicode=true"
> )
> Set "T="

>Use 'If DEFINED unicode' to determine if CMD is using
>Unicode.

Oops. I'm on an Android and am not able to check for
errors. This might work better:

Set "unicode="
Set "T=%temp%\%0.%random%.txt"
Echo.>"%T%"
For %%i in ("%T%") do (
ERASE "%T%"
If %%~zi==4 Set "unicode=true"
)
Set "T="


Frank

JJ

unread,
Jul 26, 2012, 10:58:06 AM7/26/12
to
Herbert Kleebauer <kl...@unibwm.de> wrote:

> With an external file you only need to check the size:
>
> echo.>tmp.txt
> for %%i in (tmp.txt) do if %%~zi==2 (echo no unicode) else if
> %%~zi==4 (echo unicode) else echo ????

Thanks, I forgot that CMD has variable modifiers that act directly
on some file properties.

Though I found that working in Unicode mode is more troublesome
than I thought because CMD won't recognize Unicode files generated
from redirection. e.g.:

VER>TST
REM vvv This will display file contents as if it is in ANSI format.
TYPE TST
REM vvv This won't display anything. Same reason as above.
FOR /F %%A IN (TST) DO ECHO %%A

Another unexpected result is that the Unicode REG registry file can
be handled properly even when not in Unicode mode when using
specific command. e.g.:

REM vvv This works.
TYPE EXPORTED.REG
REM vvv But this won't.
FOR /F %%A IN (EXPORTED.REG) DO ECHO %%A

At first, I thought it may be due to the presence of UTF16 BOM
(0xFEFF), but the result is kind of unpredictable, at least for me.
I mean, some commands work, some don't.

When in Unicode mode, files generated from redirections don't have
UTF16 BOM. I haven't checked whether UTF16 BOM presence matters for
input redirection.

I'll just have to experiment this on my own.

BTW, anyone know how to generate a UTF16 BOM programmatically other
than via DEBUG or prebuilt file?

Todd Vargo

unread,
Jul 26, 2012, 4:33:20 PM7/26/12
to
Hmmm, looks like it should work but it was not working for me. I had to
disable the erase command to discover it was 6 characters. I quickly
discovered that when I copy/paste the snippet, every line had a trailing
space which cause Echo.>"%T%" to add 2 extra characters. ISTM, a safer
way (where posting is concerned) would be to enclose the line in
parenthesis so any trailing spaces introduced will not become part of
the command.

(Echo.>"%T%")

Liviu

unread,
Jul 26, 2012, 7:48:37 PM7/26/12
to
"JJ" <jaejunks_at@_googlemail_dot._com> wrote...
>
> I haven't checked whether UTF16 BOM presence matters for
> input redirection.

Does not, AFAIK. There is no support for unicode input, and
all "|" pipes work on "narrow" characters in the active codepage.
I am not aware of any tricks or workarounds, though I'd be very
happy to be proven wrong ;-)

On the upside, environment variables and file/dir for loops fully
support unicode strings.

> BTW, anyone know how to generate a UTF16 BOM programmatically
> other than via DEBUG or prebuilt file?

(set /p =ÿþ) <nul >utf16.txt 2>nul
cmd /u /c echo more unicode text >>utf16.txt

Note that the two odd characters on the first line must be literally
0xFF 0xFE regardless of how they display in your editor, which depends
on the assumed codepage, fonts etc.

Liviu

P.S. Regarding the original question, you may have been thinking at some
variation of the trick checking %cmdcmdline% for an embedded "/u".





frank.w...@gmail.com

unread,
Jul 27, 2012, 6:59:26 AM7/27/12
to
From Todd Vargo :

>a safer way (where posting is concerned) would be to
>enclose the line in parenthesis so any trailing spaces
>introduced will not become part of the command.

>(Echo.>"%T%")

I agree, although I find it is often better to place
redirection outside

(Echo.)>"%T%"

to avoid the redirection thinking it should be from a
number in the output

(Echo Take 2>"%T%")

Take 3 (from an Android);

Set "unicode="
Set "T=%temp%\%0.%random%.txt"
(Echo(>)"%T%"

frank.w...@gmail.com

unread,
Jul 27, 2012, 8:25:10 AM7/27/12
to
From frank.westlake:
> Set "unicode="
> Set "T=%temp%\%0.%random%.txt"
> (Echo(>)"%T%"
> For %%i in ("%T%") do (
> ERASE "%T%"
> If %%~zi==4 Set "unicode=true"
> )
> Set "T="

I should stop trying to script CMD on an Android.

Take 4:

Set "unicode="
Set "T=%temp%\%0.%random%.txt"
(Echo()>"%T%"

Frank P. Westlake

unread,
Jul 27, 2012, 10:29:56 AM7/27/12
to
On 2012-07-26 16:48, Liviu wrote:
> P.S. Regarding the original question, you may have been thinking at some
> variation of the trick checking %cmdcmdline% for an embedded "/u".

That could easily fail. On my systems I often start CMD with a shortcut
set to

%comspec% /K"NewConsole.cmd"

If '/K' runs a program that takes the switch '/U' the you could have a
problem. For example

%comspec% /K"NewConsole.cmd /U Frank"

The variable CMDCMDLINE would be 'C:\windows\system32\cmd.exe
/K"NewConsole.cmd /U Frank"' and your parsing routine would have to be
able to determine that the '/U' was intended for CMD and not anything else.

Frank



Frank P. Westlake

unread,
Jul 27, 2012, 11:20:30 AM7/27/12
to
On 2012-07-26 07:58, JJ wrote:
> VER>TST
> REM vvv This will display file contents as if it is in ANSI format.
> TYPE TST
> REM vvv This won't display anything. Same reason as above.
> FOR /F %%A IN (TST) DO ECHO %%A

Sometimes a Unicode file can be usefully converted to ASCII by piping it
through MORE and concatenating the result. This should produce the same
output for both CMD and CMD/U:

@Echo OFF
SetLocal EnableExtensions EnableDelayedExpansion
ver>tst
Set "ver="
For /F "delims=" %%A in ('TYPE TST^|MORE') Do Set "ver=!ver!%%A"
Echo(%ver%

Frank

Liviu

unread,
Jul 27, 2012, 11:43:43 AM7/27/12
to
"Frank P. Westlake" <frank.w...@gmail.com> wrote...
> On 2012-07-26 16:48, Liviu wrote:
>
>> [...] the trick checking %cmdcmdline% for an embedded "/u".
>
> That could easily fail [...] and your parsing routine would have to
> be able to determine that the '/U' was intended for CMD

Yes, just checking for "/u" anywhere in %cmdcmdline% is not safe.
But the following (oversimplified) example is safe in the sense that
it will never go into infinite recursion, and it does always
guarantee that the @echo part runs in unicode-output mode.

@if "%cmdcmdline:~1,5%" neq "md /u" (
cmd /u /s /c ""%~0" %*"
goto :eof
)
@echo *** this always runs under /u ***

The above may sometimes miss an already unicode session (e.g.
cmdcmdline = cmd /v /u) and launch a secondary 'cmd /u' shell,
but that's a matter of performance rather than safety, and can be
technically solved with smarter parsing.

Liviu


frank.w...@gmail.com

unread,
Jul 27, 2012, 11:54:48 AM7/27/12
to
From "Liviu" :
>Yes, just checking for "/u" anywhere in %cmdcmdline% is
>not safe.
>But the following (oversimplified) example is safe in
>the sense that
>it will never go into infinite recursion, and it does
>always
>guarantee that the @echo part runs in unicode-output
>mode.

>@if "%cmdcmdline:~1,5%" neq "md /u" (
> cmd /u /s /c ""%~0" %*"
> goto :eof
>)
>@echo *** this always runs under /u ***

>The above may sometimes miss ...

Agreed. Why characters 1,5 and not 0,6? Wouldn't '/I'
permit 0,6?

@if /I "%cmdcmdline:~0,5%" neq "cmd /u" (

Frank

Liviu

unread,
Jul 27, 2012, 12:04:00 PM7/27/12
to
<frank.w...@gmail.com> wrote...
> From "Liviu" :
>
>>@if "%cmdcmdline:~1,5%" neq "md /u" (
>
> Agreed. Why characters 1,5 and not 0,6?

Just a quick hack. Character 0 could be a quote and complicate
the 'if' syntax more than suitable for an "oversimplified" example.

Liviu



JJ

unread,
Jul 27, 2012, 4:44:00 PM7/27/12
to
"Frank P. Westlake" <frank.w...@gmail.com> wrote:

> Sometimes a Unicode file can be usefully converted to ASCII by
> piping it through MORE and concatenating the result. This should
> produce the same output for both CMD and CMD/U:
>
> @Echo OFF
> SetLocal EnableExtensions EnableDelayedExpansion
> ver>tst
> Set "ver="
> For /F "delims=" %%A in ('TYPE TST^|MORE') Do Set
> "ver=!ver!%%A" Echo(%ver%

Ha, that really works! I just hope that I won't have to work with
large Unicode files.

JJ

unread,
Jul 27, 2012, 4:47:06 PM7/27/12
to
frank.w...@gmail.com wrote:

> I should stop trying to script CMD on an Android.

Or try to find CMD for *nix (sic).

Todd Vargo

unread,
Jul 27, 2012, 9:10:47 PM7/27/12
to
On 7/27/2012 6:59 AM, frank.w...@gmail.com wrote:
> From Todd Vargo :
>
>> a safer way (where posting is concerned) would be to
>> enclose the line in parenthesis so any trailing spaces
>> introduced will not become part of the command.
>
>> (Echo.>"%T%")
>
> I agree, although I find it is often better to place redirection outside
>
> (Echo.)>"%T%"

To each his own I guess.

>
> to avoid the redirection thinking it should be from a number in the output
>
> (Echo Take 2>"%T%")

This is pointless to mention because it has nothing to do with the file
size test above.

>
> Take 3 (from an Android);
>
> Set "unicode="
> Set "T=%temp%\%0.%random%.txt"
> (Echo(>)"%T%"
^
This is just plain wrong syntax to propose. I have never liked seeing
people use these junk delimiters with ECHO, and in the case where the
code includes nested parenthesis, it just promotes bad habits for the
copy/paste readers. JMO-YMMV.


> For %%i in ("%T%") do (
> ERASE "%T%"
> If %%~zi==4 Set "unicode=true"
> )
> Set "T="
>
> Frank

Frank P. Westlake

unread,
Jul 28, 2012, 8:34:22 AM7/28/12
to
On 2012-07-27 13:44, JJ wrote:
> Ha, that really works!

I think it may only be useful for Unicoded-ASCII (I can't recall the
correct term), and you lose linefeed characters. Will someone with a
multibyte character language please tell us what happens to 16-bit
characters? I suspect that they will be translated to the two 8-bit
characters.

Frank

Liviu

unread,
Jul 28, 2012, 6:23:59 PM7/28/12
to
"Frank P. Westlake" <frank.w...@gmail.com> wrote...
>
> Will someone with a multibyte character language please tell us
> what happens to 16-bit characters?

( To fill in the missing context, this is about piping unicode output
through "| more". )

16b "wide" chars are translated to 8b "narrow" chars based on
the active codepage. The mapping is neither lossless nor univocal
e.g. the (c) copyright symbol U+00A9 translates to a plain "c" in
codepage 437, see for example
http://www.dostips.com/forum/viewtopic.php?p=13030#p13030.

> I suspect that they will be translated to the two 8-bit characters.

No. A single "wide" char is mapped to a single "narrow" char (not
going to touch on unicode surrogates and combining chars here).

If no mapping is defined (for example with arbitrary CJK/asian texts),
then the "narrow" character is set to some conventional placeholder,
usually displayed as "?" in the console or an empty rectangle in GUIs.

Liviu


Todd Vargo

unread,
Jul 28, 2012, 8:03:58 PM7/28/12
to
On 7/27/2012 9:10 PM, Todd Vargo wrote:
> On 7/27/2012 6:59 AM, frank.w...@gmail.com wrote:
>> From Todd Vargo :
>>
>>> a safer way (where posting is concerned) would be to
>>> enclose the line in parenthesis so any trailing spaces
>>> introduced will not become part of the command.
>>
>>> (Echo.>"%T%")
>>
>> I agree, although I find it is often better to place redirection outside
>>
>> (Echo.)>"%T%"
>
> To each his own I guess.

BTW, Frank, I have noticed that you have responded to a number of my
posts indirectly through a different post. Is that an android feature?

Tom Del Rosso

unread,
Jul 29, 2012, 8:33:24 PM7/29/12
to
Sorry but I have a lot of questions about this. :)

What's the open-parenthesis on the last line for?

Where does unicode come in? The output of ver is ANSI, and has CRLF before
and after the text which is removed.

I don't understand what this does other than get one line of ver output into
a variable, which can be done more easily like so:

for /f "delims=" %%a in ('ver') do set v=%%a


--

Reply in group, but if emailing add one more
zero, and remove the last word.


frank.w...@gmail.com

unread,
Jul 30, 2012, 6:24:16 AM7/30/12
to
From "Liviu" :
>16b "wide" chars are translated to 8b "narrow" chars
>based on
>the active codepage. The mapping is neither lossless nor
>univocal

OK, then it appears that CMD runs the file through the
Windows API routine WideCharToMultiByte(). Good. Better
then directly using the 8-bit characters. We get an
approximation of the original character.

Frank

frank.w...@gmail.com

unread,
Jul 30, 2012, 6:32:47 AM7/30/12
to
From Todd Vargo :
>On 7/27/2012 9:10 PM, Todd Vargo wrote:
>> On 7/27/2012 6:59 AM, frank.w...@gmail.com wrote:
>>> From Todd Vargo :
>>>
>>>> a safer way (where posting is concerned) would be to
>>>> enclose the line in parenthesis so any trailing
>spaces
>>>> introduced will not become part of the command.
>>>
>>>>>(Echo.>"%T%")
>>>
>>> I agree, although I find it is often better to place
>redirection outside
>>>
>>>> (Echo.)>"%T%"
>>
>> To each his own I guess.

>BTW, Frank, I have noticed that you have responded to a
>number of my posts indirectly through a different post.
>Is that an android feature?

I wrote my own mail user-agent in JavaScript. Perhaps my
script isn't including all of the message ids in the
references header. I'll examine it later -- thanks. I
don't view a message tree with this application so that
would be why I haven't noticed.

Frank

frank.w...@gmail.com

unread,
Jul 30, 2012, 8:14:11 AM7/30/12
to
From "Tom Del Rosso" :

>> Echo(%ver%

>What's the open-parenthesis on the last line for?

To avoid the "Echo is off" if %ver% happens to be empty.
According to Jeb that is the best character to use to
avoid most problems. Jeb probably knows which character
is best for which circumstance but I don't.

>Where does unicode come in? The output of ver is ANSI,
>and has CRLF before
>and after the text which is removed.

It's all in the discussion but I'll summarise.

'VER>file' outputs Unicode when CMD is started with the
switch '/U'. A Unicode file fails with FOR unless it is
piped through MORE, but the nulls in the ASCII
characters cause each character to be read on a separate
line. To get them all back on one line append them to
one variable.

Frank

Dr J R Stockton

unread,
Jul 31, 2012, 2:23:23 PM7/31/12
to
In alt.msdos.batch.nt message <138d777c9b5$frank.w...@gmail.com>,
Mon, 30 Jul 2012 10:32:47, frank.w...@gmail.com posted:

>I wrote my own mail user-agent in JavaScript. Perhaps my script isn't
>including all of the message ids in the references header. I'll
>examine it later -- thanks.

IIRC, RFCs require retaining all of the Message-IDs if reasonably
possible, and certainly retaining the first M and the last N of them.
Memory suggests that M and N may be 1 and 3 or 3 and 1.

--
(c) John Stockton, Surrey, UK. Reply addr on Home Page. Turnpike v6.05 MIME.
Web <http://www.merlyn.demon.co.uk/> - FAQish topics, acronyms, & links.
Proper <= 4-line sig. separator as above, a line exactly "-- " (SonOfRFC1036)
Do not Mail News to me. Before a reply, quote with ">" or "> " (SonOfRFC1036)
0 new messages