Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Why garbled text with out-file?

178 views
Skip to first unread message

molipha

unread,
Sep 9, 2009, 4:21:23 PM9/9/09
to
$f = "d:\realbuzios.com\private\fun.js"
$t = "Hello World"
$t | Out-File $f

Hi,

Does anyone know why this outputs:
ÿþH e l l o W o r l d
?

Many thanks

PaulChavez

unread,
Sep 9, 2009, 5:11:02 PM9/9/09
to
Out-File defaults to Unicode.

Read the help detailed entries for Out-File and Add|Set-Content to determine
the right cmdlet for you.

Paul

Larry__Weiss

unread,
Sep 9, 2009, 5:53:27 PM9/9/09
to

Which Unicode is the default, UTF-16?

- Larry

PaulChavez

unread,
Sep 9, 2009, 7:08:02 PM9/9/09
to
This is from CTP3 help, I'm assuming UTF-16 is the default based on the other
options available:

Valid values are "Unicode", "UTF7", "UTF8", "UTF32", "ASCII",
"BigEndianUnicode", "Default", and "OEM". "Unicode" is the default. "Unicode"
is the default. "Default" uses the encoding of the system's current ANSI code
page. "OEM" uses the current original equipment manufacturer code page
identifier for the operating system.

-Paul

Larry__Weiss

unread,
Sep 9, 2009, 8:09:07 PM9/9/09
to
Kind of strange that they don't allow you to be specific and select "UTF16"
explicitly.

- Larry

Robert Robelo

unread,
Sep 10, 2009, 12:02:25 AM9/10/09
to
By default Out-Host writes content in Unicode, if the application you use to read the file's content reads it in single-byte encoding -ASCII, UTF7 or UTF8- instead of surrogate-pair encoding, you'll get the output you describe. The same happens if you use the -Append switch to write Unicode encoded content to a file previously saved in single-byte encoding.
To fix this pass the encoding you prefer to Out-host's -Encoding parameter.

$t | Out-File $f -Encoding UTF8

The ÿþ at the start of the first line represent Unicode's BOM (Byte Order Mark):

# compare
[Int[]][Char[]]'ÿþ'

$ to this
[Text.Encoding]::Unicode.GetPreamble()

The BOM should not be visible, but it sometimes is when the content is read or written in a different type of encoding, single-byte vs. surrogate-pair.

If you'd like to read about this subject, here is a good start:
http://msdn.microsoft.com/en-us/library/ms404377.aspx

--
Robert

Larry__Weiss

unread,
Sep 10, 2009, 11:06:51 AM9/10/09
to
To paraphrase Indiana Jones: "Unicode! Why did it have to be Unicode?"

- Larry <grin>

0 new messages