Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

How do I open a text file with Unicode encoding? (Should've asked this before.)

125 views
Skip to first unread message

Samuel A. Winchenbach

unread,
Aug 15, 2000, 3:00:00 AM8/15/00
to
Currently I have code that works, but I figure there's probably an easier
way than creating a WideChar array.

Ben Knight
St. George Consulting Group, Inc.

P.S. Can anyone (read Peter Below) tell me where the ByteOrderMark := 65279
comes from?


procedure TMainForm.Open1Click(Sender: TObject);
var
s : String;
wc : array of WideChar;
fs : TFileStream;
ByteOrderMark : Word;
x : integer;

begin
If OpenDialog1.Execute then
begin
ByteOrderMark:=65279;
fs:=TFileStream.Create(OpenDialog1.FileName, fmOpenRead);
try
fs.ReadBuffer(ByteOrderMark, 2);
SetLength(wc, (fs.size-2) div 2);
fs.ReadBuffer(wc[0], fs.size-2);
for x:=0 to (fs.Size-1 div 2) do s:=s+wc[x];
RichEdit1.Text:=s;
finally
fs.free
end;
end;
end;

Peter Below (TeamB)

unread,
Aug 16, 2000, 3:00:00 AM8/16/00
to
In article <399994eb_2@dnews>, Samuel A. Winchenbach wrote:
> P.S. Can anyone (read Peter Below) tell me where the ByteOrderMark := 65279
> comes from?
>

It is the recommended way to start a unicode text file, and documented in
win32.hlp (topic "Byte-order Mark"). Its purpose is to handle UNICODE files
written on other platforms that use a different byte order (big-endian instead
of the Intel little-endian). If you read the first word of a UNICODE file and
see that it is $FFFE instead of $FEFF you know that you have to swap the bytes
in each word you read to get a valid Widechar for your platform. Delphi has a
Swap function that performs this byte order switch.

So if you read a UNICODE file you have to be prepared to deal with files that
have the correct byte order mark for your platform, that need to be swapped
and those that do not have a byte order mark at all (since it is recommended
but not enforcible, of course).

Your routine could be modified to deal with this like follows (untested):

Procedure SwapBytesInWideString( Var ws: WideString );
var
i: Integer;
begin
for i:= 1 to Length( ws ) do
ws[i] := Swap( ws[i] );
// if compiler balks at this try
// ws[i] := WideChar( Swap( word( ws[i] )));
end;

procedure TMainForm.Open1Click(Sender: TObject);
var
ws : WideString;
fs : TFileStream;


begin
If OpenDialog1.Execute then
begin

fs:=TFileStream.Create(OpenDialog1.FileName, fmOpenRead);
try
SetLength( ws, fs.size div 2 );
fs.ReadBuffer( ws[1], fs.Size );
If ws[1] = #$FFFE Then
SwapBytesInWideString( ws );
If ws[1] = #$FEFF Then
Delete( ws, 1, 1 );
RichEdit1.Text:= ws;


finally
fs.free
end;
end;
end;

Peter Below (TeamB) 10011...@compuserve.com)
No replies in private e-mail, please, unless explicitly requested!


0 new messages