Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Read UNICODE/ANSI/ASCII Text File to WideString

3,930 views
Skip to first unread message

Morgan

unread,
Jan 23, 2006, 2:31:57 AM1/23/06
to
What is the fastest way to load a text file into a WideString? I need to
be able to support both ASCII/ANSI and UNICODE (UTF8, UCS2, and UCS4) with
some routines that can read the UNICODE file headers to determine the byte
order encoding of the text data. Are there any ready-made code samples
(free, open source, etc.) out there I could use?

Ralf Junker - http://www.yunqa.de/delphi/

unread,
Jan 23, 2006, 5:27:29 AM1/23/06
to
Hello Morgan,

DIUnicode: http://www.yunqa.de/delphi/unicode/

A full-blown Unicode package with Unicode reader and writer
classes. See the ReadBOM function for reading the byte order
information. DIUnicode supports 70 character sets / encodings,
including the ones you need. Can be linked against DIConverters
(see below) to support more than 130 conversions.

DIConverters: http://www.yunqa.de/delphi/converters/

A free package of 130+ Unicode <--> MultiByte in-memory
character conversion functions. Can be used to create
your own converter functions and classes.

Regards,

Ralf

Morgan <mor...@nospam.com> wrote:

---
The Delphi Inspiration
http://www.yunqa.de/delphi/

Mike Shkolnik

unread,
Jan 23, 2006, 6:17:05 AM1/23/06
to
See the sample code:

var
fs: TFileStream;
w: Word;
ws: WideString;
S: string;
i: Integer;
begin
{open file}
fs := TFileStream.Create(yourFileName, fmOpenRead);

{stream can contain unicode characters - we must check before parse}
fs.Read(w, SizeOf(w));
case w of
$FEFF, {UTF-16 little endian}
$FFFE: {UTF-16 big endian}
begin
if (fs.Size > fs.Position) then
begin
i := fs.Size - fs.Position;
SetLength(ws, i div 2);
fs.Read(ws[1], i);
if (w = $FFFE) then
begin
for i := 1 to Length(ws) do
ws[i] := WideChar(Swap(Word(ws[i])));
end;
end;
end;
else
{restore position}
fs.Seek(-SizeOf(w), soFromCurrent);
SetString(S, nil, intSize);
fs.Read(Pointer(S)^, intSize);
ws := S
end;

{close file}
fs.Free
end;

--
With best regards, Mike Shkolnik
EMail: mshk...@scalabium.com
http://www.scalabium.com

"Morgan" <mor...@nospam.com> wrote in message
news:43d485bf$1...@newsgroups.borland.com...

Cristian Nicola

unread,
Jan 23, 2006, 6:25:19 AM1/23/06
to
Does this code works?

Cristian Nicola

"Mike Shkolnik" <mshkol...@ukr.net> wrote in message
news:43d4ba56$1...@newsgroups.borland.com...

Cristian Nicola

unread,
Jan 23, 2006, 6:56:57 AM1/23/06
to
Yes it does - me stupid :D
I did not realized ws is widestring, assuming instead it is a string and
everything went wrong from there ...

Cristian

"Cristian Nicola" <n_cri...@hotmail.com> wrote in message
news:43d4bbb1$1...@newsgroups.borland.com...

Mike Shkolnik

unread,
Jan 23, 2006, 9:39:50 AM1/23/06
to
> Does this code works?
Yes, I posted the code from my current project

Xavier Ind.

unread,
Jan 23, 2006, 3:13:36 PM1/23/06
to

Missed UTF-8 (EFBBBF) there.

Morgan

unread,
Jan 23, 2006, 9:17:05 PM1/23/06
to
Xavier Ind. wrote:
> Missed UTF-8 (EFBBBF) there.

I think the two UCS4 encodings are missing as well, but this is a
reasonable starting point.

I will also take a look at the DIUnicode stuff, even though I think it
is overkill for what I need, and I would prefer a small/free method (just a
few lines of code).

Thanks everyone for your suggestions!

Kim S

unread,
Feb 1, 2006, 9:22:06 AM2/1/06
to
Mike Shkolnik wrote:

Where does "intSize" come from?

0 new messages