Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Reading Unicode text files.

102 views
Skip to first unread message

David Webber

unread,
Jan 14, 2010, 7:25:10 AM1/14/10
to
I am getting increasingly frustrated at the shortcomings of CStdioFile for
reading Unicode text files (unless I've missed something).

I've found a nice looking class on CodeProject
<http://www.codeproject.com/KB/files/textfiledocument.aspx> but it is
several years old.

Has something been introduced into MFC to do this sort of thing in the last
couple of Visual Studio releases? If not, is there something in the Win32
API or CRT?

Dave
--
David Webber
Mozart Music Software
http://www.mozart.co.uk
For discussion and support see
http://www.mozart.co.uk/mozartists/mailinglist.htm

Giovanni Dicanio

unread,
Jan 14, 2010, 7:34:07 AM1/14/10
to
"David Webber" <da...@musical-dot-demon-dot-co.uk> ha scritto nel messaggio
news:uvxUuSRl...@TK2MSFTNGP04.phx.gbl...

> I am getting increasingly frustrated at the shortcomings of CStdioFile for
> reading Unicode text files (unless I've missed something).
> I've found a nice looking class on CodeProject
> <http://www.codeproject.com/KB/files/textfiledocument.aspx> but it is
> several years old.
>
> Has something been introduced into MFC to do this sort of thing in the
> last couple of Visual Studio releases? If not, is there something in
> the Win32 API or CRT?

I would suggest CStdioFileEx on CodeProject:

http://www.codeproject.com/KB/files/stdiofileex.aspx

Moreover, since VS2005, there is a Unicode support in CRT for _open and
_wopen, see _O_U16TEXT, _O_U8TEXT and _O_WTEXT flags.

http://msdn.microsoft.com/en-us/library/z0kc8e3z(VS.80).aspx

And, FWIW, some time ago, I shared some code to read and write UTF8 on
CodeGallery as well:

http://code.msdn.microsoft.com/UTF8Helpers

HTH,
Giovanni

David Webber

unread,
Jan 14, 2010, 12:35:06 PM1/14/10
to
"Giovanni Dicanio" <giovanniD...@REMOVEMEgmail.com> wrote in message
news:OpENjXRl...@TK2MSFTNGP02.phx.gbl...

> "David Webber" <da...@musical-dot-demon-dot-co.uk> ha scritto nel messaggio
> news:uvxUuSRl...@TK2MSFTNGP04.phx.gbl...
>
>> I am getting increasingly frustrated at the shortcomings of CStdioFile
>> for reading Unicode text files (unless I've missed something).
>> I've found a nice looking class on CodeProject
>> <http://www.codeproject.com/KB/files/textfiledocument.aspx> but it is
>> several years old.
>>
>> Has something been introduced into MFC to do this sort of thing in the
>> last couple of Visual Studio releases? If not, is there something in
>> the Win32 API or CRT?
>
> I would suggest CStdioFileEx on CodeProject:
>
> http://www.codeproject.com/KB/files/stdiofileex.aspx

Thanks Giovanni, I'll look at that one too.

> Moreover, since VS2005, there is a Unicode support in CRT for _open and
> _wopen, see _O_U16TEXT, _O_U8TEXT and _O_WTEXT flags.

Thanks - I've just discovered it in fopen() and friends too, with an
enhanced 'mode' parameter. The documentation is about as clear as mud when
it comes to binary or text mode, but a bit of experimentation should sort
that out.

> http://msdn.microsoft.com/en-us/library/z0kc8e3z(VS.80).aspx
>
> And, FWIW, some time ago, I shared some code to read and write UTF8 on
> CodeGallery as well:
>
> http://code.msdn.microsoft.com/UTF8Helpers

I'll try to remember that too, but at the moment I only need to read
single-byte text files with the Western European character set, or UTF-16
(in fact more specifically UCS2). (I have a data base of musical instrument
properties in a .txt file which comes in different language versions. So
far single byte stuff with the Western character set has been fine, but I
want to add a Czech version. UCS2 will be fine for this. All I need is
code to read either encoding transparently into wchar_t strings.)

zeromem

unread,
Jan 14, 2010, 1:16:07 PM1/14/10
to
Also worth a look is EZUTF, which is Non-MFC.

http://www.codeproject.com/KB/files/EZUTF.aspx

Tom Serface

unread,
Jan 15, 2010, 1:31:42 AM1/15/10
to
Hi David,

I agree and I also use the CStdioFileEx from
http://www.codeproject.com/KB/files/stdiofileex.aspx although I've redone a
lot of it because I needed UTF-8 as well (it is a really good starting
point). I wish they'd put something like this into MFC's version.

Tom

"David Webber" <da...@musical-dot-demon-dot-co.uk> wrote in message
news:uvxUuSRl...@TK2MSFTNGP04.phx.gbl...

David Webber

unread,
Jan 15, 2010, 2:26:00 PM1/15/10
to
Thanks again everyone for the various replies.

CStdioFileEx looks good and I plan to explore the other one too. For the
moment, I know the lines in the files I'm interested in have a finite
maximum length, and I have adopted the quick and dirty solution, which, for
the record, is:

FILE *fp = _wfopen( pszFilePath, L"rt, ccs=UNICODE" );

wchar_t szBuffer[512];

and then

fgetws( szBuffer, 511, fp );

until finished.

Not for C++ purists, I know <vbg>, but opening the file in this mode reads
and ANSI file if there's no BOM and a UTF-16LE file if the appropriate BOM
is there. This is sufficient for my current purposes.

Giovanni Dicanio

unread,
Jan 15, 2010, 4:30:19 PM1/15/10
to

"David Webber" <da...@musical-dot-demon-dot-co.uk> ha scritto nel messaggio
news:exQ4Ojhl...@TK2MSFTNGP04.phx.gbl...

> Not for C++ purists, I know <vbg>, but opening the file in this mode
> reads and ANSI file if there's no BOM and a UTF-16LE file if the
> appropriate BOM is there. This is sufficient for my current purposes.

The C++ I/O stream library is very slow if compared to both CRT (that you
are using) and of course Win32 memory mapped files.

I see nothing wrong in using C FILE* instead of C++ I/O streams in C++ code:
just use the right tool for the job :)

Giovanni

Tom Serface

unread,
Jan 15, 2010, 6:44:29 PM1/15/10
to
I say, whatever works for you and keeps you using MFC ... works for me :o)

Tom

"David Webber" <da...@musical-dot-demon-dot-co.uk> wrote in message

news:exQ4Ojhl...@TK2MSFTNGP04.phx.gbl...

0 new messages