Gmail Calendar Documents Reader Web more »
Recently Visited Groups | Help | Sign in
Google Groups Home
Unicode Implementation
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  11 messages - Expand all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Issac Alphonso  
View profile  
 More options Jan 16 1999, 3:00 am
Newsgroups: comp.lang.c++.moderated
From: Issac Alphonso <alpho...@isip.msstate.edu>
Date: 1999/01/16
Subject: Unicode Implementation

Hi,

Our group  is moving towards using  unicode characters for  all i/o in
our systems. We are looking for a standard implementation of a unicode
class which handles  all of the basic string  functions in the context
of unicode characters. Could any of you point us to such a class or to
any other information regarding this?

Thanks for all your help in advance.

Best regards,

      Issac Alphonso
      Institute for Signal and Information Processing
      WWW:   http://www.isip.msstate.edu/

      [ Send an empty e-mail to c++-h...@netlab.cs.rpi.edu for info ]
      [ about comp.lang.c++.moderated. First time posters: do this! ]


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Paul Grealish  
View profile  
 More options Jan 18 1999, 3:00 am
Newsgroups: comp.lang.c++.moderated
From: Paul Grealish <paul.greal...@uk.geopak-tms.com>
Date: 1999/01/18
Subject: Re: Unicode Implementation

Issac Alphonso wrote:

> Hi,

> Our group  is moving towards using  unicode characters for  all i/o in
> our systems. We are looking for a standard implementation of a unicode
> class which handles  all of the basic string  functions in the context
> of unicode characters. Could any of you point us to such a class or to
> any other information regarding this?

Have you looked at std::wstring?
It's a specialization of template class
basic_string for elements of type wchar_t.
wchar_t is the 16-bit wide character (aka
Unicode) data type.

--
+---------------------------------+
|         Paul Grealish           |
|       GEOPAK-TMS Limited        |
|       Cambridge, England        |
| paul.greal...@uk.geopak-tms.com |
+---------------------------------+

      [ Send an empty e-mail to c++-h...@netlab.cs.rpi.edu for info ]
      [ about comp.lang.c++.moderated. First time posters: do this! ]


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
james.kanze  
View profile  
 More options Jan 18 1999, 3:00 am
Newsgroups: comp.lang.c++.moderated
From: James.Ka...@dresdner-bank.com
Date: 1999/01/18
Subject: Re: Unicode Implementation
In article <36A32C29....@uk.geopak-tms.com>,

Correction: wchar_t may be a 16-bit wide character.  It may also be an
8-bit wide character.  The standard makes no guarantees.

Realistically, *if* the implementation supports Unicode, I would expect
it to use wchar_t to do so.

--
James Kanze                                           GABI Software, Sàrl
Conseils en informatique orienté objet  --
                          --  Beratung in industrieller Datenverarbeitung
mailto: ka...@gabi-soft.fr          mailto: James.Ka...@dresdner-bank.com

-----------== Posted via Deja News, The Discussion Network ==----------
http://www.dejanews.com/       Search, Read, Discuss, or Start Your Own    

      [ Send an empty e-mail to c++-h...@netlab.cs.rpi.edu for info ]
      [ about comp.lang.c++.moderated. First time posters: do this! ]


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Sean Dynan  
View profile  
 More options Jan 19 1999, 3:00 am
Newsgroups: comp.lang.c++.moderated
From: sdy...@cccgroup.co.uk (Sean Dynan)
Date: 1999/01/19
Subject: Re: Unicode Implementation

You wrote:
> > Our group  is moving towards using  unicode characters for  all i/o in
> > our systems. We are looking for a standard implementation of a unicode
> > class which handles  all of the basic string  functions in the context
> > of unicode characters. Could any of you point us to such a class or to
> > any other information regarding this?

> Have you looked at std::wstring?
> It's a specialization of template class
> basic_string for elements of type wchar_t.
> wchar_t is the 16-bit wide character (aka
> Unicode) data type.

Using the wchar_t type is fine, but hard-wires the finished binary for
Unicode strings.

Are you developing for Windows NT? If so, you can write code which can be
rebuilt to cater for either ANSI or Unicode strings with the flip of a
couple of define's.

You could start by typedef'ing some string classes like this:

typedef std::basic_string<_TCHAR> TString;
typedef std::basic_ostringstream<_TCHAR> TStringStream;

If _UNICODE is defined (e.g. project settings or hard-wired into the
source code), TString and TStringStream become Unicode string classes and
all template operations work a treat.  If _UNICODE is undefined, TString
and TStringStream become 8-bit string classes as per usual.

If _UNICODE is defined, UNICODE needs to be defined too so the Win32
Unicode API gets called instead of the ANSI API.

The string handling C run time functions should be replaced with their
text-mapped equivalents (e.g, strlen() is replaced by _tcslen()).  
Character variables should be declared using the _TCHAR type (e.g. _TCHAR
strBuf[64]).  String literals should be wrapped in the _T() or _TEXT()
text-mapping macros (e.g. _TCHAR mystring = _T("This is a string")).  On
occasion you may find yourself having to convert between 8-bit and 16-bit
character arrays (and vice-versa) using the "wcstombs()" and "mbstowcs()"
run time routines.  Using the above approach, defining _UNICODE and
rebuilding will generate a Unicode binary.  Undefining _UNICODE will
result in an ANSI string binary.

There is lots of help regarding this in the MSDN help libraries, although
you have to trawl through them for a while to make sense of it all.

Good luck.
__________
Sean Dynan
Senior Software Analyst
C-C-C Technology Ltd
sdy...@cccgroup.co.uk

      [ Send an empty e-mail to c++-h...@netlab.cs.rpi.edu for info ]
      [ about comp.lang.c++.moderated. First time posters: do this! ]


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
John Duncan  
View profile  
 More options Jan 19 1999, 3:00 am
Newsgroups: comp.lang.c++.moderated
From: "John Duncan" <jdds...@srg.psych.pitt.edu>
Date: 1999/01/19
Subject: Re: Unicode Implementation
Er, the typical wchar_t is actually UCS-2, or a two-octet character set,
which is able to represent most of the world's characters. The standard
also defines UCS-4, a four-octet character set, which does represent
all of the characters in the world.

Once you have everything represented internally in UCS-n, you have to
provide I/O in UTF-8 format for compatibility with legacy tools and
with other UTF-8 sources and sinks. UTF-8 is an encoding method that
is able to encode all 128 Latin-1 characters using one octet apiece
and contains escape bits to represent the remaining combinations,
with some characters reaching 5 or 6 octets (I can't remember). This
provides a compact and compatible solution. I'm not sure if it is
compatible with IBM's DBCS, but it would be nice if it were.

The unicode standard is on the web at:

http://www.unicode.org/

and there is a C++ library called Rosette at:

http://unicode.basistech.com/

I haven't evaluated it, but it supports UCS-2 and UTF-n
formats, and also conversion between a variety of character sets.

-John

      [ Send an empty e-mail to c++-h...@netlab.cs.rpi.edu for info ]
      [ about comp.lang.c++.moderated. First time posters: do this! ]


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Alex Martelli  
View profile  
 More options Jan 20 1999, 3:00 am
Newsgroups: comp.lang.c++.moderated
From: "Alex Martelli" <a...@magenta.com>
Date: 1999/01/20
Subject: Re: Unicode Implementation
Alan Bellingham wrote in message <36aaafbb.340110...@news.lspace.org>...

    [snip]

>>Are you developing for Windows NT? If so, you can write code which can be
>>rebuilt to cater for either ANSI or Unicode strings with the flip of a
>>couple of define's.

>But why bother? If you've coped with the concept of UTF-16 [1], why not
>stick with it?

>If you've decided there's benefit in going up to 16-bit chars, why not
>stick with it. Allowing a compiler switch to flip between them is going
>to lead to some very subtle and hard to find bugs, if you're not very
>careful.

For most projects, at any time you might find yourself faced with
a need to port the code to environments that do not support UTF-16
as well as one would wish; for example, code developed for NT might
with short notice be required to be ported to Win98, etc etc.  If you
are using "bare" wchar_t, the porting can then be very troublesome.

This seems like a classic situation for using #ifdef:

#ifdef NO_UNICODE
typedef char char_t;
#else
typedef wchar_t char_t;
#endif
typedef std::basic_string<char_t> string_t;
// and so on

A few typedef's, and perhaps a few templates with specialization,
definition of const's, etc, can make it decently easy to port.

The NT tricks are a bit less elegant (lot of preprocessor use,
since C is supported as well as C++), so, more care may indeed
be needed to avoid "subtle bugs", but the basic idea is the same,
and quite usable (the advantage is that all of the boilerplate code
has been written for you already, in <tchar.h> etc...).  Still, rolling
your own will ease possible future porting to non-Win32 platforms,
and, since ease of porting is the whole idea here, the investment
(which is not all that much) can easily repay itself.

Alex

      [ Send an empty e-mail to c++-h...@netlab.cs.rpi.edu for info ]
      [ about comp.lang.c++.moderated. First time posters: do this! ]


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Paul Grealish  
View profile  
 More options Jan 21 1999, 3:00 am
Newsgroups: comp.lang.c++.moderated
From: Paul Grealish <paul.greal...@uk.geopak-tms.com>
Date: 1999/01/21
Subject: Re: Unicode Implementation

I think you'd be better off using the preprocessor
in the same way that the <tchar.h> header does.
Create a header file all the character type C++
entities (example given at end).

> If _UNICODE is defined, UNICODE needs to be defined too so the Win32
> Unicode API gets called instead of the ANSI API.

You should not define the symbol UNICODE (with no
underscore) yourself.  You should only ever set
the symbol _UNICODE (with underscore).  The Win32
API headers will internally set UNICODE depending
on whether _UNICODE is set.

--
+---------------------------------+
|         Paul Grealish           |
|       GEOPAK-TMS Limited        |
|       Cambridge, England        |
| paul.greal...@uk.geopak-tms.com |
+---------------------------------+

#ifdef _UNICODE
#define _tstring                std::wstring
#define _tcin                   std::wcin
#define _tcout                  std::wcout
#define _tcerr                  std::wcerr
#define _tclog                  std::wclog
#define _tios                   std::wios
#define _tstreambuf             std::wstreambuf
#define _tistream               std::wistream
#define _tostream               std::wostream
#define _tiostream              std::wiostream
#define _tstringbuf             std::wstringbuf
#define _tistringstream std::wistringstream
#define _tostringstream std::wostringstream
#define _tstringstream  std::wstringstream
#define _tfilebuf               std::wfilebuf
#define _tifstream              std::wifstream
#define _tofstream              std::wofstream
#define _tfstream               std::wfstream
#else
#define _tstring                std::string
#define _tcin                   std::cin
#define _tcout                  std::cout
#define _tcerr                  std::cerr
#define _tclog                  std::clog
#define _tios                   std::ios
#define _tstreambuf             std::streambuf
#define _tistream               std::istream
#define _tostream               std::ostream
#define _tiostream              std::iostream
#define _tstringbuf             std::stringbuf
#define _tistringstream std::istringstream
#define _tostringstream std::ostringstream
#define _tstringstream  std::stringstream
#define _tfilebuf               std::filebuf
#define _tifstream              std::ifstream
#define _tofstream              std::ofstream
#define _tfstream               std::fstream
#endif

      [ Send an empty e-mail to c++-h...@netlab.cs.rpi.edu for info ]
      [ about comp.lang.c++.moderated. First time posters: do this! ]


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Sean Dynan  
View profile  
 More options Jan 21 1999, 3:00 am
Newsgroups: comp.lang.c++.moderated
From: sdy...@cccgroup.co.uk (Sean Dynan)
Date: 1999/01/21
Subject: Re: Unicode Implementation

You wrote:
> sdy...@cccgroup.co.uk (Sean Dynan) wrote:

> >Using the wchar_t type is fine, but hard-wires the finished binary for
> >Unicode strings.

> >Are you developing for Windows NT? If so, you can write code which can be
> >rebuilt to cater for either ANSI or Unicode strings with the flip of a
> >couple of define's.

> But why bother? If you've coped with the concept of UTF-16 [1], why not
> stick with it?

Because it's one less headache when the marketing dept., for example,
decides the product should also run on Windows 95.
--
__________
Sean Dynan
Senior Software Analyst
C-C-C Technology Ltd
sdy...@cccgroup.co.uk

      [ Send an empty e-mail to c++-h...@netlab.cs.rpi.edu for info ]
      [ about comp.lang.c++.moderated. First time posters: do this! ]


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
John Duncan  
View profile  
 More options Jan 21 1999, 3:00 am
Newsgroups: comp.lang.c++.moderated
From: "John Duncan" <jdds...@srg.psych.pitt.edu>
Date: 1999/01/21
Subject: Re: Unicode Implementation

>But why bother? If you've coped with the concept of UTF-16 [1], why not
>stick with it?

You must mean UCS-2. I don't believe that NT uses UTF-16 very much.
NT tends to use UTF-8 for storage, so that the transformation to
ANSI display terminals is relatively straightforward. Remember that
Windows 95 supports ANSI and MBCS but not Unicode. Conversion from
ANSI to UTF-8 is direct.

UCS-2 is the two-octet character set. UTF-16 is a 16-bit transformation
format for all unicode encodings, including UCS-1, which does not
support much internationalization.

-John

      [ Send an empty e-mail to c++-h...@netlab.cs.rpi.edu for info ]
      [ about comp.lang.c++.moderated. First time posters: do this! ]


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Thiemo Seufer  
View profile  
 More options Jan 22 1999, 3:00 am
Newsgroups: comp.lang.c++.moderated
From: "Thiemo Seufer" <seu...@csv.ica.uni-stuttgart.de>
Date: 1999/01/22
Subject: Re: Unicode Implementation

John Duncan wrote in message <78593a$lq...@usenet01.srv.cis.pitt.edu>...
>>But why bother? If you've coped with the concept of UTF-16 [1], why not
>>stick with it?

>You must mean UCS-2. I don't believe that NT uses UTF-16 very much.
>NT tends to use UTF-8 for storage, so that the transformation to
>ANSI display terminals is relatively straightforward. Remember that
>Windows 95 supports ANSI and MBCS but not Unicode. Conversion from
>ANSI to UTF-8 is direct.

No. Conversion from ASCII (with MSB unset) to UTF-8 is direct. Characters
not fitting in there are mapped to an multi-byte representation.

Thiemo Seufer

      [ Send an empty e-mail to c++-h...@netlab.cs.rpi.edu for info ]
      [ about comp.lang.c++.moderated. First time posters: do this! ]


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
B. K. Oxley (binkley) at Home  
View profile  
 More options Jan 28 1999, 3:00 am
Newsgroups: comp.lang.c++.moderated
From: "B. K. Oxley (binkley) at Home" <bink...@bigfoot.com>
Date: 1999/01/28
Subject: Re: Unicode Implementation

Even though you are discussing specifically Win32 platforms, I should
still point out for the benefit of others reading this thread that most
other 32-bit operating systems define wchar_t to be 32-bits wide, not
16-bits.

If you are considering writing cross-platform code, one of the great
downfallings of standard C++ is the lack of a UNICODE character type
(such as runes on Plan 9, for example).  Of course, wchar_t pretty much
precedes the gradual standardization on UNICODE.

Further, the issue of 16- v. 32-bit representation of UTF16 overlooks
important issues such as support for surrogates (such as Klingon :-) and
gaiji characters (Japanese), which fall outside of the first plane in
UNICODE, and thus require the full UCS-4 (32-bit) representation (unless
one is willing to use multi-character 16-bit sequences).

This particular problem has bitten my employer (Inso Corporation) in
support XML across platforms.  I finally recommended using a
specialization of basic_string combined with an inhouse-defined unsigned
16-bit integral type.

Take a gander at the UNICODE Consortium's standard (version 2.1 is
current) for more details at http://www.unicode.org.

--binkley

      [ Send an empty e-mail to c++-h...@netlab.cs.rpi.edu for info ]
      [ about comp.lang.c++.moderated. First time posters: do this! ]


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »

Create a group - Google Groups - Google Home - Terms of Service - Privacy Policy
©2009 Google