I am a novice to this newsgroup so forgive me if I'm just warming up
something that has been discussed at length already (In which case I'd
love to get a pointer to those discussions).
The standard (or at least the last draft) says that there must be a
char_traits specialization for each char type used with basic_string
(and presumably also basic_istream etc.). Two such specializations are
required, namely for char and wchar_t. I assume that the aim is that
more specializations can be provided as needed.
The char_traits class provides typedefs for char_type (the character
type itself) and int_type (a 'larger' type that can hold additional
values distinct from any legal character value, i.e. EOF). That implies
to me that char_traits specializations for 'super-wide' characters that
have the value range of a long are not possible. So in a typical
implementation with 32 bit long and 8 bit char, wchar_t must be less
than 32 bit or else the standard can not be obeyed. If I wanted to
provide a super_wchar_t to hold ISO 32-bit characters, I could not do
it in a standard-conforming way. Am I right or am I missing something?
Would it not be a good idea to prevent the definition of strings or
streams for a character type that has no char_traits specialization? Or
is this impractical? My VC++ 6.0 SP3 allows
std::basic_ifstream<unsigned char> definitions without a hiccup, but
with an int_type that is the same as the char_type, such that EOF has a
representation that is also a legal character value. The consequence is
that reading a 0xFF byte from the file gives me an end-of-file
condition. Would you call that a bug or is that in fact legal?
Isn't it rather inelegant to require char_traits to define an int_type
any way, given that there is no automatic way to get the next-larger
type? It means that each potential character type needs its own
explicit specialization of the char_traits class. I know that using a
larger type to hold characters plus special conditions has a legacy in
the C library (getchar() returns an int), but in C++ land I would
consider that an ugly hack. Why has it been done that way?
Thanks for any wisdom you can share.
Stefan
Sent via Deja.com http://www.deja.com/
Share what you know. Learn what you don't.
[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]
[ about comp.lang.c++.moderated. First time posters: do this! ]
You're missing two things:
1) It's permissible for char_type and int_type to be synonyms, so long
as the EOF code is not a valid character code. Admittedly, this is
graceless, since it is not transparent, but there's precedent for it. The
C Standard permits wchar_t and wint_t to be synonyms if there is a
spare code (such as (wchar_t)-1) to use for WEOF.
2) Template class char_traits is one of those half-witted place holders
you find from time to time in the Standard C++ library. Explicit specializations
give you the right answers for char and wchar_t, the two types widely
supported in the library. But an arbitrary specialization of the template is
likely to be only approximately useful. You're more or less expected to
explicitly specialize this template for any character type you use.
>Would it not be a good idea to prevent the definition of strings or
>streams for a character type that has no char_traits specialization? Or
>is this impractical? My VC++ 6.0 SP3 allows
>std::basic_ifstream<unsigned char> definitions without a hiccup, but
>with an int_type that is the same as the char_type, such that EOF has a
>representation that is also a legal character value. The consequence is
>that reading a 0xFF byte from the file gives me an end-of-file
>condition. Would you call that a bug or is that in fact legal?
It's a feature.
>Isn't it rather inelegant to require char_traits to define an int_type
>any way, given that there is no automatic way to get the next-larger
>type? It means that each potential character type needs its own
>explicit specialization of the char_traits class. I know that using a
>larger type to hold characters plus special conditions has a legacy in
>the C library (getchar() returns an int), but in C++ land I would
>consider that an ugly hack. Why has it been done that way?
It was done as a way to templatize the iostreams and string classes
as expeditiously as possible. I think it's a pretty good first (or even
second) draft, but it was standardized before it was tried out in real
life. A bit more refining in the field would doubtless have led to a
better spec.
P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com
> Hi,
> I am a novice to this newsgroup so forgive me if I'm just warming up
> something that has been discussed at length already (In which case I'd
> love to get a pointer to those discussions).
> The standard (or at least the last draft) says that there must be a
> char_traits specialization for each char type used with basic_string
> (and presumably also basic_istream etc.). Two such specializations are
> required, namely for char and wchar_t. I assume that the aim is that
> more specializations can be provided as needed.
Right
> The char_traits class provides typedefs for char_type (the character
> type itself) and int_type (a 'larger' type that can hold additional
> values distinct from any legal character value, i.e. EOF). That implies
> to me that char_traits specializations for 'super-wide' characters that
> have the value range of a long are not possible. So in a typical
> implementation with 32 bit long and 8 bit char, wchar_t must be less
> than 32 bit or else the standard can not be obeyed. If I wanted to
> provide a super_wchar_t to hold ISO 32-bit characters, I could not do
> it in a standard-conforming way. Am I right or am I missing something?
You missed something quite trivial - int_type must be something that can
hold values distinct from any char_type. That's all. There is no
requirement whatsoever that states that int_type should be a build-in
here, so if it is matching_int_type<iso_char_t> you defined before,
it is just fine.
[snip]
> Isn't it rather inelegant to require char_traits to define an int_type
> any way, given that there is no automatic way to get the next-larger
> type? It means that each potential character type needs its own
> explicit specialization of the char_traits class. I know that using a
> larger type to hold characters plus special conditions has a legacy in
> the C library (getchar() returns an int), but in C++ land I would
> consider that an ugly hack. Why has it been done that way?
Exactly because it is impossible for basic_string to determine an
appropriate int_type class, char_traits is required. It allows anyone
who positively absolutely needs to add a char type, with the means to
do so. Try to find an alternative, if you like. The requirements are
that you can express the extension char type in portable C++ without
compiler aid. We need to store all possible values that a variable of
new type char_T can hold, plus some. What type can hold it?
char[sizeof(char_T)+1] is not good enough - char_T could be POD, but
relying on that is not too smart an idea. Writing
char_traits<char_T>::int_type makes it portable.
Michiel Salters
So I have two choices when I write a new char_traits specialization:
1. Make char_type and int_type synonyms
2. Make int_type 'larger' than char_type (if possible)
The drawback of the first choice is that not all character sets will be
compatible with it since they may have a legal character that overlaps
with the value for eof(). It means that the character set(s) must be
known in advance to be able to make that decision. (BTW is 0xFFFF a
legal Unicode character? I don't know Unicode well enough...)
The drawback of the second choice is that my desired char_type may
already be as 'large' as the largest integer type, making the choice
impossible.
> 2) Template class char_traits is one of those half-witted place
holders
> you find from time to time in the Standard C++ library. Explicit
specializations
> give you the right answers for char and wchar_t, the two types widely
> supported in the library. But an arbitrary specialization of the
template is
> likely to be only approximately useful. You're more or less expected
to
> explicitly specialize this template for any character type you use.
My feeling is that it is not the char_traits template mechanism that is
half-witted, but the decision to continue with the C habit of encoding
EOF as a special value into a character. The C library started with
that when there were no wide characters and an int could be expected to
be larger than a char. I can sort of understand why the I18N amendments
to the C standard decided to apply the same trick to wide characters
(it keeps everything uniform), but why they allowed wchar_t and wint_t
to be synonyms I don't know since it makes getchar() and wgetchar()
behave differently. The only explanation I have is that in some odd
cases int and char also might be the same size (thereby possibly
breaking code using getchar()). So, ok, the problem stems from the C
library, but was it a good decision to repeat it in the C++ library?
> >Would it not be a good idea to prevent the definition of strings or
> >streams for a character type that has no char_traits specialization?
Or
> >is this impractical? My VC++ 6.0 SP3 allows
> >std::basic_ifstream<unsigned char> definitions without a hiccup, but
> >with an int_type that is the same as the char_type, such that EOF
has a
> >representation that is also a legal character value. The consequence
is
> >that reading a 0xFF byte from the file gives me an end-of-file
> >condition. Would you call that a bug or is that in fact legal?
>
> It's a feature.
A feature I find difficult to like.
> >Isn't it rather inelegant to require char_traits to define an
int_type
> >any way, given that there is no automatic way to get the next-larger
> >type? It means that each potential character type needs its own
> >explicit specialization of the char_traits class. I know that using a
> >larger type to hold characters plus special conditions has a legacy
in
> >the C library (getchar() returns an int), but in C++ land I would
> >consider that an ugly hack. Why has it been done that way?
>
> It was done as a way to templatize the iostreams and string classes
> as expeditiously as possible. I think it's a pretty good first (or
even
> second) draft, but it was standardized before it was tried out in real
> life. A bit more refining in the field would doubtless have led to a
> better spec.
The library ain't that bad, really, but the trouble is that now that it
is a standard, it is rather hard to sort out those rough edges.
Stefan
Sent via Deja.com http://www.deja.com/
Share what you know. Learn what you don't.
[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]
I think the real problem is that the distinction between text IO and
binary IO is not clear enough. For text IO, using a singular value for
EOF works well enough, as long as a value is available. In practice, it
can easily be made available for narrow character sets, by using a wider
type for int_type, and it is hard to imagine a character set with 16 or
more bits which effectively uses every single code possible. For binary
IO, generally, only byte oriented IO makes sense at the lowest
(streambuf) level, and for this, int_type can be made sufficiently
large. However, for binary IO, the use of a singular EOF is normally
less appropriate.
As Plauger points out, the solution is not particularly elegant. In C++,
the "elegant" solution would be to define a Fallible class (as per
Barton and Nackman), and use Fallible< charT > instead of int_type. On
the other hand, this would have a noticible effect on performance in
most implementations, and the singular value for EOF has enormous
historical precedence, and works in practice.
> >Would it not be a good idea to prevent the definition of strings or
> >streams for a character type that has no char_traits specialization?
Or
> >is this impractical? My VC++ 6.0 SP3 allows
> >std::basic_ifstream<unsigned char> definitions without a hiccup, but
> >with an int_type that is the same as the char_type, such that EOF has
a
> >representation that is also a legal character value. The consequence
is
> >that reading a 0xFF byte from the file gives me an end-of-file
> >condition. Would you call that a bug or is that in fact legal?
>
> It's a feature.
Which we could do without:-).
Seriously, I think that basic_filebuf< unsigned char > *is* the most
reasonable way to read and write raw data. However, to work, it is
necessary to provide a reasonable specialization for char_traits -- one
with int_type the same as char_type is IMHO *not* reasonable in this
case. (It's interesting to note that the standard doesn't actually
specify what the non-specialized template of char_traits should look
like.)
Perhaps a future version of the standard could require a third
specialization, on unsigned char, precisely for raw data.
> >Isn't it rather inelegant to require char_traits to define an
int_type
> >any way, given that there is no automatic way to get the next-larger
> >type? It means that each potential character type needs its own
> >explicit specialization of the char_traits class. I know that using a
> >larger type to hold characters plus special conditions has a legacy
in
> >the C library (getchar() returns an int), but in C++ land I would
> >consider that an ugly hack. Why has it been done that way?
>
> It was done as a way to templatize the iostreams and string classes
> as expeditiously as possible. I think it's a pretty good first (or
even
> second) draft, but it was standardized before it was tried out in real
> life. A bit more refining in the field would doubtless have led to a
> better spec.
Let's say that it could be more elegant, but it works.
--
James Kanze mailto:
James...@dresdner-bank.com
Conseils en informatique orientée objet/
Beratung in objekt orientierter
Datenverarbeitung
Ziegelhüttenweg 17a, 60598 Frankfurt, Germany Tel. +49 (069) 63 19 86
27
Does that mean that if I want to specialize char_traits for a 32-bit
character set on a machine where long is 32 bit I could do the
following:
typedef long super_wchar_t;
template<typename ChT> struct Xint_t {
bool eof;
ChT ch;
Xint_t<ChT>() eof(true) {}
Xint_t<ChT>(bool b, const ChT& c) eof(b), ch(c) {}
};
template<> struct char_traits<super_wchar_t> {
typedef super_wchar_t char_type;
typedef Xint_t<char_type> int_type;
static char_type to_char_type(const int_type& i)
{ return i.ch; }
static int_type to_int_type(const char_type& c)
{ return int_type(false,c); }
static bool eq_int_type(const int_type& a, const int_type& b)
{ return a.eof==b.eof && a.ch==b.ch; }
static int_type eof()
{ return int_type(); }
// ...
};
Or is there more to it than meets the eye? On browsing through the
standard I could not find anything saying that the above was illegal.
Would the stream templates then automatically work with this kind of
characters or is there more work needed?
> > Isn't it rather inelegant to require char_traits to define an
int_type
> > any way, given that there is no automatic way to get the next-larger
> > type? It means that each potential character type needs its own
> > explicit specialization of the char_traits class. I know that using
a
> > larger type to hold characters plus special conditions has a legacy
in
> > the C library (getchar() returns an int), but in C++ land I would
> > consider that an ugly hack. Why has it been done that way?
>
> Exactly because it is impossible for basic_string to determine an
> appropriate int_type class, char_traits is required. It allows anyone
> who positively absolutely needs to add a char type, with the means to
> do so. Try to find an alternative, if you like. The requirements are
> that you can express the extension char type in portable C++ without
> compiler aid. We need to store all possible values that a variable of
> new type char_T can hold, plus some. What type can hold it?
> char[sizeof(char_T)+1] is not good enough - char_T could be POD, but
> relying on that is not too smart an idea. Writing
> char_traits<char_T>::int_type makes it portable.
Well, if my above example worked, it would be the alternative, wouldn't
it?
BTW what does POD mean?
Stefan
Wouldn't the following be allowed?
struct ulong_char_int_type
{
unsigned long char_value;
bool is_eof;
}; // valid values are { n, false } and { 0, true }
// where n denotes any valid unsigned long value.
template<> struct std::char_traits<unsigned long>
{
typedef unsigned long char_type;
typedef ulong_char_int_type int_type;
// other required types
int_type not_eof(int_type e) { e.is_eof=false; return e; }
char_type to_char_type(int_type e) { return e.char_value; }
int_type to_int_type(char_type c)
{ int_type e = { c, false }; return e; }
bool eq_int_type(int_type e, int_type f)
{ return e.is_eof == f.is_eof && e.char_value == f.char_value; }
int_type eof() { int_type a { 0, true };
// functions not related to int_type
};
BTW, can anyone explain why types like off_type got into
char_traits, despite them havin nothing to do with the
characters themselves (I can easily imagint two different
types of streams working with the same character, but
using different offset types; OTOH, they should both be
able to use the same string type for I/O (and char_traits
are part of the string type).
Indeed, in most cases I'd expect off_type to be completely
independent of the character type.
Even int_type and eof() seem not appropriate; however they
might be considered a char type with added "null value"
which just happens to be used to indicate EOF by streams;
with this interpretation, they nicely fit into char_traits.
A "no such char" value makes sense even without streams
(the naming "eof" isn't a good choice for that, though).
[...]
--Bill Seymour
Exactly. Let me stress again that when you don't know what particular
character set you are dealing with (only how many bits per character
you have), then you have to pretend you are dealing with binary data
and you can not assume that any particular value will be available for
EOF. This means that if you have a situation where char_type and
int_type are synonyms, you have to know which character set you are
dealing with. In this case, I'd prefer to name the corresponding
char_traits specialization after the character set used to make the
dependency clear. For example:
typedef unsigned short unicode_t;
template<> char_traits<unicode_t> {
typedef unicode_t char_type;
typedef unicode_t int_type;
// ...
};
Here, I can know which value I want to reserve for EOF and the two
types can be made synonyms. Making them synonyms without caring about
the character set calls for trouble.
Stefan
{SNIP} // How int_type should be defined for large char-like types
> > There is no
> > requirement whatsoever that states that int_type should be a build-in
> > here, so if it is matching_int_type<iso_char_t> you defined before,
> > it is just fine.
> Does that mean that if I want to specialize char_traits for a 32-bit
> character set on a machine where long is 32 bit I could do the
> following:
Actually, this is portable, not dependant on any particular machine.
> typedef long super_wchar_t;
> template<typename ChT> struct Xint_t {
> bool eof;
> ChT ch;
> Xint_t<ChT>() eof(true) {}
> Xint_t<ChT>(bool b, const ChT& c) eof(b), ch(c) {}
> };
Nitpick - you might want to make eof private. You might even want
to make everything private and char_traits<ChT> a friend.
> template<> struct char_traits<super_wchar_t> {
> typedef super_wchar_t char_type;
> typedef Xint_t<char_type> int_type;
> static char_type to_char_type(const int_type& i)
> { return i.ch; }
> static int_type to_int_type(const char_type& c)
> { return int_type(false,c); }
> static bool eq_int_type(const int_type& a, const int_type& b)
> { return a.eof==b.eof && a.ch==b.ch; }
> static int_type eof()
> { return int_type(); }
> // ...
> };
Unfortunately you can't provide this as template<typename ChT> struct
char_traits {}, even though it makes perfectly sense. The reason of
course is that an implementation, to provide char_traits<char> must
provide a non-specialized version, even though that is not required
to function.
> Or is there more to it than meets the eye? On browsing through the
> standard I could not find anything saying that the above was illegal.
> Would the stream templates then automatically work with this kind of
> characters or is there more work needed?
AFAIK nothing is missing, but that is a dangerous trap. It's indeed
better to hold it against the relevant clauses in the standard, it is
quite easy to overlook a requirement for about everyone in this group.
> > The requirements are
> > that you can express the extension char type in portable C++ without
> > compiler aid. We need to store all possible values that a variable of
> > new type char_T can hold, plus some. What type can hold it?
> > char[sizeof(char_T)+1] is not good enough - char_T could be POD, but
> > relying on that is not too smart an idea. Writing
> > char_traits<char_T>::int_type makes it portable.
> Well, if my above example worked, it would be the alternative, wouldn't
> it?
You wrote char_traits<super_wchar_t>::int_type so it can hold eof, so I'd
say the design is ok. Of course, at the end the only relevant question is
if it compiles and runs like intended.
> BTW what does POD mean?
Plain Ol' Data - see dejanews for recent threads for explanation. But in
short - dumb objects (no ctors etc)
Michiel Salters
Granted. That's not the only thing that's wrong with it, either. It was
meant to illustrate the idea only. I didn't even run it through a
compiler.
> > template<> struct char_traits<super_wchar_t> {
> > typedef super_wchar_t char_type;
> > typedef Xint_t<char_type> int_type;
>
> > static char_type to_char_type(const int_type& i)
> > { return i.ch; }
> > static int_type to_int_type(const char_type& c)
> > { return int_type(false,c); }
> > static bool eq_int_type(const int_type& a, const int_type& b)
> > { return a.eof==b.eof && a.ch==b.ch; }
> > static int_type eof()
> > { return int_type(); }
> > // ...
> > };
>
> Unfortunately you can't provide this as template<typename ChT> struct
> char_traits {}, even though it makes perfectly sense. The reason of
> course is that an implementation, to provide char_traits<char> must
> provide a non-specialized version, even though that is not required
> to function.
Well, the VC++ library does provide a 'functional' non-specialized
version (which gave me the grief that started this thread) along with
two specialized versions for char and wchar_t. I don't see a reason why
the non-specialized version couldn't instead use my Xint_t type above
rather than making char_type and int_type synonyms. I'd have to try it
first, though.
Of course, the idea that emerged from this thread would be for a
library to provide a non-specialized functional char_traits template
along the lines of the code shown above together with two (optional)
specializations for char and wchar_t. The specializations are optional
because they only provide higher efficiency. The idea is for the
default to emphasize safety over efficiency.
If it works that way, it can be made completely portable and I'd
advocate putting the non-specialized template in the standard as a
requirement, rather than leaving it open whether a non-specialized
version is provided or not. This would be backwards compatible and it
would avoid the kind of surprises I had when trying to do binary I/O.
So, provided that a closer investigation concludes that it does indeed
work as intended, would that be a case for a defect report to the
standard committee? What's the right way to proceed here?
Stefan
Sent via Deja.com http://www.deja.com/
Share what you know. Learn what you don't.
[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]
[SNIP - suggestions to make it more usable]
> > Unfortunately you can't provide this as template<typename ChT> struct
> > char_traits {}, even though it makes perfectly sense. The reason of
> > course is that an implementation, to provide char_traits<char> must
> > provide a non-specialized version, even though that is not required
> > to function.
> Well, the VC++ library does provide a 'functional' non-specialized
> version (which gave me the grief that started this thread) along with
> two specialized versions for char and wchar_t. I don't see a reason why
> the non-specialized version couldn't instead use my Xint_t type above
> rather than making char_type and int_type synonyms. I'd have to try it
> first, though.
Yep. The problem is you can't provide your own version of built-in things.
(unless this is foreseen, and templates/virtual functions are used)
This is a good thing because it makes the language unmutable ( see D&E for
discussions about this topic), but it causes some problems for this specific
application.
> Of course, the idea that emerged from this thread would be for a
> library to provide a non-specialized functional char_traits template
> along the lines of the code shown above together with two (optional)
> specializations for char and wchar_t. The specializations are optional
> because they only provide higher efficiency. The idea is for the
> default to emphasize safety over efficiency.
I agree, but I'd call it not safety, but (logical) correctness.
> If it works that way, it can be made completely portable and I'd
> advocate putting the non-specialized template in the standard as a
> requirement, rather than leaving it open whether a non-specialized
> version is provided or not. This would be backwards compatible and it
> would avoid the kind of surprises I had when trying to do binary I/O.
I think the standard implicitly requires a non-specialized version,
by requiring specialized versions. But the non-specialized could
(I think) be an abstract (base) class, forcing anyone who
introduces a new character type to provide an appropriate
char_traits specialization. I admit, this is bending the rules.
You should derive from, not specialize an ABC. But it serves
the function, and carries the same design philosophy ("it should
look like this, but you should specialize it")
Binary I/O is quite tricky IMHO. I don't like the eof hack in
combination with data, especially given the number of problems
which could be introduced by it. (We're drifting a bit here)
Can an implementation support 32 bit characters, and 32 bit
ints? E.g. sizeof(int)==sizeof(short)==sizeof(char)==1 ?
The problem here would be getchar. It can't return an int,
since there is no appropriate bit pattern.
> So, provided that a closer investigation concludes that it does indeed
> work as intended, would that be a case for a defect report to the
> standard committee? What's the right way to proceed here?
> Stefan
I'd say, suggest this to the gcc/egcs folks, (If they don't already do
this, I don't have a gcc here. ) and Microsoft. The first are more
likely to respond, and if you help them to implement it, they won't
resist it too much :-). You don't even need there assistance for the
first experimentation. On the other hand, you might need this in
your environment, for which MSVC is required.
Now, if either one implements it, and shows it works, this may end up
as an extension of the C++ standard. It won't be fast, but since there
is a workaround (clumsy, but good enough) I don't think a defect
report is appropriate.
Michiel Salters
// Padding inserted to make the length a multiple of 8 lines
>Well, the VC++ library does provide a 'functional' non-specialized
>version (which gave me the grief that started this thread) along with
>two specialized versions for char and wchar_t. I don't see a reason why
>the non-specialized version couldn't instead use my Xint_t type above
>rather than making char_type and int_type synonyms. I'd have to try it
>first, though.
The standard says in 21.1.1/2 that the following will be provided as a basis
for specialization:
template<class T> struct char_traits;
This seems to imply that a generic definition should not be provided. Then
again, 21.1/4 seems to contradict this.
BTW, there's a flaw in either the standard's definition of char_traits or in
every implementation I've seen. char_traits<char>::lt() is defined to be the
built-in operator< (21.1.3.1/6). Most implementations, however, use memcmp()
to implement char_traits<char>::compare(), while the standard requires it to
be implemented in terms of lt(). If plain chars are signed, as they usually
are, then using memcmp() violates the standard, because the return value of
memcmp() is based on the difference between unsigned chars. To be consistent
with the C functions memcmp, strcmp, etc., char_traits<char>::compare()
should be defined to compare unsigned chars, which then makes lt() seem a
little odd, or char_traits<char>::lt() should be defined to compare unsigned
chars, which will only be surprising to someone who thinks characters should
be able to take on negative values.
Also, looking at SGI's STL, I think they have a flaw in
char_traits<char>::to_int_type(), which does not convert the char to
unsigned char. You can make a case for this conversion solely from the
Standard's definition of char_traits::to_int_type(), but it would have been
a lot nicer had the Standard been explicit in its intent for treatment of
plain chars that are signed.
--
Doug Harrison
dHar...@worldnet.att.net
As a user, I can provide a char_traits specialization for the types I
use. For the moment, that's the only decent way to avoid surprises as
far as I can see.
The only exception is the predefined char_traits specialization for
wchar_t, which may be defined in a way that makes its char_type and its
int_type synonyms.
However, I wanted to take the higher ground and ask:
1. What should a sensible library implementer provide
2. What should the standard (in its next iteration) require
[...]
> I think the standard implicitly requires a non-specialized version,
> by requiring specialized versions. But the non-specialized could
> (I think) be an abstract (base) class, forcing anyone who
> introduces a new character type to provide an appropriate
> char_traits specialization. I admit, this is bending the rules.
> You should derive from, not specialize an ABC. But it serves
> the function, and carries the same design philosophy ("it should
> look like this, but you should specialize it")
The non-specialized version has to be there, but AFAIK it could be
empty:
template<typename T> struct char_traits {};
Derivation is different from specialization. The standard says you have
to specialize, not derive the char_traits template for your character
types.
> Binary I/O is quite tricky IMHO. I don't like the eof hack in
> combination with data, especially given the number of problems
> which could be introduced by it. (We're drifting a bit here)
> Can an implementation support 32 bit characters, and 32 bit
> ints? E.g. sizeof(int)==sizeof(short)==sizeof(char)==1 ?
> The problem here would be getchar. It can't return an int,
> since there is no appropriate bit pattern.
Yes, but that's a defect of standard C which C++ inherits. I don't
think it can be rectified in the C standard in a backwards compatible
way. I just want to keep C++ from making the same mistake again in its
own domain.
> > So, provided that a closer investigation concludes that it does
indeed
> > work as intended, would that be a case for a defect report to the
> > standard committee? What's the right way to proceed here?
>
> > Stefan
>
> I'd say, suggest this to the gcc/egcs folks, (If they don't already do
> this, I don't have a gcc here. ) and Microsoft. The first are more
> likely to respond, and if you help them to implement it, they won't
> resist it too much :-). You don't even need there assistance for the
> first experimentation. On the other hand, you might need this in
> your environment, for which MSVC is required.
[...]
Perhaps P.J. Plauger or other folks at Dinkumware still follow this
thread...
Stefan
Sent via Deja.com http://www.deja.com/
Share what you know. Learn what you don't.
[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]
> > I think the standard implicitly requires a non-specialized version,
> > by requiring specialized versions. But the non-specialized could
> > (I think) be an abstract (base) class, forcing anyone who
> > introduces a new character type to provide an appropriate
> > char_traits specialization. I admit, this is bending the rules.
> > You should derive from, not specialize an ABC. But it serves
> > the function, and carries the same design philosophy ("it should
> > look like this, but you should specialize it")
> The non-specialized version has to be there, but AFAIK it could be
> empty:
> template<typename T> struct char_traits {};
> Derivation is different from specialization. The standard says you have
> to specialize, not derive the char_traits template for your character
> types.
I think you missed my point. I know the common use of an abstract base class
it to force derivation. I just wanted to point out that it can also be used
to enforce specialization. Since that is a different concept -as you noted-
which can be implemented with the same mechanism, I thought it might be
something not commonly known. I only thought it up when this thread came up,
so it might even be a new trick altogether.
The idea is of course that using an ABC prevents the use of char_traits<T> for
any type T, unless a specialized version has been provided. The catch is that
the standard library must enforce this by actually creating a char_traits
object, I think - or at least contain code that does so, but will not be
executed.
Michiel Salters
What advantage would that render? If the non-specialized char_traits
template is empty, using a character type for which there is no
specialized version will lead to compile-time errors. You are forced to
provide a specialization, which is what you want, isn't it?
Stefan
Sent via Deja.com http://www.deja.com/
Share what you know. Learn what you don't.
[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]
I only have the draft standard from 12/96 and it says in 21.1/4:
"This subclause specifies a struct template, char_traits<charT>, with
no members, to appear in the header <string> along with two explicit
specializations of it, char_traits<char> and char_traits<wchar_t> which
satisfy the requirements below."
21.1.2/2 says:
"The struct template
template<class charT> struct char_traits { };
shall be provided in the header <string> as a basis for explicit
specializations."
To me, both clauses clearly mean that the non-specialized template
should be empty. I have not seen a clause 21.1.1/2, so I suspect the
final version of the standard differs from the draft when it comes to
char_traits. Do you have access to the final standard?
[...]
> Also, looking at SGI's STL, I think they have a flaw in
> char_traits<char>::to_int_type(), which does not convert the char to
> unsigned char. You can make a case for this conversion solely from the
> Standard's definition of char_traits::to_int_type(), but it would
have been
> a lot nicer had the Standard been explicit in its intent for
treatment of
> plain chars that are signed.
Judging from the draft standard, it seems to me that it would be fine
to not convert to unsigned char provided that eof() produces a value
that is different from all legal character values. In particular, it
would have to be different from -1.
Table 37 says for X::eof():
"yields: a value e such that X::eq_int_type(e,X::to_int_type(c)) is
false for all values c"
21.1.3/2 says for typedef INT_T int_type:
"Requires: For a certain character container type char_type, a related
container type INT_T shall be a type or class which can represent all
of the valid characters converted from the corresponding char_type
values, as well as an end觔f苯ile value, eof()."
It now says:
This subclause specifies a struct template, char_traits<charT>, and two explicit
specializations of it,
char_traits<char> and char_traits<wchar_t>, all of which appear in the header
<string> and satisfy the requirements
below.
>21.1.2/2 says:
>
>"The struct template
> template<class charT> struct char_traits { };
>shall be provided in the header <string> as a basis for explicit
>specializations."
>
>To me, both clauses clearly mean that the non-specialized template
>should be empty. I have not seen a clause 21.1.1/2, so I suspect the
>final version of the standard differs from the draft when it comes to
>char_traits. Do you have access to the final standard?
The requirement that the template be empty has come and gone.
The final version is more in keeping with the original intent, to provide
at least a reasonable approximation for a traits class absent any
explicit specialization. The C++ library shipped with VC++ is consistent
with the final C++ Standard.
P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com
Yes. That was the reasoning that led to the original design.
Unfortunately, the design was hastily changed at the last minute,
right before the standard was ratified, at the behest of people
who hadn't taken the time to study the implications.
--
Nathan Myers
n...@nospam.cantrip.org http://www.cantrip.org/
>I only have the draft standard from 12/96 and it says in 21.1/4:
>
>"This subclause specifies a struct template, char_traits<charT>, with
>no members, to appear in the header <string> along with two explicit
>specializations of it, char_traits<char> and char_traits<wchar_t> which
>satisfy the requirements below."
The Standard doesn't have the "with no members" bit.
>21.1.2/2 says:
>
>"The struct template
> template<class charT> struct char_traits { };
>shall be provided in the header <string> as a basis for explicit
>specializations."
The Standard has:
template<class charT> struct char_traits;
I think that's what it intends for implementations to have, but then it goes
and says that this plus the two required specializations "all satisfy the
requirements below," so it's a little ambiguous.
>To me, both clauses clearly mean that the non-specialized template
>should be empty. I have not seen a clause 21.1.1/2, so I suspect the
>final version of the standard differs from the draft when it comes to
>char_traits. Do you have access to the final standard?
Yes, the PDF version, which unfortunately disables copying. I believe the
comp.std.c++ FAQ has instruction on where you can get it; it's cheap as
these things go, costing $18, IIRC.
>Judging from the draft standard, it seems to me that it would be fine
>to not convert to unsigned char provided that eof() produces a value
>that is different from all legal character values. In particular, it
>would have to be different from -1.
Exactly. char_traits<char>::eof() returns EOF which is typically #defined to
-1; I've never seen an implementation that used a different value, and I
seriously doubt that will ever change. Assume you have an 8 bit character
set and signed plain chars, a very common (if unfortunate) situation, also
one I doubt will ever change. to_int_type must cast to unsigned char in such
an implementation. But then eq_int_type(-1,c), where c has the value -1,
will return false, which could be considered surprising, but it's necessary
to meet the requirements of eof().
I would like to think of char_traits<char>::to_int_type as returning a sane
integer value for the character, i.e. non-negative. When I think of
converting a char to an "int type", it's in this sort of context:
// From <ctype.h>
toupper(char_traits<char>::to_int_type(c)) // Safe?
But if plain chars are signed, c is negative, and to_int_type doesn't cast
to unsigned char, this is undefined. Another C precedent is the return value
of getchar().
--
Doug Harrison
dHar...@worldnet.att.net
I had a go, but it said "The product you requested is currently not
available". Maybe their PDF files are out of stock ;-)
> >I only have the draft standard from 12/96 and it says in 21.1/4:
> >
> >"This subclause specifies a struct template, char_traits<charT>, with
> >no members, to appear in the header <string> along with two explicit
> >specializations of it, char_traits<char> and char_traits<wchar_t>
which
> >satisfy the requirements below."
>
> The Standard doesn't have the "with no members" bit.
>
> >21.1.2/2 says:
> >
> >"The struct template
> > template<class charT> struct char_traits { };
> >shall be provided in the header <string> as a basis for explicit
> >specializations."
>
> The Standard has:
>
> template<class charT> struct char_traits;
>
> I think that's what it intends for implementations to have, but then
it goes
> and says that this plus the two required specializations "all satisfy
the
> requirements below," so it's a little ambiguous.
Ok, this sort of agrees what P.J. Plauger responded. So the standard
allows - or even requires - a "functional" non-specialized char_traits.
I don't mind that at all. I can see the point of having "at least a
reasonable approximation for a traits class absent any explicit
specialization" (Plauger). It could make life simpler for people who
aren't experts. The real question, I think, is what the non-specialized
char_traits should look like.
My feeling is that the standard should be more specific than it
currently is. It should require an implementation of the non-
specialized char_traits that keeps eof() separate from any possible
character value, no matter what character type is used. This means that
char_type and int_type can not be synonyms. How that can be done has
already been discussed in this thread. Nonoptimal performance would
have to be suffered as the cost of generality (genericity?). Explicit
specializations could then be made optional by the standard.
Explicit specializations would be used to provide more efficient
solutions for a particular char_type, such as a "larger" integer type
as int_type.
As the requirements for the char_traits struct currently specify that
int_type must be able to hold the eof() value in addition to all valid
characters of char_type, int_type and char_type can only be synonyms
where it is known that the representation of eof() is a bit pattern
that can not occur in a character. I hold that this can only be known
when the character set is known. As the non-specialized char_traits
struct has to provide the definitions for the most general case, no
assumption about the character set can be made, hence char_type and
int_type can not be synonyms. Only a char_traits specialization which
knows about the character set used can make char_type and int_type
synonyms. For this case, I would prefer to make the dependency explicit
by naming the character type after the character set.
For example, if a char_traits specialization for unicode characters
chooses to make char_type and int_type synonyms, and use the value
0xFFFF for eof(), forbidding its use as a valid character value, I'd
prefer a definition like this:
typedef unsigned short unicode_char;
template<> struct char_traits<unicode_char> {...};
over one like this:
typedef unsigned short wide_char;
template<> struct char_traits<wide_char> {...};
although they are semantically equivalent.
I believe the standard's wording should be changed to make the issue
clearer. Currently, library providers can provide implementations which
result in eof() representing a valid character while still claiming
standard compliance (as in the VC++ library).
I agree completely.
Stefan
Sent via Deja.com http://www.deja.com/
Share what you know. Learn what you don't.
[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]
> For example, if a char_traits specialization for unicode characters
> chooses to make char_type and int_type synonyms, and use the value
> 0xFFFF for eof(), forbidding its use as a valid character value, I'd
> prefer a definition like this:
>
> typedef unsigned short unicode_char;
> template<> struct char_traits<unicode_char> {...};
>
> over one like this:
>
> typedef unsigned short wide_char;
> template<> struct char_traits<wide_char> {...};
>
> although they are semantically equivalent.
In fact, both define exactly the same type as char_traits< unsigned
short >. And only one can exist in a given program: you cannot have
char_traits< unicode_char > and char_traits< local_wide_char > if both
unicode_char and local_wide_char are typedef's for unsigned short.
There are several ways of handling this. The most obvious is to use
enums instead of unsigned short:
enum unicode_char { unicode_min = 0 , unicode_max = 0xffff } ;
enum local_wide_char { wide_min = 0 , wide_max = 0xffff } ;
Another alternative is to declare a struct which contains an unsigned
short.
But the easiest solution is just to define a new class with the desired
interface; there is nothing in the standard which requires the actual
traits to be a specialization of a template. The main advantage of
using the specialization is within other templates:
template< typename T > class X { basic_string< T > ... } ;
Even here, you can write:
template< typename T > class X { basic_string< T , char_traits< T >
> ... } ;
But without an additional parameter for the outer template, you still
only get the specializations of the template.
Never the less, in the case of string, I think the templating of traits
is probably added complexity for nothing. In practice, the number of
*types* involved will be quite small: char and unsigned short. Whereas
you typically *will* need several different traits classes for char:
8859-1, 8859-2... (A "quality" implementation would provide all of
these. Regretfully, there isn't a quality implementation currently
available.) This means that the trick of using a template to provide
the default traits for a type doesn't really buy much.
(While I'm at it: one thing I've often wondered about. Given that, in
fact, all of the necessary information is in traits, and that using a
template for traits doesn't buy that much, why make the character type a
template parameter at all? Why not just template< typename Traits >
class basic_string?)
> I believe the standard's wording should be changed to make the issue
> clearer. Currently, library providers can provide implementations
which
> result in eof() representing a valid character while still claiming
> standard compliance (as in the VC++ library).
Ideally, of course, the int_type would be Fallible< char_type >, and the
EOF problem solves itself. Practically, however, the spirit of C rules,
and I would not really complain about an implementation which used long
and -1.
--
James Kanze mailto:
James...@dresdner-bank.com
Conseils en informatique orientée objet/
Beratung in objekt orientierter
Datenverarbeitung
Ziegelhüttenweg 17a, 60598 Frankfurt, Germany Tel. +49 (069) 63 19 86
27