What is the correct behaviour?

Jonne Lehtinen

未讀,

2006年10月28日晚上9:34:112006/10/28

收件者：

Hello, I just stumbled upon two different behaviours with two different
compilers. Here's a sample code....

std::string a( "-432" );
unsigned long b = 0;
std::stringstream stream;

if( !(stream << a && stream >> b) ) {
// Should fail?
}

On g++ 4.1.2 (20061026 (Debian "prerelease")) and g++ 3.4.2 it does
indeed fail but on my school's modified g++ 4.1 (uses stlport 5.0.1) and
on Visual C++ 2005 it doesn't fail. Which behaviour is correct?

This also affects boost::lexical_cast.

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Carl Barron

未讀,

2006年10月29日凌晨2:51:382006/10/29

收件者：

In article <ei0k7a$ksh$1...@news.cc.tut.fi>, Jonne Lehtinen
<jonne.l...@tut.fi.invalid> wrote:

> Hello, I just stumbled upon two different behaviours with two different
> compilers. Here's a sample code....
>
> std::string a( "-432" );
> unsigned long b = 0;
> std::stringstream stream;
>
> if( !(stream << a && stream >> b) ) {
> // Should fail?
> }
>
> On g++ 4.1.2 (20061026 (Debian "prerelease")) and g++ 3.4.2 it does
> indeed fail but on my school's modified g++ 4.1 (uses stlport 5.0.1) and
> on Visual C++ 2005 it doesn't fail. Which behaviour is correct?
>
> This also affects boost::lexical_cast.

My guess is that it should fail as -432 is not an unsigned long.
They should all work if a is "432". Details are in the specificaton
of
basic_stringbuf. There should be no problem with the if statement
itself if a contained a positive integral representation. [-432 is
not].

Details eventually get down to basic_stringbuf, which add appended
chars, as above, to the 'input sequence'.

Alf P. Steinbach

未讀,

2006年10月29日上午10:31:212006/10/29

收件者：

* Jonne Lehtinen:

> Hello, I just stumbled upon two different behaviours with two different
> compilers. Here's a sample code....
>
> std::string a( "-432" );
> unsigned long b = 0;
> std::stringstream stream;
>
> if( !(stream << a && stream >> b) ) {
> // Should fail?
> }
>
> On g++ 4.1.2 (20061026 (Debian "prerelease")) and g++ 3.4.2 it does
> indeed fail but on my school's modified g++ 4.1 (uses stlport 5.0.1) and
> on Visual C++ 2005 it doesn't fail. Which behaviour is correct?
>
> This also affects boost::lexical_cast.

I'm too lazy to check, but I think this probably falls under the
Undefined Behavior of fscanf & family, which iostream input is defined
in terms of. And if so, what you're up against is Undefined Behavior.
Which means there's no particular correct behavior: any and all
behaviors, including reformatting your harddisk, are then correct.

Physics has its "open dirty secret" of direct conflict between general
relativity and quantum mechanics, that one theory must be utterly wrong.

C++ programming has its "open dirty secret" of Undefined Behavior in the
standard library's stream classes, that they're triple-ungood beasties.

--
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?

James Kanze

未讀,

2006年10月29日上午11:06:022006/10/29

收件者：

Jonne Lehtinen wrote:
> Hello, I just stumbled upon two different behaviours with two
> different compilers. Here's a sample code....

> std::string a( "-432" );
> unsigned long b = 0;
> std::stringstream stream;

> if( !(stream << a && stream >> b) ) {
> // Should fail?
> }

> On g++ 4.1.2 (20061026 (Debian "prerelease")) and g++ 3.4.2 it
> does indeed fail but on my school's modified g++ 4.1 (uses
> stlport 5.0.1) and on Visual C++ 2005 it doesn't fail. Which
> behaviour is correct?

It's undefined behavior, but from a quality of implementation
point of view, I would expect failure. You have the value -432,
which isn't representable in an unsigned long.

--
James Kanze Gabi Software email: kanze...@neuf.fr
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Alberto Ganesh Barbati

未讀,

2006年10月29日晚上11:31:222006/10/29

收件者：

James Kanze ha scritto:

> Jonne Lehtinen wrote:
>
>> std::string a( "-432" );
>> unsigned long b = 0;
>> std::stringstream stream;
>
>> if( !(stream << a && stream >> b) ) {
>> // Should fail?
>> }
>

> It's undefined behavior, but from a quality of implementation
> point of view, I would expect failure. You have the value -432,
> which isn't representable in an unsigned long.
>

I can't see what makes the behaviour undefined. Could you please
elaborate? What would be necessary to make the behavior of that code
defined?

Ganesh

Seungbeom Kim

未讀,

2006年10月30日凌晨3:06:142006/10/30

收件者：

Alberto Ganesh Barbati wrote:
> James Kanze ha scritto:
>> Jonne Lehtinen wrote:
>>
>>> std::string a( "-432" );
>>> unsigned long b = 0;
>>> std::stringstream stream;
>>> if( !(stream << a && stream >> b) ) {
>>> // Should fail?
>>> }
>> It's undefined behavior, but from a quality of implementation
>> point of view, I would expect failure. You have the value -432,
>> which isn't representable in an unsigned long.
>>
>
> I can't see what makes the behaviour undefined. Could you please
> elaborate? What would be necessary to make the behavior of that code
> defined?

27.6.1.2.2 [lib.istream.formatted.arithmetic] defers the behaviour to
22.2.2.1 [lib.locale.num.get], which defers the behaviour to scanf (with
the conversion specifier "%lu" in this case), and the C standard states:
"If this object does not have an appropriate type, or if the result of
the conversion cannot be represented in the object, the behavior is
undefined."(7.9.6.2/10 in C90, 7.19.6.2/10 in C99).

Yes, it does sound harsh -- then how could we write a program that
behaves only in defined ways in response to arbitrary inputs? Extra
checking before calling scanf() or operator>>() (which isn't always
feasible)?

--
Seungbeom Kim

Alberto Ganesh Barbati

未讀,

2006年10月30日清晨5:42:242006/10/30

收件者：

Seungbeom Kim ha scritto:

> Alberto Ganesh Barbati wrote:
>> James Kanze ha scritto:
>>> Jonne Lehtinen wrote:
>>>
>>>> std::string a( "-432" );
>>>> unsigned long b = 0;
>>>> std::stringstream stream;
>>>> if( !(stream << a && stream >> b) ) {
>>>> // Should fail?
>>>> }
>>> It's undefined behavior, but from a quality of implementation
>>> point of view, I would expect failure. You have the value -432,
>>> which isn't representable in an unsigned long.
>>>
>> I can't see what makes the behaviour undefined. Could you please
>> elaborate? What would be necessary to make the behavior of that code
>> defined?
>
> 27.6.1.2.2 [lib.istream.formatted.arithmetic] defers the behaviour to
> 22.2.2.1 [lib.locale.num.get], which defers the behaviour to scanf (with
> the conversion specifier "%lu" in this case), and the C standard states:
> "If this object does not have an appropriate type, or if the result of
> the conversion cannot be represented in the object, the behavior is
> undefined."(7.9.6.2/10 in C90, 7.19.6.2/10 in C99)

That's completely irrelevant. The sentence "If this object does not have
an appropriate type" in the C standard refers to the fact that the type
of the object whose pointer is passed to *scanf shall match the type of
the specifier. For example:

long l
scanf("%ld", &l); // OK: %ld matches a long
scanf("%lu", &l); // Undefined behaviour: %lu matches an unsigned long

In C++ we never need to care about this, because overload rules allows
the type system to automatically select the right specifier.

Ganesh

--

Seungbeom Kim

未讀,

2006年10月30日上午10:05:592006/10/30

收件者：

Alberto Ganesh Barbati wrote:
> Seungbeom Kim ha scritto:

>> 27.6.1.2.2 [lib.istream.formatted.arithmetic] defers the behaviour to
>> 22.2.2.1 [lib.locale.num.get], which defers the behaviour to scanf (with
>> the conversion specifier "%lu" in this case), and the C standard states:
>> "If this object does not have an appropriate type, or if the result of
>> the conversion cannot be represented in the object, the behavior is
>> undefined."(7.9.6.2/10 in C90, 7.19.6.2/10 in C99)
>
> That's completely irrelevant.

Are you sure?

> The sentence "If this object does not have
> an appropriate type" in the C standard refers to the fact that the type
> of the object whose pointer is passed to *scanf shall match the type of
> the specifier. For example:
>
> long l
> scanf("%ld", &l); // OK: %ld matches a long
> scanf("%lu", &l); // Undefined behaviour: %lu matches an unsigned long
>
> In C++ we never need to care about this, because overload rules allows
> the type system to automatically select the right specifier.

Entirely true, but what about the second condition I quoted: "or
if the result of the conversion cannot be represented in the object"?
Negative values cannot be represented in an unsigned integer object.

--
Seungbeom Kim

Alf P. Steinbach

未讀,

2006年10月30日上午11:04:442006/10/30

收件者：

* Alberto Ganesh Barbati:

> Seungbeom Kim ha scritto:
>> Alberto Ganesh Barbati wrote:
>>> James Kanze ha scritto:
>>>> Jonne Lehtinen wrote:
>>>>
>>>>> std::string a( "-432" );
>>>>> unsigned long b = 0;
>>>>> std::stringstream stream;
>>>>> if( !(stream << a && stream >> b) ) {
>>>>> // Should fail?
>>>>> }
>>>> It's undefined behavior, but from a quality of implementation
>>>> point of view, I would expect failure. You have the value -432,
>>>> which isn't representable in an unsigned long.
>>>>
>>> I can't see what makes the behaviour undefined. Could you please
>>> elaborate? What would be necessary to make the behavior of that code
>>> defined?
>> 27.6.1.2.2 [lib.istream.formatted.arithmetic] defers the behaviour to
>> 22.2.2.1 [lib.locale.num.get], which defers the behaviour to scanf (with
>> the conversion specifier "%lu" in this case), and the C standard states:
>> "If this object does not have an appropriate type, or if the result of
>> the conversion cannot be represented in the object, the behavior is
>> undefined."(7.9.6.2/10 in C90, 7.19.6.2/10 in C99)
>
> That's completely irrelevant.

Presumably you mean, some part of the above quoted text, which you've
focused on but fail to specify other than by the anywhere-pointing
designator "that", is completely irrelevant.

It's not unusual that not all of a paragraph from a standard is relevant
for the issue at hand.

So I'd say that's completely irrelevant (heh ;-)).

> The sentence "If this object does not have
> an appropriate type" in the C standard refers to the fact that the type
> of the object whose pointer is passed to *scanf shall match the type of
> the specifier. For example:
>
> long l
> scanf("%ld", &l); // OK: %ld matches a long
> scanf("%lu", &l); // Undefined behaviour: %lu matches an unsigned long
>
> In C++ we never need to care about this, because overload rules allows
> the type system to automatically select the right specifier.

The above is correct, I think, and may perhaps be what the earlier
"that" referred to.

On the other hand, the relevant portion of the quoted part of the C
standard's text is "or if the result of the conversion cannot be
represented in the object".

Some partial workarounds exist. For example, one may use std::getline
to read in one line in a std::string, and then convert using the more
well-defined std::strtod, std::strtol or std::strtoul from the C
library. The problem is AFAICS still an open issue, having simmered
there for eight years or so (could this be political?); see <url:
http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-active.html#23>.

--
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

Nicola Musatti

未讀,

2006年10月30日上午11:05:222006/10/30

收件者：

Alberto Ganesh Barbati wrote:
> Seungbeom Kim ha scritto:

[...]

> > 27.6.1.2.2 [lib.istream.formatted.arithmetic] defers the behaviour to
> > 22.2.2.1 [lib.locale.num.get], which defers the behaviour to scanf (with
> > the conversion specifier "%lu" in this case), and the C standard states:
> > "If this object does not have an appropriate type, or if the result of
> > the conversion cannot be represented in the object, the behavior is
> > undefined."(7.9.6.2/10 in C90, 7.19.6.2/10 in C99)
>
> That's completely irrelevant. The sentence "If this object does not have
> an appropriate type" in the C standard refers to the fact that the type
> of the object whose pointer is passed to *scanf shall match the type of
> the specifier.

[...]

I suspect you missed the "or if the result of the conversion cannot be
represented in the object" part of Seungbeom Kim's quote.

Cheers,
Nicola Musatti

Jiang

未讀,

2006年10月30日上午11:07:342006/10/30

收件者：

James Kanze wrote:
> Jonne Lehtinen wrote:
> > Hello, I just stumbled upon two different behaviours with two
> > different compilers. Here's a sample code....
>
> > std::string a( "-432" );
> > unsigned long b = 0;
> > std::stringstream stream;
>
> > if( !(stream << a && stream >> b) ) {
> > // Should fail?
> > }
>
> > On g++ 4.1.2 (20061026 (Debian "prerelease")) and g++ 3.4.2 it
> > does indeed fail but on my school's modified g++ 4.1 (uses
> > stlport 5.0.1) and on Visual C++ 2005 it doesn't fail. Which
> > behaviour is correct?
>
> It's undefined behavior, but from a quality of implementation
> point of view

IMHO, this is implementation-defined behavior.

In 4.7(Integral conversions), the standard says:

If the destination type is signed, the value is unchanged if it
can be represented in the destination type (and bit-field width);
otherwise, the value is implementation-defined.

The istream's extractors uses num_get, which handling
numeric formatting and conversion.

> , I would expect failure. You have the value -432,
> which isn't representable in an unsigned long.
>

Indeed the conversion should fail in my mind.

I test the above code using Boost::lexical_cast, it does
report the fail by throwing bad_lexical_cast exception:

#include <boost/lexical_cast.hpp>
#include <iostream>

int main()
{

using boost::lexical_cast;
using boost::bad_lexical_cast;

try {

std::string str("-232");
unsigned long i = lexical_cast<unsigned long>(str);

}
catch(bad_lexical_cast& e) {
std::cout << e.what() << std::endl;;
}

}

$ ./blc
bad lexical cast: source type value could not be interpreted as target

James Kanze

未讀,

2006年10月30日上午11:05:432006/10/30

收件者：

Seungbeom Kim wrote:
> Alberto Ganesh Barbati wrote:
> > James Kanze ha scritto:
> >> Jonne Lehtinen wrote:

> >>> std::string a( "-432" );
> >>> unsigned long b = 0;
> >>> std::stringstream stream;
> >>> if( !(stream << a && stream >> b) ) {
> >>> // Should fail?
> >>> }
> >> It's undefined behavior, but from a quality of implementation
> >> point of view, I would expect failure. You have the value -432,
> >> which isn't representable in an unsigned long.

> > I can't see what makes the behaviour undefined. Could you please
> > elaborate? What would be necessary to make the behavior of that code
> > defined?

> 27.6.1.2.2 [lib.istream.formatted.arithmetic] defers the behaviour to
> 22.2.2.1 [lib.locale.num.get], which defers the behaviour to scanf (with
> the conversion specifier "%lu" in this case), and the C standard states:
> "If this object does not have an appropriate type, or if the result of
> the conversion cannot be represented in the object, the behavior is
> undefined."(7.9.6.2/10 in C90, 7.19.6.2/10 in C99).

That's exactly the passages I was thinking of. In the special
case of conversion to unsigned long, however, we have to
consider that the "conversion" of the text string is deferred to
stdtoul, and stdtoul does define a result when a minus sign is
present. A result which, by definition, can be represented in
an unsigned long. So I'm no longer sure that this particular
example is undefined behavior, although the problem exists in
general. (And the defined behavior is, in fact,
counter-intuitive, and not what I would want.)

> Yes, it does sound harsh -- then how could we write a program that
> behaves only in defined ways in response to arbitrary inputs? Extra
> checking before calling scanf() or operator>>() (which isn't always
> feasible)?

I think in general, you have to count a little bit on quality of
implementation, and refuse to use libraries which don't have
correct error handling. If we exclude the special case of an
explicitly negative value being assigned to an unsigned (because
of the above considerations), all of the libraries I have access
to except the STL port detect overflow correctly, setting
failbit. For unsigned, both the standard Sun CC library (Rogue
Wave, I think) and g++ reject negative numbers, VC++
(Dinkumware) seems to interpret the above statements concerning
the behavior of strtoul literally, and accepts them, with the
same results as if the results were ULONG_MAX + 1 - value (but
it is still inconsistent in that it doesn't signal an error if I
input -32768 into an unsigned short. (The results of strtoul
for this string are defined to be 4294934528 on my machines, and
this value isn't representable in an unsigned short.)

And even STL port doesn't do anything worse than returning a
fantasy value with no error---I'm pretty sure that it won't
reformat your hard disk.

But yes, it would be nice if all implementations agreed as to
what the desired behavior was, regarding negative values read
into an unsigned. (The behavior of Dinkumware can only be
intentional---I can't imagine a programming error allowing
-32768 to be read into an unsigned short, but signaling an error
for 65536.) And it would be even nicer if the standard
specified it black on white.

--
James Kanze (GABI Software) email:james...@gmail.com

Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

--

Alberto Ganesh Barbati

未讀,

2006年10月30日下午2:56:522006/10/30

收件者：

Seungbeom Kim ha scritto:

> Alberto Ganesh Barbati wrote:
>> Seungbeom Kim ha scritto:
>>> 27.6.1.2.2 [lib.istream.formatted.arithmetic] defers the behaviour to
>>> 22.2.2.1 [lib.locale.num.get], which defers the behaviour to scanf (with
>>> the conversion specifier "%lu" in this case), and the C standard states:
>>> "If this object does not have an appropriate type, or if the result of
>>> the conversion cannot be represented in the object, the behavior is
>>> undefined."(7.9.6.2/10 in C90, 7.19.6.2/10 in C99)
>> That's completely irrelevant.
>
> Are you sure?
>
>> The sentence "If this object does not have
>> an appropriate type" in the C standard refers to the fact that the type
>> of the object whose pointer is passed to *scanf shall match the type of
>> the specifier. For example:
>>
>> long l
>> scanf("%ld", &l); // OK: %ld matches a long
>> scanf("%lu", &l); // Undefined behaviour: %lu matches an unsigned long
>>
>> In C++ we never need to care about this, because overload rules allows
>> the type system to automatically select the right specifier.
>
> Entirely true, but what about the second condition I quoted: "or
> if the result of the conversion cannot be represented in the object"?
> Negative values cannot be represented in an unsigned integer object.
>

Right, I agree that there's UB in this case and I apologize for not
having read all of your post carefully.

However, one may raise the question: what is the "result of the
conversion"? This is not stupid as it may seem, because if strtoul were
used to make the conversion then the result would always be
non-negative. In fact §7.20.1.4/5 says: "If the subject sequence begins
with a minus sign, the value resulting from the conversion is negated
(in the return type)." However, as footnote 228 suggests and despite the
references to strtol-like functions in §7.19.6.2/12, fscanf is not
required to perform the %ul conversion through strtoul.

Ganesh

--

James Kanze

未讀,

2006年10月31日上午8:59:372006/10/31

收件者：

Alberto Ganesh Barbati wrote:
> Seungbeom Kim ha scritto:
> > Alberto Ganesh Barbati wrote:
> >> Seungbeom Kim ha scritto:
> >>> 27.6.1.2.2 [lib.istream.formatted.arithmetic] defers the behaviour to
> >>> 22.2.2.1 [lib.locale.num.get], which defers the behaviour to scanf
(with
> >>> the conversion specifier "%lu" in this case), and the C standard
states:
> >>> "If this object does not have an appropriate type, or if the result of
> >>> the conversion cannot be represented in the object, the behavior is
> >>> undefined."(7.9.6.2/10 in C90, 7.19.6.2/10 in C99)
> >> That's completely irrelevant.

[...]

> However, one may raise the question: what is the "result of the
> conversion"?

Which has an intuitive answer: the results of converting "-432"
to a numeric value is the value -432. But of course, you really
cannot count on intuition when interpreting the standard.

> This is not stupid as it may seem, because if strtoul were
> used to make the conversion then the result would always be
> non-negative. In fact §7.20.1.4/5 says: "If the subject sequence begins
> with a minus sign, the value resulting from the conversion is negated
> (in the return type)." However, as footnote 228 suggests and despite the
> references to strtol-like functions in §7.19.6.2/12, fscanf is not
> required to perform the %ul conversion through strtoul.

I think that the issue is not 100% clear. I suspect that the
intent of the description in the C standard is that *if* strtoul
returns the result of a conversion, that is what is used, but if
it returns a fixed value as a result of an error, the
implementation is free to do something else. But I'm not sure:
it could also mean that fscanf should always have the same
results as strtoul (i.e. ULONG_MAX in case of overflow). Or
perhaps something vaguer, and the reference to strtoul is just
to not have to repeat how the "correct" value is established.

But I'll admit that the reason I suspect the first is because
this is what Dinkumware does; it most obviously does so
intentionally (since it handles overflow correctly in general),
and Plauger played an important role in the writing of the C
standard which is being referred to here. It's not the point of
view I'd prefer, but it is coherent with the concept that
unsigned values are not cardinal numbers, but modulo numbers.
(But then, of course, the logical implication is that overflow
is not possible. But that creates more than a few
implementation problems, and may have been rejected because of
this.) All the rationale for C says is:

The specification of fscanf is based in part on these principles:

[...]
* The conversions performed by fscanf are compatible with
those performed by strtod and strtol.

(Presumably, the intent here is to include strtoul as well.
This is the rationale, of course, and not the standard, and
isn't as precise.)

This also supports the idea that it is intentional that -432 be
required to result in the value modulo, and not an error. It
still leaves open (IMHO, and both in C and in C++) the question
if the target is an unsigned short. The results of converting
"-432" using strtoul on my machines is either 4294966864 or
18446744073709551184, neither of which fits into an unsigned
short. Which means that we should have undefined behavior. But
it seems inconsistent to accept -432 for unsigned long, but have
it undefined behavior for unsigned short.

--
James Kanze (GABI Software) email:james...@gmail.com

Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

James Kanze

未讀,

2006年10月31日上午9:00:062006/10/31

收件者：

Jiang wrote:
> James Kanze wrote:
> > Jonne Lehtinen wrote:
> > > Hello, I just stumbled upon two different behaviours with two
> > > different compilers. Here's a sample code....

> > > std::string a( "-432" );
> > > unsigned long b = 0;
> > > std::stringstream stream;

> > > if( !(stream << a && stream >> b) ) {
> > > // Should fail?
> > > }

> > > On g++ 4.1.2 (20061026 (Debian "prerelease")) and g++ 3.4.2 it
> > > does indeed fail but on my school's modified g++ 4.1 (uses
> > > stlport 5.0.1) and on Visual C++ 2005 it doesn't fail. Which
> > > behaviour is correct?

> > It's undefined behavior, but from a quality of implementation
> > point of view

> IMHO, this is implementation-defined behavior.

> In 4.7(Integral conversions), the standard says:

> If the destination type is signed, the value is unchanged if it
> can be represented in the destination type (and bit-field width);
> otherwise, the value is implementation-defined.

There's no integral conversion involved. Integral conversions
are between integral types, and do not use the standard library.

> The istream's extractors uses num_get, which handling
> numeric formatting and conversion.

I know. And they define the semantics by reference to fscanf in
the C standard. And the C standard says that if the converted
value isn't representable in the target type, the behavior is
undefined.

As it happens, however, the C standard defines the semantics of
the conversion "as if" strtoul was used. And strtoul defines a
conversion of "-432" to unsigned long; the results must be equal
to ULONG_MAX+1-432. I'd missed this little detail before, but
it would seem that if we take the standard literally, the
conversion is well defined, with well defined results, and an
implementation which reports an error is not conformant.

I don't know for sure whether this is intentional or simply an
accidental result of applying the general principle used to
define signed conversions to unsigned conversions. At any rate,
I'm not yet to the point of posting a bug report to g++ because
they generate an error.

> > , I would expect failure. You have the value -432,
> > which isn't representable in an unsigned long.

> Indeed the conversion should fail in my mind.

> I test the above code using Boost::lexical_cast, it does
> report the fail by throwing bad_lexical_cast exception:

I suspect that boost::lexical_cast doesn't do anything special,
but simply reacts to errors detected in the stringstream it
uses. So whether you get bad_lexical_cast will depend on the
compiler: of the four libraries I have access to, two generate
an error, and two don't (but one that doesn't misses a lot of
other errors as well).

> #include <boost/lexical_cast.hpp>
> #include <iostream>

> int main()
> {
>
> using boost::lexical_cast;
> using boost::bad_lexical_cast;

> try {
> std::string str("-232");
> unsigned long i = lexical_cast<unsigned long>(str);
> }
> catch(bad_lexical_cast& e) {
> std::cout << e.what() << std::endl;;
> }
> }

> $ ./blc
> bad lexical cast: source type value could not be interpreted as target

For which compiler? Which standard library?

--
James Kanze (GABI Software) email:james...@gmail.com

Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Alf P. Steinbach

未讀,

2006年10月31日上午11:06:382006/10/31

收件者：

* James Kanze, about stream extractor for unsigned value:

>
> As it happens, however, the C standard defines the semantics of
> the conversion "as if" strtoul was used. And strtoul defines a
> conversion of "-432" to unsigned long; the results must be equal
> to ULONG_MAX+1-432. I'd missed this little detail before, but
> it would seem that if we take the standard literally, the
> conversion is well defined, with well defined results, and an
> implementation which reports an error is not conformant.

Yes, I saw this in the discussion for active issue 23 (ref [1]).

But surely it never was the intention that a C++ standard library
stream, providing strong typing for novices, should successfully produce
ULONG_MAX+1-432 for input "-432"?

If it was then the streams are IMO even more than triple-ungood, not
just inadvertently complex, cryptic, error-prone, limited, and
inefficient, but by design delivering unusable, unexpected results...

Notes:
[1] <url: http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-active.html#23>

--
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

Alberto Ganesh Barbati

未讀,

2006年10月31日下午2:21:292006/10/31

收件者：

James Kanze ha scritto:

> Alberto Ganesh Barbati wrote:
>> This is not stupid as it may seem, because if strtoul were
>> used to make the conversion then the result would always be
>> non-negative. In fact §7.20.1.4/5 says: "If the subject sequence begins
>> with a minus sign, the value resulting from the conversion is negated
>> (in the return type)." However, as footnote 228 suggests and despite the
>> references to strtol-like functions in §7.19.6.2/12, fscanf is not
>> required to perform the %ul conversion through strtoul.
>
> I think that the issue is not 100% clear. I suspect that the
> intent of the description in the C standard is that *if* strtoul
> returns the result of a conversion, that is what is used, but if
> it returns a fixed value as a result of an error, the
> implementation is free to do something else. But I'm not sure:
> it could also mean that fscanf should always have the same
> results as strtoul (i.e. ULONG_MAX in case of overflow). Or
> perhaps something vaguer, and the reference to strtoul is just
> to not have to repeat how the "correct" value is established.

A question just passed in my mind. I probably already know the answer,
but... here it is: why is %ul allowed to parse a negative number in the
first place? If the first non-white char is '-' then we could just stop
immediately and return failure. IMHO, it would be better than trying to
parse a negative number and then cause undefined behavior because it
won't fit the destination variable. That fits the I/O attitude of
scanf-like functions. For strtoul, it makes more sense to parse
negative numbers, because the application has the whole string: it
doesn't need to extract it one char at a time, with the potential need
to push an offending char back in the stream.

> This also supports the idea that it is intentional that -432 be
> required to result in the value modulo, and not an error. It
> still leaves open (IMHO, and both in C and in C++) the question
> if the target is an unsigned short. The results of converting
> "-432" using strtoul on my machines is either 4294966864 or
> 18446744073709551184, neither of which fits into an unsigned
> short. Which means that we should have undefined behavior. But
> it seems inconsistent to accept -432 for unsigned long, but have
> it undefined behavior for unsigned short.

Good point. If we want to take to the letter the statement "If the

subject sequence begins with a minus sign, the value resulting from

the conversion is negated (in the return type)." then we would first
load 432 into an unsigned short and then negate the result. However,
it's just speculation... the standard definitely isn't clear enough.

I feel very uncomfortable to know that every time I read an unsigned
integer from a user-provided stream I risk undefined behavior. Of course
I would prefer the behavior to always be defined, but at least I think
it's reasonable to expect it to be implementation-defined or
unspecified. Don't you think?

Ganesh

James Kanze

未讀,

2006年10月31日下午2:20:192006/10/31

收件者：

Alf P. Steinbach wrote:
> * James Kanze, about stream extractor for unsigned value:

> > As it happens, however, the C standard defines the semantics of
> > the conversion "as if" strtoul was used. And strtoul defines a
> > conversion of "-432" to unsigned long; the results must be equal
> > to ULONG_MAX+1-432. I'd missed this little detail before, but
> > it would seem that if we take the standard literally, the
> > conversion is well defined, with well defined results, and an
> > implementation which reports an error is not conformant.

> Yes, I saw this in the discussion for active issue 23 (ref [1]).

> But surely it never was the intention that a C++ standard library
> stream, providing strong typing for novices, should successfully produce
> ULONG_MAX+1-432 for input "-432"?

One would hope so, but there is also apparently an intention to
conform to C in this respect. Even when C does something
completely ridiculous like saying that overflow is undefined
behavior.

I'll admit that I don't really get it either. I'm just quoting
the standard.

> If it was then the streams are IMO even more than triple-ungood, not
> just inadvertently complex, cryptic, error-prone, limited, and
> inefficient, but by design delivering unusable, unexpected results...

I see you've never used C I/O. Speak about cryptic, error-prone
and limited.

> [1] <url: http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-active.html#23>

There's still the little awkwardness as to whether -432 is a
valid value for an unsigned. I don't think that aspect was
considered in the open issue. And as long as the semantics of
the basic conversion are defined in terms of strtoul, it's going
to be an open question.

Personally, given the semantics of unsigned types, I could very
well accept either of two resolutions:

-- The actual value of the string presented must be between 0
and std::numeric_types<T>::max() (inclusive). This would
make unsigned work like signed for input (even if is doesn't
work the same anywhere else).

-- The actual value is reduced modulo
std::numeric_types<T>::max()+1. In this case, negative
values would be accepted, but so would excessively large
values. This has the advantage of corresponding to the
semantics of unsigned types in C++, but I rather suspect
that it would be extremely difficult to implement.

Of course, a perhaps even simpler solution would be to modify
stage 2 in §22.2.2.1.2/3 to remove + and - from the list of
atoms if the target type is unsigned. This would at least
eliminate the ambiguity concerning negative unsigned values.

--
James Kanze (GABI Software) email:james...@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

--

Alf P. Steinbach

未讀,

2006年10月31日晚上8:45:062006/10/31

收件者：

* James Kanze:

> Alf P. Steinbach wrote:
>>
>> But surely it never was the intention that a C++ standard library
>> stream, providing strong typing for novices, should successfully produce
>> ULONG_MAX+1-432 for input "-432"?
>>

>> If it was then the streams are IMO even more than triple-ungood, not
>> just inadvertently complex, cryptic, error-prone, limited, and
>> inefficient, but by design delivering unusable, unexpected results...
>
> I see you've never used C I/O. Speak about cryptic, error-prone
> and limited.

When I think of C i/o I think of 'creat', the biggest mistake in the
design of original Unix (namely, "I'd spell creat with an e"), and the
slow-reader program for Deep Blue. Something bad happened during
standardization of C: the standardized functionality isn't the de facto
real & useful one. And something bad happened during standardization of
C++, but since C only allows you to inadvertently shoot yourself in the
foot, while C++ allows you to inadvertently level a whole city block,
that something bad that happened for C++ was orders of magnitude worse.

In medium-short, it seems to me that the Wanton Abstractionists and
Generalizationists won, for languages meant for serious systems-level
programming where one needs control, with the result that the
standardized i/o functionality for C is handicapped (thus, Posix) and
for C++ lobotomized while in either case not being any more portable,
and actually just as system-dependent as it would have been with a more
complete, less over-abstracted and actually useful toolset.

Just my humble opinion, of course.

[snip]

> Of course, a perhaps even simpler solution would be to modify
> stage 2 in §22.2.2.1.2/3 to remove + and - from the list of
> atoms if the target type is unsigned. This would at least
> eliminate the ambiguity concerning negative unsigned values.

That sounds like a kludge, chewing-gum and elastic bands applied to stop
the rattling in one special case, instead of a general solution.

I remember when I once jokingly remarked -- was it here? -- that we
ditch the whole standard library except the original STL parts (which
weren't even an original basis but added in quite late, IIRC).

Tellingly, and a revelation for me, most of the first responses took
that seriously instead of as a Good Joke, arguing seriously for and
against the proposition. But then, as Piet Hein remarked, one who takes
a joke only as a joke, and serious stuff only seriously, has actually
understood equally little of both. Of course, in Danish it rhymed! ;-)

--
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

Jiang

未讀,

2006年10月31日晚上9:21:512006/10/31

收件者：

James Kanze wrote:
> Jiang wrote:
> > James Kanze wrote:
> > > Jonne Lehtinen wrote:
> > > > Hello, I just stumbled upon two different behaviours with two
> > > > different compilers. Here's a sample code....
>
> > > > std::string a( "-432" );
> > > > unsigned long b = 0;
> > > > std::stringstream stream;
>
> > > > if( !(stream << a && stream >> b) ) {
> > > > // Should fail?
> > > > }
>
> > > > On g++ 4.1.2 (20061026 (Debian "prerelease")) and g++ 3.4.2 it
> > > > does indeed fail but on my school's modified g++ 4.1 (uses
> > > > stlport 5.0.1) and on Visual C++ 2005 it doesn't fail. Which
> > > > behaviour is correct?
>
> > > It's undefined behavior, but from a quality of implementation
> > > point of view
>
> > IMHO, this is implementation-defined behavior.
>
> > In 4.7(Integral conversions), the standard says:
>
> > If the destination type is signed, the value is unchanged if it
> > can be represented in the destination type (and bit-field width);
> > otherwise, the value is implementation-defined.
>
> There's no integral conversion involved. Integral conversions
> are between integral types, and do not use the standard library.
>

But finally the standard library needs the conversion, because
we do not have an integer at the beginning, right?

> > The istream's extractors uses num_get, which handling
> > numeric formatting and conversion.
>
> I know. And they define the semantics by reference to fscanf in
> the C standard. And the C standard says that if the converted
> value isn't representable in the target type, the behavior is
> undefined.
>

Actually I feel strange that we need C language standard to
induce the C++ standard rule (Although I did quote C99 in
this newsgroup for some reasons). Maybe I will post a message
for this issue in comp.std.c++.

> As it happens, however, the C standard defines the semantics of
> the conversion "as if" strtoul was used. And strtoul defines a
> conversion of "-432" to unsigned long; the results must be equal
> to ULONG_MAX+1-432. I'd missed this little detail before, but
> it would seem that if we take the standard literally, the
> conversion is well defined, with well defined results, and an
> implementation which reports an error is not conformant.
>
> I don't know for sure whether this is intentional or simply an
> accidental result of applying the general principle used to
> define signed conversions to unsigned conversions. At any rate,
> I'm not yet to the point of posting a bug report to g++ because
> they generate an error.
>

This issue is not clear for me, because the C++ standard
does not say it explicitly that for this issue we should refer
C standard (c90, since C++ standard predates C99 )

> > > , I would expect failure. You have the value -432,
> > > which isn't representable in an unsigned long.
>
> > Indeed the conversion should fail in my mind.
>
> > I test the above code using Boost::lexical_cast, it does
> > report the fail by throwing bad_lexical_cast exception:
>
> I suspect that boost::lexical_cast doesn't do anything special,
> but simply reacts to errors detected in the stringstream it
> uses. So whether you get bad_lexical_cast will depend on the
> compiler: of the four libraries I have access to, two generate
> an error, and two don't (but one that doesn't misses a lot of
> other errors as well).
>

Correct, see my test results below please.

> > #include <boost/lexical_cast.hpp>
> > #include <iostream>
>
> > int main()
> > {
> >
> > using boost::lexical_cast;
> > using boost::bad_lexical_cast;
>
> > try {
> > std::string str("-232");
> > unsigned long i = lexical_cast<unsigned long>(str);
> > }
> > catch(bad_lexical_cast& e) {
> > std::cout << e.what() << std::endl;;
> > }
> > }
>
> > $ ./blc
> > bad lexical cast: source type value could not be interpreted as target
>
> For which compiler? Which standard library?
>

For above Boost lexical_cast test case, following compilers throw
the bad_lexical_cast exception.

1. GCC 4.0.3 ( ubuntu build, libstdc++.so.6 )

2. GCC 3.4.4 ( cygwin build, libstdc++-v3 cygwin port. newlib)

3. Borland c++ 5.5.1

However, the following implementations do not throw exception,
and the cast result is exactly what you said.

1. MS vc7.1

2. MS vc8.0

3. Intel icl 9.0

4. Comeau 4.3.3 ( vc7.1 back-end + libcomo )

Undefined behavior? implementation-defined behavior?
Or simply compiler bugs ?

James Kanze

未讀,

2006年11月1日上午11:32:432006/11/1

收件者：

Alberto Ganesh Barbati wrote:
> James Kanze ha scritto:
> > Alberto Ganesh Barbati wrote:
> >> This is not stupid as it may seem, because if strtoul were
> >> used to make the conversion then the result would always be
> >> non-negative. In fact §7.20.1.4/5 says: "If the subject sequence begins
> >> with a minus sign, the value resulting from the conversion is negated
> >> (in the return type)." However, as footnote 228 suggests and despite
the
> >> references to strtol-like functions in §7.19.6.2/12, fscanf is not
> >> required to perform the %ul conversion through strtoul.

> > I think that the issue is not 100% clear. I suspect that the
> > intent of the description in the C standard is that *if* strtoul
> > returns the result of a conversion, that is what is used, but if
> > it returns a fixed value as a result of an error, the
> > implementation is free to do something else. But I'm not sure:
> > it could also mean that fscanf should always have the same
> > results as strtoul (i.e. ULONG_MAX in case of overflow). Or
> > perhaps something vaguer, and the reference to strtoul is just
> > to not have to repeat how the "correct" value is established.

> A question just passed in my mind. I probably already know the answer,
> but... here it is: why is %ul allowed to parse a negative number in the
> first place?

That's really a very good question. Logically, one would not
expect a minus sign to be part of a legal representation for an
unsigned. But... strtoul specifically allows it, and defines
what it means. And C++ doesn't have a separate list of "atoms"
for the unsigned types, and so automatically has it.

As to what is preferable, one can argue both ways: target is
unsigned, so no - sign allowed, or target is a numeric type, so
minus sign is allowed, but if the resulting value isn't
representable in the type (and it won't be if the minus sign is
present and the type is unsigned), an error occurs.

The difference in the two cases is largely in the type of error
one gets, overflow, or syntax error. And since C++ doesn't
allow distinguishing them:-)...

> If the first non-white char is '-' then we could just stop
> immediately and return failure. IMHO, it would be better than trying to
> parse a negative number and then cause undefined behavior because it
> won't fit the destination variable.

>From what I understand from the defect report someone posted,
the intent of the committee is to require an error if the result
of the conversion isn't representable in the target type. We
still have the problem that as currently defined, the conversion
is the equivalent of %u in an fscanf, which means that -432
converts to a large positive value, and not a negative value.

(Again: don't shoot the messager. I don't like it any more than
you do.)

> That fits the I/O attitude of
> scanf-like functions. For strtoul, it makes more sense to parse
> negative numbers, because the application has the whole string: it
> doesn't need to extract it one char at a time, with the potential need
> to push an offending char back in the stream.

It's still a bit weird, IMHO.

Part of the problem, I think, is that the specification for
these functions was written by compiler writers, with potential
use in their compiler in mind. And of course, when writing C or
C++, it's fairly frequent to initialize an unsigned with -1, to
set all bits to one, or to obtain the maximum value. So they
decided to support the same thing in their input conversion
routines, even if it doesn't make sense in general.

For the time being: it's just one more reason to avoid unsigned.
The standard integral type is int; just use int, and everything
will be fine (at least with most implementations).

> > This also supports the idea that it is intentional that -432 be
> > required to result in the value modulo, and not an error. It
> > still leaves open (IMHO, and both in C and in C++) the question
> > if the target is an unsigned short. The results of converting
> > "-432" using strtoul on my machines is either 4294966864 or
> > 18446744073709551184, neither of which fits into an unsigned
> > short. Which means that we should have undefined behavior. But
> > it seems inconsistent to accept -432 for unsigned long, but have
> > it undefined behavior for unsigned short.

> Good point. If we want to take to the letter the statement "If the
> subject sequence begins with a minus sign, the value resulting from
> the conversion is negated (in the return type)." then we would first
> load 432 into an unsigned short and then negate the result.

I suspect that that is the intent. If only because that's what
Dinkumware does, and Plauger was very active in the
standardization of this. However, if you do this, you can't
simply forward to the C standard, or at least not to fscanf; you
can't use strtoul directly, because it defines the results of
the conversion as negating an unsigned long, and once you've
done that, you don't have a negative value to play with.

> However,
> it's just speculation... the standard definitely isn't clear enough.

And doesn't necessarily make sense where it is clear.

> I feel very uncomfortable to know that every time I read an unsigned
> integer from a user-provided stream I risk undefined behavior. Of course
> I would prefer the behavior to always be defined, but at least I think
> it's reasonable to expect it to be implementation-defined or
> unspecified. Don't you think?

What do I think about it? I can see absolutely no reason not to
require failbit to be set. And the corresponding error code in
errno, given that we have no other means of reliably reporting
the type of error. And I can see no logical reason for
accepting a negative value when reading an unsigned, even if
this is what C intentially does.

(It's a shame Plauger hasn't intervened in this thread. He was
very active in the standardization of the C library, and could
doubtlessly give us vital insights into the intent and the why
of the semantics of stdtoul and the fscanf definitions. Not
that I think he'd convince me to change my position with regards
to C++, but I am curious as to what they were thinking of. As
you say, the result is that practically any program which reads
user provided input has undefined behavior.)

--
James Kanze (Gabi Software) email: james...@gmail.com

Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

James Kanze

未讀,

2006年11月1日上午11:47:142006/11/1

收件者：

Alf P. Steinbach wrote:
> * James Kanze:
> > Alf P. Steinbach wrote:

> >> But surely it never was the intention that a C++ standard library
> >> stream, providing strong typing for novices, should successfully
produce
> >> ULONG_MAX+1-432 for input "-432"?

> >> If it was then the streams are IMO even more than triple-ungood, not
> >> just inadvertently complex, cryptic, error-prone, limited, and
> >> inefficient, but by design delivering unusable, unexpected results...

> > I see you've never used C I/O. Speak about cryptic, error-prone
> > and limited.

> When I think of C i/o I think of 'creat', the biggest mistake in the
> design of original Unix (namely, "I'd spell creat with an e"),

I think that was meant to be a joke. I can't imagine anyone
being that arrogant.

> and the
> slow-reader program for Deep Blue. Something bad happened during
> standardization of C: the standardized functionality isn't the de facto
> real & useful one.

Well, the de facto real and useful one was subtly different
between Unix and CP/M. Also, the de facto real and useful one
was only really usable at the lowest level, and couldn't
possibly be ported to mainframes of the day (and of today). The
problem is that at that level, you really are OS dependant.

> And something bad happened during standardization of
> C++, but since C only allows you to inadvertently shoot yourself in the
> foot, while C++ allows you to inadvertently level a whole city block,
> that something bad that happened for C++ was orders of magnitude worse.

> In medium-short, it seems to me that the Wanton Abstractionists and
> Generalizationists won, for languages meant for serious systems-level
> programming where one needs control, with the result that the
> standardized i/o functionality for C is handicapped (thus, Posix) and
> for C++ lobotomized while in either case not being any more portable,
> and actually just as system-dependent as it would have been with a more
> complete, less over-abstracted and actually useful toolset.

> Just my humble opinion, of course.

I'd partially agree.

The problem is that one size doesn't fit all, and trying to make
all possible I/O abstractions fit into one model is doomed to
failure. The C++ I/O streams are an excellent abstraction for
streamed text input and output, but the functionality necessary
for things like seeking, bi-directional I/O, binary I/O, etc. is
a hack, more tacked on than part of the basic abstraction, and
not very easily used.

Posix is NOT an alternative, even on Posix systems. Posix is
(intentionally) a much, much lower level abstraction. It is
something to build on---when I need an I/O abstraction other
than streamed text, I write my own, building on Posix. (But of
course, I don't have to be 100% portable. As far as my customer
is concerned, if it runs under Solaris, it's fine, and if it
also runs under Linux, we can consider it "portable".)

> [snip]
> > Of course, a perhaps even simpler solution would be to modify
> > stage 2 in §22.2.2.1.2/3 to remove + and - from the list of
> > atoms if the target type is unsigned. This would at least
> > eliminate the ambiguity concerning negative unsigned values.

> That sounds like a kludge, chewing-gum and elastic bands applied to stop
> the rattling in one special case, instead of a general solution.

It is a kludge. (Although I think you also suggested, in
another posting, that we shouldn't accept a - sign if we were
inputting to an unsigned type.) However, if we suppose the TR
results in requiring failbit to be set if the converted value
isn't representable, it sounds like the smallest modification of
the standard which would define a reasonable behavior for
unsigned values. And I don't think that the committee is in a
mood to rewrite all of the specifications for numeric
conversions for the next release of the standard. (The goal is
for it to be C++0x, and given the bookkeeping overhead necessary
between the final draft and the standard itself, I think this
means only two or three more meetings to finalize all of the
wording. And a lot of very interesting and useful proposals are
still very far from a first draft of the wording in the
standard.)

> I remember when I once jokingly remarked -- was it here? -- that we
> ditch the whole standard library except the original STL parts (which
> weren't even an original basis but added in quite late, IIRC).

Why caution the STL parts? They don't work very well in
application code, and are certainly more poorly designed than
iostream.

I think that there is a problem in general with the library, in
that it does try to be all things to all people. The more I
think of it, the more I think that maybe we need several
different standard libraries (in different namespaces?) for
different levels: what is good for an applications programmer
isn't necessarily good for someone working at the lower levels.
Having a general facility for streamed text I/O, for example, is
considered almost essential for an application programmer, but
that facility doesn't lend itself at all to being a base for
building up other abstractions.

--
James Kanze (Gabi Software) email: james...@gmail.com

Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

--

James Kanze

未讀,

2006年11月1日中午12:32:372006/11/1

收件者：

This is perhaps a bit confusing, because the word "conversion"
is being used for two things that are technically very
different. The standard library must implement a conversion of
text to a numerical value. There is no such conversion in the
language proper. §4.7 discusses conversions in the language
itself, and not those implemented by the library. And not all
of those: the very first words in the section are "An rvalue of
an integer type can be converted to an rvalue of another integer
type. An rvalue of an enumeration type can be converted to an
rvalue of an integer type." And those are the conversions which
are meant by "Integral conversions". There is no integral
conversion here; the standard also specifies (in various library
sections, and by references to the C library) how a string of
text is converted (at run-time) to an integral type, but
integral conversions have nothing to do with this, except in so
far as the standard says they do.

> > > The istream's extractors uses num_get, which handling
> > > numeric formatting and conversion.

> > I know. And they define the semantics by reference to fscanf in
> > the C standard. And the C standard says that if the converted
> > value isn't representable in the target type, the behavior is
> > undefined.

> Actually I feel strange that we need C language standard to
> induce the C++ standard rule (Although I did quote C99 in
> this newsgroup for some reasons). Maybe I will post a message
> for this issue in comp.std.c++.

The reason is simple. It was definitly the intent of the
committee that the results of such conversions be the same in C
and in C++. And what better way to ensure this than by refering
to the C standard in C++.

> > As it happens, however, the C standard defines the semantics of
> > the conversion "as if" strtoul was used. And strtoul defines a
> > conversion of "-432" to unsigned long; the results must be equal
> > to ULONG_MAX+1-432. I'd missed this little detail before, but
> > it would seem that if we take the standard literally, the
> > conversion is well defined, with well defined results, and an
> > implementation which reports an error is not conformant.

> > I don't know for sure whether this is intentional or simply an
> > accidental result of applying the general principle used to
> > define signed conversions to unsigned conversions. At any rate,
> > I'm not yet to the point of posting a bug report to g++ because
> > they generate an error.

> This issue is not clear for me, because the C++ standard
> does not say it explicitly that for this issue we should refer
> C standard (c90, since C++ standard predates C99 )

Yes it does. The description of the semantics of the num_get
virtual functions, in §22.2.2.1.2, makes explicit reference to
the stdio conversion specifiers, and in stage three, it says "A
sequence of chars has been accumulated in stage 2 that is
converted (according to the rules of scanf) to a value of the
type of val [...] The sequence of chars accumulated in stage 2
would have caused scanf to report an input failure." The
semantics of the conversion are defined by what scanf would have
done.

[...]

> For above Boost lexical_cast test case, following compilers throw
> the bad_lexical_cast exception.

> 1. GCC 4.0.3 ( ubuntu build, libstdc++.so.6 )

> 2. GCC 3.4.4 ( cygwin build, libstdc++-v3 cygwin port. newlib)

> 3. Borland c++ 5.5.1

> However, the following implementations do not throw exception,
> and the cast result is exactly what you said.

> 1. MS vc7.1

> 2. MS vc8.0

> 3. Intel icl 9.0

> 4. Comeau 4.3.3 ( vc7.1 back-end + libcomo )

> Undefined behavior? implementation-defined behavior?
> Or simply compiler bugs ?

As currently worded, I don't think an error is allowed in this
specific case. I'm far from sure that this was the intent,
however, and I, personnally, would prefer the error.

--
James Kanze (Gabi Software) email: james...@gmail.com

Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34