floating issue/idea: istream >> x "fail" case suppresses eof

62 views
Skip to first unread message

antho...@gmail.com

unread,
Feb 25, 2015, 11:29:37 PM2/25/15
to std-pr...@isocpp.org
Hello,

This is a "Float the idea" post per the https://isocpp.org/std/submit-a-proposal steps, though my idea may break old code so I'm currently just seeking to see if I'm missing something, there's some existing proposal in this space, interest levels, and hopefully a better idea!

The task: read numbers from a stream, print error message if there's bad input

Commonly recommended approach:

    int n;
    while (stream >> n)
        ;
    if (!stream && !stream.eof())
        std::cerr << "choked on bad input\n";

Failing case: stream contains say "-", at least on the GCC compiler at ideone.com the stream >> n operation sets both failbit and eofbit, as '-' is consumed as a potentially valid part of a number that never materialises.

Available solution: you can use a controlling while (!(stream >> std::ws).eof) then attempt stream >> n inside the loop, but that frustrates the current ability to concisely express more complex multi-variable input operations as chains of >> and short-circuited &&s with getline() and tests of just-read variables.  Exceptions might be a valid solution here too, but IMHO it's clearly undesirable that there not be an exception-free alternative.

(Doomed) Proposal: the while (stream >> n) approach above could be made more robust if the standard-library-provided overloads set only the failbit in these circumstances, and left eof to be reported only if another input operation is attempted.

Then for user-defined overloads, it'd be nice if beginners could be taught something simple like:

    struct X { int a_, b_; };

    std::istream& operator>>(std::istream& is, const X& x)
    {
        return is >> x.a_ >> x.b_;
    }

But this may also leave the stream in a fail+eof condition - e.g. istringstream("12") - where the caller can't differentiate "a" parsed successfully but "b" failed from a simple eof for an empty/whitespace stream.

Correct code - assuming the standard library's overload for int is adjusted as proposed above - might be:

    std::istream& operator>>(std::istream& is, const X& x)
    {
        if (is >> x.a_)
            if (!(is >> x.b_))
            {
               is.clear(); // in case eof was set
               is.setstate(std::ios_base::failbit); // "just" fail
            }
        return is;
    }

Perhaps an iomanipulator could help here:

    std::istream& operator>>(std::istream& is, const X& x)
    {
        return is >> x.a_ >> x.b_ >> if_fail_clear_eof;
    }

And the doom and gloom: this would be a breaking change, so is probably a non-started.  For example, someone dead-certain that their input was a string of at least one space-separated numbers (with no trailing whitespace), might currently get away with:

    while (!stream.eof()) { stream >> n; use(n); }

With the proposal above, the eof() wouldn't become true immediately after streaming the last number.

Perhaps someone will have a better idea.  It bothers me that the common, intuitive, expressive and concise approach can fail to notice parsing errors....

Cheers,
Tony

David Krauss

unread,
Feb 26, 2015, 12:18:13 AM2/26/15
to std-pr...@isocpp.org
On 2015–02–26, at 12:29 PM, antho...@gmail.com wrote:

Hello,

This is a "Float the idea" post per the https://isocpp.org/std/submit-a-proposal steps, though my idea may break old code so I'm currently just seeking to see if I'm missing something, there's some existing proposal in this space, interest levels, and hopefully a better idea!

The task: read numbers from a stream, print error message if there's bad input

Commonly recommended approach:

    int n;
    while (stream >> n)
        ;
    if (!stream && !stream.eof())
        std::cerr << "choked on bad input\n”;

By whom? The most common recommendation I recall on this topic is, “eof() probably isn’t what you want.”

To check for a failure due to bad input or parsing, use fail(). (And to check for an I/O error, use bad().) To check that the stream is emptied without error in general, use rdstate() == std::ios::eofbit.

Failing case: stream contains say "-", at least on the GCC compiler at ideone.com the stream >> n operation sets both failbit and eofbit, as '-' is consumed as a potentially valid part of a number that never materialises.

Unless you recognize the failure and seek the stream back before the dash (or somehow unget it), the condition is irrecoverable.

Available solution: you can use a controlling while (!(stream >> std::ws).eof)

In a loop, checking eof() is especially not to be encouraged.

(Doomed) Proposal: the while (stream >> n) approach above could be made more robust if the standard-library-provided overloads set only the failbit in these circumstances, and left eof to be reported only if another input operation is attempted.

Perhaps iostreams would be more robust if failed operations didn’t consume characters. It’s not particularly robust, or fast. Maybe it can be replaced some day with something better.

But what robustness is added by clearing eof when no input remains? Any attempt to recover and re-read will just set eof() again.

    while (!stream.eof()) { stream >> n; use(n); }

This is exactly what you should never do.

With the proposal above, the eof() wouldn't become true immediately after streaming the last number.

Perhaps someone will have a better idea.  It bothers me that the common, intuitive, expressive and concise approach can fail to notice parsing errors….

Try this:

while (stream) { stream >> n; use(n); }
if (stream.fail()) std::cerr << "choked on bad input\n";
if (stream.bad()) std::cerr << "could not read the input\n";

Or more safely and lazily:

stream.exceptions( std::ios::failbit | std::ios::badbit );
while (stream) { stream >> n; use(n); }

Perhaps more folks would use automatic stream exceptions if they were more friendly. Here are some ideas:

1. Specify or recommend better default error messages for uncaught ios_base::failure exceptions.
2. Add a hook to ios_base::failure to add a user-visible file or stream name, let the standard streams initialize it in an unspecified way.
3. Create a sentry guard so you can set exceptions() at the beginning of a try block, and have it restored when it exits.

antho...@gmail.com

unread,
Feb 26, 2015, 2:26:19 AM2/26/15
to std-pr...@isocpp.org


On Thursday, February 26, 2015 at 2:18:13 PM UTC+9, David Krauss wrote:

On 2015–02–26, at 12:29 PM, antho...@gmail.com wrote:

Hello,

This is a "Float the idea" post per the https://isocpp.org/std/submit-a-proposal steps, though my idea may break old code so I'm currently just seeking to see if I'm missing something, there's some existing proposal in this space, interest levels, and hopefully a better idea!

The task: read numbers from a stream, print error message if there's bad input

Commonly recommended approach:

    int n;
    while (stream >> n)
        ;
    if (!stream && !stream.eof())
        std::cerr << "choked on bad input\n”;

By whom? The most common recommendation I recall on this topic is, “eof() probably isn’t what you want.”

To check for a failure due to bad input or parsing, use fail(). (And to check for an I/O error, use bad().) To check that the stream is emptied without error in general, use rdstate() == std::ios::eofbit.
 
The point is that fail() is set after stream >> n hits eof, without there being "bad"/un-parsable input.
 
Failing case: stream contains say "-", at least on the GCC compiler at ideone.com the stream >> n operation sets both failbit and eofbit, as '-' is consumed as a potentially valid part of a number that never materialises.

Unless you recognize the failure and seek the stream back before the dash (or somehow unget it), the condition is irrecoverable.

Exactly.  My point is: isn't that undesirable?  Shouldn't the library make it easy to see why the loop terminated?
Available solution: you can use a controlling while (!(stream >> std::ws).eof)

In a loop, checking eof() is especially not to be encouraged.

 Agreed - it's in desperation that coders are forced back to this ugly and error-prone mess.  Hence my original post.
(Doomed) Proposal: the while (stream >> n) approach above could be made more robust if the standard-library-provided overloads set only the failbit in these circumstances, and left eof to be reported only if another input operation is attempted.

Perhaps iostreams would be more robust if failed operations didn’t consume characters. It’s not particularly robust, or fast. Maybe it can be replaced some day with something better.

But what robustness is added by clearing eof when no input remains? Any attempt to recover and re-read will just set eof() again.

The robustness improvement comes from differentiating a stream of good input from a stream terminated by bad (likely truncated in the case of "-") input.  It's not about being able to get other input from the stream afterwards; rather, one might print an error and terminate until the user fixes the input instead of presenting results calculated from bogus/incomplete input.
 
    while (!stream.eof()) { stream >> n; use(n); }

This is exactly what you should never do.

That's not a particularly useful perspective.  I've presented it only to illustrate currently working (but extremely fragile) code that would be broken by this proposal.  Sure we can deride anyone silly enough to write code like this (I certainly would in a code review), but it's still important for me to acknowledge and illustrate how the proposal is a breaking change.
With the proposal above, the eof() wouldn't become true immediately after streaming the last number.

Perhaps someone will have a better idea.  It bothers me that the common, intuitive, expressive and concise approach can fail to notice parsing errors….

Try this:

while (stream) { stream >> n; use(n); }
if (stream.fail()) std::cerr << "choked on bad input\n";
if (stream.bad()) std::cerr << "could not read the input\n";

 
Tried here: http://ideone.com/BpKisk

Result: no warning for istringstream("2 3 -");
 
Or more safely and lazily:

stream.exceptions( std::ios::failbit | std::ios::badbit );
while (stream) { stream >> n; use(n); }
 
That doesn't seem to work either - same terminating stream state for "3 4 -" and "3 4" - see http://ideone.com/UYKQiS
 
Perhaps more folks would use automatic stream exceptions if they were more friendly. Here are some ideas:

1. Specify or recommend better default error messages for uncaught ios_base::failure exceptions.
2. Add a hook to ios_base::failure to add a user-visible file or stream name, let the standard streams initialize it in an unspecified way.
3. Create a sentry guard so you can set exceptions() at the beginning of a try block, and have it restored when it exits.

Who could argue with better error messages?  Scope guards sound good too - but easily implemented outside the Standard.  For 2 - if the exception's caught in the scope containing the stream then it's not necessary, and if not the stream's already destroyed.

Thanks for the input.

David Krauss

unread,
Feb 26, 2015, 3:19:29 AM2/26/15
to std-pr...@isocpp.org
On 2015–02–26, at 3:26 PM, antho...@gmail.com wrote:

The point is that fail() is set after stream >> n hits eof, without there being "bad"/un-parsable input.

There was bad input, the dash character. And your proposal is that eofbit sometimes not be set at the EOF, not that failbit not be set after a conversion failure.

If you’re concerned with conversion failures and not with eof, check failbit/fail() and not eofbit/eof().

In a loop, checking eof() is especially not to be encouraged.

 Agreed - it's in desperation that coders are forced back to this ugly and error-prone mess.  Hence my original post.

There are right answers, albeit buried among wrong answers. That’s the way iostreams is, but this proposal seems to be shaking things up without making correct code easier overall.

Again, though, eof() is seldom the right solution.

The robustness improvement comes from differentiating a stream of good input from a stream terminated by bad (likely truncated in the case of "-") input.  It's not about being able to get other input from the stream afterwards; rather, one might print an error and terminate until the user fixes the input instead of presenting results calculated from bogus/incomplete input.

Streams with bad input are differentiated by failbit. Period.

    while (!stream.eof()) { stream >> n; use(n); }

This is exactly what you should never do.

That's not a particularly useful perspective.  I've presented it only to illustrate currently working (but extremely fragile) code that would be broken by this proposal.  Sure we can deride anyone silly enough to write code like this (I certainly would in a code review), but it's still important for me to acknowledge and illustrate how the proposal is a breaking change.

Okay, but it looks similar to your first example, in that eof is being used as a proxy for something else.

In this case, any conversion error leads to an infinite loop, which is extra bad.

Tried here: http://ideone.com/BpKisk 

Result: no warning for istringstream("2 3 -“);

The warning is printed there on the webpage under “stderr.”

Or more safely and lazily:

stream.exceptions( std::ios::failbit | std::ios::badbit );
while (stream) { stream >> n; use(n); }
 
That doesn't seem to work either - same terminating stream state for "3 4 -" and "3 4" - see http://ideone.com/UYKQiS

Ah, I forgot that converting the stream to bool only tests failbit, not eofbit. Also, you need to manually eat the trailing space if it’s going to make a difference to the loop condition.

This is correct:

stream.exceptions( std::ios::failbit | std::ios::badbit );
while (stream.good()) { stream >> n >> std::ws; use(n); }

Yes, iostreams is finicky…

Perhaps more folks would use automatic stream exceptions if they were more friendly. Here are some ideas:

1. Specify or recommend better default error messages for uncaught ios_base::failure exceptions.
2. Add a hook to ios_base::failure to add a user-visible file or stream name, let the standard streams initialize it in an unspecified way.
3. Create a sentry guard so you can set exceptions() at the beginning of a try block, and have it restored when it exits.

Who could argue with better error messages?  Scope guards sound good too - but easily implemented outside the Standard.  For 2 - if the exception's caught in the scope containing the stream then it's not necessary, and if not the stream's already destroyed.

#2 is more useful for proper I/O, not stringstreams. It can help a program generate a somewhat sensible error message for the user: “Expected a number, got ‘z’ in HTTP POST input”.

Point is, if the what() strings were more sensible, there would be less need to process the exceptions at all. Either generically catch after aborting and print exc.what(), or don’t catch and let the program terminate, still getting a reasonable diagnostic log entry.

Jean-Marc Bourguet

unread,
Feb 26, 2015, 3:28:59 AM2/26/15
to std-pr...@isocpp.org, antho...@gmail.com
Le jeudi 26 février 2015 08:26:19 UTC+1, antho...@gmail.com a écrit :


On Thursday, February 26, 2015 at 2:18:13 PM UTC+9, David Krauss wrote:

On 2015–02–26, at 12:29 PM, antho...@gmail.com wrote:

Hello,

This is a "Float the idea" post per the https://isocpp.org/std/submit-a-proposal steps, though my idea may break old code so I'm currently just seeking to see if I'm missing something, there's some existing proposal in this space, interest levels, and hopefully a better idea!

The task: read numbers from a stream, print error message if there's bad input

Commonly recommended approach:

    int n;
    while (stream >> n)
        ;
    if (!stream && !stream.eof())
        std::cerr << "choked on bad input\n”;

By whom? The most common recommendation I recall on this topic is, “eof() probably isn’t what you want.”

To check for a failure due to bad input or parsing, use fail(). (And to check for an I/O error, use bad().) To check that the stream is emptied without error in general, use rdstate() == std::ios::eofbit.
 
The point is that fail() is set after stream >> n hits eof, without there being "bad"/un-parsable input.

You tried to input something, that input failed.  There is indeed no difference between failing due to missing data and bad data with >> if you automatically skip over the whitespace. If you want to do so, you'll have to skip the white space manually (and disable the possibility to skip automatically, ws is using the sentry which set the error)

    stream >> std::noskipws;
    while (stream >> std::ws && !stream.eof() && stream >> n) {
        std::cout << "got " << n << '\n';
    }
    if (stream.bad()) std::cout << "IO error\n";
    if (stream.fail()) std::cout << "Bad format\n";

Usually, the distinction doesn't worth the pain (maybe because I share with the IOStream designers an Unix bias, Unix considers `\n` as line terminator so non terminated last lines are oddities which can be ignored when IOStream formatted input is an adequate mechanism). When I use formatted IOstream input, I use this kind of loop:

    while (stream >> n) {
        std::cout << "got " << n << '\n';
    }
    if (stream.bad()) std::cout << "IO error\n";
    if (!stream.eof()) std::cout << "Bad format\n";

 
Failing case: stream contains say "-", at least on the GCC compiler at ideone.com the stream >> n operation sets both failbit and eofbit, as '-' is consumed as a potentially valid part of a number that never materialises.

Unless you recognize the failure and seek the stream back before the dash (or somehow unget it), the condition is irrecoverable.

Exactly.  My point is: isn't that undesirable?  Shouldn't the library make it easy to see why the loop terminated?

If you go to the point where error recovery is needed, IOStream formatted input is probably no more an adequate solution.


Try this:

while (stream) { stream >> n; use(n); }

That's wrong: never use the result of an IO operation without being sure that the IO operation succeeded.

Yours,

-- 
Jean-Marc

David Krauss

unread,
Feb 26, 2015, 3:34:30 AM2/26/15
to std-pr...@isocpp.org
On 2015–02–26, at 4:28 PM, Jean-Marc Bourguet <jm.bo...@gmail.com> wrote:

while (stream) { stream >> n; use(n); }

That's wrong: never use the result of an IO operation without being sure that the IO operation succeeded.

Yikes, sorry, missed that when editing the example.

Failure conditions are much better handled as exceptions IMHO.

Nevin Liber

unread,
Feb 26, 2015, 3:36:57 AM2/26/15
to std-pr...@isocpp.org
On 26 February 2015 at 09:34, David Krauss <pot...@gmail.com> wrote:
Failure conditions are much better handled as exceptions IMHO.

What about logging from destructors? 
--
 Nevin ":-)" Liber  <mailto:ne...@eviloverlord.com(847) 691-1404

Jean-Marc Bourguet

unread,
Feb 26, 2015, 3:46:59 AM2/26/15
to std-pr...@isocpp.org
Le jeudi 26 février 2015 09:19:29 UTC+1, David Krauss a écrit :

This is correct:

Well ;-) 


stream.exceptions( std::ios::failbit | std::ios::badbit );
while (stream.good()) { stream >> n >> std::ws; use(n); }

Yes, iostreams is finicky…

Tricky indeed.  You have to remove the possibility to skip whitespace automatically as std::ws is using the sentry which will first skip and then set the failbit at eof and thus throw an exception before ws skip without setting failbit.

stream >> std::noskipws;
stream.exceptions( std::ios::failbit | std::ios::badbit );
while (stream >> std::ws >> n) { use(n); }

seems good.

-- 
Jean-Marc

Jean-Marc Bourguet

unread,
Feb 26, 2015, 3:49:39 AM2/26/15
to std-pr...@isocpp.org


Le jeudi 26 février 2015 09:36:57 UTC+1, Nevin ":-)" Liber a écrit :
On 26 February 2015 at 09:34, David Krauss <pot...@gmail.com> wrote:
Failure conditions are much better handled as exceptions IMHO.

What about logging from destructors? 


Wrap the destructor in a try {} catch block? (Logging rarely ask for input parsing BTW, and the error condition for output are even better handled by exception than those for input, they are order of magnitude rarer and more difficult to test).

Yours,

-- 
Jean-Marc

David Krauss

unread,
Feb 26, 2015, 3:54:05 AM2/26/15
to std-pr...@isocpp.org
On 2015–02–26, at 4:36 PM, Nevin Liber <ne...@eviloverlord.com> wrote:

On 26 February 2015 at 09:34, David Krauss <pot...@gmail.com> wrote:
Failure conditions are much better handled as exceptions IMHO.

What about logging from destructors? 

What about it? If you mean that it’s dangerous for std::clog << foo to throw, then just don’t set clog.exceptions!

On the other hand, if you have a transaction object which wishes to log something from its destructor, then it might be easier to reliably detect the failure during unwinding, as opposed to upon exiting the non-exceptional scope with a neutered stream and a bunch of uninitialized variables.

Nevin Liber

unread,
Feb 26, 2015, 4:04:57 AM2/26/15
to std-pr...@isocpp.org
It is a truly horrible idea to encourage people to use exceptions in libraries that are likely to be called from destructors.  Wrapping everything in a destructor in a try..catch block is a usability nightmare.  And making error handling inconsistent input vs. output for the same library (it is supposed to be one combined library) would be a strictly worse choice.

Might as well just go back to using printf/scanf...

David Krauss

unread,
Feb 26, 2015, 4:06:45 AM2/26/15
to std-pr...@isocpp.org
On 2015–02–26, at 4:46 PM, Jean-Marc Bourguet <jm.bo...@gmail.com> wrote:

Tricky indeed.  You have to remove the possibility to skip whitespace automatically as std::ws is using the sentry which will first skip and then set the failbit at eof and thus throw an exception before ws skip without setting failbit.

stream >> std::noskipws;
stream.exceptions( std::ios::failbit | std::ios::badbit );
while (stream >> std::ws >> n) { use(n); }

seems good.

What if there’s whitespace after the number; is that also failure?

It looks like the manual whitespace handling in this example is only doing the same thing as the sentry would have.

How about:

while ( (stream >> std::ws).good() && stream >> n ) use( n );

This loop will exit upon EOF, with stream == true, or failure, with stream == false.

Or, how about not using iostreams where tolerance and precise distinctions are needed ;P .

Jean-Marc Bourguet

unread,
Feb 26, 2015, 4:13:38 AM2/26/15
to std-pr...@isocpp.org
Le jeudi 26 février 2015 10:06:45 UTC+1, David Krauss a écrit :
How about:

while ( (stream >> std::ws).good() && stream >> n ) use( n );

This loop will exit upon EOF, with stream == true, or failure, with stream == false.

Or, how about not using iostreams where tolerance and precise distinctions are needed ;P .
 
We agree for sure on that.

-- 
Jean-Marc

David Krauss

unread,
Feb 26, 2015, 4:16:48 AM2/26/15
to std-pr...@isocpp.org
On 2015–02–26, at 5:04 PM, Nevin Liber <ne...@eviloverlord.com> wrote:

It is a truly horrible idea to encourage people to use exceptions in libraries that are likely to be called from destructors.

What are you saying? Not to use iostreams from destructors? Not to throw exceptions on failure conditions?

In custom stream inserters and extractors, just use stream.clear(std::ios::failbit), which throws according to user preferences. I’m not suggesting for users to invent or throw their own iostreams exceptions. Extensibility within ios::failure would be nice though.

Wrapping everything in a destructor in a try..catch block is a usability nightmare.  And making error handling inconsistent input vs. output for the same library (it is supposed to be one combined library) would be a strictly worse choice.

It’s not a library-wide decision. You choose which streams throw and which don’t.

antho...@gmail.com

unread,
Feb 26, 2015, 4:31:48 AM2/26/15
to std-pr...@isocpp.org, antho...@gmail.com


On Thursday, February 26, 2015 at 5:28:59 PM UTC+9, Jean-Marc Bourguet wrote:
Le jeudi 26 février 2015 08:26:19 UTC+1, antho...@gmail.com a écrit :


On Thursday, February 26, 2015 at 2:18:13 PM UTC+9, David Krauss wrote:

On 2015–02–26, at 12:29 PM, antho...@gmail.com wrote:

Hello,

This is a "Float the idea" post per the https://isocpp.org/std/submit-a-proposal steps, though my idea may break old code so I'm currently just seeking to see if I'm missing something, there's some existing proposal in this space, interest levels, and hopefully a better idea!

The task: read numbers from a stream, print error message if there's bad input

Commonly recommended approach:

    int n;
    while (stream >> n)
        ;
    if (!stream && !stream.eof())
        std::cerr << "choked on bad input\n”;

By whom? The most common recommendation I recall on this topic is, “eof() probably isn’t what you want.”

To check for a failure due to bad input or parsing, use fail(). (And to check for an I/O error, use bad().) To check that the stream is emptied without error in general, use rdstate() == std::ios::eofbit.
 
The point is that fail() is set after stream >> n hits eof, without there being "bad"/un-parsable input.

You tried to input something, that input failed.  There is indeed no difference between failing due to missing data and bad data with >> if you automatically skip over the whitespace. If you want to do so, you'll have to skip the white space manually (and disable the possibility to skip automatically, ws is using the sentry which set the error)

    stream >> std::noskipws;
    while (stream >> std::ws && !stream.eof() && stream >> n) {
        std::cout << "got " << n << '\n';
    }
    if (stream.bad()) std::cout << "IO error\n";
    if (stream.fail()) std::cout << "Bad format\n";

It's the need for such code that precipitated my post.  I don't think C++ newbies should be expected to write such code in order to do something as simple as parse numbers from a stream robustly.
 
Usually, the distinction doesn't worth the pain (maybe because I share with the IOStream designers an Unix bias, Unix considers `\n` as line terminator so non terminated last lines are oddities which can be ignored when IOStream formatted input is an adequate mechanism).

Same background here, but if istreams were necessarily files (and everyone used sane editors) it would be less of a concern, but std::istringstreams initialised from string literals, network input etc. are quite likely not to be whitespace terminated.
 
When I use formatted IOstream input, I use this kind of loop:

    while (stream >> n) {
        std::cout << "got " << n << '\n';
    }
    if (stream.bad()) std::cout << "IO error\n";
    if (!stream.eof()) std::cout << "Bad format\n";

Yup - I've tended to settle for that too - would be nice if it handled this "bad input at end" corner-case.
 
Failing case: stream contains say "-", at least on the GCC compiler at ideone.com the stream >> n operation sets both failbit and eofbit, as '-' is consumed as a potentially valid part of a number that never materialises.

Unless you recognize the failure and seek the stream back before the dash (or somehow unget it), the condition is irrecoverable.

Exactly.  My point is: isn't that undesirable?  Shouldn't the library make it easy to see why the loop terminated?

If you go to the point where error recovery is needed, IOStream formatted input is probably no more an adequate solution.

Currently yes, but could a proposal like this help rectify the situation?
 

Try this:

while (stream) { stream >> n; use(n); }

That's wrong: never use the result of an IO operation without being sure that the IO operation succeeded.

Yours,

-- 
Jean-Marc

Cheers,
Tony

antho...@gmail.com

unread,
Feb 26, 2015, 4:44:13 AM2/26/15
to std-pr...@isocpp.org


On Thursday, February 26, 2015 at 5:19:29 PM UTC+9, David Krauss wrote:

On 2015–02–26, at 3:26 PM, antho...@gmail.com wrote:

The point is that fail() is set after stream >> n hits eof, without there being "bad"/un-parsable input.

There was bad input, the dash character. And your proposal is that eofbit sometimes not be set at the EOF, not that failbit not be set after a conversion failure.

What I mean by [[ "The point is that fail() is set after stream >> n hits eof, without there being "bad"/un-parsable input." ]] is that even when there's a simple eof condition (no longer talking about a hyphen), finding it while attempting stream >> n sets both failbit and eofbit... after >> it's too late to differentiate.
 
If you’re concerned with conversion failures and not with eof, check failbit/fail() and not eofbit/eof().

We seem to be going round in circles.  As above, that doesn't work because both bits are set for both an eof and a bad-input condition, given a loop controlled by stream >> n.
In a loop, checking eof() is especially not to be encouraged.

 Agreed - it's in desperation that coders are forced back to this ugly and error-prone mess.  Hence my original post.

There are right answers, albeit buried among wrong answers. That’s the way iostreams is, but this proposal seems to be shaking things up without making correct code easier overall.

How is changing standard behaviour to make the following "work" as might be naively expected not easier overall?

    while (stream >> n)
          ;
    if (stream.fail() && !stream.eof())
        ...report bad input...
 
Again, though, eof() is seldom the right solution.

I'm not suggesting eof() be used... my proposal is to remove the need to use it to handle this situation.
The robustness improvement comes from differentiating a stream of good input from a stream terminated by bad (likely truncated in the case of "-") input.  It's not about being able to get other input from the stream afterwards; rather, one might print an error and terminate until the user fixes the input instead of presenting results calculated from bogus/incomplete input.

Streams with bad input are differentiated by failbit. Period.

Again, the problem is failbit also happens when you try a final stream >> n when there's nothing (but possibly whitespace) left in the stream.
    while (!stream.eof()) { stream >> n; use(n); }

This is exactly what you should never do.

That's not a particularly useful perspective.  I've presented it only to illustrate currently working (but extremely fragile) code that would be broken by this proposal.  Sure we can deride anyone silly enough to write code like this (I certainly would in a code review), but it's still important for me to acknowledge and illustrate how the proposal is a breaking change.

Okay, but it looks similar to your first example, in that eof is being used as a proxy for something else.

In this case, any conversion error leads to an infinite loop, which is extra bad.

It's hideous, but if the programmer trusts their inputs (e.g. they've generated them elsewhere in the program), they may not care.  I'd still be breaking working core irrespective of how misguided writing that code was.
Tried here: http://ideone.com/BpKisk 

Result: no warning for istringstream("2 3 -“);

The warning is printed there on the webpage under “stderr.”

Oh yup - sorry - every month or three that website throws me doing that.  Again, the problem is that the same message is reported for an end-of-file condition - as demonstrated at http://ideone.com/E1ajY6
Or more safely and lazily:

stream.exceptions( std::ios::failbit | std::ios::badbit );
while (stream) { stream >> n; use(n); }
 
That doesn't seem to work either - same terminating stream state for "3 4 -" and "3 4" - see http://ideone.com/UYKQiS

Ah, I forgot that converting the stream to bool only tests failbit, not eofbit. Also, you need to manually eat the trailing space if it’s going to make a difference to the loop condition.

This is correct:

stream.exceptions( std::ios::failbit | std::ios::badbit );
while (stream.good()) { stream >> n >> std::ws; use(n); }

Yes, iostreams is finicky…

Agreed, so - can it be less so with minimal changes - like those proposed?
Perhaps more folks would use automatic stream exceptions if they were more friendly. Here are some ideas:

1. Specify or recommend better default error messages for uncaught ios_base::failure exceptions.
2. Add a hook to ios_base::failure to add a user-visible file or stream name, let the standard streams initialize it in an unspecified way.
3. Create a sentry guard so you can set exceptions() at the beginning of a try block, and have it restored when it exits.

Who could argue with better error messages?  Scope guards sound good too - but easily implemented outside the Standard.  For 2 - if the exception's caught in the scope containing the stream then it's not necessary, and if not the stream's already destroyed.

#2 is more useful for proper I/O, not stringstreams. It can help a program generate a somewhat sensible error message for the user: “Expected a number, got ‘z’ in HTTP POST input”.

Point is, if the what() strings were more sensible, there would be less need to process the exceptions at all. Either generically catch after aborting and print exc.what(), or don’t catch and let the program terminate, still getting a reasonable diagnostic log entry.

Definitely in favour of whatever anyone can do to get better error messages, though there's the risk of making the exception-throwing stream state less performant too... lots to balance out.

David Krauss

unread,
Feb 26, 2015, 5:34:23 AM2/26/15
to std-pr...@isocpp.org
On 2015–02–26, at 5:44 PM, antho...@gmail.com wrote:



On Thursday, February 26, 2015 at 5:19:29 PM UTC+9, David Krauss wrote:

On 2015–02–26, at 3:26 PM, antho...@gmail.com wrote:

The point is that fail() is set after stream >> n hits eof, without there being "bad"/un-parsable input.

There was bad input, the dash character. And your proposal is that eofbit sometimes not be set at the EOF, not that failbit not be set after a conversion failure.

What I mean by [[ "The point is that fail() is set after stream >> n hits eof, without there being "bad"/un-parsable input." ]] is that even when there's a simple eof condition (no longer talking about a hyphen), finding it while attempting stream >> n sets both failbit and eofbit... after >> it's too late to differentiate.

It sets failbit because n remains uninitialized. It sets eofbit because the stream is at eof.

You’re essentially trying to specialize stream >> n for loop conditions. Although the proposal “fixes” the given case, the semantics become inconsistent and confusing for other cases. Usually eof is set whenever the end of the stream is touched. If exceptions(std::ios::eofbit) is set, then the extractor might even lose control before observing the eof condition. Do you propose that be worked around?

The solution is to use a more sophisticated loop.

I'm not suggesting eof() be used... my proposal is to remove the need to use it to handle this situation. 

Your proposal is that eof shouldn’t be set when failbit is set, when the extractor already extracted a non-whitespace character. Given that, eof still needs to be called to differentiate failure due to no input at all (the “3 4” case, where you still expect eof to be set), from failure due to unexpected input at the end (the “3 4 -” case, where the state of eof is surprising to you).

Streams with bad input are differentiated by failbit. Period.

Again, the problem is failbit also happens when you try a final stream >> n when there's nothing (but possibly whitespace) left in the stream. 

There’s no way that failbit won’t be set by that operation: n remains uninitialized, so it’s a failure due to bad input.

The grammar expressed by while ( stream >> n ) is impossible to satisfy in finite time. If there’s no other exit condition, it must end in failure.

You can try making your proposal. I’m only one person here, and not a committee member. But my sensibility would be to fix the grammar and the loop, not the library. “Most users” just want to avoid endless loops and UB, so your use case doesn’t sound as representative as it’s presented to be.

Definitely in favour of whatever anyone can do to get better error messages, though there's the risk of making the exception-throwing stream state less performant too... lots to balance out. 

The best way to optimize the unexceptional state is to throw exceptions and not waste time testing error flags. As for iostreams, it does the error flag checks whether you enable exceptions or not, so it’s permanently pessimized.

I don’t want to suggest anything to make any streams more heavyweight, and some Unix fstream implementations indeed can’t recover a filename from an open stream. On the other hand, some (Linux at least) can, at least to an approximation that would be sufficient for an error message. In any case, hopefully a hook could be added without too much ABI disturbance. After all, error codes were just recently added.

antho...@gmail.com

unread,
Feb 26, 2015, 11:21:40 AM2/26/15
to std-pr...@isocpp.org


On Thursday, February 26, 2015 at 7:34:23 PM UTC+9, David Krauss wrote:

On 2015–02–26, at 5:44 PM, antho...@gmail.com wrote:



On Thursday, February 26, 2015 at 5:19:29 PM UTC+9, David Krauss wrote:

On 2015–02–26, at 3:26 PM, antho...@gmail.com wrote:

The point is that fail() is set after stream >> n hits eof, without there being "bad"/un-parsable input.

There was bad input, the dash character. And your proposal is that eofbit sometimes not be set at the EOF, not that failbit not be set after a conversion failure.

What I mean by [[ "The point is that fail() is set after stream >> n hits eof, without there being "bad"/un-parsable input." ]] is that even when there's a simple eof condition (no longer talking about a hyphen), finding it while attempting stream >> n sets both failbit and eofbit... after >> it's too late to differentiate.

It sets failbit because n remains uninitialized. It sets eofbit because the stream is at eof.

You’re essentially trying to specialize stream >> n for loop conditions. Although the proposal “fixes” the given case, the semantics become inconsistent and confusing for other cases. Usually eof is set whenever the end of the stream is touched.

"touched" is vague.  If you simply open an ifstream on an empty file, 27.9.1.9 guarantees not to set eof(), and it's only after an I/O operation on the file is attempted that the fail / eof flags come into play.  Still, overall I agree the breaks with current behaviour aren't acceptable.
 
If exceptions(std::ios::eofbit) is set, then the extractor might even lose control before observing the eof condition. Do you propose that be worked around?

I haven't given any thought to the exception case beyond consideration of your earlier code.  It does sound problematic.
 
The solution is to use a more sophisticated loop.

I'm not suggesting eof() be used... my proposal is to remove the need to use it to handle this situation. 

Your proposal is that eof shouldn’t be set when failbit is set, when the extractor already extracted a non-whitespace character. Given that, eof still needs to be called to differentiate failure due to no input at all (the “3 4” case, where you still expect eof to be set), from failure due to unexpected input at the end (the “3 4 -” case, where the state of eof is surprising to you).

It doesn't surprise me in the least... I'm saying it's inconveniently ambiguous. 
Streams with bad input are differentiated by failbit. Period.

Again, the problem is failbit also happens when you try a final stream >> n when there's nothing (but possibly whitespace) left in the stream. 

There’s no way that failbit won’t be set by that operation: n remains uninitialized, so it’s a failure due to bad input.

No dispuiting that... the proposal was about effectively serialising the reporting of failbit and eofbit.
 
The grammar expressed by while ( stream >> n ) is impossible to satisfy in finite time. If there’s no other exit condition, it must end in failure.

You can try making your proposal. I’m only one person here, and not a committee member. But my sensibility would be to fix the grammar and the loop, not the library. “Most users” just want to avoid endless loops and UB, so your use case doesn’t sound as representative as it’s presented to be.

It's hardly uncommon to want to tell the user when their input can't be parsed.  The proposal for types like int was largely to simplify the implementation needed to detect corrupt or incomplete objects when writing operator>> overloads for ones own types (explained in my original post).
 

All that said, I'm starting to think if anything were to be done about this, it might be better to propose a new flag (e.g. "partial", "incomplete") rather than change the reporting of the current flags.  It's pretty obvious how that would help the simple looping case, and for custom overloads it might require something like...

    return is >> x.a_ >> x.set_partial >> x.b_ >> x.clear_partial;

...to return a proper value for the new flag.

Jean-Marc Bourguet

unread,
Feb 26, 2015, 12:03:29 PM2/26/15
to std-pr...@isocpp.org, antho...@gmail.com
Le jeudi 26 février 2015 10:31:48 UTC+1, antho...@gmail.com a écrit :

You tried to input something, that input failed.  There is indeed no difference between failing due to missing data and bad data with >> if you automatically skip over the whitespace. If you want to do so, you'll have to skip the white space manually (and disable the possibility to skip automatically, ws is using the sentry which set the error)

    stream >> std::noskipws;
    while (stream >> std::ws && !stream.eof() && stream >> n) {
        std::cout << "got " << n << '\n';
    }
    if (stream.bad()) std::cout << "IO error\n";
    if (stream.fail()) std::cout << "Bad format\n";

It's the need for such code that precipitated my post.  I don't think C++ newbies should be expected to write such code in order to do something as simple as parse numbers from a stream robustly.
 
Usually, the distinction doesn't worth the pain (maybe because I share with the IOStream designers an Unix bias, Unix considers `\n` as line terminator so non terminated last lines are oddities which can be ignored when IOStream formatted input is an adequate mechanism).

Same background here, but if istreams were necessarily files (and everyone used sane editors) it would be less of a concern, but std::istringstreams initialised from string literals, network input etc. are quite likely not to be whitespace terminated.

Manually adding a space is easier than the above contortion for stringstreams.

Exactly.  My point is: isn't that undesirable?  Shouldn't the library make it easy to see why the loop terminated?

If you go to the point where error recovery is needed, IOStream formatted input is probably no more an adequate solution.

Currently yes, but could a proposal like this help rectify the situation?

IMHO, it address a minor corner case which doesn't make me drop IOStream.  Robust parsing or even just lexing is something far harder.

Yours,

-- 
Jean-Marc

Jean-Marc Bourguet

unread,
Feb 26, 2015, 12:42:12 PM2/26/15
to std-pr...@isocpp.org, antho...@gmail.com
Le jeudi 26 février 2015 17:21:40 UTC+1, antho...@gmail.com a écrit :


On Thursday, February 26, 2015 at 7:34:23 PM UTC+9, David Krauss wrote:

On 2015–02–26, at 5:44 PM, antho...@gmail.com wrote:



On Thursday, February 26, 2015 at 5:19:29 PM UTC+9, David Krauss wrote:

On 2015–02–26, at 3:26 PM, antho...@gmail.com wrote:

The point is that fail() is set after stream >> n hits eof, without there being "bad"/un-parsable input.

There was bad input, the dash character. And your proposal is that eofbit sometimes not be set at the EOF, not that failbit not be set after a conversion failure.

What I mean by [[ "The point is that fail() is set after stream >> n hits eof, without there being "bad"/un-parsable input." ]] is that even when there's a simple eof condition (no longer talking about a hyphen), finding it while attempting stream >> n sets both failbit and eofbit... after >> it's too late to differentiate.

It sets failbit because n remains uninitialized. It sets eofbit because the stream is at eof.

You’re essentially trying to specialize stream >> n for loop conditions. Although the proposal “fixes” the given case, the semantics become inconsistent and confusing for other cases. Usually eof is set whenever the end of the stream is touched.

"touched" is vague.  If you simply open an ifstream on an empty file, 27.9.1.9 guarantees not to set eof(), and it's only after an I/O operation on the file is attempted that the fail / eof flags come into play.  Still, overall I agree the breaks with current behaviour aren't acceptable.


There’s no way that failbit won’t be set by that operation: n remains uninitialized, so it’s a failure due to bad input.

No dispuiting that... the proposal was about effectively serialising the reporting of failbit and eofbit.

eofbit means "The streambuf has returned EOF and should no more be queried, excepted if the user clear the condition".  I don't really like that unget() clears eof , I've wondered in the past if the state wouldn't have been better if the sentry had check eof && egptr==gptr instead of just eof, and unget avoided clearing eof; that would also probably need some other similar adjustments elsewhere.

All that said, I'm starting to think if anything were to be done about this, it might be better to propose a new flag (e.g. "partial", "incomplete") rather than change the reporting of the current flags.  It's pretty obvious how that would help the simple looping case, and for custom overloads it might require something like...

    return is >> x.a_ >> x.set_partial >> x.b_ >> x.clear_partial;

...to return a proper value for the new flag.

I don't see how you can make it work for implementing >> using other >> without using a scoped object holding the desired state for the desired level.  But then, implementing << using << is also broken if you don't pay attention to width handling.

-- 
Jean-Marc Bourguet

Reply all
Reply to author
Forward
0 new messages