On 2015–02–26, at 12:29 PM, antho...@gmail.com wrote:
Hello,
This is a "Float the idea" post per the https://isocpp.org/std/submit-a-proposal steps, though my idea may break old code so I'm currently just seeking to see if I'm missing something, there's some existing proposal in this space, interest levels, and hopefully a better idea!
The task: read numbers from a stream, print error message if there's bad input
Commonly recommended approach:
int n;
while (stream >> n)
;
if (!stream && !stream.eof())
std::cerr << "choked on bad input\n”;
Failing case: stream contains say "-", at least on the GCC compiler at ideone.com the stream >> n operation sets both failbit and eofbit, as '-' is consumed as a potentially valid part of a number that never materialises.
Available solution: you can use a controlling while (!(stream >> std::ws).eof)
(Doomed) Proposal: the while (stream >> n) approach above could be made more robust if the standard-library-provided overloads set only the failbit in these circumstances, and left eof to be reported only if another input operation is attempted.
while (!stream.eof()) { stream >> n; use(n); }
With the proposal above, the eof() wouldn't become true immediately after streaming the last number.
Perhaps someone will have a better idea. It bothers me that the common, intuitive, expressive and concise approach can fail to notice parsing errors….
On 2015–02–26, at 12:29 PM, antho...@gmail.com wrote:Hello,
This is a "Float the idea" post per the https://isocpp.org/std/submit-a-proposal steps, though my idea may break old code so I'm currently just seeking to see if I'm missing something, there's some existing proposal in this space, interest levels, and hopefully a better idea!
The task: read numbers from a stream, print error message if there's bad input
Commonly recommended approach:
int n;
while (stream >> n)
;
if (!stream && !stream.eof())
std::cerr << "choked on bad input\n”;By whom? The most common recommendation I recall on this topic is, “eof() probably isn’t what you want.”To check for a failure due to bad input or parsing, use fail(). (And to check for an I/O error, use bad().) To check that the stream is emptied without error in general, use rdstate() == std::ios::eofbit.
Failing case: stream contains say "-", at least on the GCC compiler at ideone.com the stream >> n operation sets both failbit and eofbit, as '-' is consumed as a potentially valid part of a number that never materialises.Unless you recognize the failure and seek the stream back before the dash (or somehow unget it), the condition is irrecoverable.
Available solution: you can use a controlling while (!(stream >> std::ws).eof)In a loop, checking eof() is especially not to be encouraged.
(Doomed) Proposal: the while (stream >> n) approach above could be made more robust if the standard-library-provided overloads set only the failbit in these circumstances, and left eof to be reported only if another input operation is attempted.Perhaps iostreams would be more robust if failed operations didn’t consume characters. It’s not particularly robust, or fast. Maybe it can be replaced some day with something better.But what robustness is added by clearing eof when no input remains? Any attempt to recover and re-read will just set eof() again.
while (!stream.eof()) { stream >> n; use(n); }This is exactly what you should never do.
With the proposal above, the eof() wouldn't become true immediately after streaming the last number.
Perhaps someone will have a better idea. It bothers me that the common, intuitive, expressive and concise approach can fail to notice parsing errors….Try this:while (stream) { stream >> n; use(n); }if (stream.fail()) std::cerr << "choked on bad input\n";
if (stream.bad()) std::cerr << "could not read the input\n";
Or more safely and lazily:stream.exceptions( std::ios::failbit | std::ios::badbit );while (stream) { stream >> n; use(n); }
Perhaps more folks would use automatic stream exceptions if they were more friendly. Here are some ideas:1. Specify or recommend better default error messages for uncaught ios_base::failure exceptions.2. Add a hook to ios_base::failure to add a user-visible file or stream name, let the standard streams initialize it in an unspecified way.3. Create a sentry guard so you can set exceptions() at the beginning of a try block, and have it restored when it exits.
On 2015–02–26, at 3:26 PM, antho...@gmail.com wrote:The point is that fail() is set after stream >> n hits eof, without there being "bad"/un-parsable input.
In a loop, checking eof() is especially not to be encouraged.
Agreed - it's in desperation that coders are forced back to this ugly and error-prone mess. Hence my original post.
The robustness improvement comes from differentiating a stream of good input from a stream terminated by bad (likely truncated in the case of "-") input. It's not about being able to get other input from the stream afterwards; rather, one might print an error and terminate until the user fixes the input instead of presenting results calculated from bogus/incomplete input.
while (!stream.eof()) { stream >> n; use(n); }This is exactly what you should never do.
That's not a particularly useful perspective. I've presented it only to illustrate currently working (but extremely fragile) code that would be broken by this proposal. Sure we can deride anyone silly enough to write code like this (I certainly would in a code review), but it's still important for me to acknowledge and illustrate how the proposal is a breaking change.
Or more safely and lazily:stream.exceptions( std::ios::failbit | std::ios::badbit );while (stream) { stream >> n; use(n); }That doesn't seem to work either - same terminating stream state for "3 4 -" and "3 4" - see http://ideone.com/UYKQiS
Perhaps more folks would use automatic stream exceptions if they were more friendly. Here are some ideas:1. Specify or recommend better default error messages for uncaught ios_base::failure exceptions.2. Add a hook to ios_base::failure to add a user-visible file or stream name, let the standard streams initialize it in an unspecified way.3. Create a sentry guard so you can set exceptions() at the beginning of a try block, and have it restored when it exits.
Who could argue with better error messages? Scope guards sound good too - but easily implemented outside the Standard. For 2 - if the exception's caught in the scope containing the stream then it's not necessary, and if not the stream's already destroyed.
On Thursday, February 26, 2015 at 2:18:13 PM UTC+9, David Krauss wrote:On 2015–02–26, at 12:29 PM, antho...@gmail.com wrote:Hello,
This is a "Float the idea" post per the https://isocpp.org/std/submit-a-proposal steps, though my idea may break old code so I'm currently just seeking to see if I'm missing something, there's some existing proposal in this space, interest levels, and hopefully a better idea!
The task: read numbers from a stream, print error message if there's bad input
Commonly recommended approach:
int n;
while (stream >> n)
;
if (!stream && !stream.eof())
std::cerr << "choked on bad input\n”;By whom? The most common recommendation I recall on this topic is, “eof() probably isn’t what you want.”To check for a failure due to bad input or parsing, use fail(). (And to check for an I/O error, use bad().) To check that the stream is emptied without error in general, use rdstate() == std::ios::eofbit.The point is that fail() is set after stream >> n hits eof, without there being "bad"/un-parsable input.
Failing case: stream contains say "-", at least on the GCC compiler at ideone.com the stream >> n operation sets both failbit and eofbit, as '-' is consumed as a potentially valid part of a number that never materialises.Unless you recognize the failure and seek the stream back before the dash (or somehow unget it), the condition is irrecoverable.
Exactly. My point is: isn't that undesirable? Shouldn't the library make it easy to see why the loop terminated?
Try this:while (stream) { stream >> n; use(n); }
On 2015–02–26, at 4:28 PM, Jean-Marc Bourguet <jm.bo...@gmail.com> wrote:while (stream) { stream >> n; use(n); }That's wrong: never use the result of an IO operation without being sure that the IO operation succeeded.
Failure conditions are much better handled as exceptions IMHO.
This is correct:
stream.exceptions( std::ios::failbit | std::ios::badbit );while (stream.good()) { stream >> n >> std::ws; use(n); }Yes, iostreams is finicky…
On 26 February 2015 at 09:34, David Krauss <pot...@gmail.com> wrote:Failure conditions are much better handled as exceptions IMHO.What about logging from destructors?
On 2015–02–26, at 4:36 PM, Nevin Liber <ne...@eviloverlord.com> wrote:On 26 February 2015 at 09:34, David Krauss <pot...@gmail.com> wrote:Failure conditions are much better handled as exceptions IMHO.What about logging from destructors?
On 2015–02–26, at 4:46 PM, Jean-Marc Bourguet <jm.bo...@gmail.com> wrote:Tricky indeed. You have to remove the possibility to skip whitespace automatically as std::ws is using the sentry which will first skip and then set the failbit at eof and thus throw an exception before ws skip without setting failbit.stream >> std::noskipws;stream.exceptions( std::ios::failbit | std::ios::badbit );while (stream >> std::ws >> n) { use(n); }seems good.
How about:
while ( (stream >> std::ws).good() && stream >> n ) use( n );This loop will exit upon EOF, with stream == true, or failure, with stream == false.Or, how about not using iostreams where tolerance and precise distinctions are needed ;P .
On 2015–02–26, at 5:04 PM, Nevin Liber <ne...@eviloverlord.com> wrote:It is a truly horrible idea to encourage people to use exceptions in libraries that are likely to be called from destructors.
Wrapping everything in a destructor in a try..catch block is a usability nightmare. And making error handling inconsistent input vs. output for the same library (it is supposed to be one combined library) would be a strictly worse choice.
Le jeudi 26 février 2015 08:26:19 UTC+1, antho...@gmail.com a écrit :
On Thursday, February 26, 2015 at 2:18:13 PM UTC+9, David Krauss wrote:On 2015–02–26, at 12:29 PM, antho...@gmail.com wrote:Hello,
This is a "Float the idea" post per the https://isocpp.org/std/submit-a-proposal steps, though my idea may break old code so I'm currently just seeking to see if I'm missing something, there's some existing proposal in this space, interest levels, and hopefully a better idea!
The task: read numbers from a stream, print error message if there's bad input
Commonly recommended approach:
int n;
while (stream >> n)
;
if (!stream && !stream.eof())
std::cerr << "choked on bad input\n”;By whom? The most common recommendation I recall on this topic is, “eof() probably isn’t what you want.”To check for a failure due to bad input or parsing, use fail(). (And to check for an I/O error, use bad().) To check that the stream is emptied without error in general, use rdstate() == std::ios::eofbit.The point is that fail() is set after stream >> n hits eof, without there being "bad"/un-parsable input.You tried to input something, that input failed. There is indeed no difference between failing due to missing data and bad data with >> if you automatically skip over the whitespace. If you want to do so, you'll have to skip the white space manually (and disable the possibility to skip automatically, ws is using the sentry which set the error)stream >> std::noskipws;while (stream >> std::ws && !stream.eof() && stream >> n) {std::cout << "got " << n << '\n';}if (stream.bad()) std::cout << "IO error\n";if (stream.fail()) std::cout << "Bad format\n";
Usually, the distinction doesn't worth the pain (maybe because I share with the IOStream designers an Unix bias, Unix considers `\n` as line terminator so non terminated last lines are oddities which can be ignored when IOStream formatted input is an adequate mechanism).
When I use formatted IOstream input, I use this kind of loop:while (stream >> n) {std::cout << "got " << n << '\n';}if (stream.bad()) std::cout << "IO error\n";if (!stream.eof()) std::cout << "Bad format\n";
Failing case: stream contains say "-", at least on the GCC compiler at ideone.com the stream >> n operation sets both failbit and eofbit, as '-' is consumed as a potentially valid part of a number that never materialises.Unless you recognize the failure and seek the stream back before the dash (or somehow unget it), the condition is irrecoverable.
Exactly. My point is: isn't that undesirable? Shouldn't the library make it easy to see why the loop terminated?If you go to the point where error recovery is needed, IOStream formatted input is probably no more an adequate solution.
Try this:while (stream) { stream >> n; use(n); }That's wrong: never use the result of an IO operation without being sure that the IO operation succeeded.Yours,--Jean-Marc
On 2015–02–26, at 3:26 PM, antho...@gmail.com wrote:The point is that fail() is set after stream >> n hits eof, without there being "bad"/un-parsable input.There was bad input, the dash character. And your proposal is that eofbit sometimes not be set at the EOF, not that failbit not be set after a conversion failure.
If you’re concerned with conversion failures and not with eof, check failbit/fail() and not eofbit/eof().
In a loop, checking eof() is especially not to be encouraged.
Agreed - it's in desperation that coders are forced back to this ugly and error-prone mess. Hence my original post.There are right answers, albeit buried among wrong answers. That’s the way iostreams is, but this proposal seems to be shaking things up without making correct code easier overall.
Again, though, eof() is seldom the right solution.
The robustness improvement comes from differentiating a stream of good input from a stream terminated by bad (likely truncated in the case of "-") input. It's not about being able to get other input from the stream afterwards; rather, one might print an error and terminate until the user fixes the input instead of presenting results calculated from bogus/incomplete input.Streams with bad input are differentiated by failbit. Period.
while (!stream.eof()) { stream >> n; use(n); }This is exactly what you should never do.
That's not a particularly useful perspective. I've presented it only to illustrate currently working (but extremely fragile) code that would be broken by this proposal. Sure we can deride anyone silly enough to write code like this (I certainly would in a code review), but it's still important for me to acknowledge and illustrate how the proposal is a breaking change.Okay, but it looks similar to your first example, in that eof is being used as a proxy for something else.In this case, any conversion error leads to an infinite loop, which is extra bad.
The warning is printed there on the webpage under “stderr.”
Or more safely and lazily:stream.exceptions( std::ios::failbit | std::ios::badbit );while (stream) { stream >> n; use(n); }That doesn't seem to work either - same terminating stream state for "3 4 -" and "3 4" - see http://ideone.com/UYKQiSAh, I forgot that converting the stream to bool only tests failbit, not eofbit. Also, you need to manually eat the trailing space if it’s going to make a difference to the loop condition.This is correct:stream.exceptions( std::ios::failbit | std::ios::badbit );while (stream.good()) { stream >> n >> std::ws; use(n); }Yes, iostreams is finicky…
Perhaps more folks would use automatic stream exceptions if they were more friendly. Here are some ideas:1. Specify or recommend better default error messages for uncaught ios_base::failure exceptions.2. Add a hook to ios_base::failure to add a user-visible file or stream name, let the standard streams initialize it in an unspecified way.3. Create a sentry guard so you can set exceptions() at the beginning of a try block, and have it restored when it exits.
Who could argue with better error messages? Scope guards sound good too - but easily implemented outside the Standard. For 2 - if the exception's caught in the scope containing the stream then it's not necessary, and if not the stream's already destroyed.#2 is more useful for proper I/O, not stringstreams. It can help a program generate a somewhat sensible error message for the user: “Expected a number, got ‘z’ in HTTP POST input”.Point is, if the what() strings were more sensible, there would be less need to process the exceptions at all. Either generically catch after aborting and print exc.what(), or don’t catch and let the program terminate, still getting a reasonable diagnostic log entry.
On 2015–02–26, at 5:44 PM, antho...@gmail.com wrote:
On Thursday, February 26, 2015 at 5:19:29 PM UTC+9, David Krauss wrote:The point is that fail() is set after stream >> n hits eof, without there being "bad"/un-parsable input.There was bad input, the dash character. And your proposal is that eofbit sometimes not be set at the EOF, not that failbit not be set after a conversion failure.
What I mean by [[ "The point is that fail() is set after stream >> n hits eof, without there being "bad"/un-parsable input." ]] is that even when there's a simple eof condition (no longer talking about a hyphen), finding it while attempting stream >> n sets both failbit and eofbit... after >> it's too late to differentiate.
I'm not suggesting eof() be used... my proposal is to remove the need to use it to handle this situation.
Streams with bad input are differentiated by failbit. Period.
Again, the problem is failbit also happens when you try a final stream >> n when there's nothing (but possibly whitespace) left in the stream.
Definitely in favour of whatever anyone can do to get better error messages, though there's the risk of making the exception-throwing stream state less performant too... lots to balance out.
On 2015–02–26, at 5:44 PM, antho...@gmail.com wrote:
On Thursday, February 26, 2015 at 5:19:29 PM UTC+9, David Krauss wrote:The point is that fail() is set after stream >> n hits eof, without there being "bad"/un-parsable input.There was bad input, the dash character. And your proposal is that eofbit sometimes not be set at the EOF, not that failbit not be set after a conversion failure.
What I mean by [[ "The point is that fail() is set after stream >> n hits eof, without there being "bad"/un-parsable input." ]] is that even when there's a simple eof condition (no longer talking about a hyphen), finding it while attempting stream >> n sets both failbit and eofbit... after >> it's too late to differentiate.It sets failbit because n remains uninitialized. It sets eofbit because the stream is at eof.You’re essentially trying to specialize stream >> n for loop conditions. Although the proposal “fixes” the given case, the semantics become inconsistent and confusing for other cases. Usually eof is set whenever the end of the stream is touched.
If exceptions(std::ios::eofbit) is set, then the extractor might even lose control before observing the eof condition. Do you propose that be worked around?
The solution is to use a more sophisticated loop.I'm not suggesting eof() be used... my proposal is to remove the need to use it to handle this situation.Your proposal is that eof shouldn’t be set when failbit is set, when the extractor already extracted a non-whitespace character. Given that, eof still needs to be called to differentiate failure due to no input at all (the “3 4” case, where you still expect eof to be set), from failure due to unexpected input at the end (the “3 4 -” case, where the state of eof is surprising to you).
Streams with bad input are differentiated by failbit. Period.
Again, the problem is failbit also happens when you try a final stream >> n when there's nothing (but possibly whitespace) left in the stream.There’s no way that failbit won’t be set by that operation: n remains uninitialized, so it’s a failure due to bad input.
The grammar expressed by while ( stream >> n ) is impossible to satisfy in finite time. If there’s no other exit condition, it must end in failure.You can try making your proposal. I’m only one person here, and not a committee member. But my sensibility would be to fix the grammar and the loop, not the library. “Most users” just want to avoid endless loops and UB, so your use case doesn’t sound as representative as it’s presented to be.
You tried to input something, that input failed. There is indeed no difference between failing due to missing data and bad data with >> if you automatically skip over the whitespace. If you want to do so, you'll have to skip the white space manually (and disable the possibility to skip automatically, ws is using the sentry which set the error)stream >> std::noskipws;while (stream >> std::ws && !stream.eof() && stream >> n) {std::cout << "got " << n << '\n';}if (stream.bad()) std::cout << "IO error\n";if (stream.fail()) std::cout << "Bad format\n";
It's the need for such code that precipitated my post. I don't think C++ newbies should be expected to write such code in order to do something as simple as parse numbers from a stream robustly.
Usually, the distinction doesn't worth the pain (maybe because I share with the IOStream designers an Unix bias, Unix considers `\n` as line terminator so non terminated last lines are oddities which can be ignored when IOStream formatted input is an adequate mechanism).
Same background here, but if istreams were necessarily files (and everyone used sane editors) it would be less of a concern, but std::istringstreams initialised from string literals, network input etc. are quite likely not to be whitespace terminated.
Exactly. My point is: isn't that undesirable? Shouldn't the library make it easy to see why the loop terminated?If you go to the point where error recovery is needed, IOStream formatted input is probably no more an adequate solution.
Currently yes, but could a proposal like this help rectify the situation?
On Thursday, February 26, 2015 at 7:34:23 PM UTC+9, David Krauss wrote:On 2015–02–26, at 5:44 PM, antho...@gmail.com wrote:
On Thursday, February 26, 2015 at 5:19:29 PM UTC+9, David Krauss wrote:The point is that fail() is set after stream >> n hits eof, without there being "bad"/un-parsable input.There was bad input, the dash character. And your proposal is that eofbit sometimes not be set at the EOF, not that failbit not be set after a conversion failure.
What I mean by [[ "The point is that fail() is set after stream >> n hits eof, without there being "bad"/un-parsable input." ]] is that even when there's a simple eof condition (no longer talking about a hyphen), finding it while attempting stream >> n sets both failbit and eofbit... after >> it's too late to differentiate.It sets failbit because n remains uninitialized. It sets eofbit because the stream is at eof.You’re essentially trying to specialize stream >> n for loop conditions. Although the proposal “fixes” the given case, the semantics become inconsistent and confusing for other cases. Usually eof is set whenever the end of the stream is touched."touched" is vague. If you simply open an ifstream on an empty file, 27.9.1.9 guarantees not to set eof(), and it's only after an I/O operation on the file is attempted that the fail / eof flags come into play. Still, overall I agree the breaks with current behaviour aren't acceptable.
There’s no way that failbit won’t be set by that operation: n remains uninitialized, so it’s a failure due to bad input.No dispuiting that... the proposal was about effectively serialising the reporting of failbit and eofbit.
All that said, I'm starting to think if anything were to be done about this, it might be better to propose a new flag (e.g. "partial", "incomplete") rather than change the reporting of the current flags. It's pretty obvious how that would help the simple looping case, and for custom overloads it might require something like...return is >> x.a_ >> x.set_partial >> x.b_ >> x.clear_partial;...to return a proper value for the new flag.