Performance
This is the big one, generally the #1 reason why people suggest using C-standard file IO rather than iostreams.Utility
While performance is the big issue, it’s not the only one.snprintf(..., “0x%08x”, integer);
stream << "0x" << std::right << std::hex << std::setw(8) << iVal << std::endl;
It may take a bit longer to become used to the printf version, but this is something you can easily look up in a reference.
C++ used the << method because the alternatives were less flexible. Boost.Format and other systems show that C++03 did not really have to use this mechanism to achieve the extensibility features that iostreams provide.
What do you think? Are there other issues in iostreams that need to be mentioned?
On 17 November 2012 13:36, Jason McKesson <jmck...@gmail.com> wrote:C++ used the << method because the alternatives were less flexible. Boost.Format and other systems show that C++03 did not really have to use this mechanism to achieve the extensibility features that iostreams provide.Boost.Format came out in 2002. C++03 (which is basically C++98) was standardized in the 90s. Short of building a time machine, I fail to see how Boost.Format showed C++03 anything.
What do you think? Are there other issues in iostreams that need to be mentioned?Not really, no. Ragging on iostreams is easy, and has been done plenty of times already. Coming up with a proposal to replace it is hard and time consuming. I don't see any proposal here. Are you looking to write one?
Le 17/11/2012 20:36, Jason McKesson a �crit :
> The Iostreams library in C++ has a problem. We have real, reasonable,
> legitimate C++ professional, who like C++ and use modern C++ idioms,
> telling people to not use iostreams. This is not due to differing
> ideas on C++ or C-in-classes-style development, but the simple
> practical realities of the situation.
>
There are mostly two points where I disagree with your analysis:
- Performance: I performances really matter, granted, I will not use
iostream, but I will not use C I/O facilities either. I will use
platform specific API that can deliver maximum performance.
- Usability: I find printf format really hard to use (and very error
prone). It's another language, and an obscure one. I genuinely have no
idea what 0x%08x meant in your message. I was not even sure if it
expected one argument or several. But this is not my main point. My main
point is that your comparison is unfair: Most of the time, when doing
I/O, I don't care about format
(when I care, then I use a UI library
such as Qt, or I generate HTML, or LaTeX, or whatever, but I don't use
iostream)
. And in this case, iostream are not more verbose:
os << "Line " << line << ": Error(" << code << "): " << msg;
printf("Line %??: Error(%??): %??", line, code, msg);
The difference is not that big, even when using only basic types (and,
as you said, the difference is in the other direction when dealing with
user defined types).
For me, the biggest issue I have with iostream is localisation, and the
possibility to have a whole sentence in one block, and to be able to
swap arguments. And boost format really helps here.
--
Lo�c
The Iostreams library in C++ has a problem. We have real, reasonable, legitimate C++ professional, who like C++ and use modern C++ idioms, telling people to not use iostreams. This is not due to differing ideas on C++ or C-in-classes-style development, but the simple practical realities of the situation.
This kind of thing is indicative of a real problem in iostreams. In order to eventually solve that problem, we must first identify exactly what the problems are. This discussion should be focused on exactly that: identifying the problems with the library. Once we know what the real problems are, we can be certain that any new system that is proposed addresses them.
Note that this is about problems within iostreams. This is not about a list of things you wish it could do. This is about what iostreams actually tries to do but fails at in some way. So stuff like async file IO doesn’t go here, since iostreams doesn’t try to provide that.
Feel free to add to this list other flaws you see in iostreams. Or if you think that some of them are not real flaws, feel free to explain why.
What do you think? Are there other issues in iostreams that need to be mentioned?
On 11/17/2012 08:36 PM, Jason McKesson wrote:I think that it is a failure in design that getting the string out of the stringstream is by value.
It’s very compact, for one. Once you understand the basic syntax of it, it’s very easy to see what’s going on. Especially for complex formatting. Just consider the physical size difference between these two:
snprintf(..., “0x%08x”, integer);
stream << "0x" << std::right << std::hex << std::setw(8) << iVal << std::endl;
It may take a bit longer to become used to the printf version, but this is something you can easily look up in a reference.
It’s very compact, for one. Once you understand the basic syntax of it, it’s very easy to see what’s going on. Especially for complex formatting. Just consider the physical size difference between these two:
snprintf(..., “0x%08x”, integer);
stream << "0x" << std::right << std::hex << std::setw(8) << iVal << std::endl;
It may take a bit longer to become used to the printf version, but this is something you can easily look up in a reference.
It’s very compact, for one. Once you understand the basic syntax of it, it’s very easy to see what’s going on. Especially for complex formatting. Just consider the physical size difference between these two:
snprintf(..., “0x%08x”, integer);stream << "0x" << std::right << std::hex << std::setw(8) << iVal << std::endl;It may take a bit longer to become used to the printf version, but this is something you can easily look up in a reference.Again, logic of a person for whom recursion is as easy to understand and use as iteration.
--
It’s very compact, for one. Once you understand the basic syntax of it, it’s very easy to see what’s going on. Especially for complex formatting. Just consider the physical size difference between these two:snprintf(..., “0x%08x”, integer);stream << "0x" << std::right << std::hex << std::setw(8) << iVal << std::endl;It may take a bit longer to become used to the printf version, but this is something you can easily look up in a reference.a) every heard of "type safety"?
--
What do you think? Are there other issues in iostreams that need to be mentioned?
Your most recent replies have been getting somewhat inflamatory. I think you should take a break.
Your most recent replies have been getting somewhat inflamatory. I think you should take a break.Fair enough, but interestingly, you didn't say anything to the guy who claimed that my suggestion is idiotic. I believe that either apply rules (of correct manners etc) to everyone and I am more than happy for it, or don't apply them at all. Saying just to one guy (me) to ease off and don't say anything to another guy why I believe presented far worse behavior than I (calling someone's suggestion "idiotic") is simply not fair. I would like you to note that I wasn't the first guy who posted "somewhat" inflammatory posts. Some people here are passive aggressive and this bad too yet you don't mind them doing so. And also, please note that I didn't use any offensive words, like commenting on someone's suggestion as "idiotic", for example.
vector<char>
using back_inserter
' which took about 6x the times of the rest and, contradicting your analysis, 'putting binary data directly into stringbuf
' which took about half the time of the rest.The performance problem of iostreams is the locale support.
If you remove the locale support then everything can be nicely inlined into nothingness and run in circles around printf.
Remember that printf do parse the format string every time it runs so there is a pretty big wiggle room if that is what you wish to beat.
> There’s one real problem with this logic, and it is exactly why people
> suggest C-standard file IO. Iostreams violates a fundamental precept of
> C++: pay only for what you use.
Yes. See above.
> Consider this suite of benchmarks. This code doesn’t do file IO; it writes
> directly to a string. All it’s doing is measuring the time it takes to append
> 4-characters to a string. A lot. It uses a `char[]` as a useful control. It also
> tests the use of `vector<char>` (presumably `basic_string` would have
> similar results). Therefore, this is a solid test for the efficiency of the
> iostreams codebase itself.
>
> Obviously there will be some efficiency loss. But consider the numbers in
> the results.
I did download the tests and ran them using g++ -O2 <filename>.cpp
My g++ is g++-4.7.2 on linux.
All tests run in about the same time save for 'putting binary data into avector<char>
usingback_inserter
' which took about 6x the times of the rest and, contradicting your analysis, 'putting binary data directly intostringbuf
' which took about half the time of the rest.
If I were to remove the -O2 flag, telling the compiler to not optimize the code, then my test results show some similarity to yours (Worst case 15x) but who compiles benchmarks without optimization?
/MF
The performance problem of iostreams is the locale support.
If you remove the locale support then everything can be nicely inlined into nothingness and run in circles around printf. Remember that printf do parse the format string every time it runs so there is a pretty big wiggle room if that is what you wish to beat.
"idiotic" is not an offensive word
"idiotic" is not an offensive word. More importantly, he called your suggestion idiotic, which is very different from calling you idiotic. Attacks against your suggestion are going to happen; that's what this discussion forum is about. Attacking you as a person is what we wouldn't allow; attacking a suggestion is perfectly reasonable.
Plus, the "idiotic" comment came after an extended period of discussion where you continued to use the same reasoning over and over, without showing the slightest sense that you understood the opposing argument. Nor did you display any recognition or understanding of the simple fact that the standard doesn't cover what you were talking about. Given the substance of the discussion, I think it was a perfectly reasonable assessment of your suggestion.
Please do explain what this response has to do with "the failures of iostreams"?
The problem with iostreams is that locales are part of the streambuf, not merely the formatting stream. The streambuf should be about basic "byte" input/output to/from a stream, not locale-specific constructs.
Where should newline translation be done?
The primary problem with that is that "Text file" isn't actually a well-defined platform-independent concept because of the newlines.
set the newline sequence,
stay with an inheritance design
You wouldn't need a separate sink for them.
convert it into the platform-specific equivalent
The primary problem with that is that "Text file" isn't actually a well-defined platform-independent concept because of the newlines.
Yours,
--
Jean-Marc Bourguet
We have iterators for that. Hell, you can do that right now.
set the newline sequence,And if I want to support multiple? Does that mean I have to be precognitive?
So we can continue to feel the pain of multiple inheritance?
I'm not sure if that variety is still relevant but C and C++ IO were designed to handle them (for instance, spaces before end of line may disappear when rereading a text file, NUL characters may appear at end of file for binary files). Before designing a replacement which is unable to handle them, I'd suggest to be sure that they are no more relevant (start by looking at z/OS) and to bring people aware of the IO models of the OS you want to support early enough that you don't have to restart your design as not portable enough.
this is about finding out where iostreams went wrong, not how to fix it
On Saturday, November 24, 2012 11:07:08 AM UTC-8, Jean-Marc Bourguet wrote:I'm not sure if that variety is still relevant but C and C++ IO were designed to handle them (for instance, spaces before end of line may disappear when rereading a text file, NUL characters may appear at end of file for binary files). Before designing a replacement which is unable to handle them, I'd suggest to be sure that they are no more relevant (start by looking at z/OS) and to bring people aware of the IO models of the OS you want to support early enough that you don't have to restart your design as not portable enough.
Why do we have to support those?
Try doing that with iterators. The only way you could do it is by loading the entire file into memory, or by using those 'silly' input iterators.
You'll notice I only suggested setting the newline sequence for output streams. For input streams, a sane default would be to swallow \r characters, unless the user is expecting a certain sequence.
I agree that input iterators are really just functions in disguise, but they do work and not badly either. There is no reason why an input-iterator based solution could not work just fine. More relevantly, an input-iterator based solution would actually be remotely generic- I could decompress a file I had already loaded into memory, for example.
You'll notice I only suggested setting the newline sequence for output streams. For input streams, a sane default would be to swallow \r characters, unless the user is expecting a certain sequence.You have no idea what the user is expecting. Only they know that.
Oh for heaven's sake, are you seriously taking me to task for suggesting that users who use line endings other than \n, \n\r, or \r\n, would have to stoop so low as to override a default option?
1) The end of an input_iterator is a wasteful hack.
2) You can't differentiate between 'no data' and 'end of data'.
3) Iterators don't take ownership, so you have additional lifetime management.
It's because one-directional ranges and streams are effectively the same thing.
Well, yes. I'm saying that the stream should not eat data unless explicitly asked for.
2) You can't differentiate between 'no data' and 'end of data'.I don't see the difference. In either case, there is no more data to be had.
Okay, you misunderstood some of the stuff I was saying about iterators, but I'm pretty sure it's moot. We both agree that ranges are a superior solution to iterators, and, while our confidence levels differ, we both think ranges have the potential for making good streams, so we can stop talking about iterators now, right?
The binary and text streams would likely comprise separate classes, so there'd be no need for a flag. As for operator<< and codecvt_facet, they are almost certain to be removed.
--Beman
I' ve read through this thread and would like to mention a few points. This information is fruit of my experience in implementing the boost serialization library. For "binary" archives performance was he supreme consideration. At the same time I wanted/needed it to be built on top of the standard library constructs.
a) The std::binary flag was necessary to avoid the i/o stream from munching characters. Unfortunatly, there is no way to inquire (e.g. i/ostream.is_binary() ) to determine how a stream has been opened so that certain user errors can be detected.
b) the << and >> interfaces turned out to be performance killers. But the functionality provided by these operators was totally unused. So later versions of the library just used the streambuf interface directly. The constructor for a binary archive can take as an argument either a streambuf or a stream. If passed a stream, the associated streambuf is used directly. This results in a huge performance boost # 1.
c) unfortunately, the streambuf implements the codecvt interface. A performance hit and not a good match for binary i/o. So I made a custom codecvt facet which does nothing. Another performance improvement.
Le dimanche 25 novembre 2012 17:26:15 UTC+1, robertmac...@gmail.com a écrit :I' ve read through this thread and would like to mention a few points. This information is fruit of my experience in implementing the boost serialization library. For "binary" archives performance was he supreme consideration. At the same time I wanted/needed it to be built on top of the standard library constructs.
a) The std::binary flag was necessary to avoid the i/o stream from munching characters. Unfortunatly, there is no way to inquire (e.g. i/ostream.is_binary() ) to determine how a stream has been opened so that certain user errors can be detected.
Never though about that, but I'd have been in situations where I'd have used one if it had been available.
b) the << and >> interfaces turned out to be performance killers. But the functionality provided by these operators was totally unused. So later versions of the library just used the streambuf interface directly. The constructor for a binary archive can take as an argument either a streambuf or a stream. If passed a stream, the associated streambuf is used directly. This results in a huge performance boost # 1.
<< and >> are about providing a formatting interface. There is an unformated API to stream, but like you I usually resort to streambuf as I find it more convenient (but I usually use only streams in my public interface and use its error reporting interface). One aspect I don't like about streambuf from a performance POV is that sgetn and sputn directly call xsgetn and xsputn which are virtual functions even if there is room in the corresponding area, and that prevent them to be inlined when used with small length.
c) unfortunately, the streambuf implements the codecvt interface. A performance hit and not a good match for binary i/o. So I made a custom codecvt facet which does nothing. Another performance improvement.
Wouldn't imbuing locale::classic() enough? That's what I do on my binary stream but I've never though about measuring if there was a win in imbuing a custom codecvt.
On Sunday, November 25, 2012 10:09:21 AM UTC-8, Jean-Marc Bourguet wrote:Le dimanche 25 novembre 2012 17:26:15 UTC+1, robertmac...@gmail.com a écrit :I' ve read through this thread and would like to mention a few points. This information is fruit of my experience in implementing the boost serialization library. For "binary" archives performance was he supreme consideration. At the same time I wanted/needed it to be built on top of the standard library constructs.
a) The std::binary flag was necessary to avoid the i/o stream from munching characters. Unfortunatly, there is no way to inquire (e.g. i/ostream.is_binary() ) to determine how a stream has been opened so that certain user errors can be detected.
Never though about that, but I'd have been in situations where I'd have used one if it had been available.
b) the << and >> interfaces turned out to be performance killers. But the functionality provided by these operators was totally unused. So later versions of the library just used the streambuf interface directly. The constructor for a binary archive can take as an argument either a streambuf or a stream. If passed a stream, the associated streambuf is used directly. This results in a huge performance boost # 1.
<< and >> are about providing a formatting interface. There is an unformated API to stream, but like you I usually resort to streambuf as I find it more convenient (but I usually use only streams in my public interface and use its error reporting interface). One aspect I don't like about streambuf from a performance POV is that sgetn and sputn directly call xsgetn and xsputn which are virtual functions even if there is room in the corresponding area, and that prevent them to be inlined when used with small length.
note that one is permitted to make his own streambuf implemenation as well. Another path that should be exhausted before starting to think about a whole new library. I don't think that in my code I actually use these put/get functions - but I could be wrong, I forget.
I was responding to the suggestion that in many cases they aren't convenient to use and any other alternative could dispense with these. My view is that you don't have to use them and if you want to make your own "raw_ostream" it doesn't have to support them if you feel this way. My real point is that it's premature to think about a new library when the possibilities of the current one haven't been exhausted. It's also possible that attempts to make a "raw_i/ostream" class might work just fine except for some small thing that could be addressed with a small tweak to the current library - implementation of is_binary() is would be an example.
c) unfortunately, the streambuf implements the codecvt interface. A performance hit and not a good match for binary i/o. So I made a custom codecvt facet which does nothing. Another performance improvement.
Wouldn't imbuing locale::classic() enough? That's what I do on my binary stream but I've never though about measuring if there was a win in imbuing a custom codecvt.
lol - truth is I don't know the answer to this. I did this because I thought it would make a difference. I likely concluded this by tracing into library code. It was an easy fix so I implemented and forgot about it.
Too re-iterate my point,
a) the main concern of the original post was that streams have performance issues and that a new library might be needed to address this.
b) Another (secondary concern) was the interface.
c) My view is that these ideas should be "Tested" by making some derivations/ehancements to the current libraries to address these concerns.
The Iostreams library in C++ has a problem. We have real, reasonable, legitimate C++ professional, who like C++ and use modern C++ idioms, telling people to not use iostreams. This is not due to differing ideas on C++ or C-in-classes-style development, but the simple practical realities of the situation.--
This kind of thing is indicative of a real problem in iostreams. In order to eventually solve that problem, we must first identify exactly what the problems are. This discussion should be focused on exactly that: identifying the problems with the library. Once we know what the real problems are, we can be certain that any new system that is proposed addresses them.
Note that this is about problems within iostreams. This is not about a list of things you wish it could do. This is about what iostreams actually tries to do but fails at in some way. So stuff like async file IO doesn’t go here, since iostreams doesn’t try to provide that.
Feel free to add to this list other flaws you see in iostreams. Or if you think that some of them are not real flaws, feel free to explain why.
Performance
This is the big one, generally the #1 reason why people suggest using C-standard file IO rather than iostreams.
Oftentimes, when people defend iostreams performance, they will say something to the effect of, “iostreams does far more than C-standard file IO.” And that’s true. With iostreams, you have an extensible mechanism for writing any type directly to a stream. You can “easily” write new streambuf’s that will allow you to (via runtime polymorphism) be able to work with existing code, thus allowing you to leverage your file IO for other forms of IO. You could even use a network pipe as an input or output stream.
There’s one real problem with this logic, and it is exactly why people suggest C-standard file IO. Iostreams violates a fundamental precept of C++: pay only for what you use.
Consider this suite of benchmarks. This code doesn’t do file IO; it writes directly to a string. All it’s doing is measuring the time it takes to append 4-characters to a string. A lot. It uses a `char[]` as a useful control. It also tests the use of `vector<char>` (presumably `basic_string` would have similar results). Therefore, this is a solid test for the efficiency of the iostreams codebase itself.
Obviously there will be some efficiency loss. But consider the numbers in the results.
The ostringstream is more than full order of magnitude slower than the control. It’s almost 100x in some cases. Note that it’s not using << to write to the stream; it’s using `ostream::write()`.
Note that the vector<char> implementations are fairly comparable to the control, usually being around 1x-4x the speed. So clearly this is something in ostringstream.
Now, you might say that one could use the stringbuf directly. And that was done. While it does improve performance over the ostringstream case substantially (generally half to a quarter the performance), it’s still over 10x slower than the control or most vector<char> implementations.
Why? The stringbuf operations ought to be a thin wrapper over std::string. After all, that’s what was asked for.
Where does this inefficiency come from? I haven’t done any extensive profiling analysis, but my educated guesses are from two places: virtual function overhead and an interface that does too much.
ostringstream is supposed to be able to be used as an ostream for runtime-polymorphism. But here’s where the C++ maxim comes into play. Runtime-polymorphism is not being used here. Every function call should be able to be statically dispatched. And it is, but all of the virtual machinery comes from within ostringstream.
This problem seems to come mostly from the fact that basic_ostream, which does most of the leg-work for ostringstream, has no specific knowledge of its stream type. Therefore it's always a virtual call. And it may be doing many such virtual calls.
You can achieve the same runtime polymorphism (being able to overload operator<< for any stream) by using a static set of stream classes, tightly coupled to their specific streambufs, and a single “anystream” type that those streams can be converted into. It would use std::function-style type erasure to remember the original type and feed function calls to it. It would use a single function call to initiate each write operation, rather than what appears to be many virtual calls within each write.
Then, there’s the fact that streambuf itself is overdesigned. stringbuf ought to be a simple interface wrapper around a std::string, but it’s not. It’s a complex thing. It has locale support of all things. Why? Isn’t that something that should be handled at the stream level?
This API has no way to get a low-level interface to a file/string/whatever. There’s no way to just open a filebuf and blast the file into some memory, or to shove some memory out of a filebuf. It will always employ the locale machinery even if you didn’t ask for it. It will always make these internal virtual calls, even if they are completely statically dispatched.
With iostreams, you are paying for a lot of stuff that you don’t frequently use. At the stream level, it makes sense that you’re paying for certain machinery (though again, some way to say that you’re not using some of it would be nice). At the buffer level, it does not, since that is the lowest level you’re allowed to use.
Utility
While performance is the big issue, it’s not the only one.
The biggest selling point for iostreams is the ability to extend its formatted writing functionality. You can overload operator<< for various types and simply use them. You can’t do that with fprintf. And thanks to ADL, it will work just fine for classes in namespaces. You can create new streambuf types and even streams if you like. All relatively easily.
Here’s the problem, and it is admittedly one that is subjective: printf is really nice syntax.
It’s very compact, for one. Once you understand the basic syntax of it, it’s very easy to see what’s going on. Especially for complex formatting. Just consider the physical size difference between these two:
snprintf(..., “0x%08x”, integer);
stream << "0x" << std::right << std::hex << std::setw(8) << iVal << std::endl;
It may take a bit longer to become used to the printf version, but this is something you can easily look up in a reference.
Plus, it makes it much easier to do translations on formatted strings. You can look the pattern string up in a table that changes from language to language. This is rather more difficult in iostreams, though not impossible. Granted, pattern changes may not be enough, as some languages have different subject/verb/object grammars that would require reshuffling patterns around. However, there are printf-style systems that do allow for reshuffling, whereas no such mechanism exists for iostream-style.
C++ used the << method because the alternatives were less flexible. Boost.Format and other systems show that C++03 did not really have to use this mechanism to achieve the extensibility features that iostreams provide.
What do you think? Are there other issues in iostreams that need to be mentioned?
From the given performance tests, it would appear that "derivations/ehancements(sic)" will be insufficient to resolve this problem.
It's an interface problem, and you can't solve an interface problem by continuing to use the same interface.
On Sunday, November 25, 2012 12:52:45 PM UTC-8, Nicol Bolas wrote:
From the given performance tests, it would appear that "derivations/ehancements(sic)" will be insufficient to resolve this problem.
I don't think the tests show that.
It's an interface problem, and you can't solve an interface problem by continuing to use the same interface.
I guess that's where we disagree. Of course without an alternative interface to test it's really hard to know.
On Sunday, November 25, 2012 2:16:48 PM UTC-8, robertmac...@gmail.com wrote:
On Sunday, November 25, 2012 12:52:45 PM UTC-8, Nicol Bolas wrote:
From the given performance tests, it would appear that "derivations/ehancements(sic)" will be insufficient to resolve this problem.
I don't think the tests show that.
Well, how else can you explain it? vector::push_back is over 10x faster than doing the equivalent task with basic_stringbuf directly when they are doing the exact same thing.
On Sunday, November 25, 2012 3:57:03 PM UTC-8, Nicol Bolas wrote:
On Sunday, November 25, 2012 2:16:48 PM UTC-8, robertmac...@gmail.com wrote:
On Sunday, November 25, 2012 12:52:45 PM UTC-8, Nicol Bolas wrote:
From the given performance tests, it would appear that "derivations/ehancements(sic)" will be insufficient to resolve this problem.
I don't think the tests show that.
Well, how else can you explain it? vector::push_back is over 10x faster than doing the equivalent task with basic_stringbuf directly when they are doing the exact same thing.
The stream implementation considers codeconvert, /r/n translation etc, etc.
The Iostreams library in C++ has a problem. We have real, reasonable, legitimate C++ professional, who like C++ and use modern C++ idioms, telling people to not use iostreams. ...
See what happened with the proposition to remove trigraphs.
Other aspects such as Unicode/Localisation should be dealt with by other libraries - it doesn't seem like it should be the purpose of a stream to do this, and not every user of the streaming library should have to pay the cost by default, as at present. It seems perverse at the moment that one can improve the performance of iostreams only by introducing more complex code - the inverse of the more typical equation where more code => more functionality.
If we can pin down/agree a more suitable interface, I'm certainly happy to help provide a sample implementation for testing and improvement. My ideas above can be discarded if they are not appropriate - I'm not wedded to them, but I thought they might provide a hook for criticism!