Java inspired iostream replacement idea.

329 views
Skip to first unread message

Ben M.

unread,
Oct 17, 2016, 2:23:48 AM10/17/16
to ISO C++ Standard - Future Proposals
Recently I watched quite a few talks in 2013, & 2016 and iostreams was mentioned quite a bit. I haven't done much research into it yet. But seing how nothing much has been done I assume it hasn't been discussed. And doing a quick search in this forum didn't yield much results. My idea for IOStreams replacement is to have it similar to how Java does it.

// these are strictly binary streams. NO conversion of anykind should be done
// at this level
class
InputStream {
public:
    // Movable
    // not copyable.

    virtual ~InputStream(){};
   
/** @returns bytes actually read into buffer */
   
virtual size_t read(void *data, size_t size)=0;
   
    // like this for no exceptions
    template<typename T>
    bool read(T &output) noexcept {
        return read(&output, sizeof(T)) == sizeof(T);
    }
    template<typename T>
    T read() {
        T output;
        bool success = read<T>(output);
        if(!success){
            throw SomeException();
        }
    }

  
    // maybe have a variadic template that looks like this. I always need to lookup
    // the syntax for this. But you get the idea
    template<typename ...args>
    bool read(T &output, ... args) {
        if(read(output)) {
             return read(args...);
        }
        return false;
    }

    // have the default do nothing in base class. Assumes by default
    // there is no buffer. If you want buffer wrap in a BufferedInputStream
    virtual void flush(){};
    // no close function as this will add some unintuitive expectations.
    // If we have close it will need to be virtual. And calling virtual functions
    // in destructors is a bad idea.
};

// add specialization by doing something like
template<>
bool InputStream::read<FancyType>(FancyType &output) {
    // custom deserialization
    return true/false;
}

class OutputStream {
public:
    // movable & not copyable
    /** @returns bytes actually written into buffer */
   
virtual size_t write(const void *data, size_t size)=0;
   
    // similar idea as InputStream
    virtual void flush()=0;
    // similar an InputStream for 2 templates.

};

// for both OutputStream & InputStream have a read/write specialization for std::string
// in the specialization treat it as UTF-8 string by default. And absolutely no analysis of "\r\n" or "\n"
// conversion of anykind. Just read it as is. Just convert UTF-8 to what std::string needs/expects.


Then for text streams it's a similar idea. The text streams will have a handle to a binary input/output stream. In Java these are called Reader/Writer. Reader/Writer is to deal with text and handles text file encoding. And other related parameters. In C++ it would be similar deal. And at this level not handle any form of localization. Or if any localization very minimal. Then on top of that there is the  Printer which will handle the localization specific features and formatting such as floating points and etc..

For seeking there is to be Seekable class which has 1 virtual function seek() similar to fseek And there to be another Tellable class for telling file position similar to ftell. Ofcourse this would mean dynamic_cast<>() would need to be used when using these 2 features. I don't want them to be in InputStreeam/OutputStream because they may not have such ability for example a network socket. :What I like about Java's stream design is it's simplicity and how easy it is to extend them & compose them.

What are some thoughts on this? And if standard committee member is here, has doing something Java inspired been discussed?

Thank you.

Nicol Bolas

unread,
Oct 17, 2016, 10:34:27 AM10/17/16
to ISO C++ Standard - Future Proposals
Um no. You used a virtual interface as your means of specializing how streams work. If IOStreams has proven anything, it is that this is not the correct way to do that.

C++ is not Java. If you want to create a better C++ stream interface, then I would suggest looking at Boost.IOStreams for some ideas and concepts. Not that Boost.IOStream is a great idea, but it's a decent starting place.

Ville Voutilainen

unread,
Oct 17, 2016, 10:37:51 AM10/17/16
to ISO C++ Standard - Future Proposals
On 17 October 2016 at 17:34, Nicol Bolas <jmck...@gmail.com> wrote:
> Um no. You used a virtual interface as your means of specializing how
> streams work. If IOStreams has proven anything, it is that this is not the
> correct way to do that.

Java itself proved the same thing.

benz...@gmail.com

unread,
Oct 21, 2016, 1:03:25 AM10/21/16
to ISO C++ Standard - Future Proposals
Ok so I updated it. No virtual functions. Taking some inspiration from boost iostreams. See attached. With the approach I'm going with you will get something like

// binary is always default
GZipInput<FileInputStream> input(FileInputStream("someFile.gz"));

Just to compare boost iostreams does like so

    ifstream file("hello.z", ios_base::in | ios_base::binary);
    filtering_streambuf
<input> in;
   
in.push(zlib_decompressor());
   
in.push(file);
    boost
::iostreams::copy(in, cout);



Also have lineReader so you can do something like this

for(auto &line : lineReader(input)) {
   
// do something for each line
}

The idea would be to have readers take care of text related concepts ontop of the inputstreams which will be pure binary. What are you thoughts in this direction?

Thank you.
iostream.hpp

Nicol Bolas

unread,
Oct 21, 2016, 11:05:07 AM10/21/16
to ISO C++ Standard - Future Proposals, benz...@gmail.com

I considered something similar a while back. However, I came to a number of issues.

First, one of the important benefits of the current iostream-based approach is that you can overload stream input/output for specific types (ie: `operator<<` or `operator>>`) without knowing or caring exactly where your data was going. You didn't care if someone used `operator<<` to a file/string/whatever. You just overloaded `operator<<(std::ostream &os, const MyType&)` and you were covered for every kind of output.

That is incredibly important from a usability standpoint. But at the same time, it hurts performance, since it's all based on a virtual interface. Now, for streamed output of types, that's probably not important. Where it is important is the fact that it enforces this overhead onto everyone, even people who are just using the low-level "write some bytes" routines.

Of course, the obvious answer in your case is to use templates. Which leads into:

Second, the Boost IOstreams is a runtime-based system of filters. And while that does make it a bit slower, it has one very important benefit: it's highly flexible. If you're saving a file, and the user chooses to use one type of zip archive than another, that's just changing a runtime filter. It doesn't require vastly different code-paths or anything.

My point is that there is this tension between runtime flexibility and low-level performance. Your template-based approach is good for the latter, but not very good for the former. A comprehensive stream system ought to be able to handle the needs of both.

For example, there could be a `filtered_input_stream` template, which is variadic. The first parameter is the source of the input, and the following parameters are the set of compile-time defined filters. However, there would also be an `any_input_filter` type, which type-erases input filters.

So you could have `filtered_input_stream<file_input, any_input_filter>`, which would read from a file and filter the data with a runtime-defined filter.

Also, on a personal note, I don't like the idea of having the platform-dependent text translation stuff be part of the file input/output itself. It seems to me like that should be a filter on top of an always-binary file input/output object. Though admittedly it may be more efficient to do it that way.

Third, your current design has `InputStreamHolder` based on very simple entrypoints: read, seek, and tell. IOstream's `stream_buf` type uses a much more complex series of functions. Obviously, the designers of IOstreams added that complexity for a reason. Is your simple API sufficient to encapsulate these needs? I'm not saying it isn't; it's just that IOstream's complexity either has a purpose or it does not. And it'd be good to find out which it is.

Also, it would be good to figure out if there is room for a `vector::reserve`-style interface here. I haven't done any really low-level file IO stuff, but with buffering and all, it might not be unreasonable to be able to say, "I'm going to write something that's about 200 characters, so do any allocations you might need for that right now."

Lastly, if we're going to revamp stream IO, then we need to remember that this is the 21st century. So maybe we should investigate ways of handling streams that allow them to execute asynchronously. At the very least, there ought to be a way to create a low-level async file input/output object that works with `future`s and the like. Filters should probably run in the originating thread, on the assumption that the performance-limiting part is the actual IO, not the filters.

David Krauss

unread,
Oct 22, 2016, 3:44:58 AM10/22/16
to std-pr...@isocpp.org, Nicol Bolas, benz...@gmail.com
On 2016–10–21, at 11:05 PM, Nicol Bolas <jmck...@gmail.com> wrote:

First, one of the important benefits of the current iostream-based approach is that you can overload stream input/output for specific types (ie: `operator<<` or `operator>>`) without knowing or caring exactly where your data was going. You didn't care if someone used `operator<<` to a file/string/whatever. You just overloaded `operator<<(std::ostream &os, const MyType&)` and you were covered for every kind of output.

That is incredibly important from a usability standpoint. But at the same time, it hurts performance, since it's all based on a virtual interface.

That flexibility comes from the stream buffer, and adding a virtual call to buffer reallocation or flushing should never be a bottleneck.

Virtual calls into the locale do occur very often in the formatting routines, but the branches are typically predictable. It’s one factor… iostreams is slow because it’s bogged down by many little details of complexity. But overflow() should be allowed to be slow and its polymorphism is not superfluous. (And if the user might be hitting it too often, they should have an easy and portable way to prevent that by adjusting the buffer size.)

benz...@gmail.com

unread,
Oct 22, 2016, 9:15:00 PM10/22/16
to ISO C++ Standard - Future Proposals, benz...@gmail.com

Hi thank you for your feedback. I used it to adapt my ideas further. See bellow. Because of the length and time it took me. I wrote this in markdown, and copied and pasted into the editor in google groups. Which is why it looks different than usual.


Overload << >> for different types without needing to care about underlaying stream class.

InputStreamHolder<> is for this purpose. It allows a compile time way to anchor templates. There would be an equivalent OutputStreamHolder<> too for outputstream. Another approach is to create a template like so

template<class T, class POD, 
    typename 
    std::enable_if<
        std::is_same<
            std::streamsize, 
            decltype(((T*)0x0)->read(nullptr, std::streamsize(0x0)))
        >::value
    >::type* = nullptr
>
T &operator>>(T &stream, POD &output) {
    stream.read(&output, sizeof(POD));
    return stream;
}

There well need to be a special class like is_input_stream<> to make above easier. Reader & Writer will not have read/write method like above. But will have one that accepts a character size and returns a std::string of that many characters not bytes. As reader/writer should be pure character streams.


Runtime flexibility. Trade off between flexibility vs performance

For this I was thinking of having a VirtualInputStreamBase. It would work like so

class VirtualInputStreamBase {
public:
    // movable & not copyable
    // ...
    virtual ~VirtualInputStreamBase(){};
    virtual std::streamsize read(void *data, std::streamsize sizeBytes)=0;
};

template<typename T>
class VirtualInputStream : public VirtualInputStreamBase {
public:
    VirtualInputStream(T &stream) : mStream(std::move(stream){}
    std::streamsize read(void* data, std::streamsize sizeBytes) override {
        return mStream.read(data, sizeBytes);
    }
private:
    T mStream;
};

// now I can do VirtualInputStream<MyVeryCustomStream> stream(MyVeryCustomStream(...));
// I don't know if there is a better way. Either a pointer or reference needs to be used.
std::unique_ptr<VirtualInputStreamBase> someFunction();
// now inputstreams can go beyond with a type-erasure like feature. And something similar to filters
// in boost can be implemented.
 

Keep basic streams fully binary (no text translations). Text & platform specific translations should be in a higher level filter.

Yes I completely agree with this. The approach I like is having a “Reader” / “writer” classes would handle text & platform specific translations.


IOstream’s stream_buf has a wide array of use cases. How will these use cases fit into new IOStreams library.

I assume you are referring to basic_streambuf. Breaking down basic_streambuf does the following

  • Locale
  • Buffering/no buffering/Various details about how to buffer/flush/sync
  • Seeking
  • Peaking into next byte
  • Putting a single byte back
  • I don’t fully understand the rest. To me it seams everything is related to the above.

I think this is too much for 1 class to handle and should be handled by another class.

  • Locale should be handled at a higher level. Like a Reader<> /Writer<>.
  • for buffering have BufferedInputStream<>, and just buffering. No peaking. Seeking is ok. Being a template class if underlaying stream doesn’t support a seek it will generate a compiler error at seek() call site.
  • PeakInputStream<> to handle peaking and putting bytes back. Peaking should work like so

      PeakInputStream<SomeStream> stream;
    
      stream.mark(); // mark this spot. Stream will begin recording from this point on
      // do some reading and other processing.
      stream.markSize(); // essential returns the number of bytes from now to the mark point.
      // which means you can go backwards by stream.markSize()
      stream.markSeek(stream.markSize()); // go backwards by this amount. This will go back all the way to last mark() call.
    

    Which leads to the obvious usability issue you mentioned. Below is what happens if you want everything.

LocaleReader<BufferedInputStream<PeakInputStream<AsyncInputStream<SomeCustomInputStream>>>> reader;

I think something like the above should still be possible while also allowing the approach done by boost iostreams with its filter. The filter will utilize VirtualInputStream, or VirtualOutputStream.


vector::reserve style buffering

Something like a buffered stream? or to tell the OS to preallocate section on the hard drive for a file?


asynchronous streams.

There are read ahead asynchronous IO. This can be done with a AsyncInputStream<> / AsyncOutputStream kind of class. There is also asynchronous io similar to boost ASIO which is pretty much call backs from a users perspective. Under the hood it is very complex. Going the ASIO approach will require classes such as FileHandle, SocketHandle and let the library do any buffering. Boost ASIO style is more powerfull and read ahead can be implemted with a boost ASIO style library. Regardless the case thread executors will be needed. Specifically a thread pool executor.

  • Do my ideas make sense? Thoughts, opinions, directions?
  • What do you like more InputStreamHolder<> or using SFINAE for << >> overloading? I haven’t fully realized the advantages/disadvantages of both forms. I like the SFINAE direction more as it’s a simpler appreach for a library user. With SFINAE appreach Readers/Writers must not have binary reading/writing methods. >> << will nead to be overloaded for binary streams. For reader/writer I think a direction like boost format is the way to go.
  • For asynchronous IO should it be like boost ASIO, just read ahead, or both? I think the obvious answer is both.

Looking at boost ASIO. I think it’s good. If it were to become part of the c++ standard an obvious change would be to have it better integrated with c++’s threading library.


Thank you.

Sergey Zubkov

unread,
Oct 23, 2016, 3:47:24 PM10/23/16
to ISO C++ Standard - Future Proposals, benz...@gmail.com

I assume you are referring to basic_streambuf. Breaking down basic_streambuf does the following

  • I don’t fully understand the rest. To me it seams everything is related to the above.
Don't miss bulk I/O, I always found it ingenious how istream::read and ostream::write turn into a single system call each (such as POSIX read and write) with no intermediate buffering or locale calls (in the most common cases).
 

benz...@gmail.com

unread,
Oct 23, 2016, 8:27:42 PM10/23/16
to ISO C++ Standard - Future Proposals, benz...@gmail.com
Can you elaborate bulk I/O? Is ability to disabling buffering enough? Thank you.

Sergey Zubkov

unread,
Oct 23, 2016, 8:46:37 PM10/23/16
to ISO C++ Standard - Future Proposals, benz...@gmail.com
Can you elaborate bulk I/O? Is ability to disabling buffering enough? Thank you.
no, buffering is still there, but if I'm writing 1M and the buffer is 4K, C++ streams do not write in 4K chunks (when implemented as intended, e.g. libstdc++)  As the standard puts it, xsputn can achieve the effect of overflow through other means.
Message has been deleted
Message has been deleted
Message has been deleted

benz...@gmail.com

unread,
Nov 20, 2016, 12:54:50 PM11/20/16
to ISO C++ Standard - Future Proposals, benz...@gmail.com
If anyone is wondering about progress. I have included a zip file of what I have been working on. If anyone wants to join in let me know and I'll add you to bitbucket repository. I'll make it public when I feel it's more ready. Just in summary the basic idea so far. All pure binary streams end in _istream, _ostream, or _iostream. e.g. file_istream, file_iostream. Locale based streams end in _istring, _ostring, _iostring and they hold on to a respective binary stream. There is a chain_istream & chain_ostream to chain streams together, or you can chain by using templates. The choice is given to user how to chain. endian conversion functions, block device stream to guarantee reading a block at a time. Current unknowns is what to do with errors, like to throw them, error codes, or other. Also I don't use C++ locale features it would help if someone can design & or guide that part. See attached for more details. A readme includes a more detailed summary. Basic examples in tests/test.cpp.

Attached is a zip file. Just rename the extension. I don't know why I could not upload it just says error please try again.
iostream.fun
Reply all
Reply to author
Forward
0 new messages