Re: Best practices (like for file I/O)

Alf P. Steinbach

unread,

Dec 23, 2015, 1:03:05 PM12/23/15

to

On 12/23/2015 5:20 PM, Stefan Ram wrote:
> I am thinking about writing a program that reads from a text file.
>
> I wonder what the best practices are to do this in C++.

That I don't know, but there are lots of UNGOOD practices.

> For example, I could use »getchar« and assume that the program
> is used from the command line with shell redirection of stdin
> via »<« or I could accept the path of the file via »argc« and
> then construct an »ifstream« object. Which possibility is better?

None of those are good for portable code.

In both cases the main problem is dealing properly with text encodings.

Using the "main" arguments to denote filenames fails in general in
Windows, because for the general case they simply can't encode the
filenames. The C++ standard suggests that the "main" arguments should be
multibyte-encoded, and that could have worked, but the one and only
"main" argument encoding convention in Windows before that and after
that is Windows ANSI, which in Western countries is some single byte per
character encoding. I.e. the standardization process did not standardize
or accommodate the existing practice, but innovated in a
Windows-unfriendly way. There are other oddities resulting from that
Unixland/Microsoft political conflict, e.g. that a standard C++ program
can't copy its standard input exactly to its standard output except in
Unixland, and that Windows' wide characters are in direct conflict with
the standard's requirements: there can't be a practically useful and
also fully conforming C++ implementation for Windows, ever. If these
victims of the conflict had been important then it would have been
tragic indeed. Happily, as it is it's just very absurd, something we can
look at and say, "what idiots!", nothing more.

> Here is an example of a program I wrote to remove all »x«
> while copying the contents of a file:
>
> #include <iostream>
> #include <istream>
> #include <ostream>
> #include <fstream>
> #include <iterator>
>
> int main()
> { ::std::ifstream i{ "C:\\example\\source.txt" }; if( i )

Just use the forward slash, "/", as path separator. ;-)

> { ::std::ofstream o{ "C:\\example\\target.txt" }; if( o )
> { ::std::istreambuf_iterator< char >O;
> ::std::istreambuf_iterator< char >p{ i };
> ::std::ostreambuf_iterator< char >q{ o };
> while( p != O )
> { char const ch = *p;
> if( ch != 'x' )*q = ch;
> ++q; ++p; }}}}
>
> You might find my placements of »if( i )« strange, but apart
> from the the formatting, are there any deficits in the code?

Well it's a bit inefficient.

I would read one line at a time using "getline", and remove the x-es in
each line before outputting it.

That also makes it easier and more efficient to check and deal with a
read or write failure.

Regarding the formatting that can be easily changed via e.g. AStyle.

However, changing it back to your preference might be harder, because I
don't think any editor or formatter supports that... :-o)

> (When I was young I used to carefully wrap stream
> constructions into »try« blocks and release resources
> and closed files explicitly. Then I learned more about
> RAII. I now hope the the above program with out
> »try« blocks or »close« calls still handles all
> resources correctly. Please let me know if it
> would be better to call »close« explictly.)

The stream destructors handle that.

[snip]

Cheers, & hth., & Merry Christmas,

- Alf

Ian Collins

unread,

Dec 23, 2015, 2:45:35 PM12/23/15

to

Stefan Ram wrote:
> I am thinking about writing a program that reads from a text file.
>
> I wonder what the best practices are to do this in C++.
>

> For example, I could use »getchar« and assume that the program
> is used from the command line with shell redirection of stdin
> via »<« or I could accept the path of the file via »argc« and
> then construct an »ifstream« object. Which possibility is better?

Use command options (or special file names such as '-') to select the
input and output source.

> Here is an example of a program I wrote to remove all »x«
> while copying the contents of a file:
>
> #include <iostream>
> #include <istream>
> #include <ostream>
> #include <fstream>
> #include <iterator>
>
> int main()
> { ::std::ifstream i{ "C:\\example\\source.txt" }; if( i )

> { ::std::ofstream o{ "C:\\example\\target.txt" }; if( o )
> { ::std::istreambuf_iterator< char >O;
> ::std::istreambuf_iterator< char >p{ i };
> ::std::ostreambuf_iterator< char >q{ o };
> while( p != O )
> { char const ch = *p;
> if( ch != 'x' )*q = ch;
> ++q; ++p; }}}}
>
> You might find my placements of »if( i )« strange, but apart
> from the the formatting, are there any deficits in the code?

The joint most important characteristic (along with correctness) for
code is clarity! So drop the superfluous colons and add in some whitespace.

You should make used of the standard library where possible, so I would
have written this more like:

int main( int argc, char** argv )
{
if( argc < 2 ) return EXIT_FAILURE;

std::ifstream in {argv[1]};

if( in )
{
std::ofstream out {argv[2]};

if( out )
{
using inIterator = std::istreambuf_iterator<char>;
using outIterator = std::ostreambuf_iterator<char>;

copy_if( inIterator(in), inIterator(), outIterator(out),
[]( char c ){ return c != 'x'; } );
}
}

return EXIT_SUCCESS;
}

Using copy_if makes it clear what the code is doing.

> (When I was young I used to carefully wrap stream
> constructions into »try« blocks and release resources
> and closed files explicitly. Then I learned more about
> RAII. I now hope the the above program with out
> »try« blocks or »close« calls still handles all
> resources correctly. Please let me know if it
> would be better to call »close« explictly.)

Streams are a manifestation of RAII, so everything will get released.

> I wish there was a collection of C++ snippets that can
> be used as templates for common situations! But those
> snippets should really reflect best practices. Rosetta
> code might be such a collection, but it might not reflect
> best practices (for C++17). Sometimes, I try to look up
> what I want to do in a book by Stroustrup or Sutter, but
> they might have simplified some example code for
> pedagogical reasons. Boost should be a good read for
> high-quality example code, but I am not sure whether
> I can find code for a simple file I/O situation there.

The standard algorithms are a good place to start.

> I think user code should never call library functions
> directly, but always wrap them. For example, don't call
> »getchar«, but »mygetchar«, and implement »mygetchar«
> to forward to »getchar« (we got perfect forwarding!).
> Then, when on decides later to want to read from a string
> than from a file, it is easy to just change »mygetchar«.

Why? If you need to do that there must be something unsuitable with the
library function.

> But above, I do not use a function, but an operator.

You used neither, just normal code.

> So the preceding paragraph then would say that one
> should never use a library /class/ directly, but always
> a custom wrapper for it. Writing such a custom wrapper
> seems to be more difficult for a class than for a function,
> but with the new possibility to inherit constructors,
> it might be quite easy.

No matter how easy, it would still be pointless.

--
Ian Collins

Ian Collins

unread,

Dec 23, 2015, 2:52:16 PM12/23/15

to

Alf P. Steinbach wrote:
>
> Well it's a bit inefficient.
>
> I would read one line at a time using "getline", and remove the x-es in
> each line before outputting it.

Without measuring, I wouldn't make that claim. During some earlier
testing, I found using streambuf iterators to be an efficient way to
read through a file. Being able to use them with standard algorithms is
a bonus!

--
Ian Collins

Ian Collins

unread,

Dec 23, 2015, 3:25:09 PM12/23/15

to

Stefan Ram wrote:

> Ian Collins <ian-...@hotmail.com> writes:
>> The standard algorithms are a good place to start.
>

> I assume you mean the source code of an implementation
> of the standard algorithms?

I meant learning and using them, for example using copy_if instead of
your hand written equivalent.

> Is there a »canonical« (gcc or llvm) download location
> where one can get an archive file with the source code for
> the standard library?

No.

>>> I think user code should never call library functions
>>> directly, but always wrap them. For example, don't call
>>> »getchar«, but »mygetchar«, and implement »mygetchar«
>>> to forward to »getchar« (we got perfect forwarding!).
>>> Then, when on decides later to want to read from a string
>>> than from a file, it is easy to just change »mygetchar«.
>> Why? If you need to do that there must be something unsuitable with the
>> library function.
>

> No, there does not have to be something wrong.
>
> I am thinking about cases like, for example:
>
> ::std::cerr << "Warning: 'x' is deprecated.\n"s;
>
> Here the standard operator »<<« is called directly.
>
> Now, I write a thin wrapper:
>
> ::std::ostream & warning
> ( ::std::ostream & err, ::std::string const text )
> { return err << text; }
>
> to be used like this:
>
> warning( ::std::cerr, "Warning: 'x' is deprecated.\n"s );
>
> . Then, if I later want to add a bell sound ("\a")
> to all my warning messages, it will be easy to just
> change my custom function »warning«. It would be more
> difficult to look up every usage of »<<« and decide
> whether it's a warning and then conditionally add a bell.
>

Google "Yagni" :)

--
Ian Collins

Louis Krupp

unread,

Dec 23, 2015, 10:34:21 PM12/23/15

to

On 23 Dec 2015 16:20:40 GMT, r...@zedat.fu-berlin.de (Stefan Ram)
wrote:

<snip>

>#include <iostream>
>#include <istream>
>#include <ostream>
>#include <fstream>
>#include <iterator>
>
>int main()
>{ ::std::ifstream i{ "C:\\example\\source.txt" }; if( i )
> { ::std::ofstream o{ "C:\\example\\target.txt" }; if( o )
> { ::std::istreambuf_iterator< char >O;
> ::std::istreambuf_iterator< char >p{ i };
> ::std::ostreambuf_iterator< char >q{ o };
> while( p != O )
> { char const ch = *p;
> if( ch != 'x' )*q = ch;
> ++q; ++p; }}}}
>
> You might find my placements of »if( i )« strange, but apart
> from the the formatting, are there any deficits in the code?

More conventional formatting might have saved you from what looks like
a bug: You're incrementing q whether or not you've assigned anything
to *q.

Louis

mark

unread,

Dec 24, 2015, 8:11:23 AM12/24/15

to

On 2015-12-23 20:52, Ian Collins wrote:
> Alf P. Steinbach wrote:
>>
>> Well it's a bit inefficient.
>>
>> I would read one line at a time using "getline", and remove the x-es in
>> each line before outputting it.
>
> Without measuring, I wouldn't make that claim. During some earlier
> testing, I found using streambuf iterators to be an efficient way to
> read through a file.

Efficient??? What??? Streams suck.

GCC 5.2 (g++ -O2 -std=c++14):
fread Elapsed: 213.867 ms
fread Transfer rate: 1943.43 MB/s
streambuf_iterator Elapsed: 3875 ms
streambuf_iterator Transfer rate: 107.261 MB/s

VC++ 2015 (cl -O2):
fread Elapsed: 223.228 ms
fread Transfer rate: 1861.93 MB/s
streambuf_iterator Elapsed: 7184.81 ms
streambuf_iterator Transfer rate: 57.8492 MB/s

#include <string>
#include <cstdio>
#include <cerrno>
#include <chrono>
#include <iostream>
#include <algorithm>
#include <iterator>
#include <fstream>
#include <streambuf>

// code from
// http://insanecoding.blogspot.de/2011/11/how-to-read-in-file-in-c.html

std::string get_file_contents_fread(const char *filename) {
std::FILE *fp = std::fopen(filename, "rb");
if(fp) {
std::string contents;
std::fseek(fp, 0, SEEK_END);
contents.resize(std::ftell(fp));
std::rewind(fp);
std::fread(&contents[0], 1, contents.size(), fp);
std::fclose(fp);
return(contents);
}
throw(errno);
}

std::string get_file_contents_stream(const char *filename) {
std::ifstream in(filename, std::ios::in | std::ios::binary);
if(in) {
std::string contents;
in.seekg(0, std::ios::end);
contents.reserve(in.tellg());
in.seekg(0, std::ios::beg);
std::copy((std::istreambuf_iterator<char>(in)),
std::istreambuf_iterator<char>(),
std::back_inserter(contents));
in.close();
return(contents);
}
throw(errno);
}

auto time_fn = [](auto fn, const char* desc, const char* filename) {
typedef std::chrono::high_resolution_clock clock;
typedef std::chrono::duration<float, std::milli> duration;
clock::time_point start = clock::now();
auto res = fn(filename);
duration elapsed = clock::now() - start;
std::cout << desc << " Elapsed: " << elapsed.count() << " ms\n"
<< desc << " Transfer rate: "
<< res.size() / (1000.0* elapsed.count())
<< " MB/s\n";
};

int main() {
auto filename = "testfile.txt";
time_fn(get_file_contents_fread, "fread", filename);
time_fn(get_file_contents_stream, "streambuf_iterator", filename);
return 0;
}

Daniel

unread,

Dec 24, 2015, 10:49:44 AM12/24/15

to

On Thursday, December 24, 2015 at 8:11:23 AM UTC-5, mark wrote:
> On 2015-12-23 20:52, Ian Collins wrote:

> I found using streambuf iterators to be an efficient way to
> > read through a file.
>
> Efficient??? What??? Streams suck.
>

Indeed :-) C++ streams appear to be an argument in favor of "premature optimization", as they appear to be intrinsically slow by design, and there appears to be nothing much implementations can do about it. The picture is even worse comparing std::istringstream and strtod, or std::ostringstream and sprintf. There is certainly no concept of "you don't pay for what you don't use" with streams.

Daniel

Ian Collins

unread,

Dec 24, 2015, 3:01:04 PM12/24/15

to

mark wrote:
> On 2015-12-23 20:52, Ian Collins wrote:
>> Alf P. Steinbach wrote:
>>>
>>> Well it's a bit inefficient.
>>>
>>> I would read one line at a time using "getline", and remove the x-es in
>>> each line before outputting it.
>>
>> Without measuring, I wouldn't make that claim. During some earlier
>> testing, I found using streambuf iterators to be an efficient way to
>> read through a file.
>
> Efficient??? What??? Streams suck.

Modify your code to solve the original problem rather than doing a file
copy. Then run in on a file bigger than half of your RAM.

--
Ian Collins

Louis Krupp

unread,

Dec 24, 2015, 5:10:34 PM12/24/15

to

On 24 Dec 2015 20:13:22 GMT, r...@zedat.fu-berlin.de (Stefan Ram)
wrote:

>Louis Krupp <lkr...@nospam.pssw.com.invalid> writes:
>>>if( ch != 'x' )*q = ch;
>>>++q; ++p;

>>More conventional formatting might have saved you from what looks like
>>a bug: You're incrementing q whether or not you've assigned anything
>>to *q.
>

> It is a bug with regard to the intended source text
> in terms of readability. You are right. What I did
> intend to say, was indeed:
>
>if( ch != 'x' )*q++ = ch;
>++p;
>
> . It is not a bug with regard to the actual program
> execution, because for an »ostreambuf_iterator« object both
> »*« and »++« are actually NOPs (pure identity functions),
> and the code is equivalent to
>
>if( ch != 'x' )q = ch;
>++p;

I had no idea it worked like that, but it makes sense.

Thank you.

Louis

Paavo Helde

unread,

Dec 24, 2015, 6:19:21 PM12/24/15

to

Ian Collins <ian-...@hotmail.com> wrote in
news:de3176...@mid.individual.net:

You mean, a file larger than 6 GB? Don't have so many text files like
that, sorry.

BTW, I see your moves with getline and fread, and raise you mmap(). No
numbers and non-standard, but if you care about file I/O performance
that's the way to go. Doesn't consume RAM either.

Cheers
Paavo

Ian Collins

unread,

Dec 25, 2015, 3:05:01 AM12/25/15

to

Paavo Helde wrote:
> Ian Collins <ian-...@hotmail.com> wrote in
> news:de3176...@mid.individual.net:
>
>> mark wrote:
>>> On 2015-12-23 20:52, Ian Collins wrote:
>>>> Alf P. Steinbach wrote:
>>>>>
>>>>> Well it's a bit inefficient.
>>>>>
>>>>> I would read one line at a time using "getline", and remove the
>>>>> x-es in each line before outputting it.
>>>>
>>>> Without measuring, I wouldn't make that claim. During some earlier
>>>> testing, I found using streambuf iterators to be an efficient way to
>>>> read through a file.
>>>
>>> Efficient??? What??? Streams suck.
>>
>> Modify your code to solve the original problem rather than doing a
>> file copy. Then run in on a file bigger than half of your RAM.
>
> You mean, a file larger than 6 GB? Don't have so many text files like
> that, sorry.

Try something like a VM image :)

> BTW, I see your moves with getline and fread, and raise you mmap(). No
> numbers and non-standard, but if you care about file I/O performance
> that's the way to go. Doesn't consume RAM either.

Yes, and they can play nicely with iostreams if you use a simple
streambuf class with a mapped file.

I just ran a quick test and at least on my system with g++, as I
expected, there's no significant difference between copying a file
rejecting 'x' by walking a mapped file or by using an
istreambuf_iterator on a streambuf created from the mapping.

--
Ian Collins

Jorgen Grahn

unread,

Dec 29, 2015, 10:22:20 AM12/29/15

to

On Wed, 2015-12-23, Stefan Ram wrote:
> I am thinking about writing a program that reads from a text file.
>
> I wonder what the best practices are to do this in C++.
>
> For example, I could use »getchar« and assume that the program
> is used from the command line with shell redirection of stdin
> via »<« or I could accept the path of the file via »argc« and
> then construct an »ifstream« object. Which possibility is better?

That's not a C++ question; that's a question of user interface,
regardless of language (as a user I don't care what language the
program is written in).

For Unix programs, I prefer to do what cat(1), Perl's "while(<>)"
construct and many others do. That is, if it makes sense for your
program to take multiple files and see them more or less as one long
text stream.

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .

Jorgen Grahn

unread,

Dec 31, 2015, 1:33:18 PM12/31/15

to

On Wed, 2015-12-23, Stefan Ram wrote:

> Ian Collins <ian-...@hotmail.com> writes:
>>The standard algorithms are a good place to start.
>

> I assume you mean the source code of an implementation
> of the standard algorithms?
>

> Is there a »canonical« (gcc or llvm) download location
> where one can get an archive file with the source code for
> the standard library?

I don't know what "canonical" means in this context, but since it's
free software, of course the source code is available. See

https://gcc.gnu.org/

>>>I think user code should never call library functions
>>>directly, but always wrap them.

Frankly, that strikes me as a terrible idea. The thing about library
functions is that they are well known and documented. If you wrap
them, I must assume it's to make them do something else.

>>>For example, don't call
>>>»getchar«, but »mygetchar«, and implement »mygetchar«
>>>to forward to »getchar« (we got perfect forwarding!).
>>>Then, when on decides later to want to read from a string
>>>than from a file, it is easy to just change »mygetchar«.

>>Why? If you need to do that there must be something unsuitable with the
>>library function.
>

> No, there does not have to be something wrong.
>
> I am thinking about cases like, for example:
>
> ::std::cerr << "Warning: 'x' is deprecated.\n"s;
>
> Here the standard operator »<<« is called directly.
>
> Now, I write a thin wrapper:
>
> ::std::ostream & warning
> ( ::std::ostream & err, ::std::string const text )
> { return err << text; }
>
> to be used like this:
>
> warning( ::std::cerr, "Warning: 'x' is deprecated.\n"s );
>
> . Then, if I later want to add a bell sound ("\a")
> to all my warning messages, it will be easy to just
> change my custom function »warning«. It would be more
> difficult to look up every usage of »<<« and decide
> whether it's a warning and then conditionally add a bell.

For that purpose (to be able to just go through the warnings) I think
I'd prefer to do something like

std::cerr << Warning << "'x' is deprecated.\n";

where 'Warning' is an object which prints as the right text.