Counting stuff

4 views
Skip to first unread message

bunbun

unread,
Oct 18, 2006, 5:26:43 PM10/18/06
to cpp-cookbook
I was very interested to see a cookbook for c++ because I really like
the perl and python cookbooks and c++ is my native tongue. Flipping
randomly through it, the first example I ran across is your code to
count lines, words etc. Is this really the best way to count things in
a file?
I would have thought that using the streambuf rather than a stream
would be much more efficient:

A cursory web search

http://groups.google.co.uk/group/alt.comp.lang.learn.c-c++/browse_thread/thread/2b7cdb9383357f8a/2f119d5473d5f66b?lnk=st&q=count+line+file+c%2B%2B+streambuf+fast&rnum=1&hl=en#2f119d5473d5f66b

finds this from Dietmar Kuehl
std::ifstream in("large.file");
std::count(std::istreambuf_iterator<char>(in),
std::istreambuf_iterator<char>(), '\n');

James Kanze is right that memory mapped files are much faster (3-4
times at least, on my system). Happily, there are now a system
independent io library from boost.org which provides this utility.

Either way the code is considerably faster than counting using get().

Am I being pedantic?
Is the point of the c++ cookbook to present easy to understand code
rather than practical/efficient code.
Thanks

Llew Goodstadt


My version of the code would be:

#include <boost/iostreams/code_converter.hpp>
#include <boost/iostreams/device/mapped_file.hpp>
#include <algorithm>

#include <iostream>
#include <fstream>
#include <stdexcept>
#include <boost/filesystem/path.hpp>
#include <boost/filesystem/convenience.hpp>
#include "progress_indicator.h"
#include "count_lines_in_file.h"

namespace io = boost::iostreams;

std::streamsize count_lines_in_file(
std::string& file_name,
char eol,
const t_progress_indicator& seq_progress)
{
try
{
io::mapped_file_source mapfile;
mapfile.open(file_name);
return std::count(mapfile.data(), mapfile.data() +
mapfile.size(), eol);
} catch ( std::ios::failure& fail )
{
// probably file too large. or does not map for whatever reason
}

// decay to using safer but slow code
return count_lines_in_file_std(
file_name,
eol,
seq_progress);
}

std::streamsize count_lines_in_file_std(
std::string& file_name,
char eol,
const t_progress_indicator& seq_progress)
{
std::ifstream in(file_name.c_str());
if (!in)
throw std::runtime_error("Could not open " + file_name);
return std::count(std::istreambuf_iterator<char>(in),
std::istreambuf_iterator<char>(), eol);
}

bunbun

unread,
Oct 19, 2006, 9:41:03 AM10/19/06
to cpp-cookbook
Sorry:
The last parameter in the function was for debugging and can be ignore.
The function signature should be
std::streamsize count_lines_in_file(const std::string& file_name, char
eol);

Reply all
Reply to author
Forward
0 new messages