Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

C++ is Slow?

72 views
Skip to first unread message

nw

unread,
Feb 4, 2008, 6:19:17 PM2/4/08
to
Hi all,

I'm constantly confronted with the following two techniques, which I
believe often produce less readable code, but I am told are faster
therefore better. Can anyone help me out with counter examples and
arguments?

1. The STL is slow.

More specifically vector. The argument goes like this:

"Multidimensional arrays should be allocated as large contiguous
blocks. This is so that when you are accessing the array and reach the
end of a row, the next row will already be in the cache. You also
don't need to spend time navigating pointers when accessing the array.
So a 2 dimensional array of size 100x100 should be created like this:

const int xdim=100;
const int ydim=100;

int *myarray = malloc(xdim*ydim*sizeof(int));

and accessed like this:

myarray[xdim*ypos+xpos] = avalue;

Is this argument reasonable? (Sounds reasonable to me, though the
small tests I've performed don't usually show any significant
difference).

To me this syntax looks horrible, am I wrong? Is vector the wrong
container to use? (My usual solution would be a vector<vector<int> >).
Would using a valarray help?

2. iostream is slow.

I've encountered this is work recently. I'd not considered it before,
I like the syntax and don't do so much IO generally... I'm just now
starting to process terabytes of data, so it'll become an issue. Is
iostream slow? specifically I encountered the following example
googling around. The stdio version runs in around 1second, the
iostream version takes 8seconds. Is this just down to a poor iostream
implementation? (gcc 4 on OS X). Or are there reasons why iostream is
fundamentally slower for certain operations? Are there things I should
be keeping in mind to speed up io?

// stdio version
#include <cstdio>
using namespace std;
const int NULA = 0;
int main (void) {
for( int i = 0; i < 100000000; ++i )
printf( "a" );
return NULA;
}

//cout version
#include <iostream>
using namespace std;
const int NULA = 0;
int main (void) {
std::ios_base::sync_with_stdio(false);

for( int i = 0; i < 100000000; ++i )
cout << "a" ;
return NULA;
}

Victor Bazarov

unread,
Feb 4, 2008, 6:24:46 PM2/4/08
to
nw wrote:
> I'm constantly confronted with the following two techniques, which I
> believe often produce less readable code, but I am told are faster
> therefore better. Can anyone help me out with counter examples and
> arguments?
>
> 1. The STL is slow.
>
> More specifically vector. The argument goes like this:
>
> "Multidimensional arrays should be allocated as large contiguous
> blocks. This is so that when you are accessing the array and reach the
> end of a row, the next row will already be in the cache. You also
> don't need to spend time navigating pointers when accessing the array.
> So a 2 dimensional array of size 100x100 should be created like this:
>
> const int xdim=100;
> const int ydim=100;
>
> int *myarray = malloc(xdim*ydim*sizeof(int));
>
> and accessed like this:
>
> myarray[xdim*ypos+xpos] = avalue;
>
> Is this argument reasonable? (Sounds reasonable to me, though the
> small tests I've performed don't usually show any significant
> difference).

If the tests you've performed don't show any significant difference,
then the argument is not reasonable. The only reasonable argument
is the results of the test and your standards for significance.

> To me this syntax looks horrible, am I wrong? Is vector the wrong
> container to use? (My usual solution would be a vector<vector<int> >).
> Would using a valarray help?

You can wrap your N-dimensional dynamic memory in a class and
add an overloaded operator () with N arguments, which will make
the syntax more acceptable.

> 2. iostream is slow.
>
> I've encountered this is work recently. I'd not considered it before,
> I like the syntax and don't do so much IO generally... I'm just now
> starting to process terabytes of data, so it'll become an issue. Is
> iostream slow? specifically I encountered the following example
> googling around. The stdio version runs in around 1second, the
> iostream version takes 8seconds. Is this just down to a poor iostream
> implementation? (gcc 4 on OS X).

Most likely.

> Or are there reasons why iostream is
> fundamentally slower for certain operations?

There is no reason, AFAICT.

> Are there things I should
> be keeping in mind to speed up io?

The fewer conversions the better.

>
> // stdio version
> #include <cstdio>
> using namespace std;
> const int NULA = 0;
> int main (void) {
> for( int i = 0; i < 100000000; ++i )
> printf( "a" );
> return NULA;
> }
>
> //cout version
> #include <iostream>
> using namespace std;
> const int NULA = 0;
> int main (void) {
> std::ios_base::sync_with_stdio(false);
>
> for( int i = 0; i < 100000000; ++i )
> cout << "a" ;
> return NULA;
> }

V
--
Please remove capital 'A's when replying by e-mail
I do not respond to top-posted replies, please don't ask


Alf P. Steinbach

unread,
Feb 4, 2008, 6:32:38 PM2/4/08
to
* nw:

> Hi all,
>
> I'm constantly confronted with the following two techniques, which I
> believe often produce less readable code, but I am told are faster
> therefore better. Can anyone help me out with counter examples and
> arguments?
>
> 1. The STL is slow.
>
> More specifically vector. The argument goes like this:
>
> "Multidimensional arrays should be allocated as large contiguous
> blocks. This is so that when you are accessing the array and reach the
> end of a row, the next row will already be in the cache. You also
> don't need to spend time navigating pointers when accessing the array.
> So a 2 dimensional array of size 100x100 should be created like this:
>
> const int xdim=100;
> const int ydim=100;
>
> int *myarray = malloc(xdim*ydim*sizeof(int));
>
> and accessed like this:
>
> myarray[xdim*ypos+xpos] = avalue;
>
> Is this argument reasonable? (Sounds reasonable to me, though the
> small tests I've performed don't usually show any significant
> difference).
>
> To me this syntax looks horrible, am I wrong?

The syntax is horrible, and the code is brittle.


> Is vector the wrong
> container to use?

No, not necessarily. Doing the above with vector gives you about the
same performance but less unsafe (no manual memory management) and more
convenient. Using a library matrix class even better.


> (My usual solution would be a vector<vector<int> >).
> Would using a valarray help?

I don't think valarray would help, I think on the contrary. Also
consider that that's essentially a not-quite-complete not-quite-kosher
part of the standard library. Which will never be completed.

> 2. iostream is slow.
>
> I've encountered this is work recently. I'd not considered it before,
> I like the syntax and don't do so much IO generally... I'm just now
> starting to process terabytes of data, so it'll become an issue. Is
> iostream slow?

Depends very much on the implementation. As a general rule, naive code
using C library i/o will be faster than equally naive code using
iostreams, or at least it was that way some years ago. However, you can
probably speed up things considerably for iostreams by using less naive
code, essentially buying speed by paying in complexity and code size.


> specifically I encountered the following example
> googling around. The stdio version runs in around 1second, the
> iostream version takes 8seconds. Is this just down to a poor iostream
> implementation? (gcc 4 on OS X). Or are there reasons why iostream is
> fundamentally slower for certain operations? Are there things I should
> be keeping in mind to speed up io?
>
> // stdio version
> #include <cstdio>
> using namespace std;
> const int NULA = 0;
> int main (void) {
> for( int i = 0; i < 100000000; ++i )
> printf( "a" );
> return NULA;
> }
>
> //cout version
> #include <iostream>
> using namespace std;
> const int NULA = 0;
> int main (void) {
> std::ios_base::sync_with_stdio(false);
>
> for( int i = 0; i < 100000000; ++i )
> cout << "a" ;
> return NULA;
> }

Guideline: reserve all uppercase names for macros.

The C++ macro denoting success return value for main is EXIT_SUCCESS.


Cheers, & hth.,

- Alf

--
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?

Ioannis Vranos

unread,
Feb 4, 2008, 6:48:09 PM2/4/08
to
nw wrote:
> Hi all,
>
> I'm constantly confronted with the following two techniques, which I
> believe often produce less readable code, but I am told are faster
> therefore better. Can anyone help me out with counter examples and
> arguments?
>
> 1. The STL is slow.
>
> More specifically vector. The argument goes like this:
>
> "Multidimensional arrays should be allocated as large contiguous
> blocks. This is so that when you are accessing the array and reach the
> end of a row, the next row will already be in the cache. You also
> don't need to spend time navigating pointers when accessing the array.
> So a 2 dimensional array of size 100x100 should be created like this:
>
> const int xdim=100;
> const int ydim=100;
>
> int *myarray = malloc(xdim*ydim*sizeof(int));
>
> and accessed like this:
>
> myarray[xdim*ypos+xpos] = avalue;
>
> Is this argument reasonable? (Sounds reasonable to me, though the
> small tests I've performed don't usually show any significant
> difference).
>
> To me this syntax looks horrible, am I wrong? Is vector the wrong
> container to use? (My usual solution would be a vector<vector<int> >).
> Would using a valarray help?


In normal cases you should use vector as you mentioned.

I suppose it is an implementation issue. The two codes should undertake
the same time-cost. If you remove
"std::ios_base::sync_with_stdio(false);" the output is slower?


Also check the compiler optimisation switches, they can enhance the
run-time execution.

Daniel T.

unread,
Feb 4, 2008, 7:49:10 PM2/4/08
to
nw <n...@soton.ac.uk> wrote:

> I'm constantly confronted with the following two techniques, which I
> believe often produce less readable code, but I am told are faster
> therefore better. Can anyone help me out with counter examples and
> arguments?
>
> 1. The STL is slow.
>
> More specifically vector. The argument goes like this:

Vector is no slower than manual dynamic array allocation. As for the
other containers, I once challenged a office mate (a C programmer who
had extensive knowledge of assembler) to write a double link list class
that was faster than the std::list implementation that came with the
compiler. He claimed he could do it because he was able to optimize his
list to work specifically with the data we were storing. He insisted
that the std::list couldn't possibly be as fast because it was "too
general". Despite his best efforts, std::list was a full 5% faster than
his code... using his own test suite!

> "Multidimensional arrays should be allocated as large contiguous
> blocks. This is so that when you are accessing the array and reach the
> end of a row, the next row will already be in the cache. You also
> don't need to spend time navigating pointers when accessing the array.
> So a 2 dimensional array of size 100x100 should be created like this:
>
> const int xdim=100;
> const int ydim=100;
>
> int *myarray = malloc(xdim*ydim*sizeof(int));
>
> and accessed like this:
>
> myarray[xdim*ypos+xpos] = avalue;
>
> Is this argument reasonable? (Sounds reasonable to me, though the
> small tests I've performed don't usually show any significant
> difference).
>
> To me this syntax looks horrible, am I wrong? Is vector the wrong
> container to use? (My usual solution would be a vector<vector<int> >).

I wouldn't use a vector<vector<int> > unless I needed a ragged array.
Even then, I would likely burry it in a class so I can change the
container without having to edit the entire code base.

Look at the Matrix class in the FAQ. Something like this:

template < typename T >
class Matrix {
public:
   Matrix(unsigned rows, unsigned cols) :
cols_( cols ),
data_( rows * cols )
{ }

   T& operator() (unsigned row, unsigned col)
{
// might want to consider error checking.
    return data_[cols_*row + col];
}
const T& operator() (unsigned row, unsigned col) const;
{
// might want to consider error checking.
    return data_[cols_*row + col];
}
// other methods to taste
private:
   unsigned cols_;
   vector<T> data_;
};

It's very simple to use:

Matrix<int> myarray( xdim, ydim);

accessed like:

myarray( xpos, ypos ) = avalue;


 
> 2. iostream is slow.
>
> I've encountered this is work recently. I'd not considered it before,
> I like the syntax and don't do so much IO generally... I'm just now
> starting to process terabytes of data, so it'll become an issue. Is
> iostream slow?

This I don't know about. I don't deal with the iostream library much.

Ioannis Vranos

unread,
Feb 4, 2008, 8:27:12 PM2/4/08
to
nw wrote:
>
> More specifically vector. The argument goes like this:>
> "Multidimensional arrays should be allocated as large contiguous
> blocks. This is so that when you are accessing the array and reach the
> end of a row, the next row will already be in the cache. You also
> don't need to spend time navigating pointers when accessing the array.
> So a 2 dimensional array of size 100x100 should be created like this:
>
> const int xdim=100;
> const int ydim=100;
>
> int *myarray = malloc(xdim*ydim*sizeof(int));
>
> and accessed like this:
>
> myarray[xdim*ypos+xpos] = avalue;
>
> Is this argument reasonable? (Sounds reasonable to me, though the
> small tests I've performed don't usually show any significant
> difference).
>
> To me this syntax looks horrible, am I wrong? Is vector the wrong
> container to use? (My usual solution would be a vector<vector<int> >).
> Would using a valarray help?


I think your code is incorrect. Your approach corrected:


#include <cstdlib>

int main()
{
using namespace std;

const int XDIM= 100;

const int YDIM= 200;


int (*my_array)[YDIM]= static_cast<int (*)[200]> ( malloc(XDIM* YDIM
* **my_array) );

if(my_array== 0)
return EXIT_FAILURE;

for(size_t i= 0; i< XDIM; ++i)
for(size_t j= 0; j< YDIM; ++j)
my_array[i][j]= i+j;


// ...

free(my_array);

// ...
}


The equivalent C++ style:


include <cstdlib>

int main()
{
using namespace std;

const int XDIM= 100;

const int YDIM= 200;


int (*my_array)[YDIM]= new int[XDIM][YDIM];

for(size_t i= 0; i< XDIM; ++i)
for(size_t j= 0; j< YDIM; ++j)
my_array[i][j]= i+j;
}

The proper C++ approach:

#include <cstdlib>
#include <vector>

int main()
{
using namespace std;

const int XDIM= 100;

const int YDIM= 200;

vector<vector<int> > my_array(XDIM, vector<int>(YDIM));



for(vector<vector<int> >::size_type i= 0; i< my_array.size(); ++i)
for(vector<int>::size_type j= 0; j< my_array[i].size(); ++j)
my_array[i][j]= i+j;


// ...

// No need to clean up your memory or any other resource
// - RAII (Resource Acquisition Is Initialisation)

}

Daniel T.

unread,
Feb 4, 2008, 9:09:05 PM2/4/08
to
Ioannis Vranos <ivr...@nospam.no.spamfreemail.gr> wrote:

> The proper C++ approach:

> vector<vector<int> > my_array(XDIM, vector<int>(YDIM));

That is not necessarily the proper approach. The above creates YDIM + 1
separate blocks of code which may, or may not be a good idea.

terminator

unread,
Feb 5, 2008, 12:27:08 AM2/5/08
to
On Feb 5, 4:27 am, Ioannis Vranos <ivra...@nospam.no.spamfreemail.gr>
wrote:
> }- Hide quoted text -
>
> - Show quoted text -- Hide quoted text -
>
> - Show quoted text -

I would rather:

class myclass{
vector <int> data;
public:
myclass(Xdim,Ydim):
data(Xdim*Ydim)//*int data[Ydim*Xdim];*/
{};
...
};

regards,
FM.

Jim Langston

unread,
Feb 5, 2008, 2:52:13 AM2/5/08
to
nw wrote:
> Hi all,
>
> I'm constantly confronted with the following two techniques, which I
> believe often produce less readable code, but I am told are faster
> therefore better. Can anyone help me out with counter examples and
> arguments?
>
> 1. The STL is slow.
>
> More specifically vector. The argument goes like this:
>
> "Multidimensional arrays should be allocated as large contiguous
> blocks. This is so that when you are accessing the array and reach the
> end of a row, the next row will already be in the cache. You also
> don't need to spend time navigating pointers when accessing the array.
> So a 2 dimensional array of size 100x100 should be created like this:
>
> const int xdim=100;
> const int ydim=100;
>
> int *myarray = malloc(xdim*ydim*sizeof(int));
>
> and accessed like this:
>
> myarray[xdim*ypos+xpos] = avalue;
>
> Is this argument reasonable? (Sounds reasonable to me, though the
> small tests I've performed don't usually show any significant
> difference).
>
> To me this syntax looks horrible, am I wrong? Is vector the wrong
> container to use? (My usual solution would be a vector<vector<int> >).
> Would using a valarray help?

std::vector is not necessarily slower than a manual way of doing the same
thing. Consider your manual malloc, for example. You could do the same
thing with a std::vector:

std::vector<int> myarray( xdim * ydim );
myarray[xdim*ypos + xpos] = avalue;

A std::vector is simply a dynamic array and poses all the limitations of a
dynamic array. A
std::vector<std::vector<int> >
isn't going to be necessarily faster, or slower, than a manual way of doing
it, such as an array of pointers which are allocated for each row of dasta.

If you find that a std::vector<std::vector<int> > is a bottle neck for you,
you could wrap a std::vector<int> in a class and get the speed benifits of a
continguous allocated memory block without the headache of doing the math
each time. Perhaps create an at( int row, int col) that returns a reference
to the correct element.

stl containers are generic enough to be used easily, but are generic enough
that sometimes you might want to optimize your code to be fsater if you need
to.

I had a programmer friend tell me that stl was horribly slow, I had him send
me his code and I found he was using .resize() needlessly over and over. I
restructured his code and the final code came about as fast as what he was
doing without stl.

As far as cout .vs. printf, they are both output routines. I've never found
much need to optimize user output as that is usually the bottle neck anyway,
showing informatoin to the user and waiting for user response. In your
example you are stating 1 second for 100 million iterations versus 8 seconds
for 100 million iterations. Meaning each iteration is taking 8/100
millionths of a second. That is, what, lets see, 1/100th is a milisecond,
1/1,000 is a nano second, 1/1,000,000 is a .. pico second? 8/100 of a pico
second? For something that is so fast and not used that often, do we care?
You might want to look up premature optimization.

I don't think it's pico anyway. Not sure.


--
Jim Langston
tazm...@rocketmail.com


Alex Vinokur

unread,
Feb 5, 2008, 4:01:59 AM2/5/08
to

Ioannis Vranos

unread,
Feb 5, 2008, 6:31:41 AM2/5/08
to
Code correction:

==> int (*my_array)[YDIM]= static_cast<int (*)[YDIM]> ( malloc(XDIM*
YDIM* sizeof(**my_array)) );

nw

unread,
Feb 5, 2008, 7:48:02 AM2/5/08
to
> That is, what, lets see, 1/100th is a milisecond,
> 1/1,000 is a nano second, 1/1,000,000 is a .. pico second? 8/100 of a pico
> second? For something that is so fast and not used that often, do we care?
> You might want to look up premature optimization.

That's just an example, in reality I'm looking at 19.2Gb (compressed
size) of text files, which I'll have to parse on a regular basis.

James Kanze

unread,
Feb 5, 2008, 10:51:01 AM2/5/08
to
On Feb 5, 12:19 am, nw <n...@soton.ac.uk> wrote:

> I'm constantly confronted with the following two techniques, which I
> believe often produce less readable code, but I am told are faster
> therefore better. Can anyone help me out with counter examples and
> arguments?

> 1. The STL is slow.

> More specifically vector. The argument goes like this:

> "Multidimensional arrays should be allocated as large contiguous
> blocks. This is so that when you are accessing the array and reach the
> end of a row, the next row will already be in the cache. You also
> don't need to spend time navigating pointers when accessing the array.
> So a 2 dimensional array of size 100x100 should be created like this:

> const int xdim=100;
> const int ydim=100;

> int *myarray = malloc(xdim*ydim*sizeof(int));

> and accessed like this:

> myarray[xdim*ypos+xpos] = avalue;

> Is this argument reasonable? (Sounds reasonable to me, though the
> small tests I've performed don't usually show any significant
> difference).

It depends.

First, of course, this argument has nothing to do with
std::vector. Whether you use std::vector< int > or malloc in
this case probably won't change anything in time, and will
ensure that the memory is correctly freed in case of an
exception.

Second, whether it is faster to multiply, or to chase pointers,
depends very heavily on the machine architecture. When I did a
test like this on the original Intel 8086, chasing pointers won
hands down.

Third, whatever you do, you should wrap it in a class, so you
can change it later, if it turns out that the implementation
isn't optimal, and is creating a bottleneck.

> To me this syntax looks horrible, am I wrong? Is vector the
> wrong container to use? (My usual solution would be a
> vector<vector<int> >). Would using a valarray help?

Wrap it in a class, and don't worry about it until the profiler
says you have to. At that point, try the different solutions on
the actual target hardware.

> 2. iostream is slow.

> I've encountered this is work recently. I'd not considered it
> before, I like the syntax and don't do so much IO generally...
> I'm just now starting to process terabytes of data, so it'll
> become an issue. Is iostream slow?

Again, it depends on the implementation. At least one person,
in the past, created an implementation which was faster than
stdio. For the most part, however, current implementations are
"fast enough", and implementors haven't bothered improving them,
even when faster implementations exist. Regardless of what you
do, if you're processing terabytes, getting the terabytes
physically into and out of memory will dominate runtimes.

> specifically I encountered the following example googling
> around. The stdio version runs in around 1second, the iostream
> version takes 8seconds. Is this just down to a poor iostream
> implementation? (gcc 4 on OS X). Or are there reasons why
> iostream is fundamentally slower for certain operations? Are
> there things I should be keeping in mind to speed up io?

The standards committee added support for code translation to
filebuf, which unless the implementation is very, very careful,
can slow things down significantly. In the past, Dietmar Kuehl
worked out some meta-programming technique which avoided the
cost as long as the translation was the identity function. I
don't think any implementation uses it, however.

Note that under g++ 2.95.2, which used the classical iostream,
iostream was actually faster than stdio. On the whole, though,
iostream implementations aren't as mature as those of stdio
(which, after all, has been around a long, long time). And I
don't find anywhere near that great of difference on a Sun
Sparc.

--
James Kanze (GABI Software) email:james...@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Erik Wikström

unread,
Feb 5, 2008, 11:34:12 AM2/5/08
to

Depending on what kind of processing/parsing you are going to do (and
what kind of hardware you are running) it is very likely that disk I/O
or decompression will be the bottleneck.

--
Erik Wikström

Puppet_Sock

unread,
Feb 5, 2008, 12:33:03 PM2/5/08
to

I'm going to (I hope) echo Erik's thoughts here.

Whenever you get to questions like "which is better" or "which
is faster" it is important to have a specification of what
really is better. "I like it that way" isn't a spec unless
the person saying it will give you money to do it his way.

And you need to measure it using several typical cases.
Presuming it is execution speed, you need a stopwatch.
And you need to compare various methods and see which
is faster and if it is significant. And you need to make
some tests to see what part of the app is using the time
and what parts are not making much of a difference.

Many stories are appropriate, but long winded and boring.
Just insert long shaggy-dog story of a "bug" that could
not be found when execution speed was perceived as a
problem, and a long attack on optimizing the code followed.
Only to end when the app was found to spend more than
90 percent of its time doing something other than the
part of the task being optimized. Such as Erik suggests,
where disk I/O might be using most of the time and there
isn't all that much you can do about it.
Socks

Juha Nieminen

unread,
Feb 5, 2008, 3:58:31 PM2/5/08
to

While that might sound plausible in theory, unfortunately my own
real-life experience tells otherwise: When reading&parsing input and
printing formatted output (usually to a file), switching from C++
streams to C streams usually gives a considerable speedup with all the
compilers I have tried with. We are talking about at least twice the
speed, if not even more, which is not a small difference.

If you are reading&parsing hundreds of megabytes of input and
outputting also hundreds of megabytes, the difference could well be very
considerable (eg. 20 seconds instead of 1 minute).

Juha Nieminen

unread,
Feb 5, 2008, 4:07:41 PM2/5/08
to
nw wrote:
> Is iostream slow?

My own practical experience has shown that for example if you are
reading&parsing tons of formatted data (in ascii format) and/or
outputting tons of formatted data (in ascii), switching from C++ streams
to C streams can produce a very considerable speedup (it can be at least
twice as fast) with all compilers I have tried. I have been in several
such projects where parsing of large ascii input files were necessary,
and in each case, with different compilers, switching to C streams gave
a very large speedup.

I haven't tested what happens if you simply read/write a large block
of binary data with fread()/fwrite() or the iostream equivalents, but I
assume that in this case the difference should be minimal, if there is any.

> Or are there reasons why iostream is
> fundamentally slower for certain operations?

While in theory iostream could be even faster than C streams for
certain operations (eg. printf() vs. std::cout) because in the latter
typing can be performed at compile time while in the former it's done at
runtime (by parsing the format string), in practice most iostream
implementations are considerably slower than the C equivalents. One
reason for this might be that most iostream operations are performed by
virtual functions, which probably cannot be inlined, or other such
reasons. Another reason may be that compiler makers simply haven't
optimized iostream as well as the C stream functions have been.

James Kanze

unread,
Feb 6, 2008, 5:11:00 AM2/6/08
to

A quick test on the implementations I happen to have handy,
using the proposed benchmark, showed a little less that twice,
but not much. However, I'm not sure that even that means much:
if the disk were mounted on a slow network, the difference would
doubtlessly be less. (My experience is that SMB is almost an
order of magnitude slower than NFS, so if you're accessing a
remote disk under Windows, you really can forget about anything
but the I/O times.) And floating point formatting and parsing
can be very expensive in themselves---if iostream manages to
somehow do it better (e.g. because it specializes for float,
rather than parsing a double, then converting), then that could
make up of inefficiencies elsewhere. Alternatively, most
iostream implementations are less mature, and so there is a
distinct possibility that the floating point conversions are
less optimized, and the difference greater.

In sum, while it wouldn't surprise me if printf were faster in a
given implementation, I'd measure exactly what I needed before
making any decisions. Also, if speed is a criteria, I'd
consider using lower level I/O. Back in my pre-C++ days, I once
speeded a program up by over 60% by just using Unix level system
I/O rather than stdio.h. And mmap or its equivalent under
Windows can make even more of a difference.

In the end, you'll have to experiment. If iostream is fast
enough, there's no point in trying anything else. If it's not,
using stdio.h probably worth a try. And if even that's not fast
enough, you may have to go even lower. (In my experience, every
time iostream has been too slow, switching to stdio.h hasn't
been sufficient either, and we've had to go even lower. But
that doesn't mean that your experience will be identical.)

Jim Langston

unread,
Feb 6, 2008, 5:57:49 AM2/6/08
to

I don't know how you got yours to run in 1 second and 8 second. On my
platform it was taking too long and I reduced the iterations. Here is a
test program I did with results:

#include <ctime>
#include <cstdio>
#include <iostream>

const int interations = 1000000; // 100000000
// stdio version
int mainC (void) {

std::ios_base::sync_with_stdio(false);

for( int i = 0; i < interations; ++i )
printf( "a" );
return 0;
}

//cout version
int mainCPP ()
{
std::ios_base::sync_with_stdio(false);

for( int i = 0; i < interations; ++i )
std::cout << "a" ;
return 0;
}

int main()
{
clock_t start;
clock_t stop;

start = clock();
mainC();
stop = clock();
clock_t ctime = stop - start;

start = clock();
mainCPP();
stop = clock();
clock_t cpptime = stop - start;

std::cout << "C: " << ctime << " C++: " << cpptime << "\n";

std::cout << static_cast<double>( cpptime ) / ctime << "\n";

}

after a bunch of aaaa's...

C: 20331 C++: 23418
1.15184

This is showing the stl to be about 15% slower than the C code.

Microsoft Visual C++ .net 2003
Windows XP Service Pack 2

Unfortunately with my compiler my optimizations are disabled so I don't know
how it would be optimized. But it is not 8x difference.

I would be curious of the output of different compilers. Note that clock()
on microsoft platforms shows total time, not just processing time.

--
Jim Langston
tazm...@rocketmail.com


Mirek Fidler

unread,
Feb 6, 2008, 7:49:09 AM2/6/08
to
> I don't know how you got yours to run in 1 second and 8 second. On my
> platform it was taking too long and I reduced the iterations. Here is a
> after a bunch of aaaa's...
>
> C: 20331 C++: 23418
> 1.15184
>
> This is showing the stl to be about 15% slower than the C code.

Actually, this only proves that this particular library has both C and
C++ implementations inefficient...

Mirek

nw

unread,
Feb 6, 2008, 9:28:14 AM2/6/08
to
Thanks for all the comments and suggestions, they've been very useful.

The consensus seems to be that yes, iostream is slower than stdio but
that it's largely down to poor implementations. I guess if I want to
know exactly why this is so, I'd need to dig around in the
implementations. Probably the best way to go is use iostreams for
small amounts of IO and write my own custom, possibly platform
specific, IO code when speed is critical.

The array versus vector discussion seems more problematic. A couple of
people made the point that actually it's not down to using vectors,
it's how you use them, it's basically down to allocating large blocks
versus lots of small ones. It's a good point and I'll keep it in mind.

The point was also made that if I decide to allocate large blocks and
calculate indexes then I should wrap this functionally in a class. I
guess it would also make sense to wrap it in a class that has a
similar interface to vector, assuming this can be done? This way I
could pass vector or my own object as a template parameter and easily
switch between the two.

For my own VectorContiguous, I guess operator[] would have to return a
dummy object which could be automatically type converted to the
storage type of the VectorContiguous object? Has this already been
done?

It sounds like if pushed I should be able to create a drop in
replacement for vector that contiguously allocates memory. Whither
this is valuable or not seems to be an open question, and probably
architecture dependent.

Alf P. Steinbach

unread,
Feb 6, 2008, 9:45:32 AM2/6/08
to
* nw:

>
> For my own VectorContiguous, I guess operator[] would have to return a
> dummy object which could be automatically type converted to the
> storage type of the VectorContiguous object? Has this already been
> done?
>
> It sounds like if pushed I should be able to create a drop in
> replacement for vector that contiguously allocates memory. Whither
> this is valuable or not seems to be an open question, and probably
> architecture dependent.

Not sure what you're talking about here, but std::vector does allocate
contigious memory. Guaranteed by the (C++03) standard. And so does
std::string, in practice, but that won't be guaranteed until C++0x.

nw

unread,
Feb 6, 2008, 10:36:48 AM2/6/08
to
> Not sure what you're talking about here, but std::vector does allocate
> contigious memory. Guaranteed by the (C++03) standard. And so does
> std::string, in practice, but that won't be guaranteed until C++0x.

I'm talking about contiguous allocation of multidimensional vectors.

Victor Bazarov

unread,
Feb 6, 2008, 10:40:07 AM2/6/08
to

There is no such thing. There can be a vector of vectors, but it
is not a "multidimensional vector", unfortunately. The main reason
it isn't that is the ability of all [second-tier] vectors to have
different sizes.

Daniel T.

unread,
Feb 6, 2008, 11:02:30 AM2/6/08
to
nw <n...@soton.ac.uk> wrote:

> The array versus vector discussion seems more problematic. A couple
> of people made the point that actually it's not down to using
> vectors, it's how you use them, it's basically down to allocating
> large blocks versus lots of small ones. It's a good point and I'll
> keep it in mind.
>
> The point was also made that if I decide to allocate large blocks
> and calculate indexes then I should wrap this functionally in a
> class.

Actually, you should wrap the functionality in a class no matter how you
decide to implement it, that way you can change the implementation
easily.

> I guess it would also make sense to wrap it in a class that has a
> similar interface to vector, assuming this can be done? This way I
> could pass vector or my own object as a template parameter and
> easily switch between the two.
>
> For my own VectorContiguous, I guess operator[] would have to
> return a dummy object which could be automatically type converted
> to the storage type of the VectorContiguous object? Has this
> already been done?
>
> It sounds like if pushed I should be able to create a drop in
> replacement for vector that contiguously allocates memory. Whither
> this is valuable or not seems to be an open question, and probably
> architecture dependent.

Having op[] return a dummy object is a poor idea, it makes it harder to
drop in different implementations. Only do something like that if you
absolutely have to. See the FAQ for more about this subject.

http://www.parashift.com/c++-faq-lite/operator-overloading.html#faq-13.10

nw

unread,
Feb 6, 2008, 12:04:34 PM2/6/08
to
> Having op[] return a dummy object is a poor idea, it makes it harder to
> drop in different implementations. Only do something like that if you
> absolutely have to. See the FAQ for more about this subject.
>
> http://www.parashift.com/c++-faq-lite/operator-overloading.html#faq-1...

hmm... ok, the FAQ makes a good point. But raises a couple of
questions.

Firstly is it possible to create a generalizable n-dimensional matrix
using this method?

Secondly is there anyway to make this compatible with (have a similar
interface to) a STL vector? It would be nice to use a STL container
for now and then implement my own classes later if required and select
between them using a template parameter.

Basically does this mean I should never use a multi-dimensional
vector, because I'm cutting myself off from the possibility of
changing the container later?

nw

unread,
Feb 6, 2008, 12:06:43 PM2/6/08
to
>> I'm talking about contiguous allocation of multidimensional vectors.


> There is no such thing. There can be a vector of vectors, but it
> is not a "multidimensional vector", unfortunately. The main reason
> it isn't that is the ability of all [second-tier] vectors to have
> different sizes.

ok, I'm talking about implementing a vectorlike multidimensional
datastructure that uses contiguous memory (see initial message for
details of the problem I'm addressing).

Juha Nieminen

unread,
Feb 6, 2008, 12:21:24 PM2/6/08
to
Jim Langston wrote:
> Unfortunately with my compiler my optimizations are disabled so I don't know
> how it would be optimized.

You *can't* make any speed comparisons between things when compiling
without optimizations. In debug mode the speed of the code will be
completely random, depending on what debug checks the compiler chooses
to insert there.

Moreover, printing to a console makes the comparison pointless as
well, because printing to a console has an enormous overhead which
nullifies any speed difference there may be between different I/O
libraries. You have to print to a file to get a better comparison.
After all, the question was about reading (and presumably parsing)
enormous amounts of data, and outputting enormous amounts of data. These
things are never done with a console terminal as the output, but a file.
Writing to a file is quite a fast operation (probably hundreds of times
faster than printing to a console), and thus the speed difference
between the different I/O routines will have more effect.

Daniel T.

unread,
Feb 6, 2008, 12:29:39 PM2/6/08
to
nw <n...@soton.ac.uk> wrote:

> > Having op[] return a dummy object is a poor idea, it makes it harder to
> > drop in different implementations. Only do something like that if you
> > absolutely have to. See the FAQ for more about this subject.
> >
> > http://www.parashift.com/c++-faq-lite/operator-overloading.html#faq-1...
>
> hmm... ok, the FAQ makes a good point. But raises a couple of
> questions.
>
> Firstly is it possible to create a generalizable n-dimensional matrix
> using this method?

Of course.

> Secondly is there anyway to make this compatible with (have a similar
> interface to) a STL vector? It would be nice to use a STL container
> for now and then implement my own classes later if required and select
> between them using a template parameter.

I'm guessing your nomenclature is confused again. Do you mean a vector
of vectors, when you say "vector" above?

If a Matrix (2 dimensional array, whatever you want to call it) is part
of the problem space and can be implemented in more than one way, then
it should be represented as a class. It would not be "nice" to use a
vector of vectors now and then try to shoehorn some other implementation
into the code later.

> Basically does this mean I should never use a multi-dimensional
> vector, because I'm cutting myself off from the possibility of
> changing the container later?

By all means, use a vector of vectors when you need the functionality
that such a construct would provide. A ragged array, for example, not a
2 dimensional array.

Ioannis Vranos

unread,
Feb 6, 2008, 1:12:00 PM2/6/08
to


Where computational efficiency is a primary concern, you should use
valarray and its facilities (slice_array etc).

Ioannis Vranos

unread,
Feb 6, 2008, 1:14:01 PM2/6/08
to
nw wrote:
>
> Firstly is it possible to create a generalizable n-dimensional matrix
> using this method?


Where computational efficiency is a primary concern you should use
valarray, and use its facilities for "simulating" n-dimensional matrices.

Erik Wikström

unread,
Feb 6, 2008, 5:14:50 PM2/6/08
to

I disagree, unless you have some commercial library which includes
specific optimisations for valarray using vector will be just as good.
The reason being that valarray was designed to allow optimisations but
they do not require it, and since valarray was a bit of a failure no
implementations that I know of perform any kinds of optimisations. The
only advantage of valarray is that it comes with a few mathematical
operators, but at least in the implementation that comes with Visual C++
they are implemented using normal loops.

In fact I would argue that writing your own might quite easily achieve
greater performance for some operations using OpenMP.

As for implementing a two-dimensional array I would do something like this:

template<typename T>
class Matrix
{
std::vector<T> m_vec;
size_t m_rows, m_cols;
public:
Matrix(size_t r, size_t c)
: m_vec(r * c), m_rows(r), m_cols(c)
{ }
T& operator()(size_t r, size_t c)
{
// Perhaps check if r < m_rows && c < m_cols
return m_vec[r * m_cols + c];
}
};

I might have messed up the calculation of where the element is (mixed
rows and columns) but I think that the general idea is clear. To make it
kind of usable with standard algorithms and containers you can offer
begin() and end() which just returns m_vec.begin()/end(), or you can
make more advanced stuff like allowing iterating over a row or column.

--
Erik Wikström

Erik Wikström

unread,
Feb 6, 2008, 5:17:46 PM2/6/08
to

That is why you should wrap it all in a class, that way you should be
able to change the internal workings without having to change the
class's external interface. That means that functions/classes using your
wrapper class can not tell the difference between the implementations.

--
Erik Wikström

Ioannis Vranos

unread,
Feb 6, 2008, 7:55:13 PM2/6/08
to
Erik Wikström wrote:
>> Where computational efficiency is a primary concern, you should use
>> valarray and its facilities (slice_array etc).
>
> I disagree, unless you have some commercial library which includes
> specific optimisations for valarray using vector will be just as good.
> The reason being that valarray was designed to allow optimisations but
> they do not require it, and since valarray was a bit of a failure no
> implementations that I know of perform any kinds of optimisations. The
> only advantage of valarray is that it comes with a few mathematical
> operators, but at least in the implementation that comes with Visual C++
> they are implemented using normal loops.


Although this may be the case for the implementations you know, valarray
is still the type intended for high performance computations.

If we want to use the right tools for the right job, valarray is
intended for high performance computations.

Under usual circumstances, valarray will never be slower than vector,
while it can be faster than vector in a given implementation, or in a
future version of a given implementation. Also its facilities allow us
to "simulate" n-dimensional matrices without having to define our types,
elegantly and efficiently.

nw

unread,
Feb 7, 2008, 4:42:50 AM2/7/08
to
> Under usual circumstances, valarray will never be slower than vector,
> while it can be faster than vector in a given implementation, or in a
> future version of a given implementation. Also its facilities allow us
> to "simulate" n-dimensional matrices without having to define our types,
> elegantly and efficiently.

I'm finding it difficult to locate a good example of using valarray to
simulate
a multi-dimensional array. Can you give me an example or point me
towards one?

Ioannis Vranos

unread,
Feb 7, 2008, 5:43:28 AM2/7/08
to

"The C++ Programming Language" 3rd Edition or Special Edition by Bjarne
Stroustrup (the creator of C++), Chapter 22 "Numerics" (on page 657).

James Kanze

unread,
Feb 7, 2008, 5:45:45 AM2/7/08
to

Or more likely, he has a very slow disk or IO bus. The more
time you spend in I/O, the less the differences in CPU make when
expressed as a percentage.

Not really related to the original question (we use Posix level
I/O for the critical parts in our application), but we've
noticed this a lot when migrating from Sparc/Solaris to
Linux/PC. Our Sparc park is old and slow---the new PC's are
often more than ten times as fast. But the Sparc I/O bus was
apparently a lot faster: our application actually runs at about
the same speed on both systems---with close to 100% use of the
CPU on the Sparcs, but less than 15% on the PC's.

My experience also suggests that SBM makes signficantly less
efficient use of the network than NFS (or maybe it is just Samba
which is a lot slower than the NFS servers), which means that
accessing a remote file under Windows will be really, really
slow. And that of course, any difference between stdio.h and
iostream will be negligeable, because even a poor and slow
implementation won't use measurable CPU compared to data
transfer times.

James Kanze

unread,
Feb 7, 2008, 5:53:20 AM2/7/08
to
On Feb 6, 3:28 pm, nw <n...@soton.ac.uk> wrote:
> The array versus vector discussion seems more problematic. A
> couple of people made the point that actually it's not down to
> using vectors, it's how you use them, it's basically down to
> allocating large blocks versus lots of small ones. It's a good
> point and I'll keep it in mind.

> The point was also made that if I decide to allocate large
> blocks and calculate indexes then I should wrap this
> functionally in a class. I guess it would also make sense to
> wrap it in a class that has a similar interface to vector,
> assuming this can be done? This way I could pass vector or my
> own object as a template parameter and easily switch between
> the two.

The trick is for operator[] to return a proxy object, which
supports operator[] to access the final value. As it happens,
T* fits the bill (although there'll be no bounds checking), so
implementing two dimensional arrays is particularly simple.

> For my own VectorContiguous, I guess operator[] would have to
> return a dummy object which could be automatically type
> converted to the storage type of the VectorContiguous object?
> Has this already been done?

class Vector2D
{
public:
Vector2D( size_t i, size_t j )
: m( i )
, n( j )
, data( i * j )
{
}
double* operator[]( size_t i )
{
return &data[ i * m ] ;
}
// ...

private:
size_t m ;
size_t n ;
std::vector< double > data ;
} ;

For starters.

> It sounds like if pushed I should be able to create a drop in
> replacement for vector that contiguously allocates memory.
> Whither this is valuable or not seems to be an open question,
> and probably architecture dependent.

The nice thing about using a class like the above is that you
can change the implementation at will without any changes in the
client code.

Ioannis Vranos

unread,
Feb 7, 2008, 5:59:40 AM2/7/08
to


You may also check
http://technet.microsoft.com/en-us/library/3tbs0f4w(VS.80).aspx

slice_array, gslice_array etc are n-dimensional matrices which we can
assign to a new valarray.

James Kanze

unread,
Feb 7, 2008, 6:02:01 AM2/7/08
to
On Feb 6, 5:02 pm, "Daniel T." <danie...@earthlink.net> wrote:
> nw <n...@soton.ac.uk> wrote:

[...]


> > For my own VectorContiguous, I guess operator[] would have to
> > return a dummy object which could be automatically type converted
> > to the storage type of the VectorContiguous object? Has this
> > already been done?

> > It sounds like if pushed I should be able to create a drop in
> > replacement for vector that contiguously allocates memory. Whither
> > this is valuable or not seems to be an open question, and probably
> > architecture dependent.

> Having op[] return a dummy object is a poor idea, it makes it harder to
> drop in different implementations. Only do something like that if you
> absolutely have to. See the FAQ for more about this subject.

> http://www.parashift.com/c++-faq-lite/operator-overloading.html#faq-1...

The FAQ is wrong about this. The choice between [i][j] and
(i,j) should depend on personal preference and local
conventions; both offer exactly the same possibilities for
optimization and supporting different implementations.

nw

unread,
Feb 7, 2008, 6:06:49 AM2/7/08
to
On Feb 7, 10:43 am, Ioannis Vranos <ivra...@nospam.no.spamfreemail.gr>
wrote:

<reads> ok.. I see. So the message here is encapsulate a valarray in a
Matrix object and use the slice operations to access it. The advantage
to using valarray over vector here is that that it provides slice
operations, I don't need to implement my own? Or are there other
efficiency gains to using valarray?

I can't see a huge advantage to encapsulating a valarray to a vector
here.

nw

unread,
Feb 7, 2008, 6:26:32 AM2/7/08
to

> > Having op[] return a dummy object is a poor idea, it makes it harder to
> > drop in different implementations. Only do something like that if you
> > absolutely have to. See the FAQ for more about this subject.
> >http://www.parashift.com/c++-faq-lite/operator-overloading.html#faq-1...
>
> The FAQ is wrong about this. The choice between [i][j] and
> (i,j) should depend on personal preference and local
> conventions; both offer exactly the same possibilities for
> optimization and supporting different implementations.

My reading was that the FAQ indicates that you can use [i][j] but
tries to steer
you away from it because it will be harder to implement. I'm veering
towards a
Matrix object with a operator() but I find it unfortunate that the STL
doesn't
already provide such an object, doing so would provide a standardized
interface
which would let people create compatible Matrix objects optimized for
different
platforms, spare matrices etc.

Are there are any plans for this to be added to a future standard?

Ioannis Vranos

unread,
Feb 7, 2008, 6:52:16 AM2/7/08
to
nw wrote:
>
>> "The C++ Programming Language" 3rd Edition or Special Edition by Bjarne
>> Stroustrup (the creator of C++), Chapter 22 "Numerics" (on page 657).
>
> <reads> ok.. I see. So the message here is encapsulate a valarray in a
> Matrix object and use the slice operations to access it. The advantage
> to using valarray over vector here is that that it provides slice
> operations, I don't need to implement my own? Or are there other
> efficiency gains to using valarray?
>
> I can't see a huge advantage to encapsulating a valarray to a vector
> here.


I haven't used valarray in practice myself, however valarray can be
aggressively optimised so if you have serious run-time concerns you
should use valarray with slice or gslice or whatever other valarray
auxiliary type fits better.

If you have not serious run-time concerns but the usual ones, you may
use the usual vector<vector<whatever> > combinations and have your job done.

nw

unread,
Feb 7, 2008, 7:04:52 AM2/7/08
to
> I haven't used valarray in practice myself, however valarray can be
> aggressively optimised so if you have serious run-time concerns you
> should use valarray with slice or gslice or whatever other valarray
> auxiliary type fits better.

What is it about valarray that allows it to be aggressively optimised?
Is it because it has a fixed size? So that a compiler could choose to
store it in a small but fast region of memory and be sure that it
would be necessary to grow the array?

I'm trying to understand why I need this data structure, as others
have noted it's usually not well optimised and doesn't seem to have
been developed very well in the standard.

Ioannis Vranos

unread,
Feb 7, 2008, 12:32:43 PM2/7/08
to


Have a look on the entire chapter 22 of TC++PL3, it provides some info
on this.

For example on page 663 it is mentioned:

"The valarray and its auxiliary facilities were designed for high-speed
computing. This is reflected in a few constraints on users and by a few
liberties granted to implementers. Basically, an implementer of valarray
is allowed to use just about every optimization technique you can think
of. For example, operations may be inlined and the valarray operations
are assumed to be free of side effects (except on their explicit
arguments of course). Also, valarrays are assumed to be alias free, and
the introduction of auxiliary types and *the elimination of temporaries*
is allowed as long as the basic semantics are maintained. Thus, the
declarations in <valarray> may look somewhat different from what you
find here (and in the standard), but they should provide the same
operations with the same meaning for code that doesn't go out of the way
to break rules. In particular, the elements of a valarray should have
the usual copy semantics (17.1.4)".


I do not know if you need valarray though. This is up to you to decide.
vector is efficient for most applications.

Have a look on 17.1.2 of TC++PL3 where the costs of the various
containers are mentioned.

Erik Wikström

unread,
Feb 7, 2008, 2:57:45 PM2/7/08
to
On 2008-02-07 18:32, Ioannis Vranos wrote:
> nw wrote:
>>> I haven't used valarray in practice myself, however valarray can be
>>> aggressively optimised so if you have serious run-time concerns you
>>> should use valarray with slice or gslice or whatever other valarray
>>> auxiliary type fits better.

If you have serious run-time concerns then you should use a library
which offers high performance multi-dimensional arrays, there are a few
around. However be aware that I have never heard anyone speak of the
valarray as being one of them except in theoretical situations.

>> What is it about valarray that allows it to be aggressively optimised?
>> Is it because it has a fixed size? So that a compiler could choose to
>> store it in a small but fast region of memory and be sure that it
>> would be necessary to grow the array?
>>
>> I'm trying to understand why I need this data structure, as others
>> have noted it's usually not well optimised and doesn't seem to have
>> been developed very well in the standard.
>
>
> Have a look on the entire chapter 22 of TC++PL3, it provides some info
> on this.
>
> For example on page 663 it is mentioned:
>
> "The valarray and its auxiliary facilities were designed for high-speed
> computing. This is reflected in a few constraints on users and by a few
> liberties granted to implementers. Basically, an implementer of valarray
> is allowed to use just about every optimization technique you can think
> of. For example, operations may be inlined and the valarray operations
> are assumed to be free of side effects (except on their explicit
> arguments of course). Also, valarrays are assumed to be alias free, and
> the introduction of auxiliary types and *the elimination of temporaries*
> is allowed as long as the basic semantics are maintained. Thus, the
> declarations in <valarray> may look somewhat different from what you
> find here (and in the standard), but they should provide the same
> operations with the same meaning for code that doesn't go out of the way
> to break rules. In particular, the elements of a valarray should have
> the usual copy semantics (17.1.4)".

A little history of valarray: It was developed at a time when vector-
computers were the latest and greatest in scientific computing and was
developed to allow C++ developers easy access to the power of these
vector-machines. Unfortunately there are a few problems with it, one
being that the number of vector-computers your code will ever run on is
none (with a very great probability). Another problem was that they guy
(or guys) who were working on the valarray stopped working on it some-
where half-way through. So the result is probably what they had come up
with so far, perhaps finished of by someone who was not very familiar
with the problem. And lastly I have heard that the assumed freedom from
aliases can not be fulfilled (I have heard that it cannot be in a
language that allows pointers (unless, perhaps you have something like
restrict in C)).

I am quite sure that many in the C++ committee would like to remove it,
but you are not allowed to do that in an ISO standard (at least not very
easily).

--
Erik Wikström

Erik Wikström

unread,
Feb 7, 2008, 3:02:44 PM2/7/08
to

No, (or rather probably not, I am not a member of the committee so I do
not know). It has always been the purpose of the standard library to
supply generic containers and algorithms. For specialised purposes third
part libraries are generally recommended. If you want a library you can
use to perform vector and matrix algebra there are a number of them
available, http://www.oonumerics.org/oon/ have a good list.

--
Erik Wikström

Ioannis Vranos

unread,
Feb 7, 2008, 4:39:13 PM2/7/08
to
Erik Wikström wrote:
>
> I am quite sure that many in the C++ committee would like to remove it,
> but you are not allowed to do that in an ISO standard (at least not very
> easily).


If this is the case they could deprecate it. It wasn't in C++03 and I am
not sure this will happen in "C++0x" too.

I agree that there are special-purpose, very efficient, C++ math
libraries out there, but I think we should stick with ISO C++ as much as
we can in our code.

If I had efficiency concerns and valarray didn't do the job, I would
check those 3rd party libraries of course.

Maxx

unread,
Feb 8, 2008, 3:07:38 AM2/8/08
to
On My FreeBSD platform iostream takes about 3sec more than stdio to
compile a program that is approximately about 23MB big.And that lag in
speed is probably because of the large number of libraries that
iostream has within it.

Ian Collins

unread,
Feb 8, 2008, 3:14:21 AM2/8/08
to

Who makes these compilers (iostream and stdio)?

--
Ian Collins.

Lionel B

unread,
Feb 8, 2008, 5:03:11 AM2/8/08
to
On Thu, 07 Feb 2008 19:57:45 +0000, Erik Wikström wrote:

> On 2008-02-07 18:32, Ioannis Vranos wrote:
>> nw wrote:
>>>> I haven't used valarray in practice myself, however valarray can be
>>>> aggressively optimised so if you have serious run-time concerns you
>>>> should use valarray with slice or gslice or whatever other valarray
>>>> auxiliary type fits better.

[...]

> And lastly I have heard that the assumed freedom from
> aliases can not be fulfilled (I have heard that it cannot be in a
> language that allows pointers (unless, perhaps you have something like
> restrict in C)).

FWIW, GNU g++ (or at least the version on my system) implements a
__restrict__ specifier which, checking the valarray code, is sprinkled
liberally about. My experience with g++ is that more recent versions do
manage a certain degree of vectorisation under optimisation, although
whether this is in any way facilitated by restricting aliasing I really
don't know. The Intel compiler icpc (which on my system uses the g++ std
libs) appears to vectorise more aggressively than g++. On the whole, I've
never seen much of a speed-up using valarray rather than std::vector.

> I am quite sure that many in the C++ committee would like to remove it,
> but you are not allowed to do that in an ISO standard (at least not very
> easily).

Yes, valarray is a bit of a curate's egg.

--
Lionel B

James Kanze

unread,
Feb 8, 2008, 5:59:08 AM2/8/08
to
On Feb 7, 12:26 pm, nw <n...@soton.ac.uk> wrote:
> > > Having op[] return a dummy object is a poor idea, it makes
> > > it harder to drop in different implementations. Only do
> > > something like that if you absolutely have to. See the FAQ
> > > for more about this subject.
> > >http://www.parashift.com/c++-faq-lite/operator-overloading.html#faq-1...

> > The FAQ is wrong about this. The choice between [i][j] and
> > (i,j) should depend on personal preference and local
> > conventions; both offer exactly the same possibilities for
> > optimization and supporting different implementations.

> My reading was that the FAQ indicates that you can use [i][j]
> but tries to steer you away from it because it will be harder
> to implement.

Not significantly. My reading was that it also raised
performance issues, which aren't present either.

> I'm veering towards a Matrix object with a operator() but I
> find it unfortunate that the STL doesn't already provide such
> an object, doing so would provide a standardized interface
> which would let people create compatible Matrix objects
> optimized for different platforms, spare matrices etc.

> Are there are any plans for this to be added to a future standard?

It's probably true that there is a need for true
multi-dimensional arrays. To date, however, I don't think that
there has been a proposal.

James Kanze

unread,
Feb 8, 2008, 6:01:39 AM2/8/08
to

More likely the difference is due to the fact that iostream is a
template.

Daniel T.

unread,
Feb 8, 2008, 6:19:17 AM2/8/08
to
James Kanze <james...@gmail.com> wrote:
> nw <n...@soton.ac.uk> wrote:

> > > > Having op[] return a dummy object is a poor idea, it makes
> > > > it harder to drop in different implementations. Only do
> > > > something like that if you absolutely have to. See the FAQ
> > > > for more about this subject.
> > >

> > > The FAQ is wrong about this. The choice between [i][j] and
> > > (i,j) should depend on personal preference and local
> > > conventions; both offer exactly the same possibilities for
> > > optimization and supporting different implementations.
> >
> > My reading was that the FAQ indicates that you can use [i][j]
> > but tries to steer you away from it because it will be harder
> > to implement.
>
> Not significantly. My reading was that it also raised
> performance issues, which aren't present either.

I recall having an extended discussion about this sometime last year. I
don't remember if James participated, but the conclusion was that any
class that supported the [x][y] syntax and was as flexible as one that
supported the at( x, y ) syntax, ended up supporting both. Make of that
what you will...

nw

unread,
Feb 8, 2008, 9:23:24 AM2/8/08
to
On Feb 8, 10:59 am, James Kanze <james.ka...@gmail.com> wrote:
> On Feb 7, 12:26 pm, nw <n...@soton.ac.uk> wrote:
>
> > > > Having op[] return a dummy object is a poor idea, it makes
> > > > it harder to drop in different implementations. Only do
> > > > something like that if you absolutely have to. See the FAQ
> > > > for more about this subject.
> > > >http://www.parashift.com/c++-faq-lite/operator-overloading.html#faq-1...
> > > The FAQ is wrong about this. The choice between [i][j] and
> > > (i,j) should depend on personal preference and local
> > > conventions; both offer exactly the same possibilities for
> > > optimization and supporting different implementations.
> > My reading was that the FAQ indicates that you can use [i][j]
> > but tries to steer you away from it because it will be harder
> > to implement.
>
> Not significantly. My reading was that it also raised
> performance issues, which aren't present either.

From the FAQ: "If you have a decent compiler and if you judiciously
use inlining, the compiler should optimize away the temporary objects.
In other words, the operator[]-approach above will hopefully not be
slower than what it would have been if you had directly called
Matrix::operator()(unsigned row, unsigned col) in the first place. Of
course you could have made your life simpler and avoided most of the
above work by directly calling Matrix::operator()(unsigned row,
unsigned col) in the first place. So you might as well directly call
Matrix::operator()(unsigned row, unsigned col) in the first place."

I guess he's saying that your compiler /should/ be able to optimize
your performance issues away, but perhaps that wont always be the
case.

> It's probably true that there is a need for true
> multi-dimensional arrays. To date, however, I don't think that
> there has been a proposal.

Seeing as there is a lot of numerical code going in to TR1, perhaps it
wouldn't be a bad idea if this was suggested for TR2? Or does anyone
have an opinion as to why this would be a bad idea?

James Kanze

unread,
Feb 8, 2008, 2:10:22 PM2/8/08
to
On Feb 8, 12:19 pm, "Daniel T." <danie...@earthlink.net> wrote:

Supporting both is also an option:-). Basically, all I'm saying
is that you shouldn't let supposed performance issues or
simplicity of implementation influence you. Present the options
to your users, and let them decide.

Daniel T.

unread,
Feb 8, 2008, 2:22:26 PM2/8/08
to
James Kanze <james...@gmail.com> wrote:

> Basically, all I'm saying is that you shouldn't let supposed
> performance issues or simplicity of implementation influence you.
> Present the options to your users, and let them decide.

Trying to be all interfaces to all users is what produced the
std::string class. I'm not sure if that is such a good idea.

Hans Mull

unread,
Feb 8, 2008, 4:58:31 PM2/8/08
to
Ian Collins schrieb:
Compilers?

Hans Mull

unread,
Feb 8, 2008, 4:59:23 PM2/8/08
to
Maxx schrieb:
Some C++ functions are using dynamic memory allocation. This is not as
fast as static memory allocation.

Kind regards, Hans

Ian Collins

unread,
Feb 8, 2008, 5:15:13 PM2/8/08
to

To quote "iostream takes about 3sec more than stdio to compile a program"

Compilers compile.

--
Ian Collins.

Juha Nieminen

unread,
Feb 9, 2008, 1:48:55 PM2/9/08
to
Hans Mull wrote:
> Maxx schrieb:

> Some C++ functions are using dynamic memory allocation. This is not as
> fast as static memory allocation.

Do you actually have some reference to support this?

terminator

unread,
Feb 13, 2008, 4:21:32 AM2/13/08
to
On Feb 6, 5:28 pm, nw <n...@soton.ac.uk> wrote:
> Thanks for all the comments and suggestions, they've been very useful.
>
> The consensus seems to be that yes, iostream is slower than stdio but
> that it's largely down to poor implementations. I guess if I want to
> know exactly why this is so, I'd need to dig around in the
> implementations. Probably the best way to go is use iostreams for
> small amounts of IO and write my own custom, possibly platform
> specific, IO code when speed is critical.

>
> The array versus vector discussion seems more problematic. A couple of
> people made the point that actually it's not down to using vectors,
> it's how you use them, it's basically down to allocating large blocks
> versus lots of small ones. It's a good point and I'll keep it in mind.
>
> The point was also made that if I decide to allocate large blocks and
> calculate indexes then I should wrap this functionally in a class.

base on my own memory, vector construction/destruction is no more than
15 percent slower than dynamic allocation/deallocation of a contigous
sequence of objects and with iteration I could not observe any
meaningful difference between an array and a vector.To my surprise
valarray iteration was 50 percent slower than both of the above.All
these were of course platform specific,but I do not see any general
reason to fancy vector slower than bare arrays ecxept the case of
putting the array on intrinsic stack(that is totally avoiding dynamic
allocation/deallocation) in very special cases.

> I
> guess it would also make sense to wrap it in a class that has a
> similar interface to vector, assuming this can be done? This way I
> could pass vector or my own object as a template parameter and easily
> switch between the two.
>

I doubt that in this case you get faster than stl:All what stl is
about is a set of generic classes that wrap different techniques of
representing arrays.

> For my own VectorContiguous, I guess operator[] would have to return a
> dummy object which could be automatically type converted to the
> storage type of the VectorContiguous object? Has this already been
> done?
>

you can find some similar class in std::bitset but it generally slows
down the program and I would not do it for any class unless it is the
only way.

> It sounds like if pushed I should be able to create a drop in
> replacement for vector that contiguously allocates memory. Whither
> this is valuable or not seems to be an open question, and probably
> architecture dependent.

vector ***is*** a sequential array(what you want write a class for) on
many platforms (I think all of them)I guess your class wont beat it.

regards,
FM.

0 new messages