Help improving this little function, if necessary.

Noob

unread,

Nov 18, 2015, 4:30:54 PM11/18/15

to

Hi there,

I'm new to C++, coming from Fortran. I'd like some help writing the
following function in a way to improve speed, if it turns out to be to
badly written.

int eval_lgk(const std::vector<int> coords,
const std::vector<double>factsgk) {

unsigned int nvars = coords.size();
std::vector<double> temp(coords.size()-1);

for (unsigned int i = 0; i < nvars-2; i++)
{
temp[i] = factsgk[i] * ( coords[i] - 1.0 );
}

return coords[nvars-1] +
std::accumulate(temp.begin(),temp.end(),0.0);

}

In real cases, factsgk and coords will both have size of about 3 to
5, say, but this function will be called many, many times inside loops
and I'd like to write it in a way I can get good speed.

Is it possible to get rid of the temp vector? Would it help?

Thank you for any help.

P.s: I'd also be grateful for any general advices about this little
piece of code.

Noob

unread,

Nov 18, 2015, 4:37:01 PM11/18/15

to

Oops: The types of factsgk and of temp should be int.

Paavo Helde

unread,

Nov 18, 2015, 4:47:57 PM11/18/15

to

Noob <dont...@me.com> wrote in news:n2iqhi$40c$1...@dont-email.me:

> Hi there,
>
> I'm new to C++, coming from Fortran. I'd like some help writing the
> following function in a way to improve speed, if it turns out to be to
> badly written.
>
> int eval_lgk(const std::vector<int> coords,
> const std::vector<double>factsgk) {

For speed, you probably want to use references instead of copying the
whole vectors (const std::vector<int>& coords, etc).

>
> unsigned int nvars = coords.size();

Wrong type. The correct type (in practice) is size_t, which is even less
to type than 'unsigned int'.

> std::vector<double> temp(coords.size()-1);

This will fail if coords is empty. One should have either a check-and-
throw-exception or an assert for that. If speed is critical, use an
assert which is compiled only in debug builds.

>
> for (unsigned int i = 0; i < nvars-2; i++)

Will fail if nvars<2; again, check or assert. Also, it seems that the
last element of temp is not used, so there was no need to allocate it.

> {
> temp[i] = factsgk[i] * ( coords[i] - 1.0 );
> }
>
> return coords[nvars-1] +
> std::accumulate(temp.begin(),temp.end(),0.0);

You are collecting something in the temp array, only to sum it up later.
If the arrays are large, this means unneeded memory footprint and reduced
performance. Why don't you sum it during the loop?

>
> }
>
> In real cases, factsgk and coords will both have size of about 3 to
> 5, say, but this function will be called many, many times inside loops
> and I'd like to write it in a way I can get good speed.
>
> Is it possible to get rid of the temp vector? Would it help?

Sure you can get rid of it, it has no use. If it helps with speed - no
idea, you have to measure.

Ian Collins

unread,

Nov 18, 2015, 4:56:22 PM11/18/15

to

Noob wrote:
> Hi there,
>
> I'm new to C++, coming from Fortran. I'd like some help writing the
> following function in a way to improve speed, if it turns out to be to
> badly written.
>
> int eval_lgk(const std::vector<int> coords,
> const std::vector<double>factsgk) {

It would be more idiomatic and maybe faster (depending on how well the
code gets optimised or inlined) to pass by const reference here.

> unsigned int nvars = coords.size();

A good candidate for "const auto"

> std::vector<double> temp(coords.size()-1);

Does this add any value? Can't you sum as you go in the loop?

> for (unsigned int i = 0; i < nvars-2; i++)

What if nvars < 2??

--
Ian Collins

Noob

unread,

Nov 18, 2015, 5:12:27 PM11/18/15

to

Thank you Paavo and Ian.

The expression nvars-2 was a last minute typo, but I got the idea
about asserts. I'll also be using size_t from now on.

About "temp" and the loop, I am curious about some possible one-liner
for my return statement, eliminating "temp" and that loop, and possibly
letting the compiler do a better job. Any STL solution for that?

Thank you again!

Ian Collins

unread,

Nov 18, 2015, 5:17:18 PM11/18/15

to

Maybe now would be a good to for you to restate your requirements and
present fresh code?

--
Ian Collins

Paavo Helde

unread,

Nov 18, 2015, 5:29:05 PM11/18/15

to

Noob <dont...@me.com> wrote in news:n2isve$e5f$1...@dont-email.me:

> About "temp" and the loop, I am curious about some possible one-liner
> for my return statement, eliminating "temp" and that loop, and possibly
> letting the compiler do a better job.

double sum = 0.0;
// ...
return sum;

> Any STL solution for that?

STL algorithms are mostly meant for STL containers, but here there is no
need to create the 'temp' STL containar, meaning there is no need to use a
"STL solution".

Noob

unread,

Nov 18, 2015, 5:40:13 PM11/18/15

to

Yes.

What I have now is

int eval_lgk(std::vector<int>& coords, std::vector<int>& factsgk) {

assert( factsgk.size() == coords.size() && coords.size() >= 1 );

size_t nvars = coords.size();
int partial = coords.back();

for (size_t i = 0; i < nvars-1; i++)
{
partial+= factsgk[i] * ( coords[i] - 1 );
}

return partial;

}

I still have an explicit loop that I'd like to get rid of, if possible.
I understand somewhere something will perform this loop, but I'm
looking for something that will do it efficiently without me coding it.

Thanks!

Noob

unread,

Nov 18, 2015, 5:43:22 PM11/18/15

to

I see, but I still have std::vectors to operate on.

Thanks!

Alf P. Steinbach

unread,

Nov 18, 2015, 7:46:15 PM11/18/15

to

On 11/18/2015 10:30 PM, Noob wrote:
>
> I'm new to C++, coming from Fortran. I'd like some help writing the
> following function in a way to improve speed, if it turns out to be to
> badly written.

For numerical operations using Fortran used to be a good way to speed up
things. Most numerical libraries started out as Fortran libraries. I
don't know how that is today, but I think going from Fortran to C++ is
not likely to /improve/ the speed significantly.

Anyway, the first and also second rule of optimization is to MEASURE.

Be sure to measure the right thing, though, i.e. be sure to ask your
compiler to optimize for speed. Usually that's an option like "-O2". But
consult your compiler's or IDE's documentation or GUI. Also define
"NDEBUG" to remove assertion checking. It's generally a good idea to
leave assertion checking on, but not in time-critical code.

> int eval_lgk(const std::vector<int> coords,
> const std::vector<double>factsgk)

Here you should

* hint to the compiler to inline calls to this function, by using the
keyword "inline",

* pass the arguments by reference to const, not by value.

Passing by value the arguments are (at least logically) copied, and such
copying involves expensive dynamic allocations unless it's optimized
away by the implementation.

> {
>
> unsigned int nvars = coords.size();

For clarity it's best to declare "nvars" as "const". Sprinkle "const"
liberally throughout your code, wherever possible. That way you can see
at a glance, up front, that that value will not be changing, and this
helps to understand the code faster, and avoid silly mistakes.

Also, at this point add

assert( nvars >= 2 );

Include the "<assert.h>" header for that.

If nvars is 0 then the declaration below will declare a really huge
"temp" vector, and if nvars <= 1 then the loop will likewise loop a
really large number of times.

> std::vector<double> temp(coords.size()-1);
>
> for (unsigned int i = 0; i < nvars-2; i++)
> {
> temp[i] = factsgk[i] * ( coords[i] - 1.0 );
> }

Either this loop is incorrect, or else the declaration of "temp" is
incorrect (apart from the item type which you have explained in a
follow-up should be "int"). "temp" has nvars-1 items, but only nvars-2
items are assigned to.

By the way, although the compiler is almost sure to optimize this,
introducing a name for "coords.size()" and then calling "coord.size()"
again is a lost nano-optimization opportunity.

> return coords[nvars-1] +
> std::accumulate(temp.begin(),temp.end(),0.0);

Are you sure that the "factsgk[nvars-1]" value doesn't exist or should
be ignored?

Additionally, instead of using a temporary vector just accumulate the
sum in the loop.

The temporary vector is expensive because unlike `std::string`, a vector
cannot use the short buffer optimization, and so the above code incurs a
dynamic allocation -- really expensive in this context.

>
> }
>
> In real cases, factsgk and coords will both have size of about 3 to
> 5, say, but this function will be called many, many times inside loops
> and I'd like to write it in a way I can get good speed.

A general speedup might possibly be achieved by deferring this work
until you have zillion or two postponed calls. Then you could do them
all in parallel in the graphics card. A little dirty yes but still.

But first, make the code clearly correct.

The "assert" macro is a good tool to employ for that.

Cheers & hth.,

- Alf

Ian Collins

unread,

Nov 18, 2015, 8:02:28 PM11/18/15

to

Stefan Ram wrote:

> "Alf P. Steinbach" <alf.p.stein...@gmail.com> writes:
>
>> Here you should
>> * hint to the compiler to inline calls to this function, by using the
>> keyword "inline",
>

> Which might also actually slow down the code, when it
> will not fit into a cache anymore after the inlining.

Which is why the inline keyword is pointless in this context.

--
Ian Collins

Noob

unread,

Nov 18, 2015, 8:18:22 PM11/18/15

to

Thank you very much for all your remarks!

I apologize for posting a code with logic errors prior to C++ mistakes.
I know it is no excuse but the whole deal about array indexing starting
at 1 in Fortran and 0 in C++ lead me to some mistakes when I wrote the
code for my post. The temp array was there for no logic reason other
than the fact that it remained from another version of the snippet
I typed before posting the final version.

What I have at the moment now is this

int eval_lgk(std::vector<int>& coords, std::vector<int>& factsgk) {

assert( factsgk.size() == coords.size() && coords.size() >= 1 );

int partial = coords.back();

for (size_t i = 0; i < coords.size()-1; i++)
{
partial+= factsgk[i] * ( coords[i] - 1 );
}

return partial;

}

For the moment, I can say that working with arrays is easier in Fortran,
but then there are a lot of other things C++ is allowing me to do much
cleaner and quicker.

I suppose I can start afresh from this. Having this construct: An
element-wise vector multiplication, followed by a "reduction", is this
a good way to code it or can I use some more compact syntax which will
also let the compiler do possibly a better job?

Thanks!

Noob

unread,

Nov 18, 2015, 8:19:20 PM11/18/15

to

On 18/11/2015 20:36, Stefan Ram wrote:

> Noob <dont...@me.com> writes:
>> I'd like to write it in a way I can get good speed.
>

> You already have good speed.
>
> If you really want to optimize: Start by using a profiler.
>

Will do.

Alf P. Steinbach

unread,

Nov 18, 2015, 8:19:52 PM11/18/15

to

Not pointless, but as mentioned, it's important to MEASURE.

This is a very short function that's called a zillion times, so inlining
is likely to improve things. This is precisely the kind of function that
machine code inlining is /for/. Still nothing's guaranteed: only
measuring can tell for sure, or reduce doubt about it.

In addition to being aware of the rôle of caching, one should be aware
that some compilers (like g++) take the "inline" hint very seriously,
while others may ignore it, so measuring should ideally be done for all
relevant compilers and target systems, etc.

Cheers,

- Alf

Ian Collins

unread,

Nov 18, 2015, 9:54:40 PM11/18/15

to

Alf P. Steinbach wrote:
> On 11/19/2015 2:02 AM, Ian Collins wrote:
>> Stefan Ram wrote:
>>> "Alf P. Steinbach" <alf.p.stein...@gmail.com> writes:
>>>
>>>> Here you should
>>>> * hint to the compiler to inline calls to this function, by using the
>>>> keyword "inline",
>>>
>>> Which might also actually slow down the code, when it
>>> will not fit into a cache anymore after the inlining.
>>
>> Which is why the inline keyword is pointless in this context.
>
> Not pointless, but as mentioned, it's important to MEASURE.

It is pointless because the compiler generally knows better than the
programmer which functions to inline. It will be able to include
platform specifics in its choices.

> This is a very short function that's called a zillion times, so inlining
> is likely to improve things. This is precisely the kind of function that
> machine code inlining is /for/. Still nothing's guaranteed: only
> measuring can tell for sure, or reduce doubt about it.

Which will result in in it being inlined whether the hint is there or not.

> In addition to being aware of the rôle of caching, one should be aware
> that some compilers (like g++) take the "inline" hint very seriously,
> while others may ignore it, so measuring should ideally be done for all
> relevant compilers and target systems, etc.

All the more reason to avoid the hint - let the compiler choose.

Totally agree with measuring :)

--
Ian Collins

Paavo Helde

unread,

Nov 19, 2015, 1:38:16 AM11/19/15

to

Noob <dont...@me.com> wrote in news:n2iujk$jp5$1...@dont-email.me:

You need to understand there is nothing special with STL algorithms, they
are mostly template code places in the header file so they get compiled
with exactly the same compiler and with exactly the same optimizer
options than your code. So your loop is most probably as good as theirs.
Just take care to use optimized builds in production and ensure that in
these builds the NDEBUG macro is defined (this switches the assert() line
off - but this probably does not matter anyway).

hth
Paavo

Marcel Mueller

unread,

Nov 20, 2015, 6:54:36 PM11/20/15

to

On 19.11.15 01.45, Alf P. Steinbach wrote:
> For numerical operations using Fortran used to be a good way to speed up
> things. Most numerical libraries started out as Fortran libraries. I
> don't know how that is today, but I think going from Fortran to C++ is
> not likely to /improve/ the speed significantly.

This is not my experience. The Fortran restriction to pass everything by
reference can be quite expensive at small functions. Of course, you
should not expect a factor two.

Marcel

Jorgen Grahn

unread,

Nov 21, 2015, 8:08:54 AM11/21/15

to

On Wed, 2015-11-18, Noob wrote:
> On 18/11/2015 20:17, Ian Collins wrote:

...

>> Maybe now would be a good to for you to restate your requirements and
>> present fresh code?
>>
>
> Yes.
>
> What I have now is
>
> int eval_lgk(std::vector<int>& coords, std::vector<int>& factsgk) {
>
> assert( factsgk.size() == coords.size() && coords.size() >= 1 );
>
> size_t nvars = coords.size();
> int partial = coords.back();
>
> for (size_t i = 0; i < nvars-1; i++)
> {
> partial+= factsgk[i] * ( coords[i] - 1 );
> }
>
> return partial;
>
> }

I haven't been paying attention, but the 'i < nvars-1' triggers a
small alarm. You're ignoring the last element (and fun stuff would
happen if 'nvars' is zero, but your assertion says users aren't
supposed to feed in empty vectors).

The normal pattern in C and (when there's no better alternative)
C++ is

for(unsigned i=0; i < size; i++)

Whenever a piece of code deviates from that, I spend a few seconds to
check it for correctness.

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .

Noob

unread,

Nov 24, 2015, 5:18:24 PM11/24/15

to

Yes, I think this is a good point. Also valid in Fortran. I had that
construct in that form for historical reasons. I've already changed
it to the "standard" form. Thank you.

BTW: I know this is not the place for such off-topic and subjective
stuff, but I'm really enjoying C++. I always thought that the real
advantage of C++ over Fortran was the ease for OOP (which Fortran kind
of supports nowadays), but now I see that template metaprogramming is
one of C++'s strongest points.

I'm already using a profiler and I think it's telling me that my
program is spending a considerable amount of time in allocations. Maybe
I should not use std::vector so indiscriminately. I'll let it be for the
moment because I don't think this will be reason for concern after I
include the really intensive computations.

Cheers.

Ian Collins

unread,

Nov 24, 2015, 6:48:51 PM11/24/15

to

Noob wrote:
>
> I'm already using a profiler and I think it's telling me that my
> program is spending a considerable amount of time in allocations. Maybe
> I should not use std::vector so indiscriminately. I'll let it be for the
> moment because I don't think this will be reason for concern after I
> include the really intensive computations.

One thing to remember is that std::vector has an allocator template
parameter. This allows you to provide your own allocator, so you can
experiment with alternatives to the standard allocator.

--
Ian Collins

Jorgen Grahn

unread,

Nov 25, 2015, 1:25:45 PM11/25/15

to

Good to remember, but he should probably check first if the number of
vector allocations make sense. If he's new to C++ he may accidentally
be make unnecessary copies, or something ...

I've personally never seen a need to mess with the allocators
(which admittedly doesn't mean noone should ever do it!)

Jorgen Grahn

unread,

Nov 25, 2015, 1:32:31 PM11/25/15

to

On Tue, 2015-11-24, Noob wrote:
...

> BTW: I know this is not the place for such off-topic and subjective
> stuff, but I'm really enjoying C++.

With all the negativity around here, I am happy to read that.
I don't think it's offtopic at all.

> I always thought that the real advantage of C++ over Fortran was the
> ease for OOP (which Fortran kind of supports nowadays), but now I
> see that template metaprogramming is one of C++'s strongest points.

It's one of many strengths, but IMO not one of the most important
ones. Maybe it's more important if you come from Fortran ... Anyway,
you'll notice many other strengths over time!