On 11/18/2015 10:30 PM, Noob wrote:
>
> I'm new to C++, coming from Fortran. I'd like some help writing the
> following function in a way to improve speed, if it turns out to be to
> badly written.
For numerical operations using Fortran used to be a good way to speed up
things. Most numerical libraries started out as Fortran libraries. I
don't know how that is today, but I think going from Fortran to C++ is
not likely to /improve/ the speed significantly.
Anyway, the first and also second rule of optimization is to MEASURE.
Be sure to measure the right thing, though, i.e. be sure to ask your
compiler to optimize for speed. Usually that's an option like "-O2". But
consult your compiler's or IDE's documentation or GUI. Also define
"NDEBUG" to remove assertion checking. It's generally a good idea to
leave assertion checking on, but not in time-critical code.
> int eval_lgk(const std::vector<int> coords,
> const std::vector<double>factsgk)
Here you should
* hint to the compiler to inline calls to this function, by using the
keyword "inline",
* pass the arguments by reference to const, not by value.
Passing by value the arguments are (at least logically) copied, and such
copying involves expensive dynamic allocations unless it's optimized
away by the implementation.
> {
>
> unsigned int nvars = coords.size();
For clarity it's best to declare "nvars" as "const". Sprinkle "const"
liberally throughout your code, wherever possible. That way you can see
at a glance, up front, that that value will not be changing, and this
helps to understand the code faster, and avoid silly mistakes.
Also, at this point add
assert( nvars >= 2 );
Include the "<assert.h>" header for that.
If nvars is 0 then the declaration below will declare a really huge
"temp" vector, and if nvars <= 1 then the loop will likewise loop a
really large number of times.
> std::vector<double> temp(coords.size()-1);
>
> for (unsigned int i = 0; i < nvars-2; i++)
> {
> temp[i] = factsgk[i] * ( coords[i] - 1.0 );
> }
Either this loop is incorrect, or else the declaration of "temp" is
incorrect (apart from the item type which you have explained in a
follow-up should be "int"). "temp" has nvars-1 items, but only nvars-2
items are assigned to.
By the way, although the compiler is almost sure to optimize this,
introducing a name for "coords.size()" and then calling "coord.size()"
again is a lost nano-optimization opportunity.
> return coords[nvars-1] +
> std::accumulate(temp.begin(),temp.end(),0.0);
Are you sure that the "factsgk[nvars-1]" value doesn't exist or should
be ignored?
Additionally, instead of using a temporary vector just accumulate the
sum in the loop.
The temporary vector is expensive because unlike `std::string`, a vector
cannot use the short buffer optimization, and so the above code incurs a
dynamic allocation -- really expensive in this context.
>
> }
>
> In real cases, factsgk and coords will both have size of about 3 to
> 5, say, but this function will be called many, many times inside loops
> and I'd like to write it in a way I can get good speed.
A general speedup might possibly be achieved by deferring this work
until you have zillion or two postponed calls. Then you could do them
all in parallel in the graphics card. A little dirty yes but still.
But first, make the code clearly correct.
The "assert" macro is a good tool to employ for that.
Cheers & hth.,
- Alf