for (int i=1; i<N; i++)
{
x[i] = f(x[i], x[i-1]);
}
where f(_, _) represents a function of iterative expression
I try to write a metaprogramming verion like
template <int size>
inline void meta_iter(double *x)
{
*x = f(*x, *(x-1));
metaThomas<size-1>(x+1);
}
template<> inline void meta_iter(double *x) {}
However, the performance of the metaprogramming version is not better
than ordinary version. In contrast, the former is slower than the
later!!!
Just a wild guess: for large values of N, you might unroll more of the loop
than is good for you. The code for the loop needs to be fetched from
memory. For a large body of code that is going to trigger cache misses.
Your compiler probably knows better how to avoid these than the template.
Best
Kai-Uwe Bux
Let's examine the current routine:
for (int i=1; i<N; i++)
{
x[i] = f(x[i], x[i-1]);
}
There are few changes that could be made to this routine that would
probably speed it up, and none of them necessarily require templates.
First, I would note that x[i] is calculated twice in one statement.
Second, x[i-1] is calculated even though the previous iteration had
just stored the value at that location. Eliminating both redundant and
unnecessary memory accesses as well as unrolling the loop should
improve performance:
double *index = &x[1];
double value = x[0];
for (int i = 0; i < N/4; i++)
{
value = f(*index, value);
*(++index) = value;
value = f(*index, value);
*(++index) = value;
value = f(*index, value);
*(++index) = value;
value = f(*index, value);
*(++index) = value;
}
for (int i = 0; i < N%4; i++)
{
value = f(*index, value);
*(++index) = value;
}
Greg