Hi Won,
Here's the diff. I saw from your comments you were thinking of this anyway :)
A couple of other (untested) ideas I had that *might* really help this.
1. I think if the active flag for triangles is moved into a large bitset, it will make scanning for true bits much faster as you can check 32 at a time. Though it will add verbosity. Trying to figure a way to encapsulate something like this (not tested, bugs ahead..):
std::vector<int> activeBits; // numtris/32
...
for (int i = 0 ; i < activeBits.size() ; ++i)
{
const int theseBits = activeBits[i];
if (theseBits)
{
unsigned int bit=0x1;
unsigned int base = i*sizeof(int);
for (int j = 0; i < sizeof(int) ; ++j)
{
if (bit & theseBits)
{
unsigned int tri = base + j;
// active.
}
bit = bit << 1;
}
}
}
2. A smaller help might be to cache the values of powf. At least for the cache contribution, the possible # values are very limited (to the size of the cache). Not sure if there is a space efficient way to store the valence contribution.
Kevin