How can I avoid redundant computations across cost functions?

428 views
Skip to first unread message

Stephane

unread,
Dec 20, 2016, 3:51:01 PM12/20/16
to Ceres Solver
Hi, 

I'm interested in knowing if there is a way to save redundant work happening in a cost function.

I'll try to describe an example illustrating what I'm after:

Say I have a 2D grid mesh and I want to solve for 2D displacements of the vertices of that grid such that some function of these grid vertex positions is minimized.
And let's also say the cost function is not evaluated at the vertices of the grid but rather at sample points distributed within the grid (and the cost for each sample only depends on the 4 corners of the face where that sample point lands). Our grid would count maybe a few thousand cells and one would use maybe a few hundred sample points per grid cell.

My first naive implementation would instantiate a cost function per sample point and perform the displacement of the 4 cell vertices each time (i.e., since we have a few hundred sample points per cell, we would compute the same new positions a few hundred times).
The cost function code would look something like this (it's incomplete code, just to get the idea through):

class CostFunctor {
   
template<typename T>
   
bool operator()(T const * const * parameters,
                    T
* residual) const {
       
// disp is an array of 4 2D displacements,
       
// one for each quad vertex. It is the quantity  
       
// we are solving for.
        T
const * disp = parameters[0];
       
// p0, p1, p2, p3 store the quad new 2D vertex positions
       
// after applying the displacement using some
       
// internal function
        T p0
[2], p1[2], p2[2], p3[2];
        displaceQuadVertices
(disp, p0, p1, p2, p3);
       
// After displacing the quad vertices, we evaluate the
       
// cost function at the sample point. (u,v) are the coordinates
       
// of that sample point passed in the constructor for instance.
        T res
= evaluateFunctionAtSample(u, v, p0, p1, p2, p3);
       
// target is the desired value at that sample, also passed
        // in the constructor.
        residual = res - target;
       
return true;
   
}
};


Obviously this is wasteful: we keep on recomputing the same displacements over and over. I could turn things around a bit and instantiate a (different) cost function per grid cell and evaluate the residuals for all sample points inside that cell but there would still be redundant computation (4x waste for all internal vertices).

Ideally, I would like to perform the displacement of vertices (along with their Jacobian computation) before evaluating the cost function at sample points (where we would just fetch these precomputed values and Jacobians). But I don't know how to do that in the context of a Ceres solve.

Does it make sense? If so, does anyone know if there is a general way to avoid redundant computations in cost functions like the one above?

Thanks!

Stephane

Sameer Agarwal

unread,
Dec 20, 2016, 9:27:54 PM12/20/16
to Ceres Solver
Stephane,
This is a feature request that has come up in the past, but unfortunately right now there is no way to do this. 

We have thought about exposing an API for this purpose involving a callback, but haven't gotten around to it.  

I have filed https://github.com/ceres-solver/ceres-solver/issues/244 to track it. Of course if you have API/Code ideas for this, they are welcome.

Sameer

--
You received this message because you are subscribed to the Google Groups "Ceres Solver" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ceres-solver...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ceres-solver/a616282e-7d35-42fc-8e3d-1f52b1b7acb0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Stephane

unread,
Dec 21, 2016, 7:39:50 PM12/21/16
to Ceres Solver
Hi Sameer,

Good to know, thanks for the fast answer.
I'm pretty new to Ceres so I only have limited insight as to what API would be best to address this issue.
I'm guessing the callback mentioned in the ticket would be called before the cost function is run? We would still need another piece of API to access the pre-computed values and Jacobian inside the cost function, wouldn't we?
Most other API scenarios I can think of would involve caching data on the fly which would impair multi-threading performances, so not ideal.

Stephane

Sameer Agarwal

unread,
Dec 22, 2016, 5:37:26 AM12/22/16
to ceres-...@googlegroups.com
CIL

On Wed, Dec 21, 2016 at 4:39 PM Stephane <stephan...@gmail.com> wrote:
Hi Sameer,

Good to know, thanks for the fast answer.
I'm pretty new to Ceres so I only have limited insight as to what API would be best to address this issue.
I'm guessing the callback mentioned in the ticket would be called before the cost function is run? We would still need another piece of API to access the pre-computed values and Jacobian inside the cost function, wouldn't we?

I think it would be a callback that gets called before any call to a cost function. 

A couple of things:

1. The parameter blocks that are visible to the user need to be updated before any call so that they can do the precomputation. So this would require Solver::Options::update_state_every_iteration to be true, but actually more frequently than every iteration.

2. I do not think the cost functions need to have a separate api for passing this data, since CostFunctions are objects, and the user can keep track of this data themselves.

3. If the user uses inner iterations, it requires an internal partitioning of the problem, and each of the pieces are solved in parallel, this would conflict with such a callback. So I need to think through how to work around this.

Sameer

 

Milan Vukov

unread,
Feb 8, 2017, 9:14:32 AM2/8/17
to Ceres Solver
Hi Sameer,

I am facing the same issue, so, are there any news regarding caching of the computations? If you would provide a bit more hints, I would be glad to try them out. I thought already about using callbacks, but I cannot compose a clear picture in my head and I am not familiar with Ceres internals enough to judge whether this would work. Just to be clear, my use-case boils down to:

for i=1: n
 f_i
= f(x_i)
 
for j=1: m
  residual_
{i, j} = g(f_i, y_j)

In my current implementation I always recompute f_i in the inner loop, instead of caching it.

Some time ago, I was experimenting with optimizing this out by replacing residual_{i, j} with one larger residual_i but this made conditioning of the problem much worse. On the other hand, for my use cases, Jacobian evaluation time was cut down by half.

Cheers,
Milan

William Rucklidge

unread,
Feb 8, 2017, 11:47:05 AM2/8/17
to ceres-...@googlegroups.com
If you really must do this, here's one way that it could be done. Major caveat: I haven't tried this out, but I believe that it could work. Also note that I'm writing code in a gmail window... compilation as-is is not a goal; consider this a sketch of what it would look like (and I've probably got the syntax wrong in a bunch of places). It's possible that there are modes of operation of Ceres where it would actually be worse than the naïve approach (does Ceres ever interleave evaluations at different points in parameter space?)

Note that this compromises several abstraction barriers somewhat; such are the demands of performance. I'm assuming that the dimensionality of X is known at compile time. What it does is to hash the input values of x, and if that hash is the same as the last time through, it re-uses pre-calculated values of f. (Hash collisions are assumed to never happen.) The trick is to do this in an autodifferentiation context, so that x might be an array of doubles or of Jets. I define a little templated helper that gets the scalar value of either type. I'm only hashing the scalar component of the Jets - that works fine for the case that Ceres is invoking the cost function directly (at that point, the partials stored in the Jet aren't interesting, as they're always the same: the i'th partial of x[i] is one; everything else is zero) but if you ever invoke this from a context where the Jets contain the result of some previous computation (so their partials have structure) you might want to hash the partials as well. There's some locking because Ceres can call multiple cost functions in parallel.

template<typename T> double getValue(T v);
template<> double getValue<double>(double v) { return v; }
template<typename T, int N> getValue<Jet<T, N>>(Jet<T, N> v) { return v.a; }

struct MyCostFunctor() {
  template<typename T>
  bool operator()(const T* x, const T* y, T* residual) {
    static std::array<T, DIMENSIONALITY_OF_X> cache_of_f;
    static int64 hash_of_x;
    static Mutex m;
    ThingThatCanHashDoubles hasher;
    for (int i = 0; i < DIMENSIONALITY_OF_X; i++) {
      hasher.AddThingToHash(getValue(x[i]));
    }
    int64 hash = hasher.HashAllTheThings();
    std::array<T, DIMENSIONALITY_OF_X> f;
    {
      MutexLock lock(&m);
      if (hash != hash_of_x) {
        for (int i = 0; i < DIMENSIONALITY_OF_X) {
          cache_of_f[i] = f[i] = f(x[i]);
        }
        hash = hash_of_x;
      } else {
        for (int i = 0; i < DIMENSIONALITY_OF_X) {
          f[i] = cache_of_f[i];
        }
      }
    }
    for (int j = 0; j < DIMENSIONALITY_OF_Y; j++) {
      residuals[j] = g(f[i], y[j]);
    }
  }
};

To unsubscribe from this group and stop receiving emails from it, send an email to ceres-solver+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ceres-solver/97df6437-e428-4a79-9078-34bbcae7aaab%40googlegroups.com.

Sameer Agarwal

unread,
Feb 9, 2017, 2:36:31 AM2/9/17
to ceres-...@googlegroups.com
Milan,
So the callback mechanism that is needed for this is not implemented yet. My principal concern is that this will break inner-iterations, and I have not been able to come up with a way to deal with that. and thus not much progress on this front :/
Sameer



Milan Vukov

unread,
Feb 21, 2017, 11:28:13 AM2/21/17
to Ceres Solver
Hi William and Sameer,

Many thanks for the replies. I have been chewing the suggestion William gave me and here are some findings...

Indeed, hashing the function arguments is one way to go. Implementation for my use-case needs hashing of the Jets as well. In total, in case of Jets, that meant that for each residual function call I needed to hash ~100 doubles. To summarize briefly here, I haven't seen almost any speedup. Given that computation of each residual_{i, j}  is (relatively) cheap, it indeed can be that hashing eats a lot of time in my case. Please note again, that in my use-case:

for i=1: n
 f_i 
= f(x_i)
 
for j=1: m
  residual_
{i, j} = g(f_i, y_j)

I would like to re-use f_i x10^2 - x10^3 times.

Yet another problem is the mutex. In case a mutex guard is placed before any operation on the cache, this is a performance killer. The performance gets worse than if there is no cache. If I place two guards, one before cache lookup, and one before writing to the cache, the performance gets close to the original implementation (no cache). (This means that a computation can be calculated a few more times than necessary, but that's OK).

FWIW, using static cache (like William suggested) didn't require any callbacks to the solver.

I tried also an alternative: in my use-case I though that I can hash arguments based on other app-relavant info, say hashing only a small tuple (before the solver starts calculating). However, this does not work as I use DynamicAutoDiffCostFunction, thus Ceres evaluates the cost function for Jacobian using different argument values. In this use-case, I indeed needed a solver callback to clean cache after each iteration.

After all, I have the felling (no big surprise) that caching might pay off if computation of each residual would be expensive.

I see somehow that my second approach would be possible if I would provide Jacobian computation of f_i and residual_{i,j}, but that's a different story.

Any other suggestions are highly appreciated :)

Cheers,
Milan
Reply all
Reply to author
Forward
0 new messages