Autodifferentiation Extremely Slow

Giuseppe Trapani

unread,

Oct 15, 2017, 12:30:19 PM10/15/17

to Ceres Solver

Hello everyone and thanks in advance for the support!

I'm working on the solution of a system of non-linear equations. The overall dimension of the system is 12940 residuals X 12940 parameters. Unfortunately all equations, despite being all identical in their form (sum of fractions), depend on almost all the parameters (both at the numerator and at the denominator). If you are interested I can provide a snap from a LaTeX.

Anyway the problem is that the evaluation of the gradient of such a cost function (AutoDiffCostFunction) takes ages. I stopped after 12 minutes on an amazon server (the c4.xlarge) running ubuntu 16.04 created only for this purpose.

Since it's an amazon instance, I have a clean and dedicated build of ceres. I installed the latest version of all the dependencies included Eigen and built using everywhere:

-DCMAKE_BUILD_TYPE=Release
-DCXX11=ON

I also forced full optimization in the CMakeFiles.txt of my problem:

set(CMAKE_CXX_FLAGS_RELEASE "-O3")

Below you can find the code of my cost function. My question is: I do understand that dual numbers (and Jets) are particular types and this involves some care in creating the cost function; am I choosing the wrong flow of operations / structures ? Or is it my problem that it's not suited for Automatic Differentiation ?

I tried to keep the struct very simple (I'm not too good at c++, I'm mostly a Python guy). Evaluating the problem with only the cost takes a few milliseconds, while asking for the gradient it's neverending.

const int NROWS = 1645;
const int NCOLS = 943;
const int NBETAS = 5;
const int DIMS = NROWS + NCOLS;
const int TOT_DIMS = NBETAS * DIMS;

std::array<double, TOT_DIMS> k;
std::array<double, TOT_DIMS> x;

struct Equations {

    template <typename T> bool operator()(const T* const xx, T* eq) const {
        for (int i = 0; i < TOT_DIMS; i++) {
            eq[i] = T(-k[i]);
        }
        for (int beta_ix = 0; beta_ix < NBETAS; beta_ix++) {
            for (int i = 0; i < NROWS; i++) {
                for (int alpha = NROWS; alpha < DIMS; alpha++) {
                    T zum = T(0.0);
                    for (int bb = 0; bb < NBETAS; bb++) { zum += xx[i + bb * DIMS] * xx[alpha + bb * DIMS]; }
                    T tt = xx[i + beta_ix * DIMS] * xx[alpha + beta_ix * DIMS] / (T(1.0) + zum);
                    eq[i + beta_ix * DIMS] += tt;
                    eq[alpha + beta_ix * DIMS] += tt;
                }
            }
        }
        return true;
    }
};

Thanks in advance for any suggestion you can drop!!

Sameer Agarwal

unread,

Oct 15, 2017, 2:00:02 PM10/15/17

to ceres-...@googlegroups.com

Two suggestions.

Break each equation into it's own residual block. And with multiple threads you will get parallel evaluation or you should thread your cost functor itself.

Also, given then size of the parameters block, autodiff is going to be very slow because of heap allocations.

You are going to get much better performance by doing analytical derivatives.

Sameer

--
You received this message because you are subscribed to the Google Groups "Ceres Solver" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ceres-solver...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ceres-solver/868c2c62-3db9-426e-b4c4-3ce024d1be95%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Giuseppe Trapani

unread,

Oct 18, 2017, 4:50:38 PM10/18/17

to Ceres Solver

Thanks for the suggestion Sameer!

I modified the function to use analytic derivatives and implement a couple of more tricks (like some more knowledge on the parameters and jacobians etc).

Unfortunately the trust region solvers appear to be extremely slow (I have seen just one iteration in loads of time). The line search algorithms instead appear to go quite fast, but they never reach a satisfactory error (the cost function is never less than 1e^8). I'll keep trying different options and maybe update the post.

Just as a final question: is the GradientChecker still available? I want to check if the analytical derivatives are computed correctly.

Thank you again for the support.

Sameer Agarwal

unread,

Oct 18, 2017, 5:01:17 PM10/18/17

to ceres-...@googlegroups.com

What linear solver are you using? I bet all of your time in the trust region solver is going in computing the trust region step, which requires factoring a non-trivial sized matrix 13kx13k.

Do you have access to a threaded BLAS + LAPACK implementation? something like Intel MKL?

which line search solver did you try? are you doing LBFGS? you may want to play with the rank parameter there.

To view this discussion on the web visit https://groups.google.com/d/msgid/ceres-solver/e5a97efb-e3ae-40b6-8950-ff5f247de385%40googlegroups.com.

Giuseppe Trapani

unread,

Oct 18, 2017, 6:27:08 PM10/18/17

to Ceres Solver

I did use all of the Line Search solvers but I haven't tweaked any parameter in particular. I'll see what I can do with the rank parameter you are saying, thanks for the suggestion!

As for the Trust Region I tried again all of the linear solvers but I couldn't get anyone to go past the first iteration. And yeah probably the factorization of the jacobian is killing all the performance even though I managed to install SuiteSparse and set it all up on a bigger amazon ec2 instance and setting num_threads = 64 as well as num_linear_solver_threads = 64.

I'll keep playing with some of the parameters anyway! Thanks again for the answer, if you have any other suggestion (especially on the parameters to tweak) I'll give it a try!

Sameer Agarwal

unread,

Oct 18, 2017, 6:45:13 PM10/18/17

to ceres-...@googlegroups.com

Giuseppe,

Use either DENSE_QR or DENSE_NORMAL_CHOLESKY, with LAPACK as your dense_linear_algebra_library and make sure you are using a high quality threaded BLAS underneath, like gotoblas, openblas, ATLAS or intel MKL.

SuiteSparse is of no use in this case

Sameer

--
You received this message because you are subscribed to the Google Groups "Ceres Solver" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ceres-solver...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/ceres-solver/a306773f-ac67-47f7-8b85-e0e6dcac9df4%40googlegroups.com.

Keir Mierle

unread,

Oct 18, 2017, 7:12:37 PM10/18/17

to ceres-...@googlegroups.com

The gradient checker is available; it's just an options flag.

https://github.com/kashif/ceres-solver/blob/master/include/ceres/solver.h#L90

On Wed, Oct 18, 2017 at 3:44 PM, 'Sameer Agarwal' via Ceres Solver <ceres-...@googlegroups.com> wrote:

Giuseppe,

Use either DENSE_QR or DENSE_NORMAL_CHOLESKY, with LAPACK as your dense_linear_algebra_library and make sure you are using a high quality threaded BLAS underneath, like gotoblas, openblas, ATLAS or intel MKL.

SuiteSparse is of no use in this case
Sameer

On Wed, Oct 18, 2017 at 3:27 PM 'Giuseppe Trapani' via Ceres Solver <ceres-...@googlegroups.com> wrote:

I did use all of the Line Search solvers but I haven't tweaked any parameter in particular. I'll see what I can do with the rank parameter you are saying, thanks for the suggestion!

As for the Trust Region I tried again all of the linear solvers but I couldn't get anyone to go past the first iteration. And yeah probably the factorization of the jacobian is killing all the performance even though I managed to install SuiteSparse and set it all up on a bigger amazon ec2 instance and setting num_threads = 64 as well as num_linear_solver_threads = 64.

I'll keep playing with some of the parameters anyway! Thanks again for the answer, if you have any other suggestion (especially on the parameters to tweak) I'll give it a try!

--
You received this message because you are subscribed to the Google Groups "Ceres Solver" group.

To unsubscribe from this group and stop receiving emails from it, send an email to ceres-solver+unsubscribe@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/ceres-solver/a306773f-ac67-47f7-8b85-e0e6dcac9df4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

You received this message because you are subscribed to the Google Groups "Ceres Solver" group.

To unsubscribe from this group and stop receiving emails from it, send an email to ceres-solver+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ceres-solver/CABqdRUAT8g66Fh3zjm%3Du9NyKufGVYRYuooK7wvPL0qyDCRzfMA%40mail.gmail.com.

Giuseppe Trapani

unread,

Oct 19, 2017, 3:14:57 AM10/19/17

to Ceres Solver

Hi Keira thanks for the answer!

Yes I used it already and it worked nicely, I was asking for the GradientChecker because I thought it was a class with some specific methods to access one block at a time or stuff like that!

Thanks again for the suggestion!

Giuseppe Trapani

unread,

Oct 19, 2017, 5:41:41 AM10/19/17

to Ceres Solver

Thanks for the suggestion Sameer !

I set LAPACK and DENSE_NORMAL_CHOLESKY. I finally see iterations (it takes 340 seconds) but the cost function is actually increasing therefore there is something not going right. My jacobian seems fine but I'll do some more gradient checks again (up to now the greatest error between analytic and numeric was ~ 1e-7)

As for the BLAS I follow the installation instructions for linux installing both libatlas and libopenblas before Eigen, then I install Eigen 3.3.4 directly from their source (I don't use the apt-get repository) cmake-ing it in release mode etc. Finally I install libsuitesparse.

Then I build ceres solver in release mode and install it. When I cmake my code (in release ofc) the output tells me that Ceres is present and installed with LAPACK (and some others like Suitesparse, OpenMP, C++11 etc).

One thing I noticed is that in top at the terminal the cpu is running steady at 100% except for like one second where it fly to 3200% right before printing an iteration output. Is this is a sign that the program is not really multithreading during the optimization but just at the end when it computes the residuals, gradient etc before outputing?

As always thank you for your time guys!

Giuseppe Trapani

unread,

Oct 19, 2017, 2:40:29 PM10/19/17

to Ceres Solver

I managed to build OpenBlas directly on my aws instance and then rebuild all ceres in order to be sure I wasn't overlooking anything. Unfortunately the situation is not improving : no solver (trust region or line search) and no linear solver manages to land succesful iterations... maybe one or two in a row but then the cost function rise again.

One thing I noticed is that a similar albeit smaller problem can be solved (correctly) only starting from 0.5 for all the parameters in just 7 iterations... input whatever other initial point and the solver does not converge.

I think I'll spend some more time tweaking the function and maybe coding some more tricks into it!

Thanks as always for the support!!

On Thursday, October 19, 2017 at 12:45:13 AM UTC+2, Sameer Agarwal wrote:

Sameer Agarwal

unread,

Oct 19, 2017, 4:35:17 PM10/19/17

to ceres-...@googlegroups.com

Giuseppe,

I think I may have misunderstood your problem. I thought you were saying that the linear algebra was slow, but are you saying that the solver is unable to make a successful step and terminates without making progress? If it is the latter,

then the problem is likely the conditioning of your problem and you should look into how your parameters are scaled relative to each other.

Sameer

To view this discussion on the web visit https://groups.google.com/d/msgid/ceres-solver/c84fba23-44cb-455a-8740-3c791b8f1fd0%40googlegroups.com.

Giuseppe Trapani

unread,

Oct 19, 2017, 4:52:29 PM10/19/17

to Ceres Solver

You understood correctly in both cases :)

At first the trust region solvers were just stuck at the first iteration. Then I followed your advice and moved to LAPACK + DENSE_XX and finally the solvers started "solving".

Unfortunately most of the steps are in the wrong direction, in many iterations the cost function grows. And this happens regardless of using LEVENBERG_MARQUARDT or DOGLEG strategy.

You are right in any case, I decided to go a bit deeper in the scaling of the parameters. The fact is that there is not really much to try in terms of reparametrization so I'll try to see how the solver behaves in simulated data.

Thank you very much for all your advices Sameer! I'll post again maybe if I find something new!

Reply all

Reply to author

Forward