Ceres - Memory and Integer Overflow Limitations

413 views
Skip to first unread message

tobias...@nframes.com

unread,
Aug 7, 2018, 9:40:56 AM8/7/18
to Ceres Solver

Hi,

we're fairly new to experimenting with Ceres and so far had some very good results but also some limitations that we've encountered and would like to get your impression and support on fixing those.

The first issue that we encountered is related to an integer overflow in block_sparse_matrix.cc. It seems like Ceres counts the number of non-zero entries and stores the result in an integer.
This integer will overflow if  the number of non-zero entries adds up to more than 2147483648 entries. What we're doing internally is estimating two parameters with a dimension of 4 each. On top of that each residual block in our Cost Function has a dimension of 4.

Starting from there the math seems to tell us that we will overflow as soon as we input more than 67108864.0 (67mio) residual blocks into our system. The formula I used to calculate this is: numParameters * parameterDimension * numResidualBlocks * residualDimension = num non-zeros.
From our testing this value seems to correlate with the assert in block_sparse_matrix.cc triggering. Although this is quite a large number of residual blocks we do hit this value. We sometimes estimate up to 40k parameters (with a dimension of 4 each).
We are already removing outliers and reducing the redundancy as much as possible before inputting the data into Ceres, but since we have a lot of dependencies between the different residuals we're having trouble to do this without sacrificing quality.

Another issue we have is that the estimation is taking up quite a lot of memory. Attached is a graph showing memory consumption. I put the part where residuals are added to the problem into a green box and the solving part into an orange box:




















As you can see we have quite a significant peak when starting to solve. The scale of the memory usage grows with the number of residual blocks that we add.
We were hoping that using SPARSE_NORMAL_CHOLESKY would be enough for us to be able to deal with this size of systems, but we're running out of memory on a 32GB machine when adding ~40mio residual blocks.

Do you have any suggestions for further configuring Ceres to reduce the amount of memory used and avoiding to run into the integer overflow issue described above?


What we're currently using for building:
    CUSTOM_BLAS=OFF
    CXSPARSE=OFF
    CXX11_THREADS=OFF
    CXX11=ON
    EIGENSPARSE=ON
    GFLAGS=OFF
    LAPACK=OFF
    MINIGLOG=ON
    MINIGLOG_MAX_LOG_LEVEL=-4
    OPENMP=OFF
    SUITESPARSE=OFF
    TBB=ON

What we use for solving:
SPARSE_NORMAL_CHOLESKY and automatic differentiation
On top of that we did turn enable_fast_removal on since we're iteratively removing outliers from the Problem


Thanks a lot in advance,
Tobias

Sameer Agarwal

unread,
Aug 8, 2018, 1:55:30 AM8/8/18
to ceres-...@googlegroups.com
Tobias,
My answers are inline.

On Tue, Aug 7, 2018 at 6:40 AM <tobias...@nframes.com> wrote:

Hi,

we're fairly new to experimenting with Ceres and so far had some very good results but also some limitations that we've encountered and would like to get your impression and support on fixing those.

The first issue that we encountered is related to an integer overflow in block_sparse_matrix.cc. It seems like Ceres counts the number of non-zero entries and stores the result in an integer.
This integer will overflow if  the number of non-zero entries adds up to more than 2147483648 entries. What we're doing internally is estimating two parameters with a dimension of 4 each. On top of that each residual block in our Cost Function has a dimension of 4.

yes a signed integer is a bug here. can you file a bug on github and I will take a look.
 
Starting from there the math seems to tell us that we will overflow as soon as we input more than 67108864.0 (67mio) residual blocks into our system. The formula I used to calculate this is: numParameters * parameterDimension * numResidualBlocks * residualDimension = num non-zeros.

thats roughly correct.
.
From our testing this value seems to correlate with the assert in block_sparse_matrix.cc triggering. Although this is quite a large number of residual blocks we do hit this value. We sometimes estimate up to 40k parameters (with a dimension of 4 each).

thats a fairly large optimization problem.
 
We are already removing outliers and reducing the redundancy as much as possible before inputting the data into Ceres, but since we have a lot of dependencies between the different residuals we're having trouble to do this without sacrificing quality.

Another issue we have is that the estimation is taking up quite a lot of memory. Attached is a graph showing memory consumption. I put the part where residuals are added to the problem into a green box and the solving part into an orange box:
















This increase in memory has to do with the memory needed for storing the cholesy factorization of the normal equations. This is a fundamental part of using SPARSE_NORMAL_CHOLESKY.

As you can see we have quite a significant peak when starting to solve. The scale of the memory usage grows with the number of residual blocks that we add.
We were hoping that using SPARSE_NORMAL_CHOLESKY would be enough for us to be able to deal with this size of systems, but we're running out of memory on a 32GB machine when adding ~40mio residual blocks.

you are certainly pushing the limits of the system. what kind of optimization problem are you solving? is it a photogrammetry problem? bundle adjustment?

Do you have any suggestions for further configuring Ceres to reduce the amount of memory used and avoiding to run into the integer overflow issue described above?

What we're currently using for building:
    CUSTOM_BLAS=OFF
    CXSPARSE=OFF
    CXX11_THREADS=OFF
    CXX11=ON
    EIGENSPARSE=ON
    GFLAGS=OFF
    LAPACK=OFF
    MINIGLOG=ON
    MINIGLOG_MAX_LOG_LEVEL=-4
    OPENMP=OFF
    SUITESPARSE=OFF
    TBB=ON

Eigen is not a good choice for solving a non-linear least squares problem of this size. Use SuiteSparse. It will make a large difference to the performance of your system, both memory and in time.

I also recommend using threads,  since that will allow you to evaluate the jacobian way faster.
 

What we use for solving:
SPARSE_NORMAL_CHOLESKY and automatic differentiation
On top of that we did turn enable_fast_removal on since we're iteratively removing outliers from the Problem

enable_fast_removal will also cost you memory.

Sameer
 


Thanks a lot in advance,
Tobias

--
You received this message because you are subscribed to the Google Groups "Ceres Solver" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ceres-solver...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ceres-solver/9ba82f79-7e34-4c9f-af4f-623535ae2a30%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

tobias...@nframes.com

unread,
Aug 9, 2018, 12:21:22 PM8/9/18
to Ceres Solver
Hi Sameer,

thanks a lot for your prompt answer. I will create a bug for the signed vs unsigned integer topic.

Regarding our build settings: We will try out the threads option. Using SuiteSparse is at this point in time unfortunately not a real option for us as it means quite a big overhead for us.
I did try it out on a small example though and didn't see a real difference in terms of memory usage. Since it's not really possible for us to use SuiteSparse in the short term I didn't invest much time into testing.
If you could give us a rough impression on what kind of memory savings you have seen before by naming a "ballpark figure" that would definitely be interesting for us to decide how to prioritize looking into this for the future.

It's a photogrammetry related topic that we're working on. Sorry for being a bit vague here!

Thanks,
Tobias

Sameer Agarwal

unread,
Aug 9, 2018, 12:23:43 PM8/9/18
to ceres-...@googlegroups.com
On Thu, Aug 9, 2018 at 9:21 AM <tobias...@nframes.com> wrote:
Hi Sameer,

thanks a lot for your prompt answer. I will create a bug for the signed vs unsigned integer topic.

Regarding our build settings: We will try out the threads option. Using SuiteSparse is at this point in time unfortunately not a real option for us as it means quite a big overhead for us.
I did try it out on a small example though and didn't see a real difference in terms of memory usage. Since it's not really possible for us to use SuiteSparse in the short term I didn't invest much time into testing. 
If you could give us a rough impression on what kind of memory savings you have seen before by naming a "ballpark figure" that would definitely be interesting for us to decide how to prioritize looking into this for the future.

you can try out the bundle_adjuster examples with some of the larger problems from the BAL dataset to get an idea.
 

It's a photogrammetry related topic that we're working on. Sorry for being a bit vague here!

if its bundle adjustment you should use SPARSE_SCHUR  instead of SPARSE_NORMAL_CHOLESKY.

Sameer
 
Reply all
Reply to author
Forward
0 new messages