Ceres 2.0 - Crash on Ubuntu

211 views
Skip to first unread message

Pierre Moulon

unread,
May 8, 2020, 9:26:34 PM5/8/20
to ceres-...@googlegroups.com
Hello,

I encounter issue from moving from ceres 1.13/1.14 to the WIP ceres 2.0 (master).

[Context] I'm using Ceres in OpenMVG and having a crash when I try switch to Ceres 2.0 on Ubuntu (master).

here is the stack trace, do you have any idea about what could be wrong (same code on MacOs + clang is running fine)

==4130== Invalid free() / delete / delete[] / realloc()

==4130==    at 0x4C30C9B: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)

==4130==    by 0x4C31D97: realloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)

==4130==    by 0x230D0C5: Eigen::internal::aligned_realloc(void*, unsigned long, unsigned long) (Memory.h:194)

==4130==    by 0x231911D: void* Eigen::internal::conditional_aligned_realloc<true>(void*, unsigned long, unsigned long) (Memory.h:240)

==4130==    by 0x2318883: int* Eigen::internal::conditional_aligned_realloc_new_auto<int, true>(int*, unsigned long, unsigned long) (Memory.h:396)

==4130==    by 0x2316D96: Eigen::DenseStorage<int, -1, -1, 1, 0>::conservativeResize(long, long, long) (DenseStorage.h:548)

==4130==    by 0x23159F1: Eigen::internal::conservative_resize_like_impl<Eigen::Matrix<int, -1, 1, 0, -1, 1>, Eigen::Matrix<int, -1, 1, 0, -1, 1>, true>::run(Eigen::DenseBase<Eigen::Matrix<int, -1, 1, 0, -1, 1> >&, long) (PlainObjectBase.h:989)

==4130==    by 0x2313644: Eigen::PlainObjectBase<Eigen::Matrix<int, -1, 1, 0, -1, 1> >::conservativeResize(long) (PlainObjectBase.h:434)

==4130==    by 0x2310D76: void Eigen::internal::minimum_degree_ordering<int, int>(Eigen::SparseMatrix<int, 0, int>&, Eigen::PermutationMatrix<-1, -1, int>&) (Amd.h:438)

==4130==    by 0x230D8C7: void Eigen::AMDOrdering<int>::operator()<Eigen::SparseMatrix<int, 0, int> >(Eigen::SparseMatrix<int, 0, int> const&, Eigen::PermutationMatrix<-1, -1, int>&) (Ordering.h:69)

==4130==    by 0x230A5F0: ceres::internal::MaybeReorderSchurComplementColumnsUsingEigen(int, std::map<double*, ceres::internal::ParameterBlock*, std::less<double*>, std::allocator<std::pair<double* const, ceres::internal::ParameterBlock*> > > const&, ceres::internal::Program*) (reorder_program.cc:415)

==4130==    by 0x230AD31: ceres::internal::ReorderProgramForSchurTypeLinearSolver(ceres::LinearSolverType, ceres::SparseLinearAlgebraLibraryType, std::map<double*, ceres::internal::ParameterBlock*, std::less<double*>, std::allocator<std::pair<double* const, ceres::internal::ParameterBlock*> > > const&, ceres::OrderedGroups<double*>*, ceres::internal::Program*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*) (reorder_program.cc:516)

==4130==  Address 0xaeb3220 is 32 bytes inside a block of size 36 alloc'd

==4130==    at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)

==4130==    by 0x1CF4E43: Eigen::internal::handmade_aligned_malloc(unsigned long) (Memory.h:88)

==4130==    by 0x1CF4EC7: Eigen::internal::aligned_malloc(unsigned long) (Memory.h:164)

==4130==    by 0x1CFB852: void* Eigen::internal::conditional_aligned_malloc<true>(unsigned long) (Memory.h:214)

==4130==    by 0x1F465FD: int* Eigen::internal::conditional_aligned_new_auto<int, true>(unsigned long) (Memory.h:374)

==4130==    by 0x1F46A35: Eigen::DenseStorage<int, -1, -1, 1, 0>::resize(long, long, long) (DenseStorage.h:557)

==4130==    by 0x1F45A9C: Eigen::PlainObjectBase<Eigen::Matrix<int, -1, 1, 0, -1, 1> >::resize(long) (PlainObjectBase.h:319)

==4130==    by 0x1F48FA7: Eigen::PermutationBase<Eigen::PermutationMatrix<-1, -1, int> >::resize(long) (PermutationMatrix.h:138)

==4130==    by 0x230EC20: void Eigen::internal::minimum_degree_ordering<int, int>(Eigen::SparseMatrix<int, 0, int>&, Eigen::PermutationMatrix<-1, -1, int>&) (Amd.h:107)

==4130==    by 0x230D8C7: void Eigen::AMDOrdering<int>::operator()<Eigen::SparseMatrix<int, 0, int> >(Eigen::SparseMatrix<int, 0, int> const&, Eigen::PermutationMatrix<-1, -1, int>&) (Ordering.h:69)

==4130==    by 0x230A5F0: ceres::internal::MaybeReorderSchurComplementColumnsUsingEigen(int, std::map<double*, ceres::internal::ParameterBlock*, std::less<double*>, std::allocator<std::pair<double* const, ceres::internal::ParameterBlock*> > > const&, ceres::internal::Program*) (reorder_program.cc:415)

==4130==    by 0x230AD31: ceres::internal::ReorderProgramForSchurTypeLinearSolver(ceres::LinearSolverType, ceres::SparseLinearAlgebraLibraryType, std::map<double*, ceres::internal::ParameterBlock*, std::less<double*>, std::allocator<std::pair<double* const, ceres::internal::ParameterBlock*> > > const&, ceres::OrderedGroups<double*>*, ceres::internal::Program*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*) (reorder_program.cc:516)

==4130== 

block_sparse_matrix.cc:80 Allocating values array with 3072 bytes.

detect_structure.cc:113 Schur complement static structure <2,6,0>.

block_random_access_sparse_matrix.cc:78 Matrix Size [0,0] 0

detect_structure.cc:113 Schur complement static structure <2,6,0>


Anyone have any idea of what is happening?

Did a change has been done in the code of ReorderProgramForSchurTypeLinearSolver or MaybeReorderSchurComplementColumnsUsingEigen?

Regards,
Pierre

Sameer Agarwal

unread,
May 8, 2020, 9:28:24 PM5/8/20
to ceres-...@googlegroups.com
Pierre any chance you can bisect?

--
You received this message because you are subscribed to the Google Groups "Ceres Solver" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ceres-solver...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ceres-solver/CADjmpz8%2BOcsxA1wtJWjp%3DgdZbpwkRam0kWoG-8WfqdpoMMGHHw%40mail.gmail.com.

Pierre Moulon

unread,
May 9, 2020, 4:51:12 PM5/9/20
to ceres-...@googlegroups.com
Hello Sameer,

I will try to do that, you are right it will be the fastest way to find the issue.
I will try to find some time next week and let you know.
--
Pierre M


Sameer Agarwal

unread,
May 9, 2020, 4:51:58 PM5/9/20
to ceres-...@googlegroups.com

Pierre Moulon

unread,
May 13, 2020, 12:48:00 AM5/13/20
to ceres-...@googlegroups.com, vi...@google.com
Hello Sameer, Mike,

I ran git bisect and I was able to quickly identify the first commit that create a failure for inclusion in OpenMVG (using external ceres with OpenMP threading).

I ran the following: (1.14 -> 2.0)

git bisect start; git bisect good facb199; git bisect bad 646959ef118a1f10bf93741d97cf64265d42f8c6; 


And this commit was identify as the fist one creating the error with ceres-solver used within openMVG

5d8b494557e17992b64d8df548f49ef1f9ef4e05 is the first bad commit

commit 5d8b494557e17992b64d8df548f49ef1f9ef4e05

Author: Mike Vitus <vi...@google.com>

Date:   Mon Apr 9 10:10:03 2018 -0700


    Adds a ParallelFor wrapper for no threads and OpenMP.

    

    With the addition of C++11 support we can simplify the parallel for code by

    removing the ifdef branching.  Converts coordinate_descent_minimizer.cc to use

    the thread_id ParallelFor API.

    

    Tested by building with OpenMP, C++11 threads, TBB, and no threads.  All tests

    pass.

    

    Also compared timing via the bundle adjuster.

    

    ./bin/bundle_adjuster --input=../problem-744-543562-pre.txt

    

    With OpenMP num_threads=8

    

    Head:

    Time (in seconds):

      Residual only evaluation           0.807753 (5)

      Jacobian & residual evaluation     4.489404 (6)

      Linear solver                     41.826481 (5)

    Minimizer                           50.745857

    Total                               73.294424

    

    CL:

    Time (in seconds):

      Residual only evaluation           0.970483 (5)

      Jacobian & residual evaluation     4.647438 (6)

      Linear solver                     41.781892 (5)

    Minimizer                           50.848904

    Total                               73.089983

    

    With OpenMP num_threads=1

    

    HEAD:

    Time (in seconds):

      Residual only evaluation           2.990246 (5)

      Jacobian & residual evaluation    14.132090 (6)

      Linear solver                     79.631951 (5)

    Minimizer                          100.281847

    Total                              122.946267

    

    CL:

    Time (in seconds):

      Residual only evaluation           3.075178 (5)

      Jacobian & residual evaluation    13.966451 (6)

      Linear solver                     77.005441 (5)

    Minimizer                           97.568712

    Total                              120.410454

    

    Change-Id: I1857d7943073be7465b6c6476bf46ab11c5475a3


:040000 040000 c23ccdf325b604c19b9315490452998929748312 5675d7eedd9d019aeb76b6b325fa3cc7afec9d7a M bazel

:040000 040000 94d4f8a32369e9ded11626abac404567521704c4 3e95ef40a631a6dc257ecb1a58364b3e98c53f91 M internal

:040000 040000 8758ce43617fe1e750e2242455d4b9f750fb3b2c b9f90eb1d3a7984006435331bbc042c7e205e747 M jni

bisect run success


Hoping we can find the relationship between this and the error I encountered within OpenMVG by using ceres API:

==4130==    by 0x230A5F0: ceres::internal::MaybeReorderSchurComplementColumnsUsingEigen(int, std::map<double*, ceres::internal::ParameterBlock*, std::less<double*>, std::allocator<std::pair<double* const, ceres::internal::ParameterBlock*> > > const&, ceres::internal::Program*) (reorder_program.cc:415)

==4130==    by 0x230AD31: ceres::internal::ReorderProgramForSchurTypeLinearSolver(ceres::LinearSolverType, ceres::SparseLinearAlgebraLibraryType, std::map<double*, ceres::internal::ParameterBlock*, std::less<double*>, std::allocator<std::pair<double* const, ceres::internal::ParameterBlock*> > > const&, ceres::OrderedGroups<double*>*, ceres::internal::Program*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*) (reorder_program.cc:516)


Sameer, Mike let me know how we can work to debug/fix this.

I can also share to you the DockerFile and script I used to identify this with git bisect if you want...

Regards/Cordialement,
Pierre M


Sameer Agarwal

unread,
May 13, 2020, 9:33:48 AM5/13/20
to ceres-...@googlegroups.com, Mike Vitus
Pierre,
Thanks for bisecting, but this makes no sense :/
ReorderProgramforSchurTypeLinearSolver gets called before anything threading related happens. I would love to be able to replicate this.
Sameer



Mike Vitus

unread,
May 13, 2020, 1:04:05 PM5/13/20
to Ceres Solver
I am also at a loss.  I thought maybe there could be a bad interaction between openmp settings in ceres and eigen but I'm not sure what that would be exactly.  But Eigen::internal::minimum_degree_ordering doesn't use openmp.

A couple of questions:

1. What threading option are you building with?
2. Does this happen for all datasets or just one in particular?
3. What is your Eigen environment like on Ubuntu?  There was another issue with eigen and openmvg (https://github.com/ceres-solver/ceres-solver/issues/428) that seemed to be tied to their environment.  Maybe that is why it works okay on Mac but not Ubuntu.

Mike
Thank you. 
Sameer


To unsubscribe from this group and stop receiving emails from it, send an email to ceres-...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Ceres Solver" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ceres-...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Ceres Solver" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ceres-...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Ceres Solver" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ceres-...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Ceres Solver" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ceres-...@googlegroups.com.

Pierre Moulon

unread,
May 13, 2020, 1:37:01 PM5/13/20
to ceres-...@googlegroups.com
Mike, please find some answer here:

Code is working fine with the tag corresponding to Ceres 1.13 and 1.14 so I don't think it is really linked to Eigen, since I'm using the same config for all those experiments.

1. What threading option are you building with?
-> OpenMP
-> I tried also to set Ceres with 1 thread from OpenMVG and the crash was still happening
2. Does this happen for all datasets or just one in particular?
-> Yes all the dataset, I'm even able to repro on unit test
3. What is your Eigen environment like on Ubuntu?  There was another issue with eigen and openmvg (https://github.com/ceres-solver/ceres-solver/issues/428) that seemed to be tied to their environment.  Maybe that is why it works okay on Mac but not Ubuntu.
-> I tried with Eigen 3.3.4 and 3.3.7 (same behavior whatever the version of Eigen)

Thank you for your help and question to try to understand what is happening.
I also agree that the bisection is pointing something that seems does not make sense, but this is where crash start to happen when I'm using Ceres.

Pierre Moulon

unread,
May 14, 2020, 12:01:23 AM5/14/20
to ceres-...@googlegroups.com
Sameer, Mike,

Please find here the way I used to reproduce it.
-  as attachment the DockerFile and script I used for the bisection test.

1.Build the docker
docker build . -t temp
2. Run the docker
docker run -it --rm temp
3. Copy the test_valid.sh script in opt and chmod +x test_valid.sh
4. Run the bisection
cd /opt/ceres-solver
git bisect start; git bisect good facb199; git bisect bad 646959ef118a1f10bf93741d97cf64265d42f8c6;
git bisect run /opt/test_valid.sh

Valgrind is in the docker already
Compiling ceres and openMVG in debug mode can be done by adding -DCMAKE_BUILD_TYPE=DEBUG
-> compilation and running will be slow

By default the container will have Eigen 3.3.4 (Used by Ceres and OpenMVG)

less /usr/include/eigen3/Eigen/src/Core/util/Macros.h


Regards/Cordialement,
Pierre M

Dockerfile
test_valid.sh

Björn Piltz

unread,
May 19, 2020, 4:32:22 AM5/19/20
to ceres-...@googlegroups.com
Hey, just a stab in the dark here: Does it change anything if you build openMVG with a different -DTARGET_ARCHITECTURE option? For example "none|core"
There is a known problem if you link different libraries using Eigen which were built using different defines. (You break the One Definition Rule)

That has nothing to do with the "bad" commit, but this problem would manifest differently depending on which functions get inlined or not. It also depends on the hardware you build on, since openMVG's default "TARGET_ARCHITECTURE=auto" works like "march=native".

Best
Björn

Pierre Moulon

unread,
May 19, 2020, 8:38:12 PM5/19/20
to ceres-...@googlegroups.com
Thank you Björn,

I'm gonna try your suggestion, as you said perhaps it could be linked to compiler optimization.
But the problem consistently appear at a given ceres-solver commit and was working fine before.
Note that here OpenMVG and Eigen version are fixed, the only thing that change is the ceres-solver commit

Regards/Cordialement,
Pierre M


--
You received this message because you are subscribed to the Google Groups "Ceres Solver" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ceres-solver...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ceres-solver/CAF8Ue3n9zFM56ZN%3DytAHTVAGhkx%3DkmVPWmtWNsq6JrZR6hn_Og%40mail.gmail.com.

Pierre Moulon

unread,
May 23, 2020, 1:25:18 AM5/23/20
to ceres-...@googlegroups.com
Björn was right it seems to be linked to compilation settings.
Using -DTARGET_ARCHITECTURE=core is making the last commit of ceres-solver work again with OpenMVG.

Happy to discover that there is no issue either on OpenMVG or Ceres-Solver side ;-)

Regards/Cordialement,
Pierre M

Sameer Agarwal

unread,
May 23, 2020, 1:28:18 AM5/23/20
to ceres-...@googlegroups.com
Thanks Bjorn and I am glad it worked out Pierre.

Mike Vitus

unread,
May 26, 2020, 11:31:09 AM5/26/20
to Ceres Solver
Pierre thank you for persisting on this issue, and Bjorn thanks for the tip!


On Friday, May 22, 2020 at 10:25:18 PM UTC-7, TheFrenchLeaf wrote:
Björn was right it seems to be linked to compilation settings.
Using -DTARGET_ARCHITECTURE=core is making the last commit of ceres-solver work again with OpenMVG.

Happy to discover that there is no issue either on OpenMVG or Ceres-Solver side ;-)

Regards/Cordialement,
Pierre M


Le mar. 19 mai 2020 à 17:37, Pierre Moulon <pmo...@gmail.com> a écrit :
Thank you Björn,

I'm gonna try your suggestion, as you said perhaps it could be linked to compiler optimization.
But the problem consistently appear at a given ceres-solver commit and was working fine before.
Note that here OpenMVG and Eigen version are fixed, the only thing that change is the ceres-solver commit

Regards/Cordialement,
Pierre M


Le mar. 19 mai 2020 à 01:32, Björn Piltz <bjorn...@blikken.de> a écrit :
Hey, just a stab in the dark here: Does it change anything if you build openMVG with a different -DTARGET_ARCHITECTURE option? For example "none|core"
There is a known problem if you link different libraries using Eigen which were built using different defines. (You break the One Definition Rule)

That has nothing to do with the "bad" commit, but this problem would manifest differently depending on which functions get inlined or not. It also depends on the hardware you build on, since openMVG's default "TARGET_ARCHITECTURE=auto" works like "march=native".

Best
Björn

--
You received this message because you are subscribed to the Google Groups "Ceres Solver" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ceres-...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages