Ceres Solver Version 2.2.0 Release Candidate 2

253 views
Skip to first unread message

Sameer Agarwal

unread,
Sep 30, 2023, 7:14:37 PM9/30/23
to ceres-...@googlegroups.com
Hi Folks,

A new release candidate is available for testing. It fixes a number of windows and cuda related fixes. There are a few documentation and example code updates too.


If I do not receive any bug reports, I expect to release this as 2.2.0 on Oct 9, 2023. Please test your code against this version and let us know.

Happy Optimizing,
Sameer for the Ceres Solver Team


Sameer Agarwal

unread,
Oct 2, 2023, 12:32:35 PM10/2/23
to ceres-...@googlegroups.com
Hi Folks,
Just a Monday morning reminder, please test this release candidate. Unless I hear otherwise this is going to be 2.2.0 final next monday.
Sameer

Roger Labbe

unread,
Oct 4, 2023, 3:53:59 PM10/4/23
to Ceres Solver
Hi Sameer,

I just built this version under Windows using Visual Studio 17.x, toolset v143. Being a Visual Studio user I am not conversant with cmake, so perhaps I did something wrong. However, when building one of my projects against this release I get

1>M:\work\3rdParty\ceres-solver-2.2.0\include\ceres\internal\sphere_manifold_functions.h(125,63): error C3861: 'M_PI': identifier not found (compiling source file angle_manifold.cpp)

A quick grep shows that this is the only file (except for examples/tests/benchmarks) that uses M_PI. 

CMakesList.txt includes this text:

# On MSVC, math constants are not included in <cmath> or <math.h> unless
  # _USE_MATH_DEFINES is defined [1].  As we use M_PI in the examples, ensure
  # that _USE_MATH_DEFINES is defined before the first inclusion of <cmath>.
  add_compile_definitions(_USE_MATH_DEFINES)

However, the comment that it is used in the examples is no longer fully true as it is used in sphere_manifold_functions.h. I do see the add_compile_definitions, but I don't know if that is intended to fix this problem or not.

I do understand I can #define this macro before including ceres, but that means touching a lot of code that used to compile fine without it. 

Sameer Agarwal

unread,
Oct 4, 2023, 3:58:46 PM10/4/23
to ceres-...@googlegroups.com, Sergiu Deitsch
Perhaps this has to do with the recent refactoring of windows related warnings/macros.

Sameer


--
You received this message because you are subscribed to the Google Groups "Ceres Solver" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ceres-solver...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ceres-solver/ca3a5195-3e59-41de-98e7-edffe3c6cc10n%40googlegroups.com.

Roger Labbe

unread,
Oct 4, 2023, 4:19:51 PM10/4/23
to Ceres Solver
Hi Sameer,

solver.h contains this line:

    LinearSolverOrderingType linear_solver_ordering_type;


Note it is not initialized to any value, unlike the other scalar/enum types in this file. 

Sergiu Deitsch

unread,
Oct 4, 2023, 4:20:57 PM10/4/23
to Sameer Agarwal, ceres-...@googlegroups.com
This was my first thought. However, taking a closer look at those changes it is clear we never defined _USE_MATH_DEFINES in the public interface to begin with (which is a good thing). Yet, sphere_manifold_functions.h assumes the definition to be present. Looking further, the problem was introduced not long after the 2.1.0 release.

This is easily fixable though.

Sameer Agarwal

unread,
Oct 4, 2023, 4:22:31 PM10/4/23
to ceres-...@googlegroups.com
oh, thats a good catch. Fixing it now.
Sameer


--
You received this message because you are subscribed to the Google Groups "Ceres Solver" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ceres-solver...@googlegroups.com.

Sameer Agarwal

unread,
Oct 4, 2023, 5:12:55 PM10/4/23
to ceres-...@googlegroups.com
Roger both of these issues should be fixed at head.
Sameer

Roger Labbe

unread,
Oct 4, 2023, 6:53:22 PM10/4/23
to Ceres Solver
Sameer, 

I'm hesitant to report this as I don't have a lot of backing data, but given the planned release date of Monday I'll share. I am showing significant performance degradation in 2.2 (clone of the rc2 branch) vs 2.1. We are solving for a moving camera's extrinsics and intrinsics in a scene with a few hundred to a thousand fiducials (so, a small problem) on Windows x64. Reproducibility is hard due to the use of random numbers in our feature selection (don't ask). But, in general, a solve time of 713 usec (as reported by ceres summary)  in 2.1 is 1.28 msec in 2.2. Over hundreds of frames it averages to a 50% to 60% decrease in performance. I've pasted example summaries below. 

These are small numbers in absolute terms, but we are trying to run video at full frame rate with a few solves for different things and this is a big blow to our time budget.

I plan to run some of the examples included to see if show similar degradations in performance, and I may have some bundle adjust code that we could share (I can't share source for the project I'm talking about).

Anyway, I wonder if you have done extensive benchmarks on Windows, and/or advice for what to do to investigate myself or get you information on whether this is a real problem or just something wonky with my set up. 

This is built from a clone of rc2.

Solver Summary (v 2.1.0-eigen-(3.4.90)-no_lapack-eigensparse-no_openmp)

                                     Original                  Reduced
Parameter blocks                            1                        1
Parameters                                 12                       12
Effective parameters                        7                        7
Residual blocks                           127                      127
Residuals                                 254                      254

Minimizer                        TRUST_REGION

Dense linear algebra library            EIGEN
Trust region strategy                  DOGLEG (TRADITIONAL)

                                        Given                     Used
Linear solver                        DENSE_QR                 DENSE_QR
Threads                                     7                        7
Linear solver ordering              AUTOMATIC                        1

Cost:
Initial                          7.766936e+01
Final                            7.739958e+01
Change                           2.697804e-01

Minimizer iterations                        7
Successful steps                            7
Unsuccessful steps                          0

Time (in seconds):
Preprocessor                         0.000082

  Residual only evaluation           0.000129 (7)
  Jacobian & residual evaluation     0.000386 (7)
  Linear solver                      0.000047 (7)
Minimizer                            0.000629

Postprocessor                        0.000002
Total                                0.000713

Termination:                      CONVERGENCE (Function tolerance reached. |cost_change|/cost: 6.553746e-07 <= 1.000000e-06)

Solver Summary (v 2.2.0-eigen-(3.4.90)-no_lapack-eigensparse-no_custom_blas)

                                     Original                  Reduced
Parameter blocks                            1                        1
Parameters                                 12                       12
Effective parameters                        7                        7
Residual blocks                           124                      124
Residuals                                 248                      248

Minimizer                        TRUST_REGION

Dense linear algebra library            EIGEN
Trust region strategy                  DOGLEG (TRADITIONAL)
                                        Given                     Used
Linear solver                        DENSE_QR                 DENSE_QR
Threads                                     7                        7
Linear solver ordering              AUTOMATIC                        1

Cost:
Initial                          7.448660e+01
Final                            7.446731e+01
Change                           1.928719e-02

Minimizer iterations                        4
Successful steps                            4
Unsuccessful steps                          0

Time (in seconds):
Preprocessor                         0.000112

  Residual only evaluation           0.000065 (4)
  Jacobian & residual evaluation     0.000798 (4)
  Linear solver                      0.000043 (4)
Minimizer                            0.001163

Postprocessor                        0.000004
Total                                0.001279

Termination:                      CONVERGENCE (Function tolerance reached. |cost_change|/cost: 3.360677e-07 <= 1.000000e-06)

settings:

check_gradients = 0
dense_linear_algebra_library_type = EIGEN
dogleg_type = TRADITIONAL_DOGLEG
dynamic_sparsity = 0
eta = 0.10000000000000000555
function_tolerance = 9.9999999999999995475e-07
gradient_check_numeric_derivative_relative_step_size = 9.9999999999999995475e-07
gradient_check_relative_precision = 1.0000000000000000209e-08
gradient_tolerance = 1.0000000000000000364e-10
initial_trust_region_radius = 10000
inner_iteration_tolerance = 0.0010000000000000000208
jacobi_scaling = 1
line_search_direction_type = LBFGS
line_search_interpolation_type = CUBIC
line_search_sufficient_curvature_decrease = 0.9000000000000000222
line_search_sufficient_function_decrease = 0.00010000000000000000479
line_search_type = WOLFE
linear_solver_type = DENSE_QR
logging_type = PER_MINIMIZER_ITERATION
max_consecutive_nonmonotonic_steps = 5
max_lbfgs_rank = 20
max_line_search_step_contraction = 0.0010000000000000000208
max_line_search_step_expansion = 10
max_linear_solver_iterations = 500
max_lm_diagonal = 1.0000000000000000537e+32
max_num_consecutive_invalid_steps = 5
max_num_iterations = 50
max_num_line_search_direction_restarts = 5
max_num_line_search_step_size_iterations = 20
max_solver_time_in_seconds = 1000000000
max_trust_region_radius = 10000000000000000
min_line_search_step_contraction = 0.5999999999999999778
min_line_search_step_size = 1.0000000000000000623e-09
min_linear_solver_iterations = 0
min_lm_diagonal = 9.9999999999999995475e-07
min_relative_decrease = 0.0010000000000000000208
min_trust_region_radius = 1.000000000000000056e-32
minimizer_progress_to_stdout = 0
minimizer_type = TRUST_REGION
nonlinear_conjugate_gradient_type = FLETCHER_REEVES
num_threads = 7
parameter_tolerance = 1.0000000000000000209e-08
preconditioner_type = JACOBI
sparse_linear_algebra_library_type = EIGEN_SPARSE
trust_region_problem_dump_directory = /tmp
trust_region_problem_dump_format_type = TEXTFILE
trust_region_strategy_type = DOGLEG
update_state_every_iteration = 0
use_approximate_eigenvalue_bfgs_scaling = 0
use_explicit_schur_complement = 0
use_inner_iterations = 0
use_nonmonotonic_steps = 1
visibility_clustering_type = CANONICAL_VIEWS



CMakeCache.txt

# This is the CMakeCache file.
# For build in directory: m:/work/v143
# It was generated by CMake: C:/Program Files/CMake/bin/cmake.exe
# You can edit this file to change values found and used by cmake.
# If you do not want to change any of the values, simply exit the editor.
# If you do want to change a value, simply edit, save, and exit the editor.
# The syntax for the file is as follows:
# KEY:TYPE=VALUE
# KEY is the name of a variable in the cache.
# TYPE is a hint to GUIs for the type of VALUE, DO NOT EDIT TYPE!.
# VALUE is the current value for the KEY.

########################
# EXTERNAL cache entries
########################

//Build Ceres benchmarking suite
BUILD_BENCHMARKS:BOOL=OFF

//Build User's Guide (html)
BUILD_DOCUMENTATION:BOOL=OFF

//Build examples
BUILD_EXAMPLES:BOOL=ON

//Build Ceres as a shared library.
BUILD_SHARED_LIBS:BOOL=OFF

//Enable tests
BUILD_TESTING:BOOL=ON

//Path to a program.
CMAKE_AR:FILEPATH=C:/Program Files/Microsoft Visual Studio/2022/Professional/VC/Tools/MSVC/14.37.32822/bin/Hostx64/x64/lib.exe

//Choose the type of build, options are: None Debug Release RelWithDebInfo
// MinSizeRel.
CMAKE_BUILD_TYPE:STRING=Release

//Semicolon separated list of supported configuration types, only
// supports Debug, Release, MinSizeRel, and RelWithDebInfo, anything
// else will be ignored.
CMAKE_CONFIGURATION_TYPES:STRING=Debug;Release

//Flags used by the CXX compiler during all build types.
CMAKE_CXX_FLAGS:STRING=/DWIN32 /D_WINDOWS /EHsc /std:c++20 /MP /DGOOGLE_GLOG_IS_A_DLL /DGLOG_NO_ABBREVIATED_SEVERITIES

//Flags used by the CXX compiler during DEBUG builds.
CMAKE_CXX_FLAGS_DEBUG:STRING=/Ob0 /Od /RTC1

//Flags used by the CXX compiler during MINSIZEREL builds.
CMAKE_CXX_FLAGS_MINSIZEREL:STRING=/O1 /Ob1 /DNDEBUG

//Flags used by the CXX compiler during RELEASE builds.
CMAKE_CXX_FLAGS_RELEASE:STRING=/O2 /Ob2 /DNDEBUG /Ot /Oi

//Flags used by the CXX compiler during RELWITHDEBINFO builds.
CMAKE_CXX_FLAGS_RELWITHDEBINFO:STRING=/O2 /Ob1 /DNDEBUG /Ot /Oi

//Libraries linked by default with all C++ applications.
CMAKE_CXX_STANDARD_LIBRARIES:STRING=kernel32.lib user32.lib gdi32.lib winspool.lib shell32.lib ole32.lib oleaut32.lib uuid.lib comdlg32.lib advapi32.lib

//Flags used by the C compiler during all build types.
CMAKE_C_FLAGS:STRING=/DWIN32 /D_WINDOWS

//Flags used by the C compiler during DEBUG builds.
CMAKE_C_FLAGS_DEBUG:STRING=/Ob0 /Od /RTC1

//Flags used by the C compiler during MINSIZEREL builds.
CMAKE_C_FLAGS_MINSIZEREL:STRING=/O1 /Ob1 /DNDEBUG

//Flags used by the C compiler during RELEASE builds.
CMAKE_C_FLAGS_RELEASE:STRING=/O2 /Ob2 /DNDEBUG

//Flags used by the C compiler during RELWITHDEBINFO builds.
CMAKE_C_FLAGS_RELWITHDEBINFO:STRING=/O2 /Ob1 /DNDEBUG /Ot /Oi

//Libraries linked by default with all C applications.
CMAKE_C_STANDARD_LIBRARIES:STRING=kernel32.lib user32.lib gdi32.lib winspool.lib shell32.lib ole32.lib oleaut32.lib uuid.lib comdlg32.lib advapi32.lib

//Flags used by the linker during all build types.
CMAKE_EXE_LINKER_FLAGS:STRING=/machine:x64

//Flags used by the linker during DEBUG builds.
CMAKE_EXE_LINKER_FLAGS_DEBUG:STRING=/debug /INCREMENTAL

//Flags used by the linker during MINSIZEREL builds.
CMAKE_EXE_LINKER_FLAGS_MINSIZEREL:STRING=/INCREMENTAL:NO

//Flags used by the linker during RELEASE builds.
CMAKE_EXE_LINKER_FLAGS_RELEASE:STRING=/INCREMENTAL:NO

//Flags used by the linker during RELWITHDEBINFO builds.
CMAKE_EXE_LINKER_FLAGS_RELWITHDEBINFO:STRING=/debug /INCREMENTAL

//Value Computed by CMake.
CMAKE_FIND_PACKAGE_REDIRECTS_DIR:STATIC=M:/work/v143/CMakeFiles/pkgRedirects

//User executables (bin)
CMAKE_INSTALL_BINDIR:PATH=bin

//Read-only architecture-independent data (DATAROOTDIR)
CMAKE_INSTALL_DATADIR:PATH=

//Read-only architecture-independent data root (share)
CMAKE_INSTALL_DATAROOTDIR:PATH=share

//Documentation root (DATAROOTDIR/doc/PROJECT_NAME)
CMAKE_INSTALL_DOCDIR:PATH=

//C header files (include)
CMAKE_INSTALL_INCLUDEDIR:PATH=include

//Info documentation (DATAROOTDIR/info)
CMAKE_INSTALL_INFODIR:PATH=

//Object code libraries (lib)
CMAKE_INSTALL_LIBDIR:PATH=lib

//Program executables (libexec)
CMAKE_INSTALL_LIBEXECDIR:PATH=libexec

//Locale-dependent data (DATAROOTDIR/locale)
CMAKE_INSTALL_LOCALEDIR:PATH=

//Modifiable single-machine data (var)
CMAKE_INSTALL_LOCALSTATEDIR:PATH=var

//Man documentation (DATAROOTDIR/man)
CMAKE_INSTALL_MANDIR:PATH=

//C header files for non-gcc (/usr/include)
CMAKE_INSTALL_OLDINCLUDEDIR:PATH=/usr/include

//Install path prefix, prepended onto install directories.
CMAKE_INSTALL_PREFIX:PATH=C:/Program Files (x86)/Ceres

//Run-time variable data (LOCALSTATEDIR/run)
CMAKE_INSTALL_RUNSTATEDIR:PATH=

//System admin executables (sbin)
CMAKE_INSTALL_SBINDIR:PATH=sbin

//Modifiable architecture-independent data (com)
CMAKE_INSTALL_SHAREDSTATEDIR:PATH=com

//Read-only single-machine data (etc)
CMAKE_INSTALL_SYSCONFDIR:PATH=etc

//Path to a program.
CMAKE_LINKER:FILEPATH=C:/Program Files/Microsoft Visual Studio/2022/Professional/VC/Tools/MSVC/14.37.32822/bin/Hostx64/x64/link.exe

//Flags used by the linker during the creation of modules during
// all build types.
CMAKE_MODULE_LINKER_FLAGS:STRING=/machine:x64

//Flags used by the linker during the creation of modules during
// DEBUG builds.
CMAKE_MODULE_LINKER_FLAGS_DEBUG:STRING=/debug /INCREMENTAL

//Flags used by the linker during the creation of modules during
// MINSIZEREL builds.
CMAKE_MODULE_LINKER_FLAGS_MINSIZEREL:STRING=/INCREMENTAL:NO

//Flags used by the linker during the creation of modules during
// RELEASE builds.
CMAKE_MODULE_LINKER_FLAGS_RELEASE:STRING=/INCREMENTAL:NO

//Flags used by the linker during the creation of modules during
// RELWITHDEBINFO builds.
CMAKE_MODULE_LINKER_FLAGS_RELWITHDEBINFO:STRING=/debug /INCREMENTAL

//Path to a program.
CMAKE_MT:FILEPATH=CMAKE_MT-NOTFOUND

//Value Computed by CMake
CMAKE_PROJECT_DESCRIPTION:STATIC=

//Value Computed by CMake
CMAKE_PROJECT_HOMEPAGE_URL:STATIC=

//Value Computed by CMake
CMAKE_PROJECT_NAME:STATIC=Ceres

//RC compiler
CMAKE_RC_COMPILER:FILEPATH=rc

//Flags for Windows Resource Compiler during all build types.
CMAKE_RC_FLAGS:STRING=-DWIN32

//Flags for Windows Resource Compiler during DEBUG builds.
CMAKE_RC_FLAGS_DEBUG:STRING=-D_DEBUG

//Flags for Windows Resource Compiler during MINSIZEREL builds.
CMAKE_RC_FLAGS_MINSIZEREL:STRING=

//Flags for Windows Resource Compiler during RELEASE builds.
CMAKE_RC_FLAGS_RELEASE:STRING=

//Flags for Windows Resource Compiler during RELWITHDEBINFO builds.
CMAKE_RC_FLAGS_RELWITHDEBINFO:STRING=

//Flags used by the linker during the creation of shared libraries
// during all build types.
CMAKE_SHARED_LINKER_FLAGS:STRING=/machine:x64

//Flags used by the linker during the creation of shared libraries
// during DEBUG builds.
CMAKE_SHARED_LINKER_FLAGS_DEBUG:STRING=/debug /INCREMENTAL

//Flags used by the linker during the creation of shared libraries
// during MINSIZEREL builds.
CMAKE_SHARED_LINKER_FLAGS_MINSIZEREL:STRING=/INCREMENTAL:NO

//Flags used by the linker during the creation of shared libraries
// during RELEASE builds.
CMAKE_SHARED_LINKER_FLAGS_RELEASE:STRING=/INCREMENTAL:NO

//Flags used by the linker during the creation of shared libraries
// during RELWITHDEBINFO builds.
CMAKE_SHARED_LINKER_FLAGS_RELWITHDEBINFO:STRING=/debug /INCREMENTAL

//If set, runtime paths are not added when installing shared libraries,
// but are added when building.
CMAKE_SKIP_INSTALL_RPATH:BOOL=OFF

//If set, runtime paths are not added when using shared libraries.
CMAKE_SKIP_RPATH:BOOL=OFF

//Flags used by the linker during the creation of static libraries
// during all build types.
CMAKE_STATIC_LINKER_FLAGS:STRING=/machine:x64

//Flags used by the linker during the creation of static libraries
// during DEBUG builds.
CMAKE_STATIC_LINKER_FLAGS_DEBUG:STRING=

//Flags used by the linker during the creation of static libraries
// during MINSIZEREL builds.
CMAKE_STATIC_LINKER_FLAGS_MINSIZEREL:STRING=

//Flags used by the linker during the creation of static libraries
// during RELEASE builds.
CMAKE_STATIC_LINKER_FLAGS_RELEASE:STRING=

//Flags used by the linker during the creation of static libraries
// during RELWITHDEBINFO builds.
CMAKE_STATIC_LINKER_FLAGS_RELWITHDEBINFO:STRING=

//If this value is on, makefiles will be generated without the
// .SILENT directive, and all commands will be echoed to the console
// during the make.  This is useful for debugging only. With Visual
// Studio IDE projects all commands are done without /nologo.
CMAKE_VERBOSE_MAKEFILE:BOOL=OFF

//Common compile flags enabled for any sanitizer
COMMON_SANITIZER_COMPILE_OPTIONS:STRING=-g -fno-omit-frame-pointer -fno-optimize-sibling-calls

//Use handcoded BLAS routines (usually faster) instead of Eigen.
CUSTOM_BLAS:BOOL=OFF

//Value Computed by CMake
Ceres_BINARY_DIR:STATIC=M:/work/v143

//Value Computed by CMake
Ceres_IS_TOP_LEVEL:STATIC=ON

//Value Computed by CMake
Ceres_SOURCE_DIR:STATIC=M:/work/ceres-solver

//Enable Eigen METIS support.
EIGENMETIS:BOOL=OFF

//Enable Eigen as a sparse linear algebra library.
EIGENSPARSE:BOOL=ON

//Export build directory using CMake (enables external use without
// install).
EXPORT_BUILD_DIR:BOOL=ON

//The directory containing a CMake configuration file for Eigen3.
Eigen3_DIR:PATH=M:/work/eigen/v143

//Enable Google Flags.
GFLAGS:BOOL=OFF

//Path to a file.
GLOG_INCLUDE_DIR:PATH=m:/work/3rdParty/glog-0.6.0/v143/include

//Path to a library.
GLOG_LIBRARY:FILEPATH=m:/work/3rdParty/glog-0.6.0/v143

//Enable use of LAPACK directly within Ceres.
LAPACK:BOOL=OFF

//Use a stripped down version of glog.
MINIGLOG:BOOL=OFF

//Add a custom target to ease removal of installed targets
PROVIDE_UNINSTALL_TARGET:BOOL=OFF

//Semicolon-separated list of sanitizers to use (e.g address, memory,
// thread)
SANITIZERS:STRING=

//Enable fixed-size schur specializations.
SCHUR_SPECIALIZATIONS:BOOL=ON

//Enable SuiteSparse.
SUITESPARSE:BOOL=OFF

//Enable use of CUDA linear algebra solvers.
USE_CUDA:BOOL=OFF

//The directory containing a CMake configuration file for benchmark.
benchmark_DIR:PATH=benchmark_DIR-NOTFOUND

//The directory containing a CMake configuration file for glog.
glog_DIR:PATH=M:/work/glog/v143


########################
# INTERNAL cache entries
########################

//Test CHECK_CXX_FLAG_Wno_missing_declarations
CHECK_CXX_FLAG_Wno_missing_declarations:INTERNAL=
//ADVANCED property for variable: CMAKE_AR
CMAKE_AR-ADVANCED:INTERNAL=1
//This is the directory where this CMakeCache.txt was created
CMAKE_CACHEFILE_DIR:INTERNAL=m:/work/v143
//Major version of cmake used to create the current loaded cache
CMAKE_CACHE_MAJOR_VERSION:INTERNAL=3
//Minor version of cmake used to create the current loaded cache
CMAKE_CACHE_MINOR_VERSION:INTERNAL=27
//Patch version of cmake used to create the current loaded cache
CMAKE_CACHE_PATCH_VERSION:INTERNAL=6
//Path to CMake executable.
CMAKE_COMMAND:INTERNAL=C:/Program Files/CMake/bin/cmake.exe
//Path to cpack program executable.
CMAKE_CPACK_COMMAND:INTERNAL=C:/Program Files/CMake/bin/cpack.exe
//Path to ctest program executable.
CMAKE_CTEST_COMMAND:INTERNAL=C:/Program Files/CMake/bin/ctest.exe
//ADVANCED property for variable: CMAKE_CXX_FLAGS
CMAKE_CXX_FLAGS-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_CXX_FLAGS_DEBUG
CMAKE_CXX_FLAGS_DEBUG-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_CXX_FLAGS_MINSIZEREL
CMAKE_CXX_FLAGS_MINSIZEREL-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_CXX_FLAGS_RELEASE
CMAKE_CXX_FLAGS_RELEASE-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_CXX_FLAGS_RELWITHDEBINFO
CMAKE_CXX_FLAGS_RELWITHDEBINFO-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_CXX_STANDARD_LIBRARIES
CMAKE_CXX_STANDARD_LIBRARIES-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_C_FLAGS
CMAKE_C_FLAGS-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_C_FLAGS_DEBUG
CMAKE_C_FLAGS_DEBUG-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_C_FLAGS_MINSIZEREL
CMAKE_C_FLAGS_MINSIZEREL-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_C_FLAGS_RELEASE
CMAKE_C_FLAGS_RELEASE-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_C_FLAGS_RELWITHDEBINFO
CMAKE_C_FLAGS_RELWITHDEBINFO-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_C_STANDARD_LIBRARIES
CMAKE_C_STANDARD_LIBRARIES-ADVANCED:INTERNAL=1
//Executable file format
CMAKE_EXECUTABLE_FORMAT:INTERNAL=Unknown
//ADVANCED property for variable: CMAKE_EXE_LINKER_FLAGS
CMAKE_EXE_LINKER_FLAGS-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_EXE_LINKER_FLAGS_DEBUG
CMAKE_EXE_LINKER_FLAGS_DEBUG-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_EXE_LINKER_FLAGS_MINSIZEREL
CMAKE_EXE_LINKER_FLAGS_MINSIZEREL-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_EXE_LINKER_FLAGS_RELEASE
CMAKE_EXE_LINKER_FLAGS_RELEASE-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_EXE_LINKER_FLAGS_RELWITHDEBINFO
CMAKE_EXE_LINKER_FLAGS_RELWITHDEBINFO-ADVANCED:INTERNAL=1
//Name of external makefile project generator.
CMAKE_EXTRA_GENERATOR:INTERNAL=
//Name of generator.
CMAKE_GENERATOR:INTERNAL=Visual Studio 17 2022
//Generator instance identifier.
CMAKE_GENERATOR_INSTANCE:INTERNAL=C:/Program Files/Microsoft Visual Studio/2022/Professional
//Name of generator platform.
CMAKE_GENERATOR_PLATFORM:INTERNAL=
//Name of generator toolset.
CMAKE_GENERATOR_TOOLSET:INTERNAL=v143
//Test CMAKE_HAVE_LIBC_PTHREAD
CMAKE_HAVE_LIBC_PTHREAD:INTERNAL=
//Have library pthreads
CMAKE_HAVE_PTHREADS_CREATE:INTERNAL=
//Have library pthread
CMAKE_HAVE_PTHREAD_CREATE:INTERNAL=
//Source directory with the top level CMakeLists.txt file for this
// project
CMAKE_HOME_DIRECTORY:INTERNAL=M:/work/ceres-solver
//ADVANCED property for variable: CMAKE_INSTALL_BINDIR
CMAKE_INSTALL_BINDIR-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_INSTALL_DATADIR
CMAKE_INSTALL_DATADIR-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_INSTALL_DATAROOTDIR
CMAKE_INSTALL_DATAROOTDIR-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_INSTALL_DOCDIR
CMAKE_INSTALL_DOCDIR-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_INSTALL_INCLUDEDIR
CMAKE_INSTALL_INCLUDEDIR-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_INSTALL_INFODIR
CMAKE_INSTALL_INFODIR-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_INSTALL_LIBDIR
CMAKE_INSTALL_LIBDIR-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_INSTALL_LIBEXECDIR
CMAKE_INSTALL_LIBEXECDIR-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_INSTALL_LOCALEDIR
CMAKE_INSTALL_LOCALEDIR-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_INSTALL_LOCALSTATEDIR
CMAKE_INSTALL_LOCALSTATEDIR-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_INSTALL_MANDIR
CMAKE_INSTALL_MANDIR-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_INSTALL_OLDINCLUDEDIR
CMAKE_INSTALL_OLDINCLUDEDIR-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_INSTALL_RUNSTATEDIR
CMAKE_INSTALL_RUNSTATEDIR-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_INSTALL_SBINDIR
CMAKE_INSTALL_SBINDIR-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_INSTALL_SHAREDSTATEDIR
CMAKE_INSTALL_SHAREDSTATEDIR-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_INSTALL_SYSCONFDIR
CMAKE_INSTALL_SYSCONFDIR-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_LINKER
CMAKE_LINKER-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_MODULE_LINKER_FLAGS
CMAKE_MODULE_LINKER_FLAGS-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_MODULE_LINKER_FLAGS_DEBUG
CMAKE_MODULE_LINKER_FLAGS_DEBUG-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_MODULE_LINKER_FLAGS_MINSIZEREL
CMAKE_MODULE_LINKER_FLAGS_MINSIZEREL-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_MODULE_LINKER_FLAGS_RELEASE
CMAKE_MODULE_LINKER_FLAGS_RELEASE-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_MODULE_LINKER_FLAGS_RELWITHDEBINFO
CMAKE_MODULE_LINKER_FLAGS_RELWITHDEBINFO-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_MT
CMAKE_MT-ADVANCED:INTERNAL=1
//number of local generators
CMAKE_NUMBER_OF_MAKEFILES:INTERNAL=7
//Platform information initialized
CMAKE_PLATFORM_INFO_INITIALIZED:INTERNAL=1
//noop for ranlib
CMAKE_RANLIB:INTERNAL=:
//ADVANCED property for variable: CMAKE_RC_COMPILER
CMAKE_RC_COMPILER-ADVANCED:INTERNAL=1
CMAKE_RC_COMPILER_WORKS:INTERNAL=1
//ADVANCED property for variable: CMAKE_RC_FLAGS
CMAKE_RC_FLAGS-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_RC_FLAGS_DEBUG
CMAKE_RC_FLAGS_DEBUG-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_RC_FLAGS_MINSIZEREL
CMAKE_RC_FLAGS_MINSIZEREL-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_RC_FLAGS_RELEASE
CMAKE_RC_FLAGS_RELEASE-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_RC_FLAGS_RELWITHDEBINFO
CMAKE_RC_FLAGS_RELWITHDEBINFO-ADVANCED:INTERNAL=1
//Path to CMake installation.
CMAKE_ROOT:INTERNAL=C:/Program Files/CMake/share/cmake-3.27
//ADVANCED property for variable: CMAKE_SHARED_LINKER_FLAGS
CMAKE_SHARED_LINKER_FLAGS-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_SHARED_LINKER_FLAGS_DEBUG
CMAKE_SHARED_LINKER_FLAGS_DEBUG-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_SHARED_LINKER_FLAGS_MINSIZEREL
CMAKE_SHARED_LINKER_FLAGS_MINSIZEREL-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_SHARED_LINKER_FLAGS_RELEASE
CMAKE_SHARED_LINKER_FLAGS_RELEASE-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_SHARED_LINKER_FLAGS_RELWITHDEBINFO
CMAKE_SHARED_LINKER_FLAGS_RELWITHDEBINFO-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_SKIP_INSTALL_RPATH
CMAKE_SKIP_INSTALL_RPATH-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_SKIP_RPATH
CMAKE_SKIP_RPATH-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_STATIC_LINKER_FLAGS
CMAKE_STATIC_LINKER_FLAGS-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_STATIC_LINKER_FLAGS_DEBUG
CMAKE_STATIC_LINKER_FLAGS_DEBUG-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_STATIC_LINKER_FLAGS_MINSIZEREL
CMAKE_STATIC_LINKER_FLAGS_MINSIZEREL-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_STATIC_LINKER_FLAGS_RELEASE
CMAKE_STATIC_LINKER_FLAGS_RELEASE-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_STATIC_LINKER_FLAGS_RELWITHDEBINFO
CMAKE_STATIC_LINKER_FLAGS_RELWITHDEBINFO-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_VERBOSE_MAKEFILE
CMAKE_VERBOSE_MAKEFILE-ADVANCED:INTERNAL=1
//Test COMPILER_HAS_DEPRECATED
COMPILER_HAS_DEPRECATED:INTERNAL=1
//Test COMPILER_HAS_DEPRECATED_ATTR
COMPILER_HAS_DEPRECATED_ATTR:INTERNAL=
//Details about finding Glog
FIND_PACKAGE_MESSAGE_DETAILS_Glog:INTERNAL=[glog::glog][v()]
//Details about finding Threads
FIND_PACKAGE_MESSAGE_DETAILS_Threads:INTERNAL=[TRUE][v()]
//ADVANCED property for variable: GLOG_INCLUDE_DIR
GLOG_INCLUDE_DIR-ADVANCED:INTERNAL=1
//ADVANCED property for variable: GLOG_LIBRARY
GLOG_LIBRARY-ADVANCED:INTERNAL=1
//Test HAVE_BIGOBJ
HAVE_BIGOBJ:INTERNAL=1
//Have library m
HAVE_LIBM:INTERNAL=
//CMAKE_INSTALL_PREFIX during last run
_GNUInstallDirs_LAST_CMAKE_INSTALL_PREFIX:INTERNAL=C:/Program Files (x86)/Ceres
//ADVANCED property for variable: benchmark_DIR
benchmark_DIR-ADVANCED:INTERNAL=1
//ADVANCED property for variable: glog_DIR
glog_DIR-ADVANCED:INTERNAL=1

On Saturday, September 30, 2023 at 4:14:37 PM UTC-7 sameer...@google.com wrote:

Sameer Agarwal

unread,
Oct 4, 2023, 7:03:08 PM10/4/23
to ceres-...@googlegroups.com
Roger,
Any chance you can try and bisect on the git history to see where the regression was introduced?
Sameer


--
You received this message because you are subscribed to the Google Groups "Ceres Solver" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ceres-solver...@googlegroups.com.

Sameer Agarwal

unread,
Oct 4, 2023, 7:07:12 PM10/4/23
to ceres-...@googlegroups.com
also don't worry about the release date, it is not set in stone. If there is a regression we will deal with it, test it and release  when we are ready.
Sameer


Roger Labbe

unread,
Oct 4, 2023, 7:45:53 PM10/4/23
to Ceres Solver
Sameer,
I could do that, no problem.

However, I have a new observation that may help narrow this down. Our code is setup to set num_threads = # effective parameters, 7 in my case. This was empirically decided on several years ago as giving both good Ceres performance and all the other threads performance for our programs. 

But in 2.2 I am seeing a huge performance decrease with more threads. 1 thread gives the best performance, competitive with 2.1 (I need more study to quantify that). 2 threads is slower. And so on. If I use as many threads as I have cores (24) solve times are around 50msec vs 500usec-1msec with one thread. In contrast 2.1 performance degrades a bit if you give it excessive cores, but nothing like this (~395usec for 1 thread, ~600usec for 24 threads).

Your release notes stated that there was extensive rewrite in threading; I wonder if there is different guidance on setting num_threads now, or if this is a regression.

I'll look at the git log to see if there is an obvious bisect point re threading after I hear back.

Sameer Agarwal

unread,
Oct 4, 2023, 8:10:51 PM10/4/23
to ceres-...@googlegroups.com
I am surprised that using 7 threads for 7 parameters was doing anything useful to begin with.
I think for such small problems, you should be using 1 thread. There is a certain overhead to using the threading which is simply not worth it for problems of the size you are solving. In fact if you are not using loss functions, then you should be using TinySolver, which will be even faster.
Sameer


Roger Labbe

unread,
Oct 5, 2023, 3:46:26 PM10/5/23
to Ceres Solver
Git bisect shows that b158515089a85be8425db66ed43546605f86a00e Parallel operations on vectors is the culprit.

My test: set num_threads=23, run app and collect average runtime for each solve. good=~1ms, bad= 20 (there was no inbetween)

So while 23 is an absurd # for such a small problem, the punishment for setting this too high is small prior to this revision, and huge after. 

Here are some run times to illustrate:

1  976.30 usec
2    1.05 msec
3    2.01 msec
...
23  20.97 msec

Sameer Agarwal

unread,
Oct 5, 2023, 5:08:35 PM10/5/23
to ceres-...@googlegroups.com, Dmitriy Korchemkin
Roger,
Thanks for bisecting. +Dmitriy Korchemkin for visibility. I am wondering if we should guard these ops for some combination of size and threads.
That said, now that you know that reducing the number of threads to 1 works, is this still a problem for you? or should we consider the multi-thread performance of relatively small vector ops something that we will work on improving in the 2.3 timeframe?

Sameer


Frank Neuhaus

unread,
Oct 6, 2023, 7:41:58 AM10/6/23
to ceres-...@googlegroups.com
Hi everyone,

I would just like to second Roger in that we are also seeing a performance degradation in 2.2.0 when solving small problems (hopefully the same as Roger's as we have not dug as deep into this as he did). This degradation was mysterious to us until we read this post, so thank you for the analysis, Roger!
Our problem has just 6 DoF, optimizing just a pose essentially, but it has quite some residuals (1759 in this specific case). For this reason, we also set the number of threads quite high, because we want to distribute the computation of these residuals across threads. Whether 24 as configured here are _really_ required is a different question, but I do not think that sticking to 1 thread really solves the issue here, because then all residuals are computed on a single thread only.
I think it should be further analyzed/fixed/worked around before releasing 2.2.0 because apart from us, there may be more people affected, who simply have not tried 2.2.0 yet.

Thank you
   Frank

Here are 2 reports from 2.1.0 vs 2.2.0 if it helps:
Both compiled on vs2019, mkl on vcpkg.  


Report 1:

Solver Summary (v 2.1.0-eigen-(3.4.0)-lapack-suitesparse-()-no_openmp)


                                     Original                  Reduced
Parameter blocks                            1                        1
Parameters                                  6                        6
Residual blocks                          1759                     1759
Residuals                                1761                     1761

Minimizer                        TRUST_REGION

Dense linear algebra library           LAPACK
Trust region strategy     LEVENBERG_MARQUARDT


                                        Given                     Used
Linear solver                        DENSE_QR                 DENSE_QR
Threads                                    24                       24

Linear solver ordering              AUTOMATIC                        1

Cost:
Initial                          1.135003e+03
Final                            1.134176e+03
Change                           8.272316e-01

Minimizer iterations                       10
Successful steps                           10

Unsuccessful steps                          0

Time (in seconds):
Preprocessor                         0.000370

  Residual only evaluation           0.001278 (10)
  Jacobian & residual evaluation     0.001578 (10)
  Linear solver                      0.000251 (10)
Minimizer                            0.003371

Postprocessor                        0.000010
Total                                0.003750

Termination:                      CONVERGENCE (Function tolerance reached. |cost_change|/cost: 6.716243e-07 <= 1.000000e
-06)




Report 2:

Solver Summary (v 2.2.0-eigen-(3.4.0)-lapack-suitesparse-())


                                     Original                  Reduced
Parameter blocks                            1                        1
Parameters                                  6                        6
Residual blocks                          1759                     1759
Residuals                                1761                     1761

Minimizer                        TRUST_REGION

Dense linear algebra library           LAPACK
Trust region strategy     LEVENBERG_MARQUARDT

                                        Given                     Used
Linear solver                        DENSE_QR                 DENSE_QR
Threads                                    24                       24

Linear solver ordering              AUTOMATIC                        1

Cost:
Initial                          1.135003e+03
Final                            1.134176e+03
Change                           8.272316e-01

Minimizer iterations                       10
Successful steps                           10

Unsuccessful steps                          0

Time (in seconds):
Preprocessor                         0.002062

  Residual only evaluation           0.000789 (10)
  Jacobian & residual evaluation     0.032251 (10)
  Linear solver                      0.000408 (10)
Minimizer                            0.037328

Postprocessor                        0.000012
Total                                0.039403

Termination:                      CONVERGENCE (Function tolerance reached. |cost_change|/cost: 6.716243e-07 <= 1.000000e
-06)


Sameer Agarwal

unread,
Oct 6, 2023, 8:41:26 AM10/6/23
to ceres-...@googlegroups.com
Thanks for the report frank.
What kind of performance do you see with 2.2 with a single thread?
Sameer


Frank Neuhaus

unread,
Oct 6, 2023, 11:19:21 AM10/6/23
to ceres-...@googlegroups.com
Below is a report from the single-threaded version of the same solve operation in 2.2.0. The performance in 2.1.0 for a single thread appears to be similar to 2.2.0.
To summarize, what we are seeing for this specific solve:
2.1.0, 24 Threads: Total 0.003750
2.2.0, 24 Threads: Total 0.039403
2.1.0, 1 Thread: Total 0.005006
2.2.0, 1 Thread: Total 0.004828

So my interpretation is: 24 threads are obviously overkill for this specific problem, probably 4 or something would be reasonable for this case. However, > 1 thread consistently improves performance over 1 thread. When using more residuals or residuals that are slower to compute, the benefit of using more threads would obviously increase.
The slowdown in multi-threaded 2.2.0 for this case is unexpected though.

Hope that helps
   Frank


Solver Summary (v 2.2.0-eigen-(3.4.0)-lapack-suitesparse-())

                                     Original                  Reduced
Parameter blocks                            1                        1
Parameters                                  6                        6
Residual blocks                          1759                     1759
Residuals                                1761                     1761

Minimizer                        TRUST_REGION

Dense linear algebra library           LAPACK
Trust region strategy     LEVENBERG_MARQUARDT
                                        Given                     Used
Linear solver                        DENSE_QR                 DENSE_QR
Threads                                     1                        1

Linear solver ordering              AUTOMATIC                        1

Cost:
Initial                          1.135003e+03
Final                            1.134176e+03
Change                           8.272316e-01

Minimizer iterations                       10
Successful steps                           10
Unsuccessful steps                          0

Time (in seconds):
Preprocessor                         0.000050

  Residual only evaluation           0.001721 (10)
  Jacobian & residual evaluation     0.002620 (10)
  Linear solver                      0.000218 (10)
Minimizer                            0.004771

Postprocessor                        0.000007
Total                                0.004828


Termination:                      CONVERGENCE (Function tolerance reached. |cost_change|/cost: 6.716243e-07 <= 1.000000e
-06)

Sameer Agarwal

unread,
Oct 6, 2023, 11:47:33 AM10/6/23
to ceres-...@googlegroups.com
Thanks Frank, this is helpful.

Since I do not have your code to benchmark myself, can I ask you one more thing to try out?
Is it the case for you too that  b158515089a85be8425db66ed43546605f86a00e  is the commit that causes the regression?
and before that commit things are okay?

Sameer



Sameer Agarwal

unread,
Oct 6, 2023, 11:54:01 AM10/6/23
to ceres-...@googlegroups.com
and while I have you, what does performance look like with 2 and 4 threads? 
I am trying to figure out if this is an issue with not setting the number of threads correctly -- given the new threading implementation and its per thread overhead or is it the case that any threading at all for small problems gives poor performance.

Sameer


Sameer Agarwal

unread,
Oct 6, 2023, 12:30:47 PM10/6/23
to ceres-...@googlegroups.com
I have started https://github.com/ceres-solver/ceres-solver/issues/1016 to track this issue. Lets continue the discussion on this there.

Reply all
Reply to author
Forward
0 new messages