I am currently trying to get the best performance possible for a publication. To have real-time nice plotting (PyVista) for my robot models, I have used pybind11 to bind my own cpp classes that call GTSAM optimizers. Some details:
- ~150 6 DoF variables = ~1000 total DoF
- Using TBB and see many threads spawned all with ~30% load
- Using DogLegOptimizer with MultiFrontalQR
- I do all timing in CPP to mitigate pybind overhead
- My pybind wrappers are relatively simple, e.g.:
py::class_<Solver>(m, "Solver")
.def(py::init<const SolverConfig&>(), py::arg("config"))
.def("solve", &Solver::solve,
py::arg("x"),
py::arg("y"),
py::call_guard<py::gil_scoped_release>());
}
One thing I see that is curious is that when I run with taskset -c 0 python ..., I see 2X faster execution, and no cores being spawned. So, is multithreading is overkill here?
And do you think my simple pybind wrappers might be hurting performance in any way?
Thanks,
James