Problem. The question about usage of `std::int64_t`/`std::uint64_t` indexing raised multiple times
https://github.com/ceres-solver/ceres-solver/issues/684 but the problem is that changing `int` to `std::int64_t` is a tedious work and requies very accurate investigation of the all usages of `rows`, `cols`, `num_nonzeros` etc. Actually the main pain is about `num_nonzeros`. It is very strange to see it overflows in sparse matrices when task is higher than 4GB (num_residuals x num_paramters > 2**32). I have a PC station with 256GB RAM and about 24 cores and it is very strange to see that I cannot use it fully on comparitively small tasks. Currently I solve SLAM like problems with `g2o` or using another solver type, but it seems that we can use.
Solution. It consists of multiple stages but it seems that eventually we can put the most of tedious work onto compiler (I used `clang-14`)
- Introducing bazel build with `clang` (to use its multiple compiler options). I prefer `bazel` since it simplifies compiler options control. To tell the truth I failed to disable compiler options for mulitple system and third party libraries using `CMake`. Moreover the project I work on mainly uses `bazel` so I a little bit stick to it.
- Enabling all conversions checking from here (https://clang.llvm.org/docs/DiagnosticsReference.html). First of all we are interested in `-Wshorten-64-to-32` since if we change `num_nonzeros_` to int64_t any operation including this value will raise compiler warning so we will be able to identify all usages
- Introducing `numeric_cast`. Idea comes from boost::numeric_cast and from this proposal to C++ standard `https://github.com/qingfengxia/cpp_numeric_cast/blob/master/numeric_cast.h`. Actually even if we find all casts it does not prevent `static_cast`s from errors when we perform something like this `std::int32_t a = static_cast<std::int32_t>({expression of type std::int64_t})`. Here we simply demonstrate the compiler that we are familiar about conversion but there are no guarantees about its correctness. I have written simple numeric_cast.h just to check conversions from `int64_t` to `int32_t`. Full accurate numeric_cast is about 250 lines of code and seems to be a very good solution to use everywhere instead of `static_cast<T>(S)`. Also we can control its usage with compiler options. If somebody desires ordnary behaviour it can be achieved with `using numeric_cast = static_cast` under `#ifndef USE_NUMERIC_CAST` in `numeric_cast.h`
- At this point we will be almost ready to start changing int32 to int64. Just one question shall be answered: are we going to use `int64_t` or `std::size_t` (int32 for 32bit platforms and in64t for 64bit platforms). One option is to create alias `index_t` to avoid problems and experiment with it. Another options is to use int64_t everywhere (seems more convenient since we anyway will fail to allocate more memory than available on the system)
Locally I achieved stage 3 but right now I see that amount of changes grown significantly. Also seeing that this problem is a painful one for many users I want to help to solve it not only in my project but also in public repo. But here is a point to discuss this plan since it would not be good if I start to perform this changes without some approval from maintainers.