I am on a machine we call the beast which has 8 Quadro P5000 which each have 16GB of RAM. The machine itself has 48 cores (with hyper threading) Intel Xeon Gold 6146 CPU @ 3.20GHz over 700GB RAM . Running Linux/Ubunu 16.04.
I have performed a SfM sparse reconstruction of a scene with 155 6K images and it works beautifully and I don't care too much about speed here.
Later I want to register a single 2K image with this scene and this part I want to be fast. I have been doing this using the sequence of colmap commands using the feature_extractor, exhaustive_matcher, and image_registrator. To my joy I saw there was a multi-gpu option for the feature_extractor and feature_matcher and decided to greedily use all my GPU's, but alas there was no speed-up? Below is my script. The file reg-image-list.txt references a single 2K image. This takes about 25 seconds whether I use the multi-gnu option or not -- why isn't there a speed up?
time colmap feature_extractor \
--database_path $DB \
--image_path reg-images \
--image_list_path reg-image-list.txt \
--ImageReader.camera_model SIMPLE_PINHOLE \
--ImageReader.camera_params "$F, $CX, $CY" \
--SiftExtraction.use_gpu 1 \
--SiftExtraction.gpu_index=0,1,2,3,4,5,6,7 \
--SiftExtraction.domain_size_pooling 1 \
--SiftExtraction.estimate_affine_shape 1
time colmap exhaustive_matcher \
--database_path $DB \
--SiftMatching.use_gpu 1 \
--SiftMatching.gpu_index=0,1,2,3,4,5,6,7
time colmap image_registrator \
--database_path $DB \
--input_path $IN_MODEL \
--output_path $OUT_MODEL \
--Mapper.ba_local_max_num_iterations 40 \
--Mapper.ba_global_max_num_iterations 100 \
--Mapper.ba_local_max_refinements 3 \
--Mapper.ba_global_max_refinements 5