Reproducibility for "schur_eliminator_impl.h" in multithread mode

51 views
Skip to first unread message

Pierre Moulon

unread,
Aug 3, 2017, 5:17:31 PM8/3/17
to ceres-...@googlegroups.com
Hello,

I noticed that a "random_shuffle" is used in "schur_eliminator_impl.h".
https://github.com/ceres-solver/ceres-solver/blob/0859fe8a57620b6dcc592741127963de1d463bbc/internal/ceres/schur_eliminator_impl.h#L156

For my point of view this shuffle is not required and it open a potential back door for no reproducibility of results.

Please, can you provide some explanation about why this shuffle is required?

Thank you.

PS: 
I noticed that the variables "thread_id" are not set to const.
It would be better to use const X in order to avoid the value to be changed by any new contributions.

Regards,
Pierre

Sameer Agarwal

unread,
Aug 3, 2017, 6:45:47 PM8/3/17
to ceres-...@googlegroups.com
Pierre,
The shuffle only happens when the solver is running in multithreaded mode, in which case there is no determinism anyways. I remember adding this because of some concerns around lock contention in the underlying blocksparsematrix object holding the schur complement and the way the matrix rows had been sorted earlier.

Sameer



--
You received this message because you are subscribed to the Google Groups "Ceres Solver" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ceres-solver...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ceres-solver/CADjmpz8_yJgG50mrqU5iS7pJqs6wb10_aF623A4C3pj_PqLhEg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Pierre Moulon

unread,
Aug 3, 2017, 7:15:05 PM8/3/17
to ceres-...@googlegroups.com
Sameer,

Thanks for your answer.

I know that it happens only in multithread mode (I put it in the email subject).

If you it some concerns around the lock contention does it means that the parallelism is not good at this place?

I tried some bundle_adjuster on a 40 cores machine and I never meet trouble even when I disabled the shuffle.

./bin/bundle_adjuster --input=../problem-245-198739-pre.txt -num_threads=40 --linear_solver=sparse_schur -explicit_schur_complement=true


I suggest the following:
 - Think again if this shuffle is really required or just remove it since the I don't see any clear proof that it will create any lock contention (Does the lock contention can depends on the problem configuration?)

-> Did you have any concern about the non const thread_it variables ?

Regards,
Pierre

2017-08-03 15:45 GMT-07:00 'Sameer Agarwal' via Ceres Solver <ceres-...@googlegroups.com>:
Pierre,
The shuffle only happens when the solver is running in multithreaded mode, in which case there is no determinism anyways. I remember adding this because of some concerns around lock contention in the underlying blocksparsematrix object holding the schur complement and the way the matrix rows had been sorted earlier.

Sameer



On Thu, Aug 3, 2017 at 2:17 PM Pierre Moulon <pmo...@gmail.com> wrote:
Hello,

I noticed that a "random_shuffle" is used in "schur_eliminator_impl.h".
https://github.com/ceres-solver/ceres-solver/blob/0859fe8a57620b6dcc592741127963de1d463bbc/internal/ceres/schur_eliminator_impl.h#L156

For my point of view this shuffle is not required and it open a potential back door for no reproducibility of results.

Please, can you provide some explanation about why this shuffle is required?

Thank you.

PS: 
I noticed that the variables "thread_id" are not set to const.
It would be better to use const X in order to avoid the value to be changed by any new contributions.

Regards,
Pierre

--
You received this message because you are subscribed to the Google Groups "Ceres Solver" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ceres-solver+unsubscribe@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Ceres Solver" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ceres-solver+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ceres-solver/CABqdRUAuH%3D_OsDL-F517MJuxfOri7off957BCBH17LH2H7Pu2Q%40mail.gmail.com.

Sameer Agarwal

unread,
Aug 5, 2017, 2:06:04 AM8/5/17
to ceres-...@googlegroups.com
On Thu, Aug 3, 2017 at 4:15 PM Pierre Moulon <pmo...@gmail.com> wrote:
Sameer,

Thanks for your answer.

I know that it happens only in multithread mode (I put it in the email subject).

My point is, that if you are looking for reproducibility, given that this is happening in multithreaded mode, that is already a lost cause.

That said, I have not looked at this fragment of code in a long time, I can go back and look to see what if anything it is doing.
 

If you it some concerns around the lock contention does it means that the parallelism is not good at this place?

I tried some bundle_adjuster on a 40 cores machine and I never meet trouble even when I disabled the shuffle.

./bin/bundle_adjuster --input=../problem-245-198739-pre.txt -num_threads=40 --linear_solver=sparse_schur -explicit_schur_complement=true


I suggest the following:
 - Think again if this shuffle is really required or just remove it since the I don't see any clear proof that it will create any lock contention (Does the lock contention can depends on the problem configuration?)

-> Did you have any concern about the non const thread_it variables ?

Yes they should be const.  Send me a CL :)

Sameer



Regards,
Pierre

2017-08-03 15:45 GMT-07:00 'Sameer Agarwal' via Ceres Solver <ceres-...@googlegroups.com>:
Pierre,
The shuffle only happens when the solver is running in multithreaded mode, in which case there is no determinism anyways. I remember adding this because of some concerns around lock contention in the underlying blocksparsematrix object holding the schur complement and the way the matrix rows had been sorted earlier.

Sameer



On Thu, Aug 3, 2017 at 2:17 PM Pierre Moulon <pmo...@gmail.com> wrote:
Hello,

I noticed that a "random_shuffle" is used in "schur_eliminator_impl.h".
https://github.com/ceres-solver/ceres-solver/blob/0859fe8a57620b6dcc592741127963de1d463bbc/internal/ceres/schur_eliminator_impl.h#L156

For my point of view this shuffle is not required and it open a potential back door for no reproducibility of results.

Please, can you provide some explanation about why this shuffle is required?

Thank you.

PS: 
I noticed that the variables "thread_id" are not set to const.
It would be better to use const X in order to avoid the value to be changed by any new contributions.

Regards,
Pierre

--
You received this message because you are subscribed to the Google Groups "Ceres Solver" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ceres-solver...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Ceres Solver" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ceres-solver...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "Ceres Solver" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ceres-solver...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ceres-solver/CADjmpz_9-TBK1FwJoM6dXy08cD6zEZN4Mckar7SomX_qJ7521g%40mail.gmail.com.

Pierre Moulon

unread,
Aug 5, 2017, 7:00:28 PM8/5/17
to ceres-...@googlegroups.com
Sameer,

I made a CL for the const issue: https://ceres-solver-review.googlesource.com/#/c/ceres-solver/+/10860/

Regards,
Pierre M

Sameer Agarwal

unread,
Aug 5, 2017, 7:08:39 PM8/5/17
to ceres-...@googlegroups.com
Thanks Pierre, it is merged!

--
You received this message because you are subscribed to the Google Groups "Ceres Solver" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ceres-solver...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages