Exact reproducibility

40 views
Skip to first unread message

S.Park

unread,
May 17, 2023, 11:15:50 AM5/17/23
to mcx-users
Hi Dr. Fang,

I would like to exactly reproduce the MCX photon simulation, using the input data format of 'muamus_short'.
When I tested it, I observed the floating point summation error.

Then, I checked the relevant previous conversation (https://groups.google.com/g/mcx-users/c/E9VgPb4q62M/m/d6G_9Dr7AgAJ):
my previous comment on "exact reproducibility is not guaranteed" is for the output fluence - because accumulation of floating point numbers from multi-threads may produce different outputs due to the execution order (i.e. limited precision floating-point summation is not commutable). However, that comment is not true for RNG seeds. The host/CPU side random number seeds are expected to reproducible; as a result, the RNG seed used for each photon inside each thread are also supposed to be exactly reproducible.

My questions are:
1. Does this error exist in MMC as well (because of the same reason that you explained in the previous conversation above)?
2. Do you think the use of '__syncthreads()' in the kernel code of MCX can help?

Looking forward to hearing from you.

Best,
Seonyeong

Qianqian Fang

unread,
May 17, 2023, 12:28:25 PM5/17/23
to mcx-...@googlegroups.com, S.Park
please explain what did you mean by "floating point summation error"? I did not see this information in previous messages.
--
You received this message because you are subscribed to the Google Groups "mcx-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mcx-users+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/mcx-users/9799a093-5ba4-455d-891e-9cb947c1583bn%40googlegroups.com.


Qianqian Fang

unread,
May 17, 2023, 11:56:14 PM5/17/23
to S P, mcx-...@googlegroups.com

it should not be called an "error" - it is the expected (and frequently observed) behavior of floating-point operations in a multi-threading environment.

the source of the problem is because MC output is achieved by accumulating small floating-point numbers at each voxel as every photon packet passes by. Therefore, the output value can be ultimately written as

v = (((vt1 + vt2) + vt3) + vt4) + ...

here t_i refers to the packet simulated by the i-th thread. However, in a multi-threading environment, the order of this summation is not guaranteed, when rerunning the exact computation on the exact the same hardware, the summation could be done as

v = (((vt2 + vt3) + vt4) + vt1) + ...

floating-point has limited precision, and (a+b)+c is different from a+ (b+c) in such a finite precision (i.e. floating-point addition is not associative), therefore, you are expected to produce different results.

the only case where such a result is exactly reproducible is when the thread scheduler executes threads in an deterministic logic/order, which can not be guaranteed by most multi-threading environments. In general, CUDA's runtime has exhibit a relatively stable thread scheduling scheme, and in some cases, you can see nearly reproducible results, however, to be exactly reproducible is not guaranteed.


there has been numerous discussions on this topic, and making results bit-to-bit reproducible even on the same hardware is quite difficult and requires manual handling of threads - __syncthreads() will not be enough because it only asks all threads to stop at the same point, but does not guarantee their orders.


http://www.shodor.org/media/content//petascale/materials/UPModules/dynamicProgrammingPartI/bestPractices.pdf   (see Section 7.2.2)

https://annals-csis.org/Volume_5/pliks/86.pdf

https://hal.science/hal-00949355/document

https://www.sciencedirect.com/science/article/abs/pii/S0167819115001155

https://www-pequan.lip6.fr/~jezequel/ARTICLES/article_CANA2015.pdf

https://www.nist.gov/system/files/documents/itl/ssd/is/NRE-2015-07-Nguyen_slides.pdf



On 5/17/23 14:31, S P wrote:
Hi Dr. Fang,

Thank you for the prompt response!
I used "floating point summation error" to mean "the accumulation of floating point numbers from multi-thread". I apologize. I should have described it better. 
because accumulation of floating point numbers from multi-threads may produce different outputs

Best,
Seonyeong

S P

unread,
May 18, 2023, 12:00:35 AM5/18/23
to mcx-...@googlegroups.com
Dear Dr. Fang,

I really appreciate your detailed explanation and relevant resources.

Best,
Seonyeong

DA V

unread,
May 19, 2023, 6:06:34 AM5/19/23
to mcx-...@googlegroups.com
Hi!
Finally, I found the error. I was considering an annular set of detectors in order to increase the SNR, but when running the simulation some of those detectors weren't located at an interface (what is very strange, because if I plot the  z, volume and all the detectors, they are clearly at the same z (z=0), so in the exact interface air-medium . By fixing that, everything begins to work fine. 

Best,
Demián.

Reply all
Reply to author
Forward
0 new messages