Issues with random number generator

75 views
Skip to first unread message

Bastiaan

unread,
Jul 25, 2018, 6:07:40 AM7/25/18
to mcx-users
Dear Fang,

I perform three exactly the same simulations, seeded with RNGseed=7777777. Attached is a figure where I plot the seeds of the detected photon (these seeds are found in the .mch file and are used to rerun the simulation). I was expecting to see the seed data for the 3 identical runs to be on top of eachother. However, this is not the case. 

Can you help me understanding what is going on? 

Also attached are the 6 .json files that are used to run the simulations (3 identical shape files and 3 identical paramter files). I could also send you the .mch files.  

I am using mcx2017.7 on a windows 64 bit machine. 


Thanks for your help. 


Kind regards,

Bastiaan
RNGSeeds_777777__20180725_110657__20180725_110648__20180725_111253.png
Parameters_g_0p9_usprime_10_df_1_Np_1000000_a_4_vx_0p01_seed_7777777_RUN1.json
Parameters_g_0p9_usprime_10_df_1_Np_1000000_a_4_vx_0p01_seed_7777777_RUN2.json
Parameters_g_0p9_usprime_10_df_1_Np_1000000_a_4_vx_0p01_seed_7777777_RUN3.json
Shape_g_0p9_usprime_10_df_1_Np_1000000_a_4_vx_0p01_seed_7777777_RUN1.json
Shape_g_0p9_usprime_10_df_1_Np_1000000_a_4_vx_0p01_seed_7777777_RUN2.json
Shape_g_0p9_usprime_10_df_1_Np_1000000_a_4_vx_0p01_seed_7777777_RUN3.json

Qianqian Fang

unread,
Jul 25, 2018, 12:06:54 PM7/25/18
to mcx-...@googlegroups.com, Bastiaan

hi Bastiaan

please try the nightly build. I am about to announce MCX 2018 (version 1.0 final - yeah!), the nightly build could be the final package if no issues are found by users.

there has been many many improvement/fixes, although I don't see they are related to the issue reported here, but please give it a try: http://mcx.space/nightly/

on the other hand, you can test the following:

1. remove the RNGSeed fields from the input file, you should be able to see reproduced results (default RNG seed is fixed)

2. use -E 7777777 in the command line, and see if the results are reproduced.

one thing I want to clarify here is that, in theory, creating reproduced results from parallel programs is generally not expected. This is because (a+b)+c != a+ (b+c) in floating point in multi-threading environment. However, nvidia's hardware has been giving excellent reproducibility due to some fixed thread scheduling method, so mcx has been able to generate reproducible results. In comparison, MMC is not able to produce exactly the same results using OpenMP on CPUs due to the very reason.

Qianqian



-------- Forwarded Message --------
Subject: [mcx-users] Issues with random number generator
Date: Wed, 25 Jul 2018 03:07:40 -0700 (PDT)
From: Bastiaan <hcbfl...@gmail.com>
Reply-To: mcx-...@googlegroups.com
To: mcx-users <mcx-...@googlegroups.com>
--
You received this message because you are subscribed to the Google Groups "mcx-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mcx-users+...@googlegroups.com.
To post to this group, send email to mcx-...@googlegroups.com.
Visit this group at https://groups.google.com/group/mcx-users.
For more options, visit https://groups.google.com/d/optout.

Bastiaan Florijn

unread,
Jul 26, 2018, 4:20:33 AM7/26/18
to Qianqian Fang, mcx-...@googlegroups.com
Dear Qianqian,

Thanks for your quick reply. 

I have tested both your suggestions and the results are not reproducible, see attachemts. DefaultSeeds.png show the results for your first suggestion. Seeds-E7777777flag.png shows the results for your second suggestion. 

Furthermore, your comment

 'one thing I want to clarify here is that, in theory, creating reproduced results from parallel programs is generally not expected. This is because (a+b)+c != a+ (b+c) in floating point in multi-threading environment. However, nvidia's hardware has been giving excellent reproducibility due to some fixed thread scheduling method, so mcx has been able to generate reproducible results. In comparison, MMC is not able to produce exactly the same results using OpenMP on CPUs due to the very reason. '


gives me a concern. The reason why we can do parallel programming is due to the data parallelism which is assured to give us the same result. If that assertion was true, then doing (a+b) in the first core, with c in the second core and then using the second core to get the result should give us a floating number which is close to machine precision if we did it the other way round. 
Are you saying that because we it's parallel programs this property is no longer true?

Kind regards,

Bastiaan


To unsubscribe from this group and stop receiving emails from it, send an email to mcx-users+unsubscribe@googlegroups.com.
DefaultSeeds.png
Seeds-E777777flag.png

Qianqian Fang

unread,
Jul 26, 2018, 11:15:17 AM7/26/18
to Bastiaan Florijn, mcx-...@googlegroups.com
On 07/26/2018 04:20 AM, Bastiaan Florijn wrote:
Dear Qianqian,

Thanks for your quick reply. 

I have tested both your suggestions and the results are not reproducible, see attachemts. DefaultSeeds.png show the results for your first suggestion. Seeds-E7777777flag.png shows the results for your second suggestion.

oh, wait, I think I misunderstood your plots. you plotted the RNG's seeds?
well, I am not sure how you cast the seed data (each threads uses a 64bit
state space), but the misalignment of the plot does not necessarily mean
the seeds are not reproduced. they may have a random ordering every
time you run the simulation, but when you sort them, they should still be
the same set of seeds.

if your captured photon numbers is identical in all 3 cases, I am quite confident
that the seeds are likely the same set, with just a random shuffling.


Furthermore, your comment

 'one thing I want to clarify here is that, in theory, creating reproduced results from parallel programs is generally not expected. This is because (a+b)+c != a+ (b+c) in floating point in multi-threading environment. However, nvidia's hardware has been giving excellent reproducibility due to some fixed thread scheduling method, so mcx has been able to generate reproducible results. In comparison, MMC is not able to produce exactly the same results using OpenMP on CPUs due to the very reason. '


gives me a concern. The reason why we can do parallel programming is due to the data parallelism which is assured to give us the same result. If that assertion was true, then doing (a+b) in the first core, with c in the second core and then using the second core to get the result should give us a floating number which is close to machine precision if we did it the other way round. 
Are you saying that because we it's parallel programs this property is no longer true?

floating point addition is not associative is known by the parallel computing
community, see

https://arxiv.org/pdf/1312.3300.pdf

however, the NVIDIA hardware has some deterministic scheduling scheme,
which makes the results reproducible in most cases.

Qianqian

Bastiaan

unread,
Jul 30, 2018, 10:06:07 AM7/30/18
to mcx-users
Dear Qiangqian,

Thanks for your quick reply. I downloaded the nightly build version and everything works as expected! As a function of photons launched, for different initial seeds,  I see a spread in one of the stochastic parameters that I'm after. This spread is decreasing as a function of number of photons. 

Thanks also for your link to the article. 

Kind regards,

Bastiaan

Kind regards,

Bastiaan


To unsubscribe from this group and stop receiving emails from it, send an email to mcx-users+...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages