Accel-sim determinism

327 views
Skip to first unread message

Rodrigo Huerta Gañan

<rodrigo.huerta.ganan@upc.edu>
unread,
Jul 30, 2021, 8:40:20 AM7/30/21
to accel-sim
Hi, I have run several times the Rodinia 2.0 benchmark with the same configuration (trace and .config file) V100-SASS (the one of the github example) and I have get the stats of the IPC. I am surprised that the IPC output from one execution to another is different.

I thought that as the simulator follows a trace and as I always use the same traces, the IPC will be the same always. 

I have also tried getting the trace of the vecAdd example app from nvbit for a gtx1080ti and getting the stats for that gpu after modeling and I have experienced the same but in a smaller magnitude as the test is smaller.

Please, could you tell me why and what is the reason of that IPC changes?

Here I attach some photos of tables of the outputs of Rodinia 2.0 and vecAdd.


vecAddDet.png
rodinia2IpcDet.png

Mahmoud Khairy

<khairy2011@gmail.com>
unread,
Aug 7, 2021, 2:03:51 PM8/7/21
to accel-sim
sorry for our late reply. are you using the dev or the release branch? 

Rodrigo Huerta Gañan

<rodrigo.huerta.ganan@upc.edu>
unread,
Aug 7, 2021, 2:06:37 PM8/7/21
to accel-sim
I have downloaded the release version 1.1.0 from here https://github.com/accel-sim/accel-sim-framework/releases

Mahmoud Khairy

<khairy2011@gmail.com>
unread,
Aug 7, 2021, 2:07:38 PM8/7/21
to accel-sim
ok, and I am assuming you did not do any changes? 
Also, can you please check the number of instructions and cycles? which one changes during different runs? 

Rodrigo Huerta Gañan

<rodrigo.huerta.ganan@upc.edu>
unread,
Aug 7, 2021, 2:11:18 PM8/7/21
to accel-sim
In both tests (vecadd and rodinia 2.0) the what it changes is the number of cycles. Instructions is constant between runs.

Mahmoud Khairy

<khairy2011@gmail.com>
unread,
Aug 7, 2021, 2:17:46 PM8/7/21
to accel-sim
ok, this seems to be bug. I opened an issue on github for to keep track it. We will try to reproduce this issue on our end and see what is wrong?

Thanks!

Mahmoud Khairy

<khairy2011@gmail.com>
unread,
Aug 7, 2021, 2:18:29 PM8/7/21
to accel-sim

Rodrigo Huerta Gañan

<rodrigo.huerta.ganan@upc.edu>
unread,
Aug 7, 2021, 2:30:41 PM8/7/21
to accel-sim
Ok, thank you. I track what people say about it here and github.

Mahmoud Khairy

<khairy2011@gmail.com>
unread,
Aug 9, 2021, 1:01:10 PM8/9/21
to accel-sim
Hi, I rerun the rodinia_2.1-ft twice using the release branch with V100 config file, and the IPC results are the same (0% error). So, I am not able reproduce your behavior. Could you please send us the details of your runs (i.e. the exact command line). Thanks!

Rodrigo Huerta Gañan

<rodrigo.huerta.ganan@upc.edu>
unread,
Aug 9, 2021, 3:03:04 PM8/9/21
to accel-sim
The commands that I have run are:
To launch the test:  ./util/job_launching/run_simulations.py -B rodinia_2.0-ft -C QV100-SASS -T ./hw_run/rodinia_2.0-ft/11.0/ -N gForum2
To monitor: ./util/job_launching/monitor_func_test.py -v -N gForum2
To get stats: ./util/job_launching/get_stats.py -N gForum2 | tee gForum2.csv

PD: I used the same comands changed 2 to 1 in order to obtain the other file.
PD2: Here I attached the stats file and if you go to the IPC stat you will see different numbers for the two runs.

gForum2.csv
gForum1.csv

Rodrigo Huerta Gañan

<rodrigo.huerta.ganan@upc.edu>
unread,
Aug 12, 2021, 5:01:42 AM8/12/21
to accel-sim
I have been running the same rodinia 2 tests in release mode (previously I was using debug) and I have seen the same behavior of differences in IPC and other stats.
Here are my software versions and a bit of detail of my machine even I don't think is important because I use your Rodinia2 traces for V100.
CPU: i7 8700k
GPU: GTX1080TI
CUDA: 11.0
DRIVER: 450.142.00
OS: Ubuntu 20.04

Moreover, I have been looking into the code to see if I'm able to see the reason: I have seen the following things.
-linear_to_raw_address_translation::addrdec_tlx method uses a rand() function, but I don't think that this is the cause because I suppose that the seed is set at  gpgpu_trace_sim_init_perf_model or gpgpu_ptx_sim_init_perf with srand(1), so It always has the same seed, so the same values are expected.
-intersim2 or booksim2 files has a lot of randoms, but I think they always use the same seed, so the same values are expected.
-inside a subfolder debug_tools theare many mentions to randoms, but I dind't look into it.

This is what I found, I expect that it will be useful to you.

Mahmoud Khairy

<khairy2011@gmail.com>
unread,
Aug 12, 2021, 9:56:29 AM8/12/21
to accel-sim
yes, I can confirm you are right. We have found this behavior too. The first time I checked multiple apps only (backprob and hotspot). but, when I looked at the other apps, they do not show consistent results for the same run, and there is <1% error.
We are working on that to figure it out. Thanks! 
Reply all
Reply to author
Forward
0 new messages