Checkpoints in Gem5-GPU and counting ticks

541 views
Skip to first unread message

Daniel Gerzhoy

unread,
Oct 13, 2016, 12:36:45 PM10/13/16
to gem5-gpu Developers List
Hi all,

Say I want to create a checkpoint

The help options say:

--take-checkpoints=TAKE_CHECKPOINTS <M,N> take checkpoints at tick M and every N ticks thereafter

so for example would it be:

...gem5.opt whatever.py --take-checkpoints=<3000000,1000000> ...etc

Start at 3million ticks and take one every million ticks. Right? Or is it a different syntax

Also, how do I count the ticks of gem5 vs the ticks in gpgpusim? are they exactly the same? 

Where is the particular tick variable that the checkpoint option interacts with?

Thanks,

Dan

Jason Lowe-Power

unread,
Oct 13, 2016, 3:13:12 PM10/13/16
to Daniel Gerzhoy, gem5-gpu Developers List
Hi Dan,

First, let me say what I usually do for checkpoints. All of my applications are annotated with region of interest (ROI) begin/end magic instructions. I then run gem5 with the --work-begin-exit-count=1 option, which exits simulation at the beginning of the ROI. I also specify --checkpoint-at-end which checkpoints when the simulation exits. With these options, I can generate a checkpoint at the beginning of the ROI.

Second, as a general statement, checkpointing is controlled by the python config scripts. You just need to call m5.checkpoint(<path>) any time simulation is not running (e.g., after m5.simulate()).

Finally, to answer your specific question, I think the syntax for the take-checkpoints option is just --take-checkpoints=1000,10. See (around) line 230 in gem5/configs/common/Simulation.py. 

GPGPU-Sim does use the same ticks as gem5. It is a clocked object just like everything in gem5. See the GPGPUSimComponentWrapper object in gem5-gpu/src/gpu/gpgpu-sim/ for how exactly this works.

Just FYI, gem5-gpu does not support checkpointing during kernel execution (in fact, it may or may not work after any kernel has been run). This is why I only use checkpointing at the start of the ROI, before any kernels have executed. I doubt you want to use periodic checkpointing (--take-checkpoints). I would either use an ROI, or specify an end tick (-m) and checkpoint at the end.

Let us know if you have more questions.

Cheers,
Jason
Reply all
Reply to author
Forward
0 new messages