Checkpoints in Full System Mode

3,387 views
Skip to first unread message

Robert Bixler

unread,
May 6, 2014, 9:44:53 AM5/6/14
to gem5-g...@googlegroups.com
Hello,

I am having some issues figuring out how to do checkpointing in Full System Mode with x86. Basically, I want a checkpoint after booting up the system.

I'm a little confused on the particulars. This post seems to indicate that I need to use the MOESI_hammer protocol to do checkpointing properly:


Two things I'm unsure of: how to use the MOESI_hammer protocol when building gem5-gpu with scons, and how to then do checkpointing.

In my attempt to use the MOESI_hammer protocol, I used this command: scons build/MOESI_hammer/gem5.opt --default=X86 EXTRAS=../gem5-gpu/src:../gpgpu-sim/ PROTOCOL=MOESI_hammer GPGPU_SIM=True

Does this get me a MOESI_hammer protocol over the VI_hammer as shown in the example in the quickstart guide (https://gem5-gpu.cs.wisc.edu/wiki/start)?

I then ran the full sys mode with: build/MOESI_hammer/gem5.opt ../gem5-gpu/configs/fs_fusion.py --kernel=/pathToKernel/x86_64-vmlinux-2.6.22.9 --disk-image=/pathToDiskImage/linux-x86.img --cpu-type=atomic

This seems to work ok, and I am able to connect with m5 term and create a checkpoint, but upon exiting the simulation and trying to restore the checkpoint with: build/MOESI_hammer/gem5.opt ../gem5-gpu/configs/fs_fusion.py --kernel=/pathToKernel/x86_64-vmlinux-2.6.22.9 --disk-image=/pathToDiskImage/linux-x86.img --checkpoint-restore=1 --restore-with-cpu=atomic

I get this error:
fatal: Can't unserialize 'system.physmem.store0:range_size'
 @ cycle 198216027826000
[paramIn:build/MOESI_hammer/sim/serialize.cc, line 228]

I previously had this problem with gem5 when I was trying to restore from a checkpoint created with a timing cpu, I believe, and I thought the fix was to create the checkpoint with an atomic cpu, but this didn't seem to work here.

Could someone help address any misconceptions I have and help me figure out what I am doing wrong?





Jason Power

unread,
May 6, 2014, 10:09:59 AM5/6/14
to Robert Bixler, gem5-g...@googlegroups.com
Hi Robert,

To answer your questions:

1) You described the correct way to compile gem5-gpu with MOESI_hammer. (BTW, you can restore the checkpoint using any protocol, including VI_hammer.)
2) You need to use timing CPU to take checkpoints.
3) You should use the --restore-with-cpu=timing option when restoring.
4) I would use the hack_back rcs file (gem5/configs/boot/hack_back_ckpt.rcS). This scriptfile will boot linux and then take a checkpoint immediately after boot. Then, you can load the checkpoint and change the script and things "just work."

Let us know if you still have more problems.

Jason

Robert Bixler

unread,
May 6, 2014, 10:54:57 AM5/6/14
to gem5-g...@googlegroups.com, Robert Bixler
Hello Jason,

Thanks for your reply.

I tried using the timing cpu to take and restore the checkpoints, and I still ended up with the same error message. Here's the commands that I used:

build/MOESI_hammer/gem5.opt ../gem5-gpu/configs/fs_fusion.py --kernel=/PathToKernel/x86_64-vmlinux-2.6.22.9 --disk-image=/PathToImage/linux-x86.img --cpu-type=timing
build/MOESI_hammer/gem5.opt ../gem5-gpu/configs/fs_fusion.py --kernel=/PathToKernel/x86_64-vmlinux-2.6.22.9 --disk-image=/PathToImage/linux-x86.img --checkpoint-restore=1 --restore-with-cpu=timing

And I again received the following error:

warn: Reading current count from inactive timer.
0: system.remote_gdb.listener: listening for remote gdb #0 on port 7000
fatal: Can't unserialize 'system.physmem.store0:range_size'
 @ cycle 91140219746000
[paramIn:build/MOESI_hammer/sim/serialize.cc, line 228]
Memory Usage: 207712 KBytes

I'll try running the script file and see if that works instead. Are there any reasons you can think of that I would still be unable to unserialize the physical memory when using the timing cpu for creating the checkpoint and restoring?

Thanks, I really appreciate the help.

Robert Bixler

unread,
May 6, 2014, 11:39:25 AM5/6/14
to gem5-g...@googlegroups.com, Robert Bixler
Hi Jason,

Running the script did not solve my problem either, unfortunately; I still get the same error as above when running this command:

build/MOESI_hammer/gem5.opt ../gem5-gpu/configs/fs_fusion.py --kernel=/home/rbix/proj/full_system_images/x86-system/binaries/x86_64-vmlinux-2.6.22.9 --disk-image=/home/rbix/proj/full_system_images/x86-system/disks/linux-x86.img --cpu-type=timing  --script=configs/boot/hack_back_ckpt.rcS



On Tuesday, May 6, 2014 10:09:59 AM UTC-4, Jason Power wrote:

Robert Bixler

unread,
May 7, 2014, 9:46:15 AM5/7/14
to gem5-g...@googlegroups.com, Robert Bixler
I'm at quite a loss here. I'm able to create and resume from checkpoints (both timing and atomic) using just Gem5. Is there some setting for Gem5-gpu that could be causing them to not work when using Gem5-gpu versus gem5?

If anyone has any suggestions on how to figure this out, it would certainly save me a lot of time.

Thanks for any help!

Joel Hestness

unread,
May 7, 2014, 11:16:05 PM5/7/14
to Robert Bixler, gem5-gpu developers
Hi Robert,
  I ran a few tests today using the latest version of gem5-gpu and the same command lines that you've supplied here, and I am unable to reproduce the error you're experiencing.

  Are you using the latest version of gem5-gpu?  From what I recall, the "physmem" object name and "range_size" parameter have not been used by Ruby (and by extension, gem5-gpu coherence protocols) for some time now. So, if you're using an older version of gem5-gpu, we'd need a bunch more info about your configuration (versions of repos) to be able to help.  Can you let us know?

  Thanks,
  Joel

--
  Joel Hestness
  PhD Student, Computer Architecture
  Dept. of Computer Science, University of Wisconsin - Madison
  http://pages.cs.wisc.edu/~hestness/

Robert Bixler

unread,
May 8, 2014, 3:29:25 AM5/8/14
to gem5-g...@googlegroups.com, Robert Bixler
Hello Joel,

Thanks for your help.

I tried rebuilding gem5-gpu and it still didn't work. I was using the guide here: https://gem5-gpu.cs.wisc.edu/wiki/start

I entered the commands from the quickstart pretty much line for line, using gem5 revision 9879 as indicated here: https://docs.google.com/spreadsheet/ccc?key=0AvwlHlT78qDYdG5pRENBUWNfQUw0dXctY1ZEZjFxMXc#gid=3

Is there something I'm doing wrong in setting up gem5-gpu? I previously set up gem5 in a different location; could there be conflicts between gem5-gpu and this somehow?

Joel Hestness

unread,
May 9, 2014, 10:44:53 AM5/9/14
to Robert Bixler, gem5-gpu developers
Hi Robert,
  There shouldn't be any conflicts with gem5.

  I don't understand what the problem could be given that there are no instances of "physmem" Python objects in our current codebase (these were removed after we updated past gem5 changeset 9826 back in Sept. 2013).  I'd recommend trying a couple things:

  1) Try pulling and updating your gem5-patches, gpgpu-sim-patches, and gem5-gpu repos:

  % cd /path/to/gem5-gpu
  % cd gem5/.hg/patches
  % hg pull -u
  % cd ../../
  % hg qpop -a
  % hg qpush -a
  % cd ../gem5-gpu
  % hg pull -u
  % cd ../gpgpu-sim/.hg/patches
  % hg pull -u
  % cd ../../
  % hg qpop -a
  % hg qpush -a
  % cd ../gem5

  At this point you should be able to rebuild and be assured you're using the latest code.  If you don't see any files added/updated on the hg pull -u commands, something is really funky, since your error suggests that you're building with old code.  In this case, it might be a good idea to try pulling the current gem5-gpu repos to another location, and rebuild from scratch.

  2) If the above still doesn't work, make sure that you're not using old or corrupted checkpoints: Delete your prior checkpoints and re-run the checkpoint collection sim.  This should ensure that gem5-gpu picks up the latest checkpoint that you collect.

  Joel


Jason Power

unread,
May 9, 2014, 11:53:21 AM5/9/14
to Joel Hestness, Robert Bixler, gem5-gpu developers
Another possible problem is using the same build directory. Make sure that you have completely wiped out all prior builds and their metadata. I've been bitten by that one a few times.

Jason

Sohan Sharma

unread,
Feb 11, 2015, 8:29:51 AM2/11/15
to gem5-g...@googlegroups.com
Hi Robert,

Were you able to fix the checkpointing issue? I am also getting a similar error message while trying to restore the checkpoint but not sure why. Here is the command line option that I am using to create and restore checkpoint and the error that I am getting:
Any pointer to fix this is appreciated.

build/X86_VI_hammer_GPU/gem5.opt ../gem5-gpu/configs/fs_fusion.py --kernel=/home/sohan/gem5-gpu-clean/full_system_images/disks/x86_64-vmlinux-2.6.28.4-smp --disk-image=/home/sohan/gem5-gpu-clean/gem5/full_system_images/disks/x86root.img --cpu-type=timing --script=configs/boot/hack_back_ckpt.rcS

build/X86_VI_hammer_GPU/gem5.opt ../gem5-gpu/configs/fs_fusion.py --kernel=/home/sohan/gem5-gpu-clean/full_system_images/disks/x86_64-vmlinux-2.6.28.4-smp --disk-image=/home/sohan/gem5-gpu-clean/gem5/full_system_images/disks/x86root.img --script=../runscript.rcs --checkpoint-restore=1 --restore-with-cpu=timing

runscript.rcs script I want to run after restoring the check point.

fatal: Can't unserialize 'system:kernel_symtab.addr_32892'
 @ tick 5573535290000
[paramIn:build/X86_VI_hammer_GPU/sim/serialize.cc, line 230]
Memory Usage: 2460952 KBytes
Program aborted at tick 5573535290000
Aborted (core dumped)

Regards
Sohan

Joel Hestness

unread,
Feb 20, 2015, 3:07:14 PM2/20/15
to Sohan Sharma, gem5-gpu developers
Hi Sohan,
  Are you using the latest revisions of gem5-gpu, or an older version? If it's an older version, can you let me know which repo revisions?

  Can you also attach your output files for your checkpoint collection run (i.e. the first command that you copied here)? Specifically, I'm hoping to take a look at your config.ini, system.pc.com_1.terminal, and the gem5.out files.

  Thanks!
  Joel
--
  Joel Hestness
  PhD Candidate, Computer Architecture

Joel Hestness

unread,
Feb 23, 2015, 5:42:39 PM2/23/15
to Sohan Sharma, gem5-gpu developers
Hi Sohan,
  Ok. This looks pretty strange given that the checkpoint simulation dropped a checkpoint and exited correctly. It is likely that your checkpoint file has been corrupted somehow, and specifically, that the kernel symbol table variable is missing or incorrect in the checkpoint. I'd check to see if the checkpoint output file (<outputdir>/cpt.5573536067500/m5.cpt) was maybe truncated or contains garbage in/near the kernel symbol table. (Feel free to send me that file if you're still unsure). I'd recommend trying to recollect the checkpoint again and see if you run into the same result.

  Joel


On Mon, Feb 23, 2015 at 3:42 PM, Sohan Sharma <sudan...@gmail.com> wrote:
Hi Joel,

I am not using the latest revisions. It is little bit older.

Following are the revisions

gem5  = 10237 gem5-patches = 113
gem5-gpu = 270
gpgpu-sim = 3, gpgpus-sim patches = 52 ( I get compilation error with revision 48. The revision 48 is in the list of working version in google docs file column 4.)

The error is build/X86_VI_hammer_GPU/src/gpu/gpgpu-sim/cuda_core.cc: In member function 'bool CudaCore::executeMemOp(const warp_inst_t&)':
build/X86_VI_hammer_GPU/src/gpu/gpgpu-sim/cuda_core.cc:277: error: 'const class warp_inst_t' has no member named 'get_atomic')

There is no compilation error with gpgpu-sim patches revision = 52. However, restoring check point does not work.

I am attaching the output of command that I use to create check point.

gem5.out file contains the standard output printed on terminal that I re-directed to this file.

Thanks
Sohan

Sohan Sharma

unread,
Feb 24, 2015, 1:15:35 PM2/24/15
to Joel Hestness, gem5-gpu developers
Thanks Joel for the reply,

I got check pointing working.  I am not sure, but somehow I had two directories with names cpt.5573536067500, cpt.5573535290000 in m5out and in the later one m5.cpt was corrupted.
I deleted cpt.5573535290000 and tried creating check points again and now there is only one directory cpt.5573536067500 and everything works fine.

This can be marked fixed!

Thank you very much for the help.

Regards

Indraneel Sarkar

unread,
Jun 14, 2017, 12:50:47 AM6/14/17
to gem5-gpu Developers List
Hi Jason,

I am trying to execute the ASIMBENCH benchmarks on gem5. I have created a checkpoint using the hack_back rcs file (gem5/configs/boot/hack_back_ckpt.rcS) and would like to know how do I "change the script" as you mentioned in point 4).

I am trying to run the adobe.rcS present in the asimbench as provided in this link.

Thank you
Indraneel

Jason Lowe-Power

unread,
Jun 14, 2017, 9:38:56 AM6/14/17
to Indraneel Sarkar, gem5-gpu Developers List
Just point to adobe.rcS in your command line to gem5. (--script=adobe.rcS)

Jason

Indraneel Sarkar

unread,
Jun 15, 2017, 12:32:04 AM6/15/17
to Jason Lowe-Power, gem5-gpu Developers List
Hi Jason,

Thank you for replying.

I have created a checkpoint using the "arm_ckpt_asim.rcS" script present in this link, I have attached the arm_asim.sh script file in which the commands are present for your reference.

The script(arm_asim.sh) creates a checkpoint and successfully terminates.

Now when I try to restore from the checkpoint using the "ttpod.sh" script attached below, the script terminates after 6hours but the frame-capture shows that android has only booted and no progress has been made further.
I am attaching the output file(nohup800.out) as well for your reference.

Ideally, without checkpoints I've noticed using frame-capture that the music player(ttpod) opens and a music starts to play.

Could you please help me in successfully restoring the checkpoint.

Thank you,
Indraneel
arm_asim.sh
ttpod.sh
nohup800.out
arm_ckpt_asim.rcS
ttpod.rcS

Jason Lowe-Power

unread,
Jun 15, 2017, 10:02:53 AM6/15/17
to Indraneel Sarkar, gem5-gpu Developers List
Hi Indraneel,

I am not familiar with asim codebase. You may want to reach out to those developers to answer your question.

The hack_back script is special in how it handles loading a new script when restoring from a checkpoint. I'm not sure if arm_cpt_asim.rcS does the same thing or not.

Jason

Indraneel Sarkar

unread,
Jun 16, 2017, 12:03:03 AM6/16/17
to Jason Lowe-Power, gem5-gpu Developers List
Hi Jason,

Thank you for your time.

I will ask the asimbench developers about this query.

On a side note, I do have a small query regarding the functionality of gem5.

When I restore a checkpoint using -r 1, does the --frame-capture command accurately capture the state of the android system?
Because the stats.txt shows 3 dumps in it, with some difference in data values, but my frame-capture still shows that android is booting.

Could you please point me to some articles regarding this.

Thanking you,
Indraneel

Jason Lowe-Power

unread,
Jun 16, 2017, 8:51:46 AM6/16/17
to Indraneel Sarkar, gem5-gpu Developers List
Hi Indraneel,

I don't know how --frame-capture works. I doubt there's any documentation on it. I would read the source code if you want to know how it works.

Jason
Reply all
Reply to author
Forward
0 new messages