Shared L3 and checkpointing

Kyu-Hyun Choi

unread,

Nov 3, 2013, 10:36:58 PM11/3/13

to gem5-g...@googlegroups.com

Hello.

Thank you for answering my previous question.(About checkpointing) I really appreciate it.

I could make / restore checkpoint now and it works pretty well.

Anyway, following your previous answers, gem5-gpu does not support shared L3 among CPU and GPU right now.

So I have some questions.

1. Doesn't it affect checkpointing to add shared L3 cache? (By modifying Ruby SLICC files)

2. Following Gem5 page, MOESI_hammer has private L1/L2 cache. Then MOESI_hammer protocol does not share data through cache? Only through main memory?

(e.g. GPU read data from CPU L2 cache and vice versa.)

3. If shared L3 cache makes problems with checkpointing, is it also matter in SE mode?

4. I would like to run in SE mode in some configurations for fast simulation. Is it possible to run multiple benchmarks together in SE mode?

(e.g. 1GPGPU apps + 4CPU apps, 1GPGPU apps + 2CPU apps and something like that)

Thank you again for your answering.

Joel Hestness

unread,

Nov 4, 2013, 11:46:29 AM11/4/13

to Kyu-Hyun Choi, gem5-gpu developers

Hi,

1. Doesn't it affect checkpointing to add shared L3 cache? (By modifying Ruby SLICC files)

It should be possible to use checkpointing after adding a shared L3 as long as you either collect checkpoints with the MOESI_hammer protocol (from which you should be able to restore into any other protocol), or if you'd like to collect checkpoints with the protocol including the shared L3, you'll need to make sure that it implements the line flush semantics similar to the MOESI_hammer L1/L2 (see Flush_line transitions).

2. Following Gem5 page, MOESI_hammer has private L1/L2 cache. Then MOESI_hammer protocol does not share data through cache? Only through main memory?
(e.g. GPU read data from CPU L2 cache and vice versa.)

That is partially correct. Note that data held in L2 caches can be requested by other L2 caches, so some sharing can occur.

3. If shared L3 cache makes problems with checkpointing, is it also matter in SE mode?

This shouldn't be a problem.

4. I would like to run in SE mode in some configurations for fast simulation. Is it possible to run multiple benchmarks together in SE mode?

(e.g. 1GPGPU apps + 4CPU apps, 1GPGPU apps + 2CPU apps and something like that)

Yes, this is possible, though tricky. You'd likely need modify the se_fusion.py configuration script to specify the workload to be run on each CPU core. See the get_processes() function in gem5/configs/example/se.py for insights on how to make that work. Also note that if you want these applications to interact in a particular way, you'll need to make sure that they are timed such that the parts of the application for which you're interested in observing concurrency overlap.

Hope this helps,

Joel

--
Joel Hestness
PhD Student, Computer Architecture
Dept. of Computer Science, University of Wisconsin - Madison
http://pages.cs.wisc.edu/~hestness/

Arth

unread,

Mar 5, 2015, 4:42:52 PM3/5/15

to gem5-g...@googlegroups.com

Hi,

I am trying to model a shared L3 cache in fusion architecture, and I was unable to follow the discussion on how to do that.

It would be really helpful if someone can give me pointers on how to implement it?

Or, if it is already implemented by someone, can you please post some reference code that I can either use/follow to get a shared L3?

4. I would like to run in SE mode in some configurations for fast simulation. Is it possible to run multiple benchmarks together in SE mode?

(e.g. 1GPGPU apps + 4CPU apps, 1GPGPU apps + 2CPU apps and something like that)

Yes, this is possible, though tricky. You'd likely need modify the se_fusion.py configuration script to specify the workload to be run on each CPU core. See the get_processes() function in gem5/configs/example/se.py for insights on how to make that work. Also note that if you want these applications to interact in a particular way, you'll need to make sure that they are timed such that the parts of the application for which you're interested in observing concurrency overlap.

I am trying to run a similar workload (in SE mode) with 1 GPGPU application (from rodinia-nocopy) and 4 CPU applications.

From what I understood, I can only attach a single process to a CPU core in SE mode. Does this mean that I will have to use 5 CPU + 1 GPU configuration in order to run this workload?

Thanks in advance. :)

-Arth (PioneerAxon)

Jason Power

unread,

Mar 6, 2015, 2:14:59 PM3/6/15

to Arth, gem5-g...@googlegroups.com

Hi Arth,

For the shared L3 cache I believe there are 4 possible methods (in order from simplest to most complex):

1) Use the MESI_three_level protocol.

2) Modify the MOESI_hammer directory to have a CacheMemory object which acts as a memory-side LLC cache. Note: this was the method I used in the HSC paper.

3) Create a simple cache that is a MemObject in gem5. Using the latest gem5/gem5-gpu you can hook any MemObject between Ruby and the memory controller. Again, this would be a memory-side LLC.

4) Create a new Ruby protocol or modify an existing protocol to have three levels of caches.

Other than MESI_three_level, I'm not aware of anyone with a publically available model for a three-level protocol.

For running CPU and GPU applications simultaneously, Yes, you would need 5 CPU cores in your case.

Jason

Arth Patel

unread,

Mar 9, 2015, 10:55:00 AM3/9/15

to Jason Power, gem5-gpu developers

Hi Jason,

Thanks for a quick reply.

For the shared L3 cache I believe there are 4 possible methods (in order from simplest to most complex):
1) Use the MESI_three_level protocol.

So, from my understanding, to achieve this, I need to do the following. (Please correct me if I'm wrong or missing any step)

1. Copy MESI_three_level.py from /gem5/configs/ruby/ to /gem5-gpu/configs/gpu-protocol and modify it accordingly.

2. Create a new X86_MESI_three_level_GPU build configuration in /gem5-gpu/build_opts/.

3. Modify se_fusion.py (and GPUConfig.py, GPUMemConfig.py, etc.) in /gem5-gpu/configs/ as needed and use it for further simulation.

2) Modify the MOESI_hammer directory to have a CacheMemory object which acts as a memory-side LLC cache. Note: this was the method I used in the HSC paper.
3) Create a simple cache that is a MemObject in gem5. Using the latest gem5/gem5-gpu you can hook any MemObject between Ruby and the memory controller. Again, this would be a memory-side LLC.

So, am I correct when I say that the only difference between Option 1, and Option 2,3 is the levels at which cache-coherence protocol acts?

4) Create a new Ruby protocol or modify an existing protocol to have three levels of caches.

Other than MESI_three_level, I'm not aware of anyone with a publically available model for a three-level protocol.

For running CPU and GPU applications simultaneously, Yes, you would need 5 CPU cores in your case.

I'm interested in understanding the interference when the L3 cache is shared by CPU and GPGPU. So, is there a way I can "bypass" cache for requests from CPU-5?

Can something similar be achieved as attaching the CPU-5 directly to main memory?

Thank you.. :)

--

Arth (PioneerAxon)

Jason Power

unread,

Mar 9, 2015, 1:26:21 PM3/9/15

to Arth Patel, gem5-gpu developers

Hi Arth,

On Mon, Mar 9, 2015 at 9:55 AM Arth Patel <arth...@gmail.com> wrote:

Hi Jason,

Thanks for a quick reply.

For the shared L3 cache I believe there are 4 possible methods (in order from simplest to most complex):
1) Use the MESI_three_level protocol.

So, from my understanding, to achieve this, I need to do the following. (Please correct me if I'm wrong or missing any step)

1. Copy MESI_three_level.py from /gem5/configs/ruby/ to /gem5-gpu/configs/gpu-protocol and modify it accordingly.
2. Create a new X86_MESI_three_level_GPU build configuration in /gem5-gpu/build_opts/.
3. Modify se_fusion.py (and GPUConfig.py, GPUMemConfig.py, etc.) in /gem5-gpu/configs/ as needed and use it for further simulation.

Yes, that's correct.

2) Modify the MOESI_hammer directory to have a CacheMemory object which acts as a memory-side LLC cache. Note: this was the method I used in the HSC paper.
3) Create a simple cache that is a MemObject in gem5. Using the latest gem5/gem5-gpu you can hook any MemObject between Ruby and the memory controller. Again, this would be a memory-side LLC.

So, am I correct when I say that the only difference between Option 1, and Option 2,3 is the levels at which cache-coherence protocol acts?

Options 2 and 3 will allow you to use any cache coherence protocol, not just MESI_three_level. This includes using heterogeneous protocols like VI_hammer.

4) Create a new Ruby protocol or modify an existing protocol to have three levels of caches.

Other than MESI_three_level, I'm not aware of anyone with a publically available model for a three-level protocol.

For running CPU and GPU applications simultaneously, Yes, you would need 5 CPU cores in your case.

I'm interested in understanding the interference when the L3 cache is shared by CPU and GPGPU. So, is there a way I can "bypass" cache for requests from CPU-5?
Can something similar be achieved as attaching the CPU-5 directly to main memory?

Not really. During GPU execution CPU-5 won't be making any memory accesses anyway. So I doubt it will affect your simulation.

15053...@mail.nwpu.edu.cn

unread,

Dec 19, 2016, 12:35:47 AM12/19/16

to gem5-gpu Developers List, arth...@gmail.com

Hi Jason,

I read you HSC paper, but I don't know what's the meaning of the shared memory-side L3 cache? the shared L3 cache between CPU and GPU?

Thanks!

Xiaofeng Li

在 2015年3月7日星期六 UTC+8上午3:14:59，Jason Lowe-Power写道：

Jason Lowe-Power

unread,

Dec 19, 2016, 10:15:38 AM12/19/16

to 15053...@mail.nwpu.edu.cn, gem5-gpu Developers List, arth...@gmail.com

Hi Xiaofeng,

The L3 cache sits between the coherence directory and memory. Thus, it is not involved in the coherence protocol at all. Whenever there is an on-chip cache miss, first the L3 is checked and if it is not present, then memory.

Jason

--

Jason

trib...@gmail.com

unread,

Jan 7, 2017, 12:54:57 PM1/7/17

to gem5-gpu Developers List, arth...@gmail.com

Hi Jason,

I posted a question earlier and then found this . So I understand how the first method works. But as you say its not that flexible if we want other coherence protocols to work with it. So your second method "Modify the MOESI_hammer directory to have a CacheMemory object which acts as a memory-side LLC cache". Which are the relevant files one should be looking to modify to add a shared L3 if using this method?. Do we modify .sm files as well?.

Jason Lowe-Power

unread,

Jan 9, 2017, 10:44:42 AM1/9/17

to trib...@gmail.com, gem5-gpu Developers List, arth...@gmail.com

Hi,

Yes. You will need to modify VI_hammer-dir.sm.

Jason

--

Jason

Message has been deleted

lucky

unread,

Jul 24, 2020, 6:50:13 AM7/24/20

to gem5-gpu Developers List

Hello Jason:

The MESI_Three_Level protocol is designed for L0-L1-L2 cache, how to use in L1-L2-L3 cache? how to use the MESI_three_level protocol to add L3 cache?
在 2017年1月9日星期一 UTC+8下午11:44:42，Jason Lowe-Power写道：

Reply all

Reply to author

Forward