Fully Coherent LLC(L3 Cache)

733 views
Skip to first unread message

Chuanwei Sun

unread,
Dec 21, 2014, 3:52:18 AM12/21/14
to gem5-g...@googlegroups.com
Hi

I read the paper "Hetergenous System Coherence for Integrated CPU+GPU Systems", at Micro-2013. I am wondering if the available release of Gem5-GPU support this work, a fully coherent LLC(L3 cache) between CPU and GPU. I have foud some protocol, such as  MESI_Three_Level.py(in gem5/configs/ruby/, but it seem not support L3 Cache between CPU and GPU).

Could I find any  mature protocol which support fully coherent LLC(L3 cache) between CPU and GPU. I'm afraid that if I add L3 myself, will generate an unknown error.

Many thanks
Chuanwei

steve

unread,
Dec 21, 2014, 10:03:09 PM12/21/14
to gem5-g...@googlegroups.com
hi, i have the same question also.

Could Ruby Memory support HSC in Micro13'?
If yes, could you give me some directions about how to add this in ruby
Thanks in advance

Chuanwei Sun於 2014年12月21日星期日UTC+8下午4時52分18秒寫道:

Joel Hestness

unread,
Dec 22, 2014, 4:58:04 PM12/22/14
to steve, gem5-gpu developers
Hi Chuanwei and Steve(?),

  gem5-gpu doesn't support HSC as implemented for the MICRO '13 paper. It does, however, support fully coherence cache hierarchies listed in gem5-gpu/configs/gpu_protocols/. Details on each of the gem5 protocols can be found on the gem5 site here: http://gem5.org/Ruby. The VI_hammer protocol has read-only L1 caches, which may contain stale data (akin to NVIDIA Fermi GPUs).

  While the MESI_Two_Level hierarchy has a shared L2 cache, unfortunately, we have not assembled any 3-level hierarchies for use with gem5-gpu. It should be straightforward to create a MESI_Three_Level gem5-gpu protocol by adding pieces analogously to the MESI_Two_Level gem5-gpu protocol (gem5-gpu/configs/gpu_protocol/MESI_Two_Level_fusion.py).

  Another option would be to add an L3 cache to one of the 2-level hierarchies (MESI_Two_Level, MOESI_hammer, VI_hammer). You could do so by defining the cache controller state machine in a .sm file (e.g. in gem5/src/mem/protocol/ or gem5-gpu/src/mem/protocol/), which handles requests from the L2 caches and sends requests to the directory. This route would give you the most flexibility to implement something like HSC (as long as you start with MOESI_hammer or VI_hammer, which have private L2 caches).

  Hope that helps,
  Joel


--
  Joel Hestness
  PhD Candidate, Computer Architecture
  Dept. of Computer Science, University of Wisconsin - Madison
  http://pages.cs.wisc.edu/~hestness/

Jason Power

unread,
Dec 23, 2014, 11:20:53 AM12/23/14
to Joel Hestness, steve, gem5-gpu developers

Hello,

As Joel says, we don't have an HSC protocol in gem5-gpu. However, adding a memory-side L3 cache in Ruby is pretty easy (and is what we did in the HSC paper). Basically, you can just modify the ruby directory controller to first check a cache memory object before it sends a request to memory. IIRC, there was only one or two additional states needed in the directory machine.

Hope this helps,
Jason

Victor Garcia

unread,
Jan 27, 2015, 11:29:52 AM1/27/15
to gem5-g...@googlegroups.com, jthes...@gmail.com, k23...@gmail.com
Hi,

I am also interested in adding a coherent LLC between GPU and CPUs.

I am new to gem5 and still getting the hang of Ruby and SLICC, so I'm not quite sure I understand what you mean Jason.

By modifying the ruby directory controller you mean the .sm of the directory? I guess I could declare a cache memory object in that file and add some actions on a transition when receiving a GET to check whether it hits or misses, but I think that doesn't make much sense. How would you configure the L3? I understand that adding another cache level would require adding the necessary code in the configuration file so the object gets instantiated and connected to the hierarchy as the rest of the caches, and that would also require creating a new .sm for that cache, right?

I see that in some coherence protocols (MESI Two and Three level) the directory is co-located with the LLC, and the .sm of the directory is mainly used as the memory controller to interact with off-chip mem. This was my initial idea on how to implement a shared LLC, but on your heterogeneous VI_hammer protocol that would mean moving all the coherence actions now done in the -dir.sm to a new -L3.sm, and leaving the -dir.sm only as the memory controller?

Could you please elaborate a bit on your previous answer, and give me (us) some more tips on how to tackle this?

Thanks

Victor

Jason Power

unread,
Jan 27, 2015, 12:30:52 PM1/27/15
to Victor Garcia, gem5-g...@googlegroups.com, jthes...@gmail.com, k23...@gmail.com
Hi Victor,

It's definitely possible to add an L3 cache without having to add a new Ruby machine. In fact, that is what we did in the HSC paper (http://dx.doi.org/10.1145/2540708.2540747).

You described the method correctly, you would add a CacheMemory object and check the cache memory before sending the request to the memory controller. To configure this CacheMemory object, you would use the same methods as you do to configure the CacheMemory objects at the other levels of caches. E.g., as in gem5-gpu/configs/gpu_protocols/VI_hammer.py

It may be illustrative to look at the files generated by SLICC. They can be found in gem5/build/<name of your build>/mem/protocol/. The Directory_Controller.py would specifically be what you want to look at. If you look at L1Cache_Controller.py you'll find two RubyCache parameters to the object. Similarly, if you add a CacheMemory object to the directory, a new parameter will be added in Directory_Controller.py, which you will be able to set in the config file (gem5-gpu/configs/gpu_protocols/VI_hammer.py).

The second method you describe, adding a new Ruby machine (.sm file) is probably a more "correct" way to do it. But it is very time consuming and error prone. This method would allow you to model a co-located directory and L3 cache (e.g. the L3 cache tags will be the directory). This is a plausible design point, but not the only design possible. The memory side L3 cache (as is modeled in the first method) is also a plausible design.

Hopefully this makes things a little more clear. The steps from SLICC, to gem5 SimObjects, to config files, to the final simulated system are quite confusing. Let me know if you have any other questions. The documentation for gem5 (found at gem5.org) may also help you understand how things work.

Cheers,
Jason

Victor Garcia

unread,
Feb 9, 2015, 11:58:25 AM2/9/15
to gem5-g...@googlegroups.com, vgar...@gmail.com, jthes...@gmail.com, k23...@gmail.com
Hi,

Thanks Jason, that did indeed help a lot.

I'm still trying to debug my changes, but I've found something strange and I'm not sure whether it is a bug or just something I don't understand correctly.

In VI_hammer-dir.sm when reading from the triggerQueue I find this:

if (in_msg.Type == TriggerType:ALL_ACKS) {
trigger
(Event:All_acks_and_owner_data, in_msg.Addr,
pf_entry
, tbe);
 
} else if (in_msg.Type == TriggerType:ALL_ACKS_OWNER_EXISTS) {
trigger
(Event:All_acks_and_shared_data, in_msg.Addr,
pf_entry
, tbe);


The trigger msgs are enqueued in o_checkForCompletion. There, If  tbe.Owned = true, it triggers an ALL_ACKS_OWNER_EXISTS msg, otherwise it triggers ALL_ACKS (or ALL_ACKS_NO_SHARERS if tbe.Sharers = false)

Then, if there is indeed an owner, shouldn't the ALL_ACKS_OWNER_EXIST trigger the event All_acks_and_owner_data, and the other way around?

In any case, this should not affect what I'm doing of adding a memory-side L3, but I just wanted to confirm whether I'm understanding this right or not.

Thanks!


Jason Power

unread,
Feb 10, 2015, 10:23:00 AM2/10/15
to Victor Garcia, gem5-g...@googlegroups.com, jthes...@gmail.com, k23...@gmail.com
Hi Victor,

I'm pretty sure the code is right, but it probably doesn't have the most descriptive names.

I believe that All_acks_and_owner_data means that we got all of the acks and the directory has the data. In this case, the owner sent the data. Looking at the code, this is only used in the DMA transitions, so it's likely that the original author of the MOESI_hammer protocol was not as careful as they should have been with naming this event. You'll find this pattern repeat often around Ruby. Though, don't think this means there aren't bugs!

Cheers,
Jason

Victor Garcia

unread,
Feb 10, 2015, 10:34:39 AM2/10/15
to gem5-g...@googlegroups.com, vgar...@gmail.com, jthes...@gmail.com, k23...@gmail.com
Hi,

Yep, it seems to be just a bad chosen name...
Thanks anyway for taking the time to look at it.

Cheers!

Adarsh Patil

unread,
Jan 11, 2016, 10:47:54 PM1/11/16
to gem5-gpu Developers List, vgar...@gmail.com, jthes...@gmail.com, k23...@gmail.com
Hi all,
Did any one successfully implement shared L3 cache for VI_Hammer? Alternatively are there any plans to open source the memory side L3 cache from HSC MICRO '13 paper?

I have been trying to debug my implementation of looking up L3 cache in the VI_Hammer-dir.sm for sometime now. Is there anything apart from the Ruby / RubySlicc debug-flag to gem5 to help me debug it. Also how do I generate the HTML pages for the VI_Hammer Ruby protocol? 
I couldn't find any useful discussions on gem5-users for this.

Thanks in advance.

Regards,
Adarsh
HPC Lab
Indian Institute of Science
Bangalore

Adarsh Patil

unread,
Jan 12, 2016, 4:06:16 AM1/12/16
to gem5-gpu Developers List, vgar...@gmail.com, jthes...@gmail.com, k23...@gmail.com
I figured out that the HTML State X Event cant be generated by setting SLICC_HTML to True in src/mem/protocol/SConsopts
However my original question stands: Is simple way to step through the state machine to simplify the debugging.
Are there any implementations of level 3 cache using VI_hammer protocol?

Regards,
Adarsh

Joel Hestness

unread,
Jan 12, 2016, 12:37:22 PM1/12/16
to Adarsh Patil, gem5-gpu Developers List, Victor Garcia, 陳立展
Hi Adarsh,
  I don't have an implementation of an L3 for any of the protocols. However, if you are trying to debug a protocol implementation, I'd recommend using few cores, small caches, and the ProtocolTrace debug flag (pass --debug-flag=ProtocolTrace to gem5.opt or gem5.debug). This will print out all of the controller transitions in the protocol, so you would be able to track what operations are being applied to which cache lines by which controllers.

  Hope this helps,
  Joel

Reply all
Reply to author
Forward
0 new messages