1. Doesn't it affect checkpointing to add shared L3 cache? (By modifying Ruby SLICC files)
2. Following Gem5 page, MOESI_hammer has private L1/L2 cache. Then MOESI_hammer protocol does not share data through cache? Only through main memory?(e.g. GPU read data from CPU L2 cache and vice versa.)
3. If shared L3 cache makes problems with checkpointing, is it also matter in SE mode?
4. I would like to run in SE mode in some configurations for fast simulation. Is it possible to run multiple benchmarks together in SE mode?(e.g. 1GPGPU apps + 4CPU apps, 1GPGPU apps + 2CPU apps and something like that)
4. I would like to run in SE mode in some configurations for fast simulation. Is it possible to run multiple benchmarks together in SE mode?(e.g. 1GPGPU apps + 4CPU apps, 1GPGPU apps + 2CPU apps and something like that)Yes, this is possible, though tricky. You'd likely need modify the se_fusion.py configuration script to specify the workload to be run on each CPU core. See the get_processes() function in gem5/configs/example/se.py for insights on how to make that work. Also note that if you want these applications to interact in a particular way, you'll need to make sure that they are timed such that the parts of the application for which you're interested in observing concurrency overlap.
For the shared L3 cache I believe there are 4 possible methods (in order from simplest to most complex):1) Use the MESI_three_level protocol.
2) Modify the MOESI_hammer directory to have a CacheMemory object which acts as a memory-side LLC cache. Note: this was the method I used in the HSC paper.3) Create a simple cache that is a MemObject in gem5. Using the latest gem5/gem5-gpu you can hook any MemObject between Ruby and the memory controller. Again, this would be a memory-side LLC.
4) Create a new Ruby protocol or modify an existing protocol to have three levels of caches.Other than MESI_three_level, I'm not aware of anyone with a publically available model for a three-level protocol.For running CPU and GPU applications simultaneously, Yes, you would need 5 CPU cores in your case.
Hi Jason,Thanks for a quick reply.For the shared L3 cache I believe there are 4 possible methods (in order from simplest to most complex):1) Use the MESI_three_level protocol.So, from my understanding, to achieve this, I need to do the following. (Please correct me if I'm wrong or missing any step)1. Copy MESI_three_level.py from /gem5/configs/ruby/ to /gem5-gpu/configs/gpu-protocol and modify it accordingly.2. Create a new X86_MESI_three_level_GPU build configuration in /gem5-gpu/build_opts/.3. Modify se_fusion.py (and GPUConfig.py, GPUMemConfig.py, etc.) in /gem5-gpu/configs/ as needed and use it for further simulation.
2) Modify the MOESI_hammer directory to have a CacheMemory object which acts as a memory-side LLC cache. Note: this was the method I used in the HSC paper.3) Create a simple cache that is a MemObject in gem5. Using the latest gem5/gem5-gpu you can hook any MemObject between Ruby and the memory controller. Again, this would be a memory-side LLC.So, am I correct when I say that the only difference between Option 1, and Option 2,3 is the levels at which cache-coherence protocol acts?
4) Create a new Ruby protocol or modify an existing protocol to have three levels of caches.Other than MESI_three_level, I'm not aware of anyone with a publically available model for a three-level protocol.For running CPU and GPU applications simultaneously, Yes, you would need 5 CPU cores in your case.I'm interested in understanding the interference when the L3 cache is shared by CPU and GPGPU. So, is there a way I can "bypass" cache for requests from CPU-5?Can something similar be achieved as attaching the CPU-5 directly to main memory?
Jason
Jason