finding memory interference in heteregenious system

Sridhar Gunnam

unread,

Apr 23, 2017, 12:28:09 AM4/23/17

to gem5-gpu Developers List

Hi Experts,

I am trying to measure memory interference on CPU+GPU heterogeneous systems. I modified the gem5-gpu/configs/se_fusion.py to assign multiple workloads to different cpu's. But I have few questions.

1) How does the simulator know which part of the code to be run on CPU and which part to be run on GPU? I am asking this as even "hello world" program shows gpu related stats in stats.txt

2) Can I pin one application to run on CPU+GPU(rodinia benchmark) and another to run only on CPU(spec benchmark)? I am attaching the se_fusion.py file that I modified for your reference.

Here is the snippet of stats.txt when I ran a rodinia benchmark and helloworld together.
./build/X86_VI_hammer_GPU/gem5.opt -d m5out/temp_bkprop_hello/ ../gem5-gpu/configs/se_fusion_modified.py -c "/home/sgunnam1/cse591/apr7/gem5-gpu/gem5-gpu/benchmarks/rodinia/backprop/gem5_fusion_backprop;/home/sgunnam1/cse591/apr7/gem5-gpu/gem5/tests/test-progs/hello/bin/x86/linux/hello" -o "16; " --num-cpus=2

system.ruby.phys_mem.bytes_read::cpu0.inst      8362080                       # Number of bytes read from this memory
system.ruby.phys_mem.bytes_read::cpu0.data      1083778                       # Number of bytes read from this memory
system.ruby.phys_mem.bytes_read::cpu1.inst        62736                       # Number of bytes read from this memory
system.ruby.phys_mem.bytes_read::cpu1.data         7284                       # Number of bytes read from this memory
system.ruby.phys_mem.bytes_read::gpu.ce          4892                       # Number of bytes read from this memory
system.ruby.phys_mem.bytes_read::gpu.shader_cores01.data         1760                       # Number of bytes read from this memory
system.ruby.phys_mem.bytes_read::gpu.shader_cores01.inst          112                       # Number of bytes read from this memory
system.ruby.phys_mem.bytes_read::gpu.shader_cores02.data         7584                       # Number of bytes read from this memory
system.ruby.phys_mem.bytes_read::gpu.shader_cores02.inst          112                       # Number of bytes read from this memory

Thanks,
Sridhar Gunnam

se_fusion_modified.py

Sridhar Gunnam

unread,

Apr 23, 2017, 3:23:15 PM4/23/17

to gem5-gpu Developers List

Hi All,

Please ignore the first question in the previous mail. I figured out that gpu.shader_cores01 and gpu.shader_cores02 are stats related to GPU benchmark and not "hello world" program.

So my questions are:

1) Is there a way to run 1 rodonia-omp + 1 rodonia applications simultaneously in system emulation mode?

When I ran one rodinia + one rodinia(OpenMP) benchmarks simultaneously, I get the following error.

warn: Sockets disabled, not accepting gdb connections
**** REAL SIMULATION ****
info: Entering event queue @ 0. Starting simulation...
fatal: syscall set_tid_address (#218) unimplemented.
@ tick 7001500
[unimplementedFunc:build/X86_VI_hammer_GPU/sim/syscall_emul.cc, line 91]
Memory Usage: 4622632 KBytes
Program aborted at cycle 7001500
Aborted (core dumped)

One of the archived mails suggest that gem5 doesn't implement multithreading system calls in syscall emulation mode, and set_tid_address is one of those calls. It suggests that in order to run these benchmarks with CPU multithreading, we need to run gem5 or gem5-gpu in full-system mode.

2) Can you give me some pointers(files that I need to modify) for designing memory scheduler in heterogeneous systems? I would like to implement "Staged Memory Access Scheduling - Multu et al. ISCA 2012". For this I need to get the source ID of memory request( CPU or GPU), create different memory request buffers and also implement scheduling schemes. But I am not able to figure out which files I need to modify. ( I am a newbie to coding and the simulator)

Thank you for your help in advance.

Regards,
Sridhar Gunnam

Joel Hestness

unread,

Apr 23, 2017, 4:42:34 PM4/23/17

to Sridhar Gunnam, gem5-gpu Developers List

Hi Sridhar,

Answers in-lined below:

1) Is there a way to run 1 rodonia-omp + 1 rodonia applications simultaneously in system emulation mode?

When I ran one rodinia + one rodinia(OpenMP) benchmarks simultaneously, I get the following error.

warn: Sockets disabled, not accepting gdb connections
**** REAL SIMULATION ****
info: Entering event queue @ 0. Starting simulation...
fatal: syscall set_tid_address (#218) unimplemented.
@ tick 7001500
[unimplementedFunc:build/X86_VI_hammer_GPU/sim/syscall_emul.cc, line 91]
Memory Usage: 4622632 KBytes
Program aborted at cycle 7001500
Aborted (core dumped)

One of the archived mails suggest that gem5 doesn't implement multithreading system calls in syscall emulation mode, and set_tid_address is one of those calls. It suggests that in order to run these benchmarks with CPU multithreading, we need to run gem5 or gem5-gpu in full-system mode.

Currently, I believe it is only possible to run pthreads applications in syscall emulation (SE) mode, because other multithreading libraries use more unimplemented syscalls. You can run pthreads applications by linking the benchmark against gem5's pthreads library in https://github.com/gem5/m5threads. OpenMP probably requires more unimplemented systems calls, so you would need to port rodinia-omp benchmarks to use pthreads. I would expect that that would be fairly straightforward, since OpenMP is generally used for embarassingly parallel portions of the benchmarks.

As you noted, the alternative option is to run in FS mode.

2) Can you give me some pointers(files that I need to modify) for designing memory scheduler in heterogeneous systems? I would like to implement "Staged Memory Access Scheduling - Multu et al. ISCA 2012". For this I need to get the source ID of memory request( CPU or GPU), create different memory request buffers and also implement scheduling schemes. But I am not able to figure out which files I need to modify. ( I am a newbie to coding and the simulator)

I had previously implemented something like Staged Memory Access Scheduling, and it was fairly straightforward. You'll need to get source IDs by following instructions in prior email threads (e.g. https://groups.google.com/forum/#!topic/gem5-gpu-dev/RubjuGNZ2fc, and https://groups.google.com/forum/#!searchin/gem5-gpu-dev/gpu$20cpu%7Csort:relevance/gem5-gpu-dev/25dw-GTHd9E/XCuOmO7bWTcJ).

To implement the SMS memory controller buffering, you'll need to modify the memory controller source code that you're using. I'd recommend using the DRAMCtrl in src/mem/dram_ctrl.cc (http://www.gem5.org/docs/html/classDRAMCtrl.html). There, you will need to reorganize the read and write queues to prioritize CPU or GPU accesses as desired.

Hope this helps!

Joel

On Saturday, April 22, 2017 at 9:28:09 PM UTC-7, Sridhar Gunnam wrote:
Hi Experts,

I am trying to measure memory interference on CPU+GPU heterogeneous systems. I modified the gem5-gpu/configs/se_fusion.py to assign multiple workloads to different cpu's. But I have few questions.

1) How does the simulator know which part of the code to be run on CPU and which part to be run on GPU? I am asking this as even "hello world" program shows gpu related stats in stats.txt

2) Can I pin one application to run on CPU+GPU(rodinia benchmark) and another to run only on CPU(spec benchmark)? I am attaching the se_fusion.py file that I modified for your reference.

Here is the snippet of stats.txt when I ran a rodinia benchmark and helloworld together.
./build/X86_VI_hammer_GPU/gem5.opt -d m5out/temp_bkprop_hello/ ../gem5-gpu/configs/se_fusion_modified.py -c "/home/sgunnam1/cse591/apr7/gem5-gpu/gem5-gpu/benchmarks/rodinia/backprop/gem5_fusion_backprop;/home/sgunnam1/cse591/apr7/gem5-gpu/gem5/tests/test-progs/hello/bin/x86/linux/hello" -o "16; " --num-cpus=2

system.ruby.phys_mem.bytes_read::cpu0.inst      8362080                       # Number of bytes read from this memory
system.ruby.phys_mem.bytes_read::cpu0.data      1083778                       # Number of bytes read from this memory
system.ruby.phys_mem.bytes_read::cpu1.inst        62736                       # Number of bytes read from this memory
system.ruby.phys_mem.bytes_read::cpu1.data         7284                       # Number of bytes read from this memory
system.ruby.phys_mem.bytes_read::gpu.ce          4892                       # Number of bytes read from this memory
system.ruby.phys_mem.bytes_read::gpu.shader_cores01.data         1760                       # Number of bytes read from this memory
system.ruby.phys_mem.bytes_read::gpu.shader_cores01.inst          112                       # Number of bytes read from this memory
system.ruby.phys_mem.bytes_read::gpu.shader_cores02.data         7584                       # Number of bytes read from this memory
system.ruby.phys_mem.bytes_read::gpu.shader_cores02.inst          112                       # Number of bytes read from this memory

Thanks,
Sridhar Gunnam

--

Joel Hestness
http://pages.cs.wisc.edu/~hestness/

http://3daystartup.org

Sridhar Gunnam

unread,

Apr 23, 2017, 4:50:46 PM4/23/17

to gem5-gpu Developers List, sridha...@gmail.com

Hi Joel,

Thank you for the response. I will work based on the pointers you suggested.

Regards,
Sridhar Gunnam

Feifei Qiao

unread,

Dec 6, 2017, 8:44:10 AM12/6/17

to gem5-gpu Developers List

Hi Sridhar,

I am doing the same thing with you, I also want to get the source ID。Could you tell me how to get the source ID and which files should modify? I have

read some method in prior email threads , but I am a very newer for gem5-gpu. I don't know how to do!

Could you help me?

Thank you very much!

Feifei Qiao

lucky

unread,

Jun 30, 2020, 9:40:24 PM6/30/20

to gem5-gpu Developers List

Hello, I also want to implement "Staged Memory Access Scheduling - Multu et al. ISCA 2012". Would you mind provide the detailed steps to do that?

在 2017年4月23日星期日 UTC+8下午12:28:09，Sridhar Gunnam写道：

lucky

unread,

Jul 1, 2020, 6:23:44 AM7/1/20

to gem5-gpu Developers List

Hello:

Do you remember how to set coreID to each Packet, and pass coreID of Packet to coreID of RubyRequest ?

I have only add req(pkt->req->coreType), req(pkt->req->coreID) like this, but I think But I think there is more to do。Looking forward to your reply.