- When we run this command(as given in the webpage) - " build/VI_hammer/gem5.opt ../gem5-gpu/configs/se_fusion.py -c ../benchmarks/rodinia/backprop/gem5_fusion_backprop -o “16” , which memory model are we using . I assume that in this case - CPU and GPU are using the same DRAM.
- Within the benchmark we have cudaMemcpy calls . I assume that here CPU and GPU - though they are using the same DRAM - they have separate dedicated areas and the cudaMemcpy moves data from CPU’s are to GPU’s and viceversa (while using the above cmd)- Is this assumption correct(are these intra - DRAM copies done using DMA)
- How do we simulate a traditional GPU configuration (which command) , wherein the cudaMemcpy will involve a PCIe DMA transfer
- In addition to DMA transfer, i assume we would also need to configure the dedicated GPU memory as GDDR5 to accurately simulate the traditional scenario - how do we do that.
- I see that within the benchmark suite we have a no copy version of benchmarks - is this conceptually equalent to pinned host model (where in the GPU ignores its dedicated memory and accesses data directly from DRAM)
- where do we get the timing details to compare two models. (In the console output - we couldn't see timing details)