Currently, I am studying hardware simulation. And I am learning about Macsim to see if I can use macsim in my research.
My
target architecture to simulate using macsim is NVIDIA GTX480, so I
have to control some configuration files to specify GTX480.
Below are the settings I tried.
- kernel_info (one of steps to generate GPU trace)
- There
are two kernels in convolutionSeparable that I've used as a benchmark.
To confirm register usage and shared memory usage of each kernel, I used
nvcc by adding option '-cubin -gencode arch=compute_20,code=compute_20 -arch=sm_20 -Xptxas=-v'.
- Each parameter that nvcc shows is
- _Z21convolutionRowsKernelPfS_iii(kernel_name) 29(register_usage) 5184(shared_memory_usage)
- _Z24convolutionColumnsKernelPfS_iii 19 2569
- params.in (configuration file to specify architecture)
- I modified it based on GTX465 params which is included in git as a sample.
- The followings are the modified parts
- num_sim_cores 15
- num_sim_small_cores 15
- clock_gpu 1.4
- clock_mc 1.8
But, since I am not familiar with macsim, I fount out that a simulation
result about cycles is different with real HW cycles measured by NVPROF when I use this configurations.
So, I have some questions.