I am trying to conduct research on various configuration cases by varying the increasing the capacity of L2 Cache Memory, based on QV100 Volta Architecture and measure its performance gain for certain workloads.
In the provided QV100 gpgpusim.config file, the L2 stats are as below.
# high level architecture configuration
-gpgpu_n_clusters 80
-gpgpu_n_cores_per_cluster 1
-gpgpu_n_mem 32
-gpgpu_n_sub_partition_per_mchannel 2
-gpgpu_clock_gated_lanes 1
# 32 sets, each 128 bytes 24-way for each memory sub partition (96 KB per memory sub partition). This gives us 6MB L2 cache
-gpgpu_cache:dl2 S:32:128:24,L:B:m:L:P,A:192:4,32:0,32
-gpgpu_cache:dl2_texture_only 0
-gpgpu_dram_partition_queues 64:64:64:64
-gpgpu_perf_sim_memcpy 1
-gpgpu_memory_partition_indexing 2
I tried several cases by doubling the number of sets(32), bytes(128), and way(24) (S:32:128:24) and n_mem, to figure out which is directly correlated to the increase in total capacity of L2 Cache.
The evaluations were conducted in SASS mode, with the rodinia traces provided.
However, evaluation results show almost no difference in collected stats(IPC), when the cache capacity is doubled, and quadrupled..and so on.
My questions are as follows.
(1) Is there a fundamental error to my approach in increasing the amount of L2 cache capacity? May I be provided with some additional information?
(2) Are the benchmark traces from rodinia not sensitive to cache capacity? Is there a way to calculate how sensitive a certain benchmark is sensitive to cache capacity? If there is, which workloads are capable of benefiting from an abundant amount of cache?
Thank you for your help,
Regards,
Jiwon.