Modification of V100 Configuration - Increase in L2 Cache Capacity

101 views

Skip to first unread message

yjiwon

<yjiwon95@gmail.com>

unread,

Nov 29, 2022, 1:41:44 PM11/29/22

to accel-sim

Hi all,

I am trying to conduct research on various configuration cases by varying the increasing the capacity of L2 Cache Memory, based on QV100 Volta Architecture and measure its performance gain for certain workloads.

In the provided QV100 gpgpusim.config file, the L2 stats are as below.

# high level architecture configuration
-gpgpu_n_clusters 80
-gpgpu_n_cores_per_cluster 1
-gpgpu_n_mem 32
-gpgpu_n_sub_partition_per_mchannel 2
-gpgpu_clock_gated_lanes 1

# 32 sets, each 128 bytes 24-way for each memory sub partition (96 KB per memory sub partition). This gives us 6MB L2 cache

-gpgpu_cache:dl2 S:32:128:24,L:B:m:L:P,A:192:4,32:0,32
-gpgpu_cache:dl2_texture_only 0
-gpgpu_dram_partition_queues 64:64:64:64
-gpgpu_perf_sim_memcpy 1
-gpgpu_memory_partition_indexing 2

I tried several cases by doubling the number of sets(32), bytes(128), and way(24) (S:32:128:24) and n_mem, to figure out which is directly correlated to the increase in total capacity of L2 Cache.

The evaluations were conducted in SASS mode, with the rodinia traces provided.

However, evaluation results show almost no difference in collected stats(IPC), when the cache capacity is doubled, and quadrupled..and so on.

My questions are as follows.

(1) Is there a fundamental error to my approach in increasing the amount of L2 cache capacity? May I be provided with some additional information?

(2) Are the benchmark traces from rodinia not sensitive to cache capacity? Is there a way to calculate how sensitive a certain benchmark is sensitive to cache capacity? If there is, which workloads are capable of benefiting from an abundant amount of cache?

Thank you for your help,

Regards,

Jiwon.

Junrui Pan

<panjunrui100@gmail.com>

unread,

Dec 2, 2022, 12:54:58 PM12/2/22

to accel-sim

Hi,

(1) your approach is generally correct. The capacity is calculated as follows:

In Volta, each memory controller is sub-partitioned into two partitions. Each partition has one slice of L2. Each slice of L2 is 32-set, 24-way, and each cache line is 128 bytes. This means each L2 slice is 32*24*128 = 96KB. 96KB/slice * 2 partitions/memory controller * 32 memory controller = 6MB.

So you can increase any of the numbers to achieve double L2 capacity. However, you need to be careful. Because your changes would also have other effects. If you double the memory controllers, then you would also double the DRAM (not 100% sure, actually. But you get the idea).

So if you want to change L2 without changing anything else, just change set-associate. Even doubling the cache line size could make differences that you did not intend to.

(2) Rodinia are very small benchmarks considering modern GPUs. If you are using Rodinia-2.1, it may not even consume the entire L2. Also if your workload has very good L1 locality then it would also not benefit from L2 very much. Because only miss at L1 would send a load to L2 and you are only having very limited miss.

Look at some app that has big working set data, and poor L1 hit rates. These would probably benefit more from a bigger L2.

Thanks,

Junrui

Reply all

Reply to author

Forward

0 new messages