Hi Freeman,
1) Yes, currently the cacheline size is fixed at 128 bytes. I don't think it's feasible to have a real system that has different cacheline sizes for the CPU and GPU assuming they are sharing a memory controller. If you assume the GPU is a discrete device, this is possible, but that is not the focus of gem5-gpu.
2) Our current model only supports a unified L1 I/D cache.
3) We fully support scratchpad memory. I believe to configure the size you'll have to modify the GPGPU-Sim config file, though. But I haven't investigated this.
4) When using VI_Hammer coherence protocol there is no L3 cache parameters. Additionally, the L3_size (and other cache size parameters) in gem5 usually refer to the classic cache system, not Ruby. Be careful that the parameters you are setting are actually affecting the system you are simulating.
As far as the topology goes, you can read the VI_hammer config files (gem5-gpu/configs/gpu_protocol/VI_hammer*.py) and there is shows how the topology is created. You can modify these files to create whatever topology you want. We use the cluster topology from gem5 (gem5/configs/topologies/Cluster.py) which has some documentation.