Hi guys,
Sure. I've used the following command line parameters to get in the ballpark of an NVIDIA GTX580:
--total-mem-size=3GB --num-l2caches=4 --num-dirs=4 --mem_ctl_latency=61 --mem_freq=3006MHz --membus_busy_cycles=8 --membank_busy_time=32ns --sc_l1_size=24kB --sc_l2_size=192kB
This configures the total memory size as 3GB striped across 4 memory controllers with latencies comparable to what you'd find with GDDR5 (i.e. including the deep 8n prefetch buffering and bank latency for a close-page policy). This also sets up the caches to be close to GTX580 partitioned with small L1D caches: 16kB data cache, but VI_hammer doesn't split the L1s, so I added 8kB for instruction. The memory bandwidth is 3006GHz * 8B/channel * dual-data rate (2) * 4 memory controllers = 179.17GB/s (i.e. exactly the same bandwidth as GTX580, but with 2 fewer memory controllers, so the frequency needs to be 1.5x higher).
In practice, the memory scheduling policy of the GTX580 is better than FR-FCFS, so the effective bandwidth for memory intensive applications can be about 5% less than actual NVIDIA hardware, but it tends to be very close.
Disclaimer: I always strongly recommend running short validation tests for L2 and memory bandwidth when choosing a new system configuration. It's pretty easy to miss a command line parameter, which can cause the memory performance to be way off. Though it's a little tricky to set up initially, the microbenchmark in benchmarks/unittests/global_reads is pretty handy for this testing.
Hope this helps,
Joel