Sandeep Raman
unread,Oct 23, 2025, 6:20:48 AMOct 23Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to stressapptest-discuss
Hi,
I’ve been running stressapptest on two systems and observed a large performance difference. Could you please help me understand what factors might contribute to this?
System details:
Sys1: 288 cores, 1 TB memory, 16 memory channels, 2 NUMA nodes
Sys2: 128 cores, 512 GB memory, 16 memory channels, 2 NUMA nodes
Test setup:
stressapptest command: stressapptest -m 288 -C 10 -s 200 --local_numa
Both systems run the same OS, kernel, stressapptest build and BIOS config
Results:
Sys1: ~212 970 MB/s
Sys2: ~86 230 MB/s
Given that both systems have the same number of memory channels and NUMA configuration, I was expecting the difference to be smaller if not equal.
Could you please clarify:
a. How does stressapptest determine or bind thread affinity (especially with --local_numa)?
b. Does the measured bandwidth scale primarily with total memory size, number of threads, or CPU/memory topology?
c. Could differences in memory frequency or CPU architecture (e.g., core generation, SMT, cache design) explain this delta?
d. Any recommended options to make the comparison between systems more topology-independent?
Thanks.