Understanding performance difference between two systems running stressapptest

6 views
Skip to first unread message

Sandeep Raman

unread,
Oct 23, 2025, 6:20:48 AMOct 23
to stressapptest-discuss
Hi,

I’ve been running stressapptest on two systems and observed a large performance difference. Could you please help me understand what factors might contribute to this?

System details:
Sys1: 288 cores, 1 TB memory, 16 memory channels, 2 NUMA nodes
Sys2: 128 cores, 512 GB memory, 16 memory channels, 2 NUMA nodes

Test setup:

stressapptest command: stressapptest -m 288 -C 10 -s 200 --local_numa
Both systems run the same OS, kernel, stressapptest build and BIOS config

Results:

Sys1: ~212 970 MB/s
Sys2: ~86 230 MB/s

Given that both systems have the same number of memory channels and NUMA configuration, I was expecting the difference to be smaller if not equal.

Could you please clarify:

a. How does stressapptest determine or bind thread affinity (especially with --local_numa)?
b. Does the measured bandwidth scale primarily with total memory size, number of threads, or CPU/memory topology?
c. Could differences in memory frequency or CPU architecture (e.g., core generation, SMT, cache design) explain this delta?
d. Any recommended options to make the comparison between systems more topology-independent?

Thanks.
Reply all
Reply to author
Forward
0 new messages