superlinear speedup?

26 views
Skip to first unread message

Xinyan Yang

unread,
Nov 13, 2023, 7:55:04 PM11/13/23
to hoomd-users
Hi,

I did a strong scaling test for the MPI version of HOOMD.4.3.0. The strong efficiency is defined as T1/(N TN)*100%. T1 is the time elapsed when I ran the HPMC simulation with 1CPU. TN is the time cost by N CPUs. The systems are the same. I just changed the number of cores. I expected to see a decreasing trend of the strong efficiency as I increase the number of cores used in the simulation. However, I got an increasing trend, and the strong efficiency is >100%, see the screenshot below. I don't think the system noise could make such a difference. Did HOOMD.4.3.0 use any superlinear speedup?

Screen Shot 2023-11-13 at 6.51.49 PM.png

Best,
Xinyan

Lourens Veen

unread,
Nov 14, 2023, 3:04:04 AM11/14/23
to hoomd...@googlegroups.com
Hi Xinyan,

If this is a small simulation, then you may be seeing a cache effect. As you add more cores, you're also adding more L1 cache, which is fast memory close to each core. If your system is, say, 10x the size of a single core's worth of L1 cache, then with 9 cores almost all of it will fit in L1, while with a single core most of it will be in slower memory, L2 or even L3 probably. The Wikipedia article is pretty good: https://en.wikipedia.org/wiki/Cache_hierarchy

Best,

Lourens


| Lourens Veen | Senior eScience Research Engineer Email: l.v...@esciencecenter.nl |
| Netherlands eScience Center | Science Park 402 | 1098 XH Amsterdam The Netherlands |


From: hoomd...@googlegroups.com <hoomd...@googlegroups.com> on behalf of Xinyan Yang <xinyanya...@gmail.com>
Sent: Tuesday, November 14, 2023 1:55
To: hoomd-users <hoomd...@googlegroups.com>
Subject: [hoomd-users] superlinear speedup?
 
--
You received this message because you are subscribed to the Google Groups "hoomd-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hoomd-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/hoomd-users/c09bbab0-9561-4658-b82e-adae8290d82dn%40googlegroups.com.

Joshua Anderson

unread,
Nov 14, 2023, 7:25:32 AM11/14/23
to hoomd...@googlegroups.com
Xinyan,

Lourens is correct - the size of the system related to the cache size can lead to superlinear speedups as smaller working sets lead to higher cache hit ratios. I have seen this behavior myself in HOOMD.

There is another possible explanation specific to HPMC simulations. Are you measuring the simulation timesteps per second? Or reading the trial moves per second? 

In MPI domain decomposition, HPMC performs fewer trial moves per timestep as you increase the number of domains. Trial moves per second is a better performance metric in HPMC simulations that can be compared across MPI CPU and GPU jobs. I mention this briefly in a note at the end of this howtol: https://hoomd-blue.readthedocs.io/en/v4.3.0/howto/determine-the-most-efficient-device.html
------
Joshua A. Anderson, Ph.D.
Research Area Specialist, Chemical Engineering, University of Michigan

On Nov 14, 2023, at 3:03 AM, Lourens Veen <l.v...@esciencecenter.nl> wrote:

Hi Xinyan,

If this is a small simulation, then you may be seeing a cache effect. As you add more cores, you're also adding more L1 cache, which is fast memory close to each core. If your system is, say, 10x the size of a single core's worth of L1 cache, then with 9 cores almost all of it will fit in L1, while with a single core most of it will be in slower memory, L2 or even L3 probably. The Wikipedia article is pretty good:https://en.wikipedia.org/wiki/Cache_hierarchy

Best,

Lourens


| Lourens Veen | Senior eScience Research Engineer Email: l.v...@esciencecenter.nl |
| Netherlands eScience Center | Science Park 402 | 1098 XH Amsterdam The Netherlands |


From: hoomd...@googlegroups.com <hoomd...@googlegroups.com> on behalf of Xinyan Yang <xinyanya...@gmail.com>
Sent: Tuesday, November 14, 2023 1:55
To: hoomd-users <hoomd...@googlegroups.com>
Subject: [hoomd-users] superlinear speedup?
 
Hi,

I did a strong scaling test for the MPI version of HOOMD.4.3.0. The strong efficiency is defined as T1/(N TN)*100%. T1 is the time elapsed when I ran the HPMC simulation with 1CPU. TNis the time cost by N CPUs. The systems are the same. I just changed the number of cores. I expected to see a decreasing trend of the strong efficiency as I increase the number of cores used in the simulation. However, I got an increasing trend, and the strong efficiency is >100%, see the screenshot below. I don't think the system noise could make such a difference. Did HOOMD.4.3.0 use any superlinear speedup?

<Screen Shot 2023-11-13 at 6.51.49 PM.png>

Best,
Xinyan

-- 
You received this message because you are subscribed to the Google Groups "hoomd-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hoomd-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/hoomd-users/c09bbab0-9561-4658-b82e-adae8290d82dn%40googlegroups.com.

-- 
You received this message because you are subscribed to the Google Groups "hoomd-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hoomd-users...@googlegroups.com.

Xinyan Yang

unread,
Nov 14, 2023, 11:41:50 AM11/14/23
to hoomd-users
I see. I did not measure number os trial moves per second. Will have a look at it. Thank you!

Best,
Xinyan 

Reply all
Reply to author
Forward
0 new messages