Low performance on my workstation

43 views
Skip to first unread message

edwi...@gmail.com

unread,
Feb 14, 2017, 7:39:16 AM2/14/17
to LibBi Users
Hi,

At work I have a workstation with 24 cores, but weirdly enough sampling performs much worse than my macbook pro with 8 cores (both run linux). I know from previous experience that the macbook pro performs slightly better than the workstation on a per core basis, but the workstation should still outperform it due to the higher number of cores. Instead it seems to run about 5 times slower. CPU usage is 100% in both cases. 

Anyone experienced similar problems? Any tips on possible ways to improve this? Can I tweak the compile time flags?

Workstation CPU:
model name      : Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz

Cheers, Edwin

Lawrence Murray

unread,
Feb 14, 2017, 7:48:05 AM2/14/17
to libbi...@googlegroups.com

Hi Edwin,

A few things to consider. Firstly, more cores is not always better if there is insufficient work to distribute between them, given that there is always some overhead with increasing the number of cores. You may find that, even if you have 24 cores, using only 12 of them, say, is still faster. Or, use the opportunity of having more cores to run with more particles, if that's something that will help you.

Secondly, do you have 24 physical cores or has this been doubled by hyperthreading? In my experience, it's worth setting the number of threads to the number of physical cores, which is half the number of cores reported by Linux if you have hyperthreading enabled.

Thirdly, make sure you have the right command-line options set if you're at the point where performance matters. Good rule of thumb:

--disable-assert --enable-avx --nthreads 12

Adjust the number of threads accordingly, and for an older CPU, use --enable-sse instead of --enable-avx.

It can take a little tuning to get the optimal configuration. Cache sizes and memory speed can matter a lot too, depending on the model.

Finally, if you have the Intel C++ compiler available, use it. It's threading overhead seems significantly less than that of gcc.

Cheers,

Lawrence

--
You received this message because you are subscribed to the Google Groups "LibBi Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to libbi-users...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Sebastian Funk

unread,
Feb 14, 2017, 8:46:42 AM2/14/17
to libbi...@googlegroups.com, Lawrence Murray
On that note and just out of interest, if `nthreads>1` but `nparticles==1` does any parallelisation occur?
_____________________________
From: Lawrence Murray <lawrenc...@it.uu.se>
Sent: Tuesday, February 14, 2017 12:48
Subject: Re: [LibBi] Low performance on my workstation
To: <libbi...@googlegroups.com>

edwi...@gmail.com

unread,
Feb 14, 2017, 9:00:46 AM2/14/17
to LibBi Users, lawrenc...@it.uu.se
Reducing the number of threads to 12 seems to have done the trick.


On Tuesday, 14 February 2017 13:46:42 UTC, Sebastian Funk wrote:
On that note and just out of interest, if `nthreads>1` but `nparticles==1` does any parallelisation occur?

Yes it does.
 
BW, Edwin

Lawrence Murray

unread,
Feb 14, 2017, 11:15:13 AM2/14/17
to libbi...@googlegroups.com

On that note and just out of interest, if `nthreads>1` but `nparticles==1` does any parallelisation occur?

Yes it does.
 

LibBi does still attempt to parallelise by running multiple actions together when they will not interfere with each other.
Reply all
Reply to author
Forward
0 new messages