PandasBiogeme only using half of cores/threads

190 views
Skip to first unread message

Mikkel Thorhauge

unread,
Nov 24, 2020, 9:47:37 AM11/24/20
to Biogeme

Dear Michel (and forum-members)

We have recently upgrades to PandasBiogeme from PythonBiogeme. We have a calculation machine with a total of 72 threads but only half of those are used during estimation (see first attached screen dump below).

The machine is a 2 CPU machine with 18 cores each and hyperthreading thus a total of 72 threads. It runs Windows Server 2019, and Biogeme (v. 3.2.6) is installed through Anaconda. Using the former version (PythonBiogeme) on the same machine we could utilize all 72 threats.

 

We have tried:

- Defining numberOfThreads =None as well as setting it to values ranging between 1 and 144, but for all values above 36 only 36 threads are used.

- Checking the documentation for potential settings that could cause this (if I recall correctly there was a setting in PythonBiogeme which defined the share of cores that was used, but it does not seem like the case anymore).

- Estimating different models on different datasets, still only half of the threads are used.

- Executed models through both Anaconda Prompt and Jupyter Notebook.

- Checking that python recognizes all 72 threats (os.cpu_count() detects 72 “cores”), and running a stress-test in python (not using the biogeme-package) indeed utilizes all 72 threads.

- Running Biogeme on other calculation machines. We have another 2-CPU machine with 72 threads, which also only utilizes half of these for estimation. We thought maybe only 1 CPU was used or only physical cores were used, but on other calculation machines with 32 threads (with 2 CPUs and hyperthreading) all of them are used during estimation (see second attached screen-dump below).


The model runs fine, but only utilizes half of the threads available.

We have been in dialog with our IT team along the process, and they cannot identify what is causing this either.

Hope you can help.

Thanks,

Mikkel

 

PS. It would be nice if multithreading could also be enabled during simulation :)


B07.PNG

B05.PNG

Bierlaire Michel

unread,
Nov 25, 2020, 1:29:40 PM11/25/20
to mthor...@gmail.com, Bierlaire Michel, Biogeme
Dear Mikkel,


On 24 Nov 2020, at 15:46, Mikkel Thorhauge <mthor...@gmail.com> wrote:

Dear Michel (and forum-members)


We have recently upgrades to PandasBiogeme from PythonBiogeme. We have a calculation machine with a total of 72 threads but only half of those are used during estimation (see first attached screen dump below).

The machine is a 2 CPU machine with 18 cores each and hyperthreading thus a total of 72 threads. It runs Windows Server 2019, and Biogeme (v. 3.2.6) is installed through Anaconda. Using the former version (PythonBiogeme) on the same machine we could utilize all 72 threats.


The implementation of multi-threading is identical in the new biogeme, as it is implemented in C++. 

 

We have tried:

- Defining numberOfThreads =None as well as setting it to values ranging between 1 and 144, but for all values above 36 only 36 threads are used.


This is strange. There is no upper bound set on the number of threads. At least, no explicit one that i am aware of. 
Note that I have never used more than 24 threads myself. But I’ll try.

- Checking the documentation for potential settings that could cause this (if I recall correctly there was a setting in PythonBiogeme which defined the share of cores that was used, but it does not seem like the case anymore).

Here, if you put “None”, it uses the number returned by multiprocessing.cpu_count(). If you provide a value, it uses the number provided. 

- Estimating different models on different datasets, still only half of the threads are used.

- Executed models through both Anaconda Prompt and Jupyter Notebook.

- Checking that python recognizes all 72 threats (os.cpu_count() detects 72 “cores”), and running a stress-test in python (not using the biogeme-package) indeed utilizes all 72 threads.

- Running Biogeme on other calculation machines. We have another 2-CPU machine with 72 threads, which also only utilizes half of these for estimation. We thought maybe only 1 CPU was used or only physical cores were used, but on other calculation machines with 32 threads (with 2 CPUs and hyperthreading) all of them are used during estimation (see second attached screen-dump below).


You define “numberOfThreads” at the creation of the Biogeme object, right? 



The model runs fine, but only utilizes half of the threads available.

We have been in dialog with our IT team along the process, and they cannot identify what is causing this either.

Hope you can help.

Thanks,

Mikkel

 

PS. It would be nice if multithreading could also be enabled during simulation :)


It is on my to do list. But, I have negative time available for the moment ;-)



<B07.PNG>

<B05.PNG>


--
You received this message because you are subscribed to the Google Groups "Biogeme" group.
To unsubscribe from this group and stop receiving emails from it, send an email to biogeme+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/biogeme/9ff5f06b-6263-45f5-8951-1f539e192e5fn%40googlegroups.com.
<B05.PNG><B07.PNG>

Mikkel Thorhauge

unread,
Nov 26, 2020, 7:25:03 AM11/26/20
to Biogeme
Yes, we define the number of threads in the biogeme-object:
biogeme  = bio.BIOGEME(database,logprob,numberOfThreads=72, numberOfDraws=1000, seed = 140382)

It would be great to know if you (or someone else) experience the same to figure out if it is a local issue with our machines, and if so, identify what is causing this.

Thanks,
Mikkel

Bierlaire Michel

unread,
Nov 30, 2020, 12:18:53 PM11/30/20
to mthor...@gmail.com, Bierlaire Michel, Biogeme

On 26 Nov 2020, at 13:22, Mikkel Thorhauge <mthor...@gmail.com> wrote:

Yes, we define the number of threads in the biogeme-object:
biogeme  = bio.BIOGEME(database,logprob,numberOfThreads=72, numberOfDraws=1000, seed = 140382)

That’s indeed the correct way to do it 


It would be great to know if you (or someone else) experience the same to figure out if it is a local issue with our machines, and if so, identify what is causing this.

Well, I am afraid I cannot help you directly. 
What is strange is that it was working with PythonBiogeme. Are you actually sure that it was working before? 
I am using the exact same multithreading code in C++. 

Best,

Michel


Mikkel Thorhauge

unread,
Dec 1, 2020, 3:23:33 AM12/1/20
to Biogeme
Yes, I am pretty sure it worked with all 72 thread in PythonBiogeme. At least we could see full load on all 72 threads using htop during estimation.

Do you have any suggestion of things to try out to identify the cause of this issue? My best idea at this point, is to do a fresh install of Linux and re-install PandasBiogeme to mimic our former setup as much as possible. At least that would determine if the switch to Windows is causing something somewhere in the process.

Best,
Mikkel

Bierlaire Michel

unread,
Dec 1, 2020, 3:56:12 AM12/1/20
to mthor...@gmail.com, Bierlaire Michel, Biogeme

On 30 Nov 2020, at 20:18, Mikkel Thorhauge <mthor...@gmail.com> wrote:

Yes, I am pretty sure it worked with all 72 thread in PythonBiogeme. At least we could see full load on all 72 threads using htop during estimation.

Do you have any suggestion of things to try out to identify the cause of this issue? My best idea at this point, is to do a fresh install of Linux and re-install PandasBiogeme to mimic our former setup as much as possible. At least that would determine if the switch to Windows is causing something somewhere in the process.

If I understand correctly, you are running Windows. it means that you have installed the binaries that are distributed with Biogeme. These binaries have been generated on a virtual machine running on my mac. It may be the reason of the problem. 
I would suggest that you reinstall directly from the source code.

python -m pip install biogeme-3.2.6.tar.gz

Note that it is easy to do on Linux, but difficult to do on Windows.

Michel


Mikkel Thorhauge

unread,
Dec 2, 2020, 1:22:27 AM12/2/20
to Biogeme
Ok, will try to re-install from the source code once the machine has an idle period.

Will give an update on how it goes :)

Thanks,
Mikkel

Mikkel Thorhauge

unread,
Feb 26, 2021, 4:26:19 AM2/26/21
to Biogeme

Hi Michel and all

I just wanted to give everybody a quick update on this in case you are facing similar issues.

 

We were unsuccessful in compiling the source code in Windows. Instead, we used Hyper-V to virtualize a Linux-environment, and when running models here we can indeed utilize all cores (instead of only half of them) on the machine, so the issue seems to be related to Biogeme not being compiled for Windows (as you mentioned, Michel).

Also, using Hyper-V we lose some performance due to the virtualization, however estimation speed seems to increase tremendously in Linux compared to Windows. What we have seen with our – still – limited testing is that a model which in Windows took approx. 14 days to estimate, now only takes about a day in our Linux virtualization. Of course, we now can utilize all cores, however the reduction in estimation time seems to be significantly greater than what can simply be attributed to the increase in number of usable cores.

 

Cheers,

Mikkel

Bierlaire Michel

unread,
Feb 27, 2021, 3:34:01 AM2/27/21
to mthor...@gmail.com, Bierlaire Michel, Biogeme
Dear Mikkel,

Thank you for the update. 

I have reached the conclusion already a while ago that Windows should be avoided by all means if you want to do serious stuff. 
I am running small models on my mac, and large models on a linux server, without any problem.

Cheers,

Michel



Reply all
Reply to author
Forward
0 new messages