Dear Michel (and forum-members)
We have recently upgrades to PandasBiogeme from PythonBiogeme. We have a calculation machine with a total of 72 threads but only half of those are used during estimation (see first attached screen dump below).
The machine is a 2 CPU machine with 18 cores each and hyperthreading thus a total of 72 threads. It runs Windows Server 2019, and Biogeme (v. 3.2.6) is installed through Anaconda. Using the former version (PythonBiogeme) on the same machine we could utilize all 72 threats.
We have tried:
- Defining numberOfThreads =None as well as setting it to values ranging between 1 and 144, but for all values above 36 only 36 threads are used.
- Checking the documentation for potential settings that could cause this (if I recall correctly there was a setting in PythonBiogeme which defined the share of cores that was used, but it does not seem like the case anymore).- Estimating different models on different datasets, still only half of the threads are used.
- Executed models through both Anaconda Prompt and Jupyter Notebook.
- Checking that python recognizes all 72 threats (os.cpu_count() detects 72 “cores”), and running a stress-test in python (not using the biogeme-package) indeed utilizes all 72 threads.
- Running Biogeme on other calculation machines. We have another 2-CPU machine with 72 threads, which also only utilizes half of these for estimation. We thought maybe only 1 CPU was used or only physical cores were used, but on other calculation machines with 32 threads (with 2 CPUs and hyperthreading) all of them are used during estimation (see second attached screen-dump below).
The model runs fine, but only utilizes half of the threads available.
We have been in dialog with our IT team along the process, and they cannot identify what is causing this either.
Hope you can help.
Thanks,
Mikkel
PS. It would be nice if multithreading could also be enabled during simulation :)
On 24 Nov 2020, at 15:46, Mikkel Thorhauge <mthor...@gmail.com> wrote:
Dear Michel (and forum-members)
We have recently upgrades to PandasBiogeme from PythonBiogeme. We have a calculation machine with a total of 72 threads but only half of those are used during estimation (see first attached screen dump below).
The machine is a 2 CPU machine with 18 cores each and hyperthreading thus a total of 72 threads. It runs Windows Server 2019, and Biogeme (v. 3.2.6) is installed through Anaconda. Using the former version (PythonBiogeme) on the same machine we could utilize all 72 threats.
We have tried:
- Defining numberOfThreads =None as well as setting it to values ranging between 1 and 144, but for all values above 36 only 36 threads are used.
- Checking the documentation for potential settings that could cause this (if I recall correctly there was a setting in PythonBiogeme which defined the share of cores that was used, but it does not seem like the case anymore).
- Estimating different models on different datasets, still only half of the threads are used.
- Executed models through both Anaconda Prompt and Jupyter Notebook.
- Checking that python recognizes all 72 threats (os.cpu_count() detects 72 “cores”), and running a stress-test in python (not using the biogeme-package) indeed utilizes all 72 threads.
- Running Biogeme on other calculation machines. We have another 2-CPU machine with 72 threads, which also only utilizes half of these for estimation. We thought maybe only 1 CPU was used or only physical cores were used, but on other calculation machines with 32 threads (with 2 CPUs and hyperthreading) all of them are used during estimation (see second attached screen-dump below).
The model runs fine, but only utilizes half of the threads available.
We have been in dialog with our IT team along the process, and they cannot identify what is causing this either.
Hope you can help.
Thanks,
Mikkel
PS. It would be nice if multithreading could also be enabled during simulation :)
<B07.PNG>
<B05.PNG>
--
You received this message because you are subscribed to the Google Groups "Biogeme" group.
To unsubscribe from this group and stop receiving emails from it, send an email to biogeme+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/biogeme/9ff5f06b-6263-45f5-8951-1f539e192e5fn%40googlegroups.com.
<B05.PNG><B07.PNG>
On 26 Nov 2020, at 13:22, Mikkel Thorhauge <mthor...@gmail.com> wrote:
Yes, we define the number of threads in the biogeme-object:
biogeme = bio.BIOGEME(database,logprob,numberOfThreads=72, numberOfDraws=1000, seed = 140382)
It would be great to know if you (or someone else) experience the same to figure out if it is a local issue with our machines, and if so, identify what is causing this.
To view this discussion on the web visit https://groups.google.com/d/msgid/biogeme/5739728b-b9d0-4d69-9431-33316c459429n%40googlegroups.com.
On 30 Nov 2020, at 20:18, Mikkel Thorhauge <mthor...@gmail.com> wrote:
Yes, I am pretty sure it worked with all 72 thread in PythonBiogeme. At least we could see full load on all 72 threads using htop during estimation.
Do you have any suggestion of things to try out to identify the cause of this issue? My best idea at this point, is to do a fresh install of Linux and re-install PandasBiogeme to mimic our former setup as much as possible. At least that would determine if the switch to Windows is causing something somewhere in the process.
python -m pip install biogeme-3.2.6.tar.gz
To view this discussion on the web visit https://groups.google.com/d/msgid/biogeme/cfb59b1d-3c1e-47c7-86ac-6d3de63a3a57n%40googlegroups.com.
Hi Michel and all
I just wanted to give everybody a quick update on this in case you are facing similar issues.
We were unsuccessful in compiling the source code in Windows. Instead, we used Hyper-V to virtualize a Linux-environment, and when running models here we can indeed utilize all cores (instead of only half of them) on the machine, so the issue seems to be related to Biogeme not being compiled for Windows (as you mentioned, Michel).
Also, using Hyper-V we lose some performance due to the virtualization, however estimation speed seems to increase tremendously in Linux compared to Windows. What we have seen with our – still – limited testing is that a model which in Windows took approx. 14 days to estimate, now only takes about a day in our Linux virtualization. Of course, we now can utilize all cores, however the reduction in estimation time seems to be significantly greater than what can simply be attributed to the increase in number of usable cores.
Cheers,
Mikkel
To view this discussion on the web visit https://groups.google.com/d/msgid/biogeme/cf07f4d2-55d7-4195-b391-28951bcee44an%40googlegroups.com.