estimation time

16 views
Skip to first unread message

Hisham Haydar

unread,
Jun 9, 2024, 9:56:23 AMJun 9
to Biogeme
Dear Prof. Bierlaire , 
My name is Hisham Haydar I recently started my PhD and working on estimating preferences in a labour supply where my assumption is that Labour supply follows a discrete choice ( similar to van soest (1995)  ) I am facing a problerm of huge time consumption in the parameter estimation using biogeme, it is taking 5-6 hours to estimate 60 parameters ( since i am including interactions for observed heterogeneity ) I assume only 4 alternatives even without unobserved heterogeneity and SMLE every estimation takes 5-6 hours I profiled the code and find that most of the time is consumed at self the function  calculateLikelihoodAndDerivatives as you can see below almost 100% of these hours solving this function, on self.theC.calculateLikelihoodAndDerivatives, do you have a suggestion how I can shorten the time ? I tried my best didnt work like on stata it takes less than 15 min for the same specification . I am still new in the domain so please excuse me if I am asking nonsense questions. 
    1692/2    0.001    0.000    0.002    0.001 base_expressions.py:1216(change_init_values)
    846/1    0.001    0.000    0.001    0.001 base_expressions.py:1230(set_estimated_values)
     2538    0.001    0.000    0.001    0.000 base_expressions.py:1367(get_children)
      203 19718.979   97.138 19719.190   97.139 biogeme.py:1025(calculateLikelihoodAndDerivatives)

specifically at  self.theC.calculateLikelihoodAndDerivatives( 1076                                                       raise ValueError(error_msg)
  1077         1         54.0     54.0      0.0          self._prepareDatabaseForFormula()
  1078                                          
  1079         1         52.0     52.0      0.0          g = np.empty(n)
  1080         1         55.0     55.0      0.0          h = np.empty([n, n])
  1081         1         11.0     11.0      0.0          bh = np.empty([n, n])
  1082         2  510175675.0    3e+08    100.0          f, g, h, bh = self.theC.calculateLikelihoodAndDerivatives(

Michel Bierlaire

unread,
Jun 9, 2024, 10:05:41 AMJun 9
to mr.hish...@gmail.com, Michel Bierlaire, Biogeme
It is normal that the software spends most of its time calculating the likelihood. But it is not normal that it takes so long.
One possibility is that the pre-compiled version of the package is not consistent with your hardware. It is likely to happen on Windows. If so, you may want to re-install it from sources (see the instructions on the webpage).

One way to speed up is to cancel the calculation of the second derivatives.
In the biogeme.toml file, set this parameter to 0.0:

[SimpleBounds]
second_derivatives = 0.0

Now, in order to investigate more, I'd need the specification file and the data. You can send them in a zip file in a private email.
> --
> You received this message because you are subscribed to the Google Groups "Biogeme" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to biogeme+u...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/biogeme/72ae532d-562f-49fe-b100-57f226042904n%40googlegroups.com.

Michel Bierlaire
Transport and Mobility Laboratory
School of Architecture, Civil and Environmental Engineering
EPFL - Ecole Polytechnique Fédérale de Lausanne
http://transp-or.epfl.ch
http://people.epfl.ch/michel.bierlaire

Michel Bierlaire

unread,
Jun 10, 2024, 5:03:07 AMJun 10
to mr.hish...@gmail.com, Michel Bierlaire, Biogeme
Hisham,

Your code on my laptop computer took 30 minutes to run.
Then, I have made the following modifications, and was able to estimate the model in less than 3 minutes on my laptop.

#alpha_sum = alpha_common + sum(alpha_dict[level] * Variable(level) for level in alpha_dict)
alpha_sum = bioMultSum([alpha_common] + [beta_dict[level] * Variable(level) for level in beta_dict] )
#beta_sum = beta_common + sum(beta_dict[level] * Variable(level) for level in beta_dict)
beta_sum = bioMultSum([beta_common] + [beta_dict[level] * Variable(level) for level in beta_dict])

Some advices:

- Avoid using the native "sum" operator of Python. Use instead the bioMultSum of Biogeme, that is optimized for fast calculation.
- Do not run your model in a notebook. I ran the model directly from Python. Estimation time 2min56. Then I ran it from the notebook: 3min45.
- Rescale your data. The level of magnitude of the final parameters range from 10^-4 to 10^2. This generates numerical issues that slow down the algorithm, and make the estimation less robust.
- Do not use Windows for complex models. Prefer linux or macosx.

I hope this helps.

Michel
Reply all
Reply to author
Forward
0 new messages