estimation time

Hisham Haydar

unread,

Jun 9, 2024, 9:56:23 AMJun 9

to Biogeme

Dear Prof. Bierlaire ,

My name is Hisham Haydar I recently started my PhD and working on estimating preferences in a labour supply where my assumption is that Labour supply follows a discrete choice ( similar to van soest (1995) ) I am facing a problerm of huge time consumption in the parameter estimation using biogeme, it is taking 5-6 hours to estimate 60 parameters ( since i am including interactions for observed heterogeneity ) I assume only 4 alternatives even without unobserved heterogeneity and SMLE every estimation takes 5-6 hours I profiled the code and find that most of the time is consumed at self the function calculateLikelihoodAndDerivatives as you can see below almost 100% of these hours solving this function, on self.theC.calculateLikelihoodAndDerivatives, do you have a suggestion how I can shorten the time ? I tried my best didnt work like on stata it takes less than 15 min for the same specification . I am still new in the domain so please excuse me if I am asking nonsense questions.
1692/2 0.001 0.000 0.002 0.001 base_expressions.py:1216(change_init_values)

846/1 0.001 0.000 0.001 0.001 base_expressions.py:1230(set_estimated_values)
2538 0.001 0.000 0.001 0.000 base_expressions.py:1367(get_children)
203 19718.979 97.138 19719.190 97.139 biogeme.py:1025(calculateLikelihoodAndDerivatives)

specifically at self.theC.calculateLikelihoodAndDerivatives( 1076 raise ValueError(error_msg)

1077 1 54.0 54.0 0.0 self._prepareDatabaseForFormula()
1078
1079 1 52.0 52.0 0.0 g = np.empty(n)
1080 1 55.0 55.0 0.0 h = np.empty([n, n])
1081 1 11.0 11.0 0.0 bh = np.empty([n, n])
1082 2 510175675.0 3e+08 100.0 f, g, h, bh = self.theC.calculateLikelihoodAndDerivatives(

Michel Bierlaire

unread,

Jun 9, 2024, 10:05:41 AMJun 9

to mr.hish...@gmail.com, Michel Bierlaire, Biogeme

It is normal that the software spends most of its time calculating the likelihood. But it is not normal that it takes so long.
One possibility is that the pre-compiled version of the package is not consistent with your hardware. It is likely to happen on Windows. If so, you may want to re-install it from sources (see the instructions on the webpage).

One way to speed up is to cancel the calculation of the second derivatives.
In the biogeme.toml file, set this parameter to 0.0:

[SimpleBounds]
second_derivatives = 0.0

Now, in order to investigate more, I'd need the specification file and the data. You can send them in a zip file in a private email.

> --
> You received this message because you are subscribed to the Google Groups "Biogeme" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to biogeme+u...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/biogeme/72ae532d-562f-49fe-b100-57f226042904n%40googlegroups.com.

Michel Bierlaire
Transport and Mobility Laboratory
School of Architecture, Civil and Environmental Engineering
EPFL - Ecole Polytechnique Fédérale de Lausanne
http://transp-or.epfl.ch
http://people.epfl.ch/michel.bierlaire

Michel Bierlaire

unread,

Jun 10, 2024, 5:03:07 AMJun 10

to mr.hish...@gmail.com, Michel Bierlaire, Biogeme

Hisham,

Your code on my laptop computer took 30 minutes to run.
Then, I have made the following modifications, and was able to estimate the model in less than 3 minutes on my laptop.

#alpha_sum = alpha_common + sum(alpha_dict[level] * Variable(level) for level in alpha_dict)
alpha_sum = bioMultSum([alpha_common] + [beta_dict[level] * Variable(level) for level in beta_dict] )
#beta_sum = beta_common + sum(beta_dict[level] * Variable(level) for level in beta_dict)
beta_sum = bioMultSum([beta_common] + [beta_dict[level] * Variable(level) for level in beta_dict])

Some advices:

- Avoid using the native "sum" operator of Python. Use instead the bioMultSum of Biogeme, that is optimized for fast calculation.
- Do not run your model in a notebook. I ran the model directly from Python. Estimation time 2min56. Then I ran it from the notebook: 3min45.
- Rescale your data. The level of magnitude of the final parameters range from 10^-4 to 10^2. This generates numerical issues that slow down the algorithm, and make the estimation less robust.
- Do not use Windows for complex models. Prefer linux or macosx.

I hope this helps.

Michel

Reply all

Reply to author

Forward