Dear Prof. Bierlaire,
I appear to have a numerical issue with the estimation of the standard deviation of sigma for ll.loglikelihoodregression models.
My questions are as follows:
1. Do you have any idea what the root cause of the below described issue is?
2. Do you have any tips for preventing such numerical issues?
3. Would you recommend ignoring the issue by e.g. ignoring the weird standard deviation, only using robust standard deviations, and/or only using the scipy optimization algorithm?
The issue:
I have built a model of monetary expenditures which combines an estimation of the total household budget and of the fraction of this budget for each expenditure category.
In addition, I have built separate (simple) models of these energy, car, and housing expenditures using only single ll.loglikelihoodregression functions.
However, in all cases, the (non-robust) standard error of sigma(s) is 1.8E+308.
Also, the correlations reported in the Biogeme .html file are all 1.8E+308 values, without exception. The covariances seem more reasonable.
Non-convergence warnings occur with the standard optimization algorithms, with the notable exception of scipy.
Interestingly, the issue appears purely numerical:
· The estimated values of sigma do make sense
· All robust standard errors and t-scores seem logical
· The estimated coefficients make sense
· For the simple models (of one expenditure at a time) these coefficients are exactly equal to the coefficients estimated with an sklearn Ordinary Least Squares model
The issue persists when:
· Estimating models with ASCs only
· Using log(ll.likelihoodregression())
· Removing the entries with the lowest simulated (log)likelihoods
The expenditures have been BoxCox transformed previously and vary from ca. 10 to 60.
These expenditures have been derived from different datasets (e.g. the EnergyCosts are from utility companies, whereas CarCosts are computed using odometer counts from obligatory checkups).
All independent variables have been scaled and centered.
All sigma’s are freely estimated, with lower bounds specified (but not reached).
The variance-covarance values vary from abs(10**-10 to 10**-2). There are negative elements. I understand this can cause 1.8E+308 stds for sigma? But then I still do not understand what the root cause of this issue might be and whether there is any cause for concern.
The biogeme version is 3.2.14, but apparently, there were some issues with the installation of this latest version inthe microdata environment by the IT people.
I cannot share the exact data or code due to the remote microdata environment, but the issue already occurs with the following simple setup:
Costs_sigma = Beta(‘Costs_sigma’, 1, None, 0)
Costs_modeled = ASC + coef1*var1 + coef2*var2 + coef3*var3 + coef4*var4
loglike = ll.loglikelihoodregression(CostData, Costs_modeled, Costs_sigma)
Biogeme = bio.BIOGEME(database, loglike)