Log-Likelihood of the choice submodel in an ICLV model

313 views
Skip to first unread message

Atefeh Fakourrad

unread,
Mar 17, 2022, 12:50:14 PM3/17/22
to Biogeme
Dear Prof. Bierlaire, 

I am in great need of estimating the Log-Likelihood of the choice submodel in an ICLV model. I have searched the forum and found that you suggested using simulation with the estimated parameters. This is still not fully clear to me. Should I follow the general instructions (of course the adapted version) of this simulation example or another one?  
In the simulation syntax, should I just specify the utility functions of the choice submodel using the estimated parameters without LVs? As far as I know, the simulation output in Biogeme will be a Pandas data frame, right? How can I find the LL then? 

I would highly appreciate your guidance in this case in advance.

Best regards,
Ati. 


Bierlaire Michel

unread,
Mar 17, 2022, 12:54:25 PM3/17/22
to a.fak...@gmail.com, Bierlaire Michel, Biogeme
I am not sure to understand your question. 
--
You received this message because you are subscribed to the Google Groups "Biogeme" group.
To unsubscribe from this group and stop receiving emails from it, send an email to biogeme+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/biogeme/8277a744-2d77-4611-b436-58606899a713n%40googlegroups.com.

Atefeh Fakourrad

unread,
Mar 18, 2022, 3:17:13 AM3/18/22
to Biogeme
Thanks for your reply. I have fully read the document and I think my question goes beyond it. 

I just want to compare the model fitness of a Mixed Logit model and an ICLV model to check which model statistically outperforms. Since the ICLV model includes both the choice model and the structural model, we cannot simply compare the reported log-likelihood of the ICLV model with the ML model. That is why I need to extract the LL of the choice model estimated in the ICLV, then I can use it to make a comparison, right? 

Now, my question is about the possible way(s) that I can use to obtain the LL of the choice model inside the ICLV model. To do so, you suggested using simulation with the estimated parameters. I was wondering what elements are needed in the simulation syntax to ensure I can get the LL of the choice (sub)model? Is there any example that I can follow in this regard? 


Best,
Ati. 

Bierlaire Michel

unread,
Mar 18, 2022, 3:20:53 AM3/18/22
to a.fak...@gmail.com, Bierlaire Michel, Biogeme

On 17 Mar 2022, at 21:45, Atefeh Fakourrad <a.fak...@gmail.com> wrote:

Thanks for your reply. I have fully read the document and I think my question goes beyond it. 

I just want to compare the model fitness of a Mixed Logit model and an ICLV model to check which model statistically outperforms. Since the ICLV model includes both the choice model and the structural model, we cannot simply compare the reported log-likelihood of the ICLV model with the ML model.

Indeed,

That is why I need to extract the LL of the choice model estimated in the ICLV, then I can use it to make a comparison, right? 

It will not help for formal comparison, as it is *not* the maximum likelihood. Indeed, the ICLV tries to fit both the choice data and the indicators.


Now, my question is about the possible way(s) that I can use to obtain the LL of the choice model inside the ICLV model. To do so, you suggested using simulation with the estimated parameters. I was wondering what elements are needed in the simulation syntax to ensure I can get the LL of the choice (sub)model? Is there any example that I can follow in this regard? 

The model specification is exactly the same as for estimation. You just need to define a dictionary with the quantities that you are interested in simulating. Biogeme will evaluate them for each entry in the database. 




Best,
Ati. 

On Thursday, March 17, 2022 at 5:54:25 PM UTC+1 michel.b...@epfl.ch wrote:
I am not sure to understand your question. 
On 17 Mar 2022, at 11:45, Atefeh Fakourrad <a.fak...@gmail.com> wrote:

Dear Prof. Bierlaire, 

I am in great need of estimating the Log-Likelihood of the choice submodel in an ICLV model. I have searched the forum and found that you suggested using simulation with the estimated parameters. This is still not fully clear to me. Should I follow the general instructions (of course the adapted version) of this simulation example or another one?  
In the simulation syntax, should I just specify the utility functions of the choice submodel using the estimated parameters without LVs? As far as I know, the simulation output in Biogeme will be a Pandas data frame, right? How can I find the LL then? 

I would highly appreciate your guidance in this case in advance.

Best regards,
Ati. 



--
You received this message because you are subscribed to the Google Groups "Biogeme" group.
To unsubscribe from this group and stop receiving emails from it, send an email to biogeme+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/biogeme/8277a744-2d77-4611-b436-58606899a713n%40googlegroups.com.


--
You received this message because you are subscribed to the Google Groups "Biogeme" group.
To unsubscribe from this group and stop receiving emails from it, send an email to biogeme+u...@googlegroups.com.

Bierlaire Michel

unread,
Mar 18, 2022, 10:37:02 AM3/18/22
to Atefeh Fakourrad, Bierlaire Michel, Biogeme
The output of the simulation is a full Pandas table, where the quantities are calculated for each entry in the database.  What you display here is just an aggregate description of the quantiles generated by the “describe” method.
For the loglikelihood, you just need to calculate the log of the choice probability. After the simulation, just generate the sum of the column.

On 18 Mar 2022, at 12:33, Atefeh Fakourrad <a.fak...@gmail.com> wrote:

Thanks for your guidance. I have run the simulation and here is the output:

Simulation results
         Prob. acc    Prob. dec
count  3456.000000  3456.000000
mean      0.701389     0.298611
std       0.096380     0.096380
min       0.380567     0.158056
25%       0.647859     0.214164
50%       0.705857     0.294143
75%       0.785836     0.352141
max       0.841944     0.619433

Is there a piece of code that I can use for calculating the LL? I found this: biogeme.calculateLikelihood(x, scaled, batch=None). If it is the case, what is 'x' exactly? In the Swissmetro example, which variable should be considered as 'X' in this function?  

Best,
Ati. 


Atefeh Fakourrad

unread,
Mar 25, 2022, 9:08:44 AM3/25/22
to Biogeme
Dear Prof. Bierlaire, 

Thanks to your guide, I was able to calculate the loglikelihood of the choice model estimated within an ICLV model. Then, I compared it with the loglikelihood of an estimated Mixed-logit model. It turns out that in my study, the log-likelihood of the Mixed-logit model is better than the loglikelihood of the choice submodel of the ICLV model. Does this mean that the Mixed-logit can better fit the data and the ICLV model is not needed despite the additional complexity? Are there other criteria than the model fit that can justify the use of ICLV models? 

Best,
Ati.  

Bierlaire Michel

unread,
Mar 25, 2022, 9:18:28 AM3/25/22
to a.fak...@gmail.com, Bierlaire Michel, Biogeme
On 25 Mar 2022, at 13:19, Atefeh Fakourrad <a.fak...@gmail.com> wrote:

Dear Prof. Bierlaire, 

Thanks to your guide, I was able to calculate the loglikelihood of the choice model estimated within an ICLV model. Then, I compared it with the loglikelihood of an estimated Mixed-logit model. It turns out that in my study, the log-likelihood of the Mixed-logit model is better than the loglikelihood of the choice submodel of the ICLV model.

Of course. As I told you already, you are comparing oranges and apples, as the likelihood of the choice submodel is *not* the maximum likelihood. It is therefore not a measure of fit. 

Does this mean that the Mixed-logit can better fit the data and the ICLV model is not needed despite the additional complexity?

It does not mean anything. 

Are there other criteria than the model fit that can justify the use of ICLV models? 

Prediction tests.


Atefeh Fakourrad

unread,
Mar 27, 2022, 7:25:52 AM3/27/22
to Biogeme
Dear Prof. Bierlaire,

Many thanks for your reply. I have already read the insightful paper that you referred to. The authors suggest two ways of calculating the choice likelihood of the ICLV model: 1- The first approach formulates the choice probability as a function solely of the observable variables x_n.
2- The second approach formulates the choice probability as a function of both the observable variables x_n and the measurement indicators i_n.

Based on the documentation, Biogeme uses the second approach (considering both observable variables and measurement indicators) to calculate the final log-likelihood. Is that right?


In this paper, it is advised to calculate the choice likelihood of the ICLV model using the first approach and then compare it with a reduced mixed logit model where the latent variables are replaced with the observable variables. I wonder if there is a way to calculate this likelihood directly in Biogeme? The formula is as follows:
x=observable variable
x*=latent variable

f_y(y_n|x_n; B,Γ,A,Φ) = ∫ f_y(y_n|x_n, x*_n; B,Γ) f_x* (x* n|x_n;A,Φ)dx*

Best,
Ati. 

Atefeh Fakourrad

unread,
Mar 29, 2022, 5:01:34 AM3/29/22
to Biogeme
Dear Prof. Bierlaire,

Sorry for the crossposting. This is the last step in the process of the ICLV model estimation that I need to do. I am just a bit confused with one of the formulas in the paper that you shared. This formula is used to calculate the choice likelihood of the ICLV model (the likelihood of the choice sub-model within the ICLV model). This method is different from what we discussed before as it also considers the indicators for estimating the choice likelihood. Following is the formula (Eq. 11 in the paper):

f_y(y_n|x_n, i_n; B,Γ, D, Ψ , A, Φ ) = f_y,i(y_n, i_n|x_n; B,Γ, D, Ψ , A, Φ) / (  ∫ f_i(in|x_n, x*_ n; D, Ψ) f_x* (x*_n|x_n;A,Φ )dx*_n)

The numerator is actually what Biogeme reports as the final log-likelihood. I just do not know how to calculate the denominator in Bioegeme. I would be grateful if you could help me with this. 

Thanks for your support.

Best,
Ati. 

Bierlaire Michel

unread,
Mar 29, 2022, 9:12:03 AM3/29/22
to a.fak...@gmail.com, Bierlaire Michel, Biogeme
The denominator is the integral of the contribution of the indicators to the loglikelihood function. 
Like any integral, you can use the MonteCarlo operator to calculate it, using draws from the structural equations. 


On 29 Mar 2022, at 10:54, Atefeh Fakourrad <a.fak...@gmail.com> wrote:

Dear Prof. Bierlaire,

Atefeh Fakourrad

unread,
Mar 30, 2022, 2:12:58 AM3/30/22
to Biogeme
Thanks for your reply. So I need to use simulation again to calculate the contribution of the indicators. For the denominator, I just need to calculate the probability of the indicators for each individual, right? Let's assume I have six Likert scale indicators (i_1, i_2,...,i_6). The structural and the measurement equations have been specified as instructed in the documentation. For simulation purposes, should I just calculate the following for the dominator, or another property/component is needed for this? 

condlike = (P_i_1*P_i_2*...*P_i_6)
logprob = log(MonteCarlo(condlike))

simulate = {
    'logProb': logprob1}

Best,
Ati. 

Bierlaire Michel

unread,
Mar 30, 2022, 3:50:38 AM3/30/22
to a.fak...@gmail.com, Bierlaire Michel, Biogeme

OSCAR SUN

unread,
Mar 31, 2022, 7:08:39 AM3/31/22
to Biogeme
Dear Prof Bierlaire and Ati,
Can you help to clarify on the arrangement of the simulation script? Say if I have two latent variables (LV1, LV2), measured by (LV1: item_1_1, item_1_2, item_1_3; LV2: item_2_1, item_2_2, item_2_3). The sub-choice model is:
___________________________________________
condprob = models.logit(V, None, answer)
Structural = (item_1_1 * item_1_2 * item_1_3 * item_2_1 * item_2_2 * item_2_3)
condlike   = Structural * condprob
prob          = log(MonteCarlo(condlike))
Then, to simulate values from each individual. Do we follow below arrangement?
condprob_1 = models.logit(V, None, 1)
condlike_1 = condprob_1
prob_1     = log(MonteCarlo(condlike_1))

condprob_2 = models.logit(V, None, 2)
condlike_2 = condprob_2
prob_2     = log(MonteCarlo(condlike_2))

condprob_3 = models.logit(V, None, 3)
condlike_3 = condprob_3
prob_3     = log(MonteCarlo(condlike_3))

simulate = {'Prob_1': prob_1,
                     'Prob_2': prob_2,
                     'Prob_3': prob_3,
                      'Prob'     : prob}
biosim  = bio.BIOGEME(database, simulate, numberOfDraws = 100)
biosim.DRAWS = {'omega_1': ('NORMAL'), 'omega_2': ('NORMAL')}

biosim.modelName = "Simulation_file"
simresults = biosim.simulate()
print(simresults)

___________________________________________
However, when I proceeded with above script, I received negative values (see below screenshot)
WechatIMG22.jpeg

Please feel free to point out any errors. Appreciate your time.
Thanks and best regards,
Oscar

Bierlaire Michel

unread,
Mar 31, 2022, 7:31:08 AM3/31/22
to oscar...@gmail.com, Bierlaire Michel, Biogeme
I don’t understand what you are trying to do. 
Either you want to calculate the contribution to the loglikelihood function for each individual, and you should use the exact same formula as for the estimation.
Or you want to simulate choice probabilities, and you just need to plug the structural equation in the choice model, and integrate over its error term.


Reply all
Reply to author
Forward
0 new messages