Inconsistent results while using Biogeme

65 views
Skip to first unread message

Newsha Bagheri

unread,
May 5, 2023, 1:24:41 AM5/5/23
to Biogeme
Dear Professor Bierlaire,

I am writing to seek your expertise regarding a problem while using Biogeme. I created a synthetic dataset for which I know the true values of parameters. I have created a model with two latent variables p1 and p2, and I am using Biogeme for estimation.

Here is the specification of the model and the code that I am using (I tried to make the formulation as simple as possible, so I consider Sigmas be equal to one):

Dataset = pd.read_excel('Dataset.xlsx')

database = db.Database('Data', Dataset)

globals().update(database.variables)

B_a = Beta('B_a', 0, None, None, 0)
B_b = Beta('B_b', 0, None, None, 0)
B_p = Beta('B_p', 0, None, None, 0)
B_z = Beta('B_z', 1, None, None, 1)
B_wz = Beta('B_wz', 0, None, None, 0)

p1 = B_z * z1 + B_wz * wz1 +  bioDraws('p1','NORMAL')
p2 = B_z * z2 + B_wz * wz2 +  bioDraws('p2','NORMAL')

V1 =  B_a * a1 + B_b * b1  + B_p * p1
V2 =  B_a * a2 + B_b * b2  + B_p * p2

V = {0: V1,
     1: V2}

condprob = models.loglogit(V, None, choice)

biogeme = bio.BIOGEME(database, MonteCarlo(condprob),numberOfDraws= 200)
biogeme.modelName = '01choice'

results = biogeme.estimate(algorithm=opt.bioNewton)


My problem is that when I use different names for bioDraws of p1 and p2, the estimated values are not close to the true values. However, when I use the same name for bioDraws of p1 and p2 (e.g.  bioDraws('p1','NORMAL')  for both ), the estimated values are close to the true values.

Could you please explain to me why this is happening? Any insight you can provide would be greatly appreciated.

Thank you very much for your time and help.

Best regards,
Niousha

Bierlaire Michel

unread,
May 5, 2023, 2:16:45 AM5/5/23
to newsha.b...@gmail.com, Bierlaire Michel, Biogeme
How different are the results? In any case, 200 draws are definitely not sufficient.
> --
> You received this message because you are subscribed to the Google Groups "Biogeme" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to biogeme+u...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/biogeme/2d1b893c-c5c5-4a5c-8fa8-b9ebc532ba9bn%40googlegroups.com.

Michel Bierlaire
Transport and Mobility Laboratory
School of Architecture, Civil and Environmental Engineering
EPFL - Ecole Polytechnique Fédérale de Lausanne
http://transp-or.epfl.ch
http://people.epfl.ch/michel.bierlaire

Bierlaire Michel

unread,
May 5, 2023, 5:24:26 AM5/5/23
to newsha.b...@gmail.com, Bierlaire Michel, Biogeme
If you give the same name, it means that you are using the exact same draws, which is generally not what you want to do.
According to the results that you show, it definitely seems to be an issue with the number of draws.

You can also try with numerical integration, although it may take a lot of estimation time.


> On 5 May 2023, at 09:06, newsha.b...@gmail.com wrote:
>
> Thank you for your quick response. I appreciate your feedback on the number of draws needed for our experiment. I agree that 200 draws are not sufficient, and I increased the number of draws to 1000 for my latest experiment. However, the number of draws does not make any difference in the value of estimated parameters. But what is making a difference here is the name of bioDraws.
>
> I would like to share with you a table that shows the estimated values of the parameters. As you can see in the table, the model with different bioDraws names has estimated values that are much closer to the true values compared to the other model. Both models were run with 1000 draws.
>
> I am unsure why this is happening as each bioDraws should have their own specific name.
>
> Thank you for your time and guidance.
>
> Best regards,
> Niousha

Bierlaire Michel

unread,
May 10, 2023, 3:05:39 AM5/10/23
to newsha.b...@gmail.com, Bierlaire Michel, Biogeme
A latent variable model without any indicator is just a mixture model. And the parameters are not necessarily identified.
Try to calculate the log likelihood with the true value of the parameters. If it is close to the maximum likelihood that you obtain, it means that you have an identification issue.


> On 10 May 2023, at 09:01, newsha.b...@gmail.com wrote:
>
> Thank you for your prompt response and valuable insights.
> I followed your advice and repeated the experiment with larger numbers of draws—specifically, 10,000 and 50,000. Unfortunately, despite the increased number of draws, the results did not show any significant improvement. The estimated values obtained are still far from the true values, as illustrated in the table below:
>
>
>
> It appears that the issue might extend beyond the number of draws. I would greatly appreciate any further guidance or suggestions you may have regarding the next steps I should take to address this issue.
> Thanks for suggesting numerical integration, but it seems the Montecarlo integration is causing the problem.
> Do you think this problem might be with the specification of the model?
> Because I am trying to develop a model in which the latent variable does not have any indicators.
> Thank you once again for your expertise and support.
>
> Best regards,
> Niousha
Reply all
Reply to author
Forward
0 new messages