Mixed logit model with aggregate panel data

199 views
Skip to first unread message

Mariana Ss

unread,
Nov 30, 2022, 7:49:07 AM11/30/22
to Biogeme
Dear Professor Bierlaire,

I have been running mixed logit models with aggregate panel data with the following specification:

logprobability = n_alt0 * log(MonteCarlo(PanelLikelihoodTrajectory(models.logit(V, None, 0)))) + n_alt1 * log(MonteCarlo(PanelLikelihoodTrajectory(models.logit(V, None, 1))))

biogeme  = bio.BIOGEME(database, logprobability, numberOfDraws = 1000)

However, I saw that an update has occurred and now it doesn't work (it says that I cannot have variables without being within the PanelLikelihoodTrajectory) and advises me to use generateFlatPanelDataframe. However, I can´t use that because my data is extremely disbalanced, that is, the number of observations by individual vary a lot (between 4 to 30). So, it appears an error saying that this function generates missing values:

biogemeError: The database contains NaN value(s). Detect where they are using the function isnan()

Therefore, is there any way to solve the problem without switching the dataset to a flat structure?

Thank you in advance for your help!

Kind regards,
Mariana Sousa


Bierlaire Michel

unread,
Nov 30, 2022, 9:33:45 AM11/30/22
to mar99...@gmail.com, Bierlaire Michel, Biogeme


> On 30 Nov 2022, at 11:54, Mariana Ss <mar99...@gmail.com> wrote:
>
> Dear Professor Bierlaire,
>
> I have been running mixed logit models with aggregate panel data with the following specification:
>
> logprobability = n_alt0 * log(MonteCarlo(PanelLikelihoodTrajectory(models.logit(V, None, 0)))) + n_alt1 * log(MonteCarlo(PanelLikelihoodTrajectory(models.logit(V, None, 1))))


I don’t understand your model. Panel data are used when the same individual is observed over time, and serial correlation must be captured. But if you have an aggregate model, what does it represent?

>
> biogeme = bio.BIOGEME(database, logprobability, numberOfDraws = 1000)
>
> However, I saw that an update has occurred and now it doesn't work (it says that I cannot have variables without being within the PanelLikelihoodTrajectory) and advises me to use generateFlatPanelDataframe. However, I can´t use that because my data is extremely disbalanced, that is, the number of observations by individual vary a lot (between 4 to 30). So, it appears an error saying that this function generates missing values:
>
> biogemeError: The database contains NaN value(s). Detect where they are using the function isnan()

Replace the NaN by a value of your choice (such as -1)
Then, you can use the Elem expression to set the contribution to the likelihood to 1 if -1 is detected.

This is exactly what is done when we combine choice data and psychometric indicators in hybrid choice models.
For instance, look at rows 182-193 of this example:
https://github.com/michelbierlaire/biogeme/blob/master/examples/latent/05latentChoiceFull.py
If the data is 6, -1 or -2, the probability is set to 1.


>
> Therefore, is there any way to solve the problem without switching the dataset to a flat structure?
>
> Thank you in advance for your help!
>
> Kind regards,
> Mariana Sousa
>
>
>
> --
> You received this message because you are subscribed to the Google Groups "Biogeme" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to biogeme+u...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/biogeme/440b4d8d-4f72-4f6d-90c5-b1cf1d4b7697n%40googlegroups.com.

Mariana Ss

unread,
Dec 1, 2022, 2:26:03 AM12/1/22
to Bierlaire Michel, Biogeme
Dear Professor Bierlaire,

My dataset tracks sales of different products over a quarter. So I want to capture the serial correlation from observing the same stores over time.

Thank you. Now I am looking to get the willingness to pay. However, since I have 30 variables for price and 30 for remaining shelf life (due to the function generateFlatPanelDataframe), I am not sure what I should do to calculate it.

I tried this:

# Estimate the parameters.
results = res.bioResults(pickleFile = 'panel_flat.pickle')

# Simulate to recalculate the log likelihood
simulated_loglike = logprob.getValue_c(
    database = dbase,
    betas = results.getBetaValues(),
    numberOfDraws = 1000,
    aggregation = True,
    prepareIds = True,
)

WTP_alt0 = [Derive(V0[t],Variable(f'{t+1}_remaining_shelf_life_alt0')) / Derive(V0[t],Variable(f'{t+1}_price_alt0')) for t in range(30)]

WTP_alt1 = [Derive(V0[t],Variable(f'{t+1}_remaining_shelf_life_alt1')) / Derive(V0[t],Variable(f'{t+1}_price_alt1')) for t in range(30)]

simulate = {'WTP_sem_etiqueta': MonteCarlo(WTP_alt0),
            'WTP_etiqueta': MonteCarlo(WTP_alt1)}

biosim = bio.BIOGEME(
    dbase, simulate, numberOfDraws=1000, suggestScales=False
)
sim = biosim.simulate()

However it didn't work. What am I doing wrong?

Thank you very much!

Kind regards,
Mariana Sousa

Miguel Costa

unread,
Feb 10, 2023, 2:10:35 AM2/10/23
to Biogeme
Hi Mariana,

Did you manage to solve the issue with the unbalanced panel data?

Best regards,
Miguel
Reply all
Reply to author
Forward
0 new messages