Panel data: Simulate prediction results at the observation level instead of individual level

23 views
Skip to first unread message

Natchaphon Leungbootnak

unread,
Mar 24, 2025, 4:51:35 PMMar 24
to Biogeme
Dear all,

I have a panel dataset that contains 'randid' (individual traveler). Each traveler can have many 'trips' (observation). In this study, I have run the Mixture logit model with panel data by calculating the coefficient from the training dataset. Then I used them to predict y in the test dataset. However, I got the y prediction results at the individual level instead of the observation level. Could you please guide me on how to simulate y at the observation level (trip instead of individual) using coefficient results from the train dataset?

I have attached my code details as follows:

#split dataset
The dataset was split based on individual id with 80% train set. The same id should not be in both train and test data. 

#making data to be panel
database_train = Database('RUMmix', df_train)
database_test = Database('RUMmix', df_test)
database_train.panel('randid')
database_test.panel('randid')

#training set
utilities = {1: utility_ML, 2: utility_GPL}
obsprob = logit(utilities, None, choice)
condprobIndiv = PanelLikelihoodTrajectory(obsprob)
log_choice_probability = log(MonteCarlo(condprobIndiv))
biogeme_train = BIOGEME(database_train, log_choice_probability, number_of_draws=1000, seed=1223)
biogeme_train.modelName = 'RUMmix'
results_train = biogeme_train.estimate()

#printing short summary results
Results for model RUMmix
Nbr of parameters: 13
Sample size: 222
Observations: 1666
Excluded data: 0
Final log likelihood: -58.80992
Akaike Information Criterion: 143.6198
Bayesian Information Criterion: 187.8546

#test simulation
prob_1 = MonteCarlo(PanelLikelihoodTrajectory(logit(utilities, None, 1)))
prob_2 = MonteCarlo(PanelLikelihoodTrajectory(logit(utilities, None, 2)))

simulate ={'Prob. 1':  prob_1 ,
           'Prob. 2':  prob_2 ,}

biogeme_test = BIOGEME(database_test, simulate, number_of_draws=1000, seed=1223)
biogeme_test.modelName = "RUMmix_test"
betaValues = results_train.getBetaValues()
simulatedValues = biogeme_test.simulate(betaValues)

Based on the above code, I got the results at individual instead of trip level.

Best regards,
Natchaphon Leungbootnak

Michel Bierlaire

unread,
Mar 25, 2025, 3:18:53 AMMar 25
to natcha...@gmail.com, Michel Bierlaire, Biogeme
The simplest thing to do is not to declare the data as panel in simulation mode.
The draws for each observation will be different. However, for simulation it does not matter.
> --
> You received this message because you are subscribed to the Google Groups "Biogeme" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to biogeme+u...@googlegroups.com.
> To view this discussion visit https://groups.google.com/d/msgid/biogeme/b835d2bc-339e-4c30-92e2-dcbb05df0417n%40googlegroups.com.

Michel Bierlaire
Transport and Mobility Laboratory
School of Architecture, Civil and Environmental Engineering
EPFL - Ecole Polytechnique Fédérale de Lausanne
http://transp-or.epfl.ch
http://people.epfl.ch/michel.bierlaire

Natchaphon Leungbootnak

unread,
Mar 25, 2025, 11:58:36 AMMar 25
to Michel Bierlaire, Biogeme
For my understanding, when I run the code, I have to set up the df_train as the panel data. However, I do not need to set the df_test as the panel data.

Thank you for your help,
Natchaphon
Reply all
Reply to author
Forward
0 new messages