Long running time and strange results of mixed logit model

258 views
Skip to first unread message

Rongqiu Song

unread,
Feb 19, 2024, 8:22:05 AM2/19/24
to Biogeme
Dear Prof  Bierlaire and the Biogeme Forum members, 

I hope this email finds you well.

I have a data from a discrete choice experiment. There are three alternatives and each person was shown 5 different choice cards.  

Firstly, I  estimated a multinomial logit model in both Stata and Biogeme using this data, and the results were the same. Then, I tried to estimate a mixed logit model by setting the starting values according to the results of the multinomial logit model in Biogeme and Stata, but the model took a long time to run (about 1 hour and 30 minutes) and the results of the mixed logit model were very different from the coefficients from MNL model.

I also ran a mixed logit model in Stata, but the results were quite close to the  variable coefficients from the MNL model. So I do not know what might be the problem with my Biogeme codes.

I tried the following code:

# Parameters to be estimated
# use street as the reference
ASC_SURFACE = Beta('ASC_SURFACE', 0.106, None, None, 0)
ASC_MULTI = Beta('ASC_MULTI', -0.498, None, None, 0)
B_PRICE = Beta('B_PRICE', -0.891, None, None, 0)
B_WAIT = Beta('B_WAIT', -0.018, None, None, 0)
B_NUMBER = Beta('B_NUMBER', 0.026, None, None, 0)
B_GREEN = Beta('B_GREEN', 0.300, None, None, 0)
B_FAST = Beta('B_FAST', 0.201, None, None, 0)
B_PAY2 = Beta('B_PAY2', 0.032, None, None, 0)
B_PAY3 = Beta('B_PAY3', -0.082, None, None, 0)
B_PAY4 = Beta('B_PAY4', 0.009, None, None, 0)
B_AUTHEN2 = Beta('B_AUTHEN2', 0.087, None, None, 0)
B_AUTHEN3 = Beta('B_AUTHEN3', 0.139, None, None, 0)
B_AMEN2 = Beta('B_AMEN2', 0.166, None, None, 0)
B_AMEN3 = Beta('B_AMEN3', 0.189, None, None, 0)
B_AMEN4 = Beta('B_AMEN4', 0.191, None, None, 0)
B_RATE2 = Beta('B_RATE2', 0.060, None, None, 0)
B_RATE3 = Beta('B_RATE3', 0.146, None, None, 0)

B_SURFACE_S = Beta('B_SURFACE_S',1,None,None,0)
B_MULTI_S = Beta('B_MULTI_S',1,None,None,0)
B_WAIT_S = Beta('B_WAIT_S', 1, None, None, 0)
B_NUMBER_S = Beta('B_NUMBER_S', 1, None, None, 0)
B_GREEN_S = Beta('B_GREEN_S', 1, None, None, 0)
B_FAST_S = Beta('B_FAST_S', 1, None, None, 0)
B_PAY2_S = Beta('PAY2_S', 1, None, None, 0)
B_PAY3_S = Beta('PAY3_S', 1, None, None, 0)
B_PAY4_S = Beta('PAY4_S', 1, None, None, 0)
B_AUTHEN2_S = Beta('AUTHEN2_S', 1, None, None, 0)
B_AUTHEN3_S = Beta('AUTHEN3_S', 1, None, None, 0)
B_AMEN2_S = Beta('AMEN2_S', 1, None, None, 0)
B_AMEN3_S = Beta('AMEN3_S', 1, None, None, 0)
B_AMEN4_S = Beta('AMEN4_S', 1, None, None, 0)
B_RATE2_S = Beta('RATE2_S', 1, None, None, 0)
B_RATE3_S = Beta('RATE3_S', 1, None, None, 0)

# Define a random parameter, normally distributed, designed to be used
# for Monte-Carlo simulation
B_SURFACE_RND = ASC_SURFACE + B_SURFACE_S * bioDraws('B_SURFACE_RND','NORMAL')
B_MULTI_RND = ASC_MULTI + B_MULTI_S * bioDraws('B_MULTI_RND','NORMAL')
B_WAIT_RND = B_WAIT + B_WAIT_S * bioDraws('B_WAIT_RND','NORMAL')
B_NUMBER_RND = B_NUMBER + B_NUMBER_S * bioDraws('B_NUMBER_RND','NORMAL')
B_GREEN_RND = B_GREEN + B_GREEN_S * bioDraws('B_GREEN_RND','NORMAL')
B_FAST_RND = B_FAST + B_FAST_S * bioDraws('B_FAST_RND','NORMAL')
B_PAY2_RND = B_PAY2 + B_PAY2_S * bioDraws('B_PAY2_RND','NORMAL')
B_PAY3_RND = B_PAY3 + B_PAY3_S * bioDraws('B_PAY3_RND','NORMAL')
B_PAY4_RND = B_PAY4 + B_PAY4_S * bioDraws('B_PAY4_RND','NORMAL')
B_AUTHEN2_RND = B_AUTHEN2 + B_AUTHEN2_S * bioDraws('B_AUTHEN2_RND','NORMAL')
B_AUTHEN3_RND = B_AUTHEN3 + B_AUTHEN3_S * bioDraws('B_AUTHEN3_RND','NORMAL')
B_AMEN2_RND = B_AMEN2 + B_AMEN2_S * bioDraws('B_AMEN2_RND','NORMAL')
B_AMEN3_RND = B_AMEN3 + B_AMEN3_S * bioDraws('B_AMEN3_RND','NORMAL')
B_AMEN4_RND = B_AMEN4 + B_AMEN4_S * bioDraws('B_AMEN4_RND','NORMAL')
B_RATE2_RND = B_RATE2 + B_RATE2_S * bioDraws('B_RATE2_RND','NORMAL')
B_RATE3_RND = B_RATE3 + B_RATE3_S * bioDraws('B_RATE3_RND','NORMAL')

# Definition of the utility functions
V1 = B_PRICE * cprice1 \
 + B_WAIT_RND * cwait1 \
 + B_NUMBER_RND * cnumber1 \
 + B_GREEN_RND * crenewable1 \
 + B_FAST_RND * FastStreet \
 + B_PAY2_RND * PayStreet2 \
 + B_PAY3_RND * PayStreet3 \
 + B_PAY4_RND * PayStreet4 \
 + B_AUTHEN2_RND * AuthenStreet2 \
 + B_AUTHEN3_RND * AuthenStreet3 \
 + B_AMEN2_RND * AmenStreet2 \
 + B_AMEN3_RND * AmenStreet3 \
 + B_AMEN4_RND * AmenStreet4 \
 + B_RATE2_RND * RateStreet2 \
 + B_RATE3_RND * RateStreet3

V2 = B_SURFACE_RND  \
 + B_PRICE * cprice2 \
 + B_WAIT_RND * cwait2 \
 + B_NUMBER_RND * cnumber2 \
 + B_GREEN_RND * crenewable2 \
 + B_FAST_RND * FastSurface \
 + B_PAY2_RND * PaySurface2 \
 + B_PAY3_RND * PaySurface3 \
 + B_PAY4_RND * PaySurface4 \
 + B_AUTHEN2_RND * AuthenSurface2 \
 + B_AUTHEN3_RND * AuthenSurface3 \
 + B_AMEN2_RND * AmenSurface2 \
 + B_AMEN3_RND * AmenSurface3 \
 + B_AMEN4_RND * AmenSurface4 \
 + B_RATE2_RND * RateSurface2 \
 + B_RATE3_RND * RateSurface3

V3 = B_MULTI_RND \
 + B_PRICE * cprice3 \
 + B_WAIT_RND * cwait3 \
 + B_NUMBER_RND * cnumber3 \
 + B_GREEN_RND * crenewable3 \
 + B_FAST_RND * FastMulti \
 + B_PAY2_RND * PayMulti2 \
 + B_PAY3_RND * PayMulti3 \
 + B_PAY4_RND * PayMulti4 \
 + B_AUTHEN2_RND * AuthenMulti2 \
 + B_AUTHEN3_RND * AuthenMulti3 \
 + B_AMEN2_RND * AmenMulti2 \
 + B_AMEN3_RND * AmenMulti3 \
 + B_AMEN4_RND * AmenMulti4 \
 + B_RATE2_RND * RateMulti2 \
 + B_RATE3_RND * RateMulti3

# Scale the utility functions, and associate them with the numbering
# of alternatives
V = {1: V1,
     2: V2,
     3: V3,
    }

# Associate the availability conditions with the alternatives
av = {1: STREET_AV,
      2: SURFACE_AV,
      3: MULTI_AV,
     }

# The choice model is a logit, with availability conditions
prob = models.logit(V,av,choice)
logprob = log(MonteCarlo(prob))

USER_NOTES = (
    'Example of a mixture of logit models with three alternatives, '
    'approximated using Monte-Carlo integration.'
)

the_biogeme = bio.BIOGEME(
    database, logprob, userNotes=USER_NOTES, parameter_file='few_draws.toml'
)
the_biogeme.modelName = 'b05normal_mixture'

print(f'Number of draws: {the_biogeme.number_of_draws}')

results = biogeme.estimate()
print(results.short_summary())
pandas_results = results.getEstimatedParameters()
pandas_results

As shown in the model, except price, I set all variables, including alternative specific constants as random parameters. The number of draws I set it as 100, but the model ran a long time (1 hour and a half) and the results were dramatically different from the MNL model. 

Could you please help me check what might be wrong with the code?

Many thanks,
Rongqiu




Michel Bierlaire

unread,
Feb 20, 2024, 3:01:10 AM2/20/24
to songron...@gmail.com, Michel Bierlaire, Biogeme
100 draws are definitely not enough to obtain meaningful results. You use only a low number of draws to debug your code.

If you want to speed up the calculation, use more processors. Also, cancel the calculation of second derivatives.
In the parameter file change

[SimpleBounds]
second_derivatives = 1.0 # float: proportion (between 0 and 1) of iterations when
# the analytical Hessian is calculated

into

[SimpleBounds]
second_derivatives = 0.0 # float: proportion (between 0 and 1) of iterations when
# the analytical Hessian is calculated
> --
> You received this message because you are subscribed to the Google Groups "Biogeme" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to biogeme+u...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/biogeme/f3c94837-c4d5-4c96-b1b9-a1c3061258a6n%40googlegroups.com.

Michel Bierlaire
Transport and Mobility Laboratory
School of Architecture, Civil and Environmental Engineering
EPFL - Ecole Polytechnique Fédérale de Lausanne
http://transp-or.epfl.ch
http://people.epfl.ch/michel.bierlaire

Reply all
Reply to author
Forward
0 new messages