Dear Professor Bierlaire and dear community,
I am currently using Biogeme to analyse a dataset that was collected through a stated preference survey. Each of the 450 respondents had to answer 6 choice games (out of 36 total choice games in the experiment), choosing between 3 different alternatives.
I have created a subgroup of the respondents with those people that have picked the same alternative every time i.e. in every one of the six choice games they have always chosen the same alternative.
I have created a dummy variable to indicate whether a respondent belongs to this group or not named W_PREF, which is 1 for those respondents that have always chosen the same alternative, 0 for those that have chose different modes in different scenarios.
I am now wondering wether it makes sense to use this dummy variable with a coefficient in the utility functions of the other two alternatives - with separate coefficients for each alternative - named B_W_PREF_ES and B_W_PREF_B like this:
# Parameters to be estimated
# General Parameters
B_TRANS = Beta('B_TRANS',0,None, None, 0)
# Parameters for Alternative 1
B_TIME_ES = Beta('B_TIME_ES',0,None, None, 0)
B_COST_ES = Beta('B_COST_ES',0,None, None, 0)
B_F_ES = Beta('B_F_ES',0,None, None, 0)
B_AGE_YT_ES = Beta('B_AGE_YT_ES',0,None, None, 0)
B_W_PREF_ES = Beta('B_W_PREF_ES',0,None, None, 0)
# Parameters for Alternative 2
B_TIME_B = Beta('B_TIME_B',0,None, None, 0)
B_COST_B = Beta('B_COST_B',0,None, None, 0)
B_F_B = Beta('B_F_B',0,None, None, 0)
B_AGE_YT_B = Beta('B_AGE_YT_B',0,None, None, 0)
B_W_PREF_B = Beta('B_W_PREF_B',0,None, None, 0)
# Parameters for Alternative 3
B_TIME_W = Beta('B_TIME_W',0,None, None, 0)
# Define random parameters for the constants, normally distributed, designed to be used
# for Monte-Carlo simulation, to address serial correlation.
# Alternative specific constant for Alternative 1
ASC_ES = Beta('ASC_ES', 0, None, None, 0)
ASC_ES_S = Beta('ASC_ES_S', 1, None, None, 0)
ASC_ES_RND = ASC_ES + ASC_ES_S * bioDraws('ASC_ES_RND', 'NORMAL_HALTON2')
# Alternative specific constant for Alternative 2
ASC_B = Beta('ASC_B', 0, None, None, 0)
ASC_B_S = Beta('ASC_B_S', 1, None, None, 0)
ASC_B_RND = ASC_B + ASC_B_S * bioDraws('ASC_B_RND', 'NORMAL_HALTON2')
# Alternative specific constant for Alternative 3 (set to 0)
ASC_W = Beta('ASC_W', 0, None, None, 1)
#Include Random Error Components (EC) to account for alternative specific variance
SIGMA_ES = Beta('SIGMA_ES',0, None, None, 0)
SIGMA_B = Beta('SIGMA_B',0, None, None, 0)
EC_ES = SIGMA_ES * bioDraws('EC_ES','NORMAL')
EC_B = SIGMA_B * bioDraws('EC_B','NORMAL')
# Definition of the utility functions
# Utility Function for Alternative 1
V1 = ASC_ES_RND + \
B_TIME_ES * S_ES_TT + \
B_TRANS * S_ES_TRANS + \
B_COST_ES * S_ES_C + \
B_F_ES * FEM + \
B_AGE_YT_ES * AGE_YT35 + \
B_W_PREF_ES * W_PREF + \
EC_ES
# Utility Function for Alternative 2
V2 = ASC_B_RND + \
B_TIME_B * S_B_TT + \
B_TRANS * S_B_TRANS + \
B_COST_B * S_B_C + \
B_F_B * FEM + \
B_AGE_YT_B * AGE_YT35 + \
B_W_PREF_B * W_PREF + \
EC_B
# Utility Function for Alternative 3
V3 = ASC_W + \
B_TIME_W * Walk_TT_S
# Associate utility functions with the numbering of alternatives
V = {1: V1, 2: V2, 3: V3}
# Associate the availability conditions with the alternatives
av = {1: 1, 2: 1, 3: 1}
# Conditional to the random parameters, the likelihood of one observation is
# given by the logit model (called the kernel)
obsprob = models.logit(V, av, CHOICE_S)
# Conditional to the random parameters, the likelihood of all observations for
# one individual (the trajectory) is the product of the likelihood of
# each observation.
condprobIndiv = PanelLikelihoodTrajectory(obsprob)
# We integrate over the random parameters using Monte-Carlo
logprob = log(MonteCarlo(condprobIndiv))
I am just wondering whether it makes sense, I was thinking of it a bit in terms of the car-loving attitude explained in one of the videos on latent classes, with the difference that in this case the information about the preference is derived from the answers to the experiment and not from other variables.
The results show high negative values for the two coefficients (as is to be expected for respondents that have a very strong preference for another alternative) and the rob. p-value is very low (0.000000e+00). In addition, the goodness of fit of the model is also very much improved compared to the model in which the W_PREF variable has been left out.
Am I missing some important reason why it is not advisable to do this?
Thank you very much for your help with this.
Kind regards,
Giulia