Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

Extracting statistics on assigned probability to alternatives in a choice set

55 views
Skip to first unread message

Rohan Menezes

unread,
Aug 21, 2024, 3:07:59 AM8/21/24
to Biogeme
Dear Michel,

I am trying to find a way to compute the following statistics regarding the performance of my estimated Destination choice model:

% Correct Predictions:
It is the ratio of  the ratio of correct predictions
to the total number of observations, expressed as a percentage. 

Here  the alternative to which the model assigns the highest probability among all other alternatives in the choice set is the predicted choice 


How do I get the assigned probabilities to the alternatives after estimation?
Screenshot 2024-08-20 104445.png

Michel Bierlaire

unread,
Aug 21, 2024, 3:18:22 AM8/21/24
to rohanme...@gmail.com, Michel Bierlaire, Biogeme


> On 20 Aug 2024, at 10:49, Rohan Menezes <rohanme...@gmail.com> wrote:
>
> Dear Michel,
>
> I am trying to find a way to compute the following statistics regarding the performance of my estimated Destination choice model:
>
> % Correct Predictions:
> It is the ratio of the ratio of correct predictions
> to the total number of observations, expressed as a percentage.


There is no such thing as "correct prediction". A choice model is probabilistic by nature. So all predictions are correct.

If you want to perform out of sample validation, you'll find an example here:
https://biogeme.epfl.ch/sphinx/auto_examples/swissmetro/plot_b04validation.html#sphx-glr-auto-examples-swissmetro-plot-b04validation-py



>
> Here the alternative to which the model assigns the highest probability among all other alternatives in the choice set is the predicted choice

This is completely wrong. Don't do that!

Anyway, the general answer is: use simulation.
Here is an example:
https://biogeme.epfl.ch/sphinx/auto_examples/swissmetro/plot_b01logit_simul.html#sphx-glr-auto-examples-swissmetro-plot-b01logit-simul-py

This will calculate the choice probability for each alternative for each individual. Then, you can derive any statistic you like.
>
>
> How do I get the assigned probabilities to the alternatives after estimation?
>
> --
> You received this message because you are subscribed to the Google Groups "Biogeme" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to biogeme+u...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/biogeme/693298e6-84fd-4f81-8215-bc212669b29an%40googlegroups.com.
> <Screenshot 2024-08-20 104445.png>

Michel Bierlaire
Transport and Mobility Laboratory
School of Architecture, Civil and Environmental Engineering
EPFL - Ecole Polytechnique Fédérale de Lausanne
http://transp-or.epfl.ch
http://people.epfl.ch/michel.bierlaire

Rohan Menezes

unread,
Aug 22, 2024, 4:26:35 AM8/22/24
to Biogeme
Dear Michel,

Thank you for your response.

> Here the alternative to which the model assigns the highest probability among all other alternatives in the choice set is the predicted choice
This is completely wrong. Don't do that!



Could you let me know the reasoning behind it? I think it relates to this statement of yours ''There is no such thing as "correct prediction". A choice model is probabilistic by nature. So all predictions are correct

Acccording to this paper The overreliance on statistical goodness-of-fit and under-reliance on model validation in discrete choice models: A review of validation practices in the transportation academic literature - ScienceDirect, In provides a heuristic for external validation of choice models (Fig 2, pg 16, see attached image as well).
One of the performance measures it recommends to report is, 'Percentage of Correct predictions' as per the equation I provided previously. 


As per this, I did the following for now,
prob_1 = models.logit(V, av, 1)
prob_2 = models.logit(V, av, 2)
prob_3 = models.logit(V, av, 3)
prob_4 = models.logit(V, av, 4)
prob_5 = models.logit(V, av, 5)
prob_6 = models.logit(V, av, 6)
prob_7 = models.logit(V, av, 7)
prob_8 = models.logit(V, av, 8)
prob_9 = models.logit(V, av, 9)
prob_10 = models.logit(V, av, 10)

simulate_formulas = {
    'Utility 1': V1,
    'Utility 2': V2,
    'Utility 3': V3,
    'Utility 4': V4,
    'Utility 5': V5,
    'Utility 6': V6,
    'Utility 7': V7,
    'Utility 8': V8,
    'Utility 9': V9,
    'Utility 10': V10,
    'Prob. 1': prob_1,
    'Prob. 2': prob_2,
    'Prob. 3': prob_3,
    'Prob. 4': prob_4,
    'Prob. 5': prob_5,
    'Prob. 6': prob_6,
    'Prob. 7': prob_7,
    'Prob. 8': prob_8,
    'Prob. 9': prob_9,
    'Prob. 10': prob_10,
}

# Create a Biogeme object for simulation
biogeme_simulation = bio.BIOGEME(work_2018_database_combinedSCAE, simulate_formulas)

# Simulate using the estimated beta values
simulated_results = biogeme_simulation.simulate(results.getBetaValues())

# Display or further process the simulated results
print(simulated_results)

please see the attached image for the the output. 


and then I determined which is the alternative assigned the highest probability.
# Extract only the probability columns
prob_columns = ['Prob. 1', 'Prob. 2', 'Prob. 3', 'Prob. 4', 'Prob. 5',
                'Prob. 6', 'Prob. 7', 'Prob. 8', 'Prob. 9', 'Prob. 10']

# Find the maximum value and the corresponding column (alternative) for each row
simulated_results['Max Probability'] = simulated_results[prob_columns].max(axis=1)
simulated_results['Predicted Chosen Alternative'] = simulated_results[prob_columns].idxmax(axis=1)

# Display the results
print(simulated_results[['Max Probability', 'Predicted Chosen Alternative']])
Screenshot 2024-08-22 092946.png
Screenshot 2024-08-22 092700.png

Michel Bierlaire

unread,
Aug 22, 2024, 4:41:29 AM8/22/24
to rohanme...@gmail.com, Michel Bierlaire, Biogeme


> On 22 Aug 2024, at 09:32, Rohan Menezes <rohanme...@gmail.com> wrote:
>
> Dear Michel,
>
> Thank you for your response.
>
> > Here the alternative to which the model assigns the highest probability among all other alternatives in the choice set is the predicted choice
> This is completely wrong. Don't do that!
>
>
> Could you let me know the reasoning behind it? I think it relates to this statement of yours ''There is no such thing as "correct prediction". A choice model is probabilistic by nature. So all predictions are correct"
>
> Acccording to this paper The overreliance on statistical goodness-of-fit and under-reliance on model validation in discrete choice models: A review of validation practices in the transportation academic literature - ScienceDirect, In provides a heuristic for external validation of choice models (Fig 2, pg 16, see attached image as well).


They explain in Section 5 why this "has been criticized", referring to the paper by de Luca and Cantarella. Actually, if you go back to the textbook by Ben-Akiva and Lerman, they also explain why this is not appropriate on page 92.

In simple words, such as method would not make a difference between a model that predicts
0.9 0.05 0.05 and a model that predicts 0.36 0.32 0.32.

Such a method is valid only when you apply the model exactly once, such as in classification, where you have to decide if an image is a dog or a cat. In that case, it makes sense to use the highest probability. This is why it is used on machine learning. But it does not apply to choice models in general. Instead, you should calculate the likelihood on the validation sample, which is the probability that your model correctly predicts the observed choice.
> To view this discussion on the web visit https://groups.google.com/d/msgid/biogeme/3ab9424a-5f5c-4fbd-88e9-81bf950d5f1fn%40googlegroups.com.
> <Screenshot 2024-08-22 092946.png><Screenshot 2024-08-22 092700.png>

Rohan Menezes

unread,
Aug 22, 2024, 8:43:03 AM8/22/24
to Biogeme
Dear Michel,

Ahh yes, you are correct. Indeed that's the limitation of this metric. This metric needs to be reported along with Clearness of predictions too. I will explore more on this. 

Thank you for highlighting this issue 

Reply all
Reply to author
Forward
0 new messages