Dear Sir,
I have used Biogeme and RStudio to run the MNL model to predict choices between ML and GPL. I ran the model with the training set and then used the coefficient results to predict choice (Y) in the test set to calculate the accuracy percentage and confusion matrix.
As a result, both Biogeme and RStudio had the same coefficient values but had very different accuracy percentages and confusion matrices. I wonder if my Biogeme code using the coefficients to calculate predicted Y is correct. I have attached my code here:
utilities = {1: utility_ML, 2: utility_GPL}
log_choice_probability = loglogit(utilities, None, choice)
biogeme_train = BIOGEME(database_train, log_choice_probability)
biogeme_train.modelName = 'RUMMNL'
results = biogeme_train.estimate()
outcome = results.getHtml(onlyRobust=False)
match = re.search(r'<strong>Optimization time</strong>: </td> <td>(.*?)</td>', outcome)
#Confusion matrix
prob_1 = logit(utilities, None, 1)
prob_2 = logit(utilities, None, 2)
simulate ={'Prob. 1': prob_1 ,
'Prob. 2': prob_2 ,}
biogeme_test = BIOGEME(database_test, simulate)
biogeme_test.modelName = "RUMMNL_test"
betaValues = results.getBetaValues()
simulatedValues = biogeme_test.simulate(betaValues)
prob_max = simulatedValues.idxmax(axis=1)
prob_max = prob_max.replace({'Prob. 1': 1, 'Prob. 2': 2})
compared_data = {'y_Actual': df_test['choice12'],
'y_Predicted': prob_max}
df_y = pd.DataFrame(compared_data, columns=['y_Actual','y_Predicted'])
confusion_matrix = pd.crosstab(df_y['y_Actual'], df_y['y_Predicted'], rownames=['Actual'], colnames=['Predicted'])
accuracy = np.trace(confusion_matrix) / confusion_matrix.sum().sum() * 100
Best regards,
Natchaphon Leungbootnak