Re: Biogeme - Persistent Identification Issue

6 views
Skip to first unread message

Michel Bierlaire

unread,
May 29, 2026, 11:22:35 AM (5 days ago) May 29
to Herner, K.A. (Kaden, Student M-CEM), bio...@googlegroups.com, Michel Bierlaire
The issue appears to come from the utility specification. In your model, the two alternatives are defined using exactly the same explanatory variables and coefficients. As a consequence, all travel-time terms cancel out in the logit probabilities, and the corresponding coefficients cannot be identified. Only the difference between the alternative-specific constants remains identifiable.

To estimate coefficients for access time, waiting time, in-vehicle time, transfers, etc., these attributes must vary across alternatives. In other words, the data should contain alternative-specific variables (for example, accessTime_1, accessTime_2, rideTime_1, rideTime_2, and so on), and each utility should use the attributes corresponding to its own alternative.

I would also suggest removing the panel structure while debugging this simplified example, as it is not needed for a standard logit model and may introduce additional complications.

> On 28 May 2026, at 11:36, Herner, K.A. (Kaden, Student M-CEM) <k.a.h...@student.utwente.nl> wrote:
>
> Good day Biogeme group and Prof. Bierlaire,
>
> I am working on a model to obtain beta coefficients for different elements of travel time (access, wait, ride, etc.). However, try as I might I usually cannot get Biogeme to give me any output other than "Warning: identification issue", where the smallest and largest eigenvalues are both -0, 0 iterations are attempted, and the "Variables involved" always seems to list whatever the first element of my utility function is.
>
> In trying to solve some of the other errors that popped up within my full, more complex model, I have whittled my code and input dataset down to a very simplified version. I have attached my code and simplified dataset here. Below, I have also attached a screenshot of a subset of my complete input dataset and some comments about my project and the extra things that my full model aims to accomplish, but the long and short of it is that any solution that will work for my complete model should also work for this simple model first.
>
> Mainly, I would like to know: Am I missing something fundamental with my data format or code that would prevent Biogeme from working? Attempting to solicit help from the AI Biogeme assistant has proved a futile exercise. Of note, I also asked a fresh assistant to create a simple Biogeme code from scratch, including short example input dataframe, and even that produced the same error (though it did estimate some coefficients) on two different computers and versions 3.3.1, 3.3.2, and the version 3.3.3 linked in response to another question on this forum.
>
> My simplified input dataset is shown in the image below:
>
> My simplified code is as follows:
> import pandas as pd
> import biogeme.database as db
> import biogeme.biogeme as bio
> from biogeme.expressions import Beta, Variable
> import biogeme.models as models
>
> df = pd.read_csv("Tinytest.csv")
> database = db.Database('data', df)
> #database.panel("TravellerID") # Using just this line causes ID errors with other variables later
> database.panel = True
> database.individual = Variable('TravellerID')
>
> Choice = Variable('Choice')
> ChoiceNum = Variable('ChoiceNum')
> accessTime = Variable('accessTime')
> rideTimes = Variable('rideTimes')
> waitTimes = Variable('waitTimes')
> transferTime = Variable('transferTime')
> egressTime = Variable('egressTime')
>
> B_access = Beta('B_access', -0.081, None, None, 0)
> B_ride = Beta('B_ride', -0.0629, None, None, 0)
> B_wait = Beta('B_wait', -0.108, None, None, 0)
> B_transfer = Beta('B_transfer', -0.081, None, None, 0)
> B_egress = Beta('B_egress', -0.081, None, None, 0)
>
> ASC_One = Beta('ASC_One', 0, None, None, 1)
> ASC_Two = Beta('ASC_Two', 0, None, None, 0)
>
> V = {
>     1: ASC_One +
>        B_access * accessTime +
>        B_ride * rideTimes +
>        B_wait * waitTimes +
>        B_transfer * transferTime +
>        B_egress * egressTime,
>
>     2: ASC_Two +
>        B_access * accessTime +
>        B_ride * rideTimes +
>        B_wait * waitTimes +
>        B_transfer * transferTime +
>        B_egress * egressTime
> }
>
> logprob = models.loglogit(V, None, Choice) # Have tried using both 'Choice' and 'ChoiceNum' here
> # Have also tried cases where Choice is either Binary
> # or where Choice is the value of ChoiceNum for which the binary would be 1 - no difference in result
>
> biogeme = bio.BIOGEME(database, logprob)
> biogeme.model_name = 'TinyTest Model3'
>
> results = biogeme.estimate()
>
> # print(results.getEstimatedParameters())
> print(results.getBetaValues()) # These lines both throw attribute errors.
>
> ### End of Code ###
>
> Hopefully the above information should be enough to diagnose the problem. Below, I will also provide a snapshot of my full input dataset and an explanation of how it is set up, in case that helps.
>
> The full input dataset looks like this:
> There are about ~3000 unique origin-destination pairs. Each pair has multiple unique routes, as noted by 'ChoiceNum'. Each route consists of a feasible route involving bus travel, and an access and egress leg by one of the four combinations of walking and biking, as noted by 'AEID'. For each OD pair, I had an estimate of travel demand via public transport. That total demand was split over all routes, first based on access/egress mode combination, and then in cases where multiple routes with the same access/egress mode combination are available, using the proportions of tap in/out card data for the first leg of the route.
>
> Thus, for the example OD pair 339-1426, there was a demand of 179 travellers, 177 of whom chose route 1 (with mode 2 - walk-bike as A/E), and 2 who chose route 2 (with mode 4 - bike-bike as A/E). Then the number of rows for each traveller is equal to the number of options available to them. All routes are unique to their OD pair, and all travellers can pick any of the routes specific to their OD pair. I have removed cases where only one route existed, or where the average number of travellers for all routes in the pair is less than 50 (arbitrarily, for now). One potential issue is that each OD pair has a unique choice set, including cases where only two routes with the same access/egress combination is used, or a case where, say, five routes for each of the four A/E combinations are available. However, I think that I have removed that complexity from my simplified model, which only looks at one pair and removes AEID completely.
>
> I have tried the model with and without alternative-specific constraints, with no difference in results.
> In the original route output file, each route was listed once with an estimated time value for each component. When I duplicated the rows based on travel demand,  I generated the time values for each traveller from a lognormal distribution, using the time value from the 'parent' row as the median. I do not think that there can be any co-linearity between the time variables, and even when I run the model using just one of the variables, I get the same issue. I have also tried constraining each of the Beta values in turn to not be estimated by Biogeme, but this did not change anything either.
>
> I split access and egress time into two columns based on their AEID, and thought that maybe the 0 values for the unused mode could be the issue, but this complexity was also removed from my simplified model.
>
> Finally, in this version of the model, Choice is binary where 1 represents the chosen alternative. I also have an exact copy of the file where Choice is instead the ID from ChoiceNum that the given traveller chose. The AI assistant got stuck in a loop of suggesting the opposite format each time, but trying both produced the same error.
>
>
>
> I really appreciate any insight into what might be going wrong, whether I am unable to see a fundamental issue with my dataset, or if I am missing an error in my code. I am happy to answer any questions or to provide the code I am using for the full dataset if that would help.
>
> Thank you for your time and assistance,
> Kaden Herner


Michel Bierlaire
Transport and Mobility Laboratory
School of Architecture, Civil and Environmental Engineering
EPFL - Ecole Polytechnique Fédérale de Lausanne
http://transp-or.epfl.ch
http://people.epfl.ch/michel.bierlaire

Reply all
Reply to author
Forward
0 new messages