biogeme.exceptions.biogemeError

張鯤翼

unread,

Jun 13, 2022, 8:31:57 AM6/13/22

to Biogeme

Hello professor, I have just begun to use the logit model of biogeme by comparing my data with swissmetro's exercise. I first encountered two warnings：

[19:25:12] < Warning > The chosen alternative [destination] is not available for the following observations (rownumber[choice]): 47786[0.0]-47806[0.0]-47836[0.0]

[19:25:12] < Warning > The choice variable [destination] does not correspond to a valid alternative for the following observations (rownumber[choice]): 47786[0.0]-47806[0.0]-47836[0.0]

Then Python reported the error:

Traceback (most recent call last):
File "C:\Users\zhang\PycharmProjects\pythonProject\Biogeme.py", line 32, in <module> biogeme = bio.BIOGEME(database, logprob)
File "C:\Users\zhang\anaconda3\lib\site-packages\biogeme\biogeme.py", line 281, in __init__ self._audit()
File "C:\Users\zhang\anaconda3\lib\site-packages\biogeme\biogeme.py", line 387, in _audit raise excep.biogemeError('\n'.join(listOfErrors))
biogeme.exceptions.biogemeError: The choice variable [destination] does not correspond to a valid alternative for the following observations (rownumber[choice]): 47786[0.0]-47806[0.0]-47836[0.0]

My data is a little bit different from the swissmetro, and I associate utility functions with the numbering of alternatives by loops:

V = {}
for i in range(476):
a = i + 1
i = B_POP * pop + B_OFFICE * office + B_EMPLOYEE * employee + B_SCHOOL * schoolNum + B_GOVERNMENT * govNum + B_HOSPITAL * hospitalNum + B_LANDUSE * landuse + B_DISTANCE * distance
V[a] = i

I think there is no difference with exercise between other parts except that "av" is None. So I‘m not sure of the cause and solution of this error.

I would appreciate it if you could explain this problem to me. thank you！

Zhang Kunyi

Bierlaire Michel

unread,

Jun 13, 2022, 8:36:02 AM6/13/22

to zhangkunyi...@gmail.com, Bierlaire Michel, Biogeme

I think that the message provided by Biogeme is quite explicit: the chosen alternative is 0 for some observations, while the numbering of your alternatives starts at 0.

--
You received this message because you are subscribed to the Google Groups "Biogeme" group.
To unsubscribe from this group and stop receiving emails from it, send an email to biogeme+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/biogeme/c06f191e-869a-4aea-8d86-7bd0acb4f70en%40googlegroups.com.

張鯤翼

unread,

Jun 14, 2022, 1:05:20 PM6/14/22

to Biogeme

Thank you professor! I have solved that question by changing the number of alternatives. But the result of MNL model seems to be wrong like this:

Number of estimated parameters:8
Sample size:934
Excluded observations:0
Init log likelihood:1.962185
Final log likelihood:1.962185
Likelihood ratio test for the init. model:-0
Rho-square for the init. model:0
Rho-square-bar for the init. model:4.08
Akaike Information Criterion:12.07563
Bayesian Information Criterion:50.79144
Final gradient norm:1.3981E-15
Nbr of threads:12
Algorithm:Newton with trust region for simple bound constraints
Relative projected gradient:8.412697e-19
Number of iterations:0
Number of function evaluations:1
Number of gradient evaluations:1
Number of hessian evaluations:1
Cause of termination:Relative gradient = 8.4e-19 <= 6.1e-06

It also mentioned an error "The second derivatives matrix is close to singularity." I have already normalized the value of parameters to [0, 1], so I'm really confusing wha's wrong with the data. I would appreciate it if you could explain this problem to me. thank you！

Bierlaire Michel

unread,

Jun 14, 2022, 1:05:24 PM6/14/22

to Fani Hatziioannidu, Bierlaire Michel, Biogeme

If you have aggregate data, you need to multiply the contribution to the loglikelihood for each alternative by the number of times it is chosen.

logprob = nbr_alt_1 * bioLogLogit(V,av,1) + nbr_alt_2 * bioLogLogit(V,av,2) + … + nbr_alt_J * bioLogLogit(V,av,J)

> On 14 Jun 2022, at 13:40, Fani Hatziioannidu <fhatzii...@gmail.com> wrote:
>
> Dear Modellers, dear Professor,
>
> I kindly ask for your advice to adjust the mode choice model in Biogeme
> with choice frequencies from RP data (more accurately from GPS average day values per origin-destination OD pair) eg,
> car 5.2, public transport 3.1, bike 2.3 (3 columns per OD)
> instead of individual choices 1,2,3 (1 column)
>
> First I would like to ask if using biogeme for a case like this is suitable and if it has been successfully used in similar cases for mode choice forecasts.
> I have a large dataset with almost 1mio OD pairs and corresponding records.
> I am using python biogeme (unfortunately I couldn't learn how to code in Panda biogeme)
>
> my draft code is like this (but I understand that there is a mistake in the logprob = bioLogLogit(V,av,CHOICE as this expects individual choices 1,2,3 instead of frequencies)
>
> ASC1 = Beta('ASC1',0,-10,10,0)
> B_CAR_TT0 = Beta('B_CAR_TT0',0,-10,10,0)
> B_PUT_PJT = Beta('B_PUT_PJT',0,-10,10,0)
> B_BIKE_TT0 = Beta('B_BIKE_TT0',0,-10,10,0)
> B_CAR_DIS = Beta('B_CAR_DIS',0,-10,10,0)
> B_PUT_SFQ = Beta('B_PUT_SFQ',0,-10,10,0)
> B_BIKE_DIS = Beta('B_BIKE_DIS',0,-10,10,0)
>
> # Definition of the utility functions
> CAR_TRIPS = B_CAR_TT0 * CAR_TT0 + B_CAR_DIS * CAR_DIS
> PUT_TRIPS = B_PUT_PJT * PUT_PJT + B_PUT_SFQ * PUT_SFQ
> BIKE_TRIPS = ASC1 + B_BIKE_TT0 * BIKE_TT0 + B_BIKE_DIS * BIKE_DIS
>
> # Associate utility functions with the numbering of alternatives
> V = {1: CAR_TRIPS,
> 2: PUT_TRIPS,
> 3: BIKE_TRIPS}
>
> # Associate the availability conditions with the alternatives
> av = {1: CAR_TRIPS,
> 2: PUT_TRIPS,
> 3: BIKE_TRIPS}
>
> # The choice model is a logit, with availability conditions
> logprob = bioLogLogit(V,av,CHOICE)
>
> # Defines an itertor on the data
> rowIterator('obsIter')
>
> # Define the log likelihood function for the estimation
> BIOGEME_OBJECT.ESTIMATE = Sum(logprob,'obsIter')
>
> Looking forward to your response and thank you in advance/
>
> kind regards,
> Fani Hatziioannidu
>
>

Fani Hatziioannidu

unread,

Jun 14, 2022, 1:08:14 PM6/14/22

to Biogeme, Michel Bierlaire

Dear Modellers, dear Professor,

I kindly ask for your advice to adjust the mode choice model in Biogeme

with choice frequencies from RP data (more accurately from GPS average day values per origin-destination OD pair) eg,

car 5.2, public transport 3.1, bike 2.3 (3 columns per OD)

instead of individual choices 1,2,3 (1 column)

First I would like to ask if using biogeme for a case like this is suitable and if it has been successfully used in similar cases for mode choice forecasts.

I have a large dataset with almost 1mio OD pairs and corresponding records.

I am using python biogeme (unfortunately I couldn't learn how to code in Panda biogeme)

my draft code is like this (but I understand that there is a mistake in the logprob = bioLogLogit(V,av,CHOICE as this expects individual choices 1,2,3 instead of frequencies)

ASC1 = Beta('ASC1',0,-10,10,0)
B_CAR_TT0 = Beta('B_CAR_TT0',0,-10,10,0)
B_PUT_PJT = Beta('B_PUT_PJT',0,-10,10,0)
B_BIKE_TT0 = Beta('B_BIKE_TT0',0,-10,10,0)
B_CAR_DIS = Beta('B_CAR_DIS',0,-10,10,0)
B_PUT_SFQ = Beta('B_PUT_SFQ',0,-10,10,0)
B_BIKE_DIS = Beta('B_BIKE_DIS',0,-10,10,0)

# Definition of the utility functions
CAR_TRIPS = B_CAR_TT0 * CAR_TT0 + B_CAR_DIS * CAR_DIS
PUT_TRIPS = B_PUT_PJT * PUT_PJT + B_PUT_SFQ * PUT_SFQ
BIKE_TRIPS = ASC1 + B_BIKE_TT0 * BIKE_TT0 + B_BIKE_DIS * BIKE_DIS

# Associate utility functions with the numbering of alternatives

Fani Hatziioannidu

unread,

Jun 16, 2022, 2:17:07 AM6/16/22

to Bierlaire Michel, Biogeme

Dear Professor,

thank you for your response and for your valuable comment.

I have revised the code to the following (in blue).

I still get the error message

Failed to load the py model file

(this run on PythonBiogeme cannot be completed)

Warning: [23:33:17]bioMain.cc:108 pythonbiogeme mode choice_v5.py biogeme 1 modal split_records.csv
[23:33:17]bioParameters.cc:399 Parameter documentation generated: pythonparam.html
[23:33:17]bioModelParser.cc:109 mode choice_v5.py exists
Warning: [23:33:22]bioModelParser.cc:142 Error: Failed to load mode choice_v5
Warning: [23:33:22]bioMain.cc:169 Error: Failed to load mode choice_v5

What is wrong with the code?

Also, what should be the headers of the frequencies (eg. nbr_alt_1) in the data file for each one of the alternatives?

Would numbers with two decimals be accepted in the frequency columns ?

ASC1 = Beta('ASC1',0,-10,10,0)

B_CARTT0 = Beta('B_CARTT0',0,-10,10,0)
B_PUTPJT = Beta('B_PUTPJT',0,-10,10,0)
B_BIKETT0 = Beta('B_BIKETT0',0,-10,10,0)
B_CARDIS = Beta('B_CARDIS',0,-10,10,0)
B_PUTSFQ = Beta('B_PUTSFQ',0,-10,10,0)
B_BIKEDIS = Beta('B_BIKEDIS',0,-10,10,0)

# Definition of the utility functions

V1 = B_CARTT0 * CARTT0 + B_CARDIS * CARDIS
V2 = B_PUTPJT * PUTPJT + B_PUTSFQ * PUTSFQ
V3 = ASC1 + B_BIKETT0 * BIKETT0 + B_BIKEDIS * BIKEDIS

# Associate utility functions with the numbering of alternatives

V = {1: V1,
2: V2,
3: V3}

# Associate the availability conditions with the alternatives

av = {1: 1,
2: 1,
3: 1}

# The choice model is a logit, with availability conditions

logprob = nbr_alt_1 * bioLogLogit(V,av,1) + nbr_alt_2 * bioLogLogit(V,av,2) + nbr_alt_3 * bioLogLogit(V,av,3)

# Defines an itertor on the data
rowIterator('obsIter')

# Define the log likelihood function for the estimation
BIOGEME_OBJECT.ESTIMATE = Sum(logprob,'obsIter')

kind regards,

Fani Hatziioannidu

unread,

Jun 23, 2022, 1:36:06 AM6/23/22

to Biogeme

Dear Biogeme group,

I kindly ask for your support with Python Biogeme as my model is not running and I cannot fix this without your help.

Would aggregated answers in the same row for 3 alternatives car, public transport and bike would work with a code

like the draft one in blue ?

What should be the headers of the frequencies (eg. nbr_alt_1) in the data file for each one of the alternatives?

Would numbers with two decimals be accepted in the frequency columns?

Please drop me any tip you can and thank you in advance!

ASC1 = Beta('ASC1',0,-10,10,0)
B_CARTT0 = Beta('B_CARTT0',0,-10,10,0)
B_PUTPJT = Beta('B_PUTPJT',0,-10,10,0)
B_BIKETT0 = Beta('B_BIKETT0',0,-10,10,0)
B_CARDIS = Beta('B_CARDIS',0,-10,10,0)
B_PUTSFQ = Beta('B_PUTSFQ',0,-10,10,0)
B_BIKEDIS = Beta('B_BIKEDIS',0,-10,10,0)

# Definition of the utility functions
V1 = B_CARTT0 * CARTT0 + B_CARDIS * CARDIS
V2 = B_PUTPJT * PUTPJT + B_PUTSFQ * PUTSFQ
V3 = ASC1 + B_BIKETT0 * BIKETT0 + B_BIKEDIS * BIKEDIS

# Associate utility functions with the numbering of alternatives
V = {1: V1, 2: V2, 3: V3}

# Associate the availability conditions with the alternatives
av = {1: 1, 2: 1, 3: 1}

# The choice model is a logit, with availability conditions
logprob = nbr_alt_1 * bioLogLogit(V,av,1) + nbr_alt_2 * bioLogLogit(V,av,2) + nbr_alt_3 * bioLogLogit(V,av,3)

# Defines an itertor on the data
rowIterator('obsIter')

# Define the log likelihood function for the estimation
BIOGEME_OBJECT.ESTIMATE = Sum(logprob,'obsIter')

kind regards,

Fani Hatziioannidu

Bierlaire Michel

unread,

Jun 23, 2022, 1:40:43 AM6/23/22

to Fani Hatziioannidu, Bierlaire Michel, Biogeme

On 22 Jun 2022, at 00:09, Fani Hatziioannidu <fhatzii...@gmail.com> wrote:

Dear Biogeme group,

I kindly ask for your support with Python Biogeme

You should consider using the latest version of Biogeme, that can be run in the Python environment.

as my model is not running and I cannot fix this without your help.

Would aggregated answers in the same row for 3 alternatives car, public transport and bike would work with a code

like the draft one in blue ?

In principle, yes. And the code looks correct. What is the error message?

What should be the headers of the frequencies (eg. nbr_alt_1) in the data file for each one of the alternatives?

nbr_alt_1, nbr_alt_2 and nbr_alt_3 must be the column headers in the data file.

Would numbers with two decimals be accepted in the frequency columns?

Yes.

--
You received this message because you are subscribed to the Google Groups "Biogeme" group.
To unsubscribe from this group and stop receiving emails from it, send an email to biogeme+u...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/biogeme/CANdP4CVryZ4HZPyde9smdt7BS4OXc1CEwsnera_fMmCvt4u4Ug%40mail.gmail.com.

Fani Hatziioannidu

unread,

Jun 25, 2022, 10:34:05 AM6/25/22

to Bierlaire Michel, Biogeme

Dear Professor, dear Modellers,

After a lot of testing I conclude that Biogeme is not reading my csv data file at all.

I have tried comma delimited and semicolon delimited options.

None of it works.

The model runs only with space delimited dat file (first prn then rename as dat),

but space prn file causes problems with records that have many digits.

I am using biogeme 2.6a version.

What should be the csv (or dat, txt) format so that it is readable? (commas, semicolons, space, tab ?)

What is the latest python biogeme version if not 2.6, and where can I find an exe file for it ? (pip installation has not worked for me)

Thank you again for your valuable help.

kind regards,

Fani Hatziioannidu

Bierlaire Michel

unread,

Jun 25, 2022, 10:35:41 AM6/25/22

to Fani Hatziioannidu, Bierlaire Michel, Biogeme

On 24 Jun 2022, at 18:19, Fani Hatziioannidu <fhatzii...@gmail.com> wrote:

Dear Professor, dear Modellers,

After a lot of testing I conclude that Biogeme is not reading my csv data file at all.

I have tried comma delimited and semicolon delimited options.

None of it works.

The model runs only with space delimited dat file (first prn then rename as dat),

but space prn file causes problems with records that have many digits.

I am using biogeme 2.6a version.

Use version 3. It relies on Pandas, that can read a lot of different formats, including Excel.