MNL and Nested logit model estimation

Mayank P Jain

unread,

Sep 1, 2010, 10:33:48 AM9/1/10

to pystatsmodels

I used statsmodels to estimate MNL model it was easy, kudos to the developers...........
I felt that it would be even easier to understand the commands if one of the examples has data coded in the main example code.

I did notice that there is no apparant way of estimating a nested logit model ........ Please advice if I am mistaken...

Regards
Mayank P Jain

V R TechNiche
Transportation Modeler/Planner
Phone: +91 901 356 0583

On Tue, Aug 31, 2010 at 6:32 PM, Mayank P Jain <maya...@gmail.com> wrote:

wow...... that was quick...........
thanks skipper for help .......... will surely post comment or questions here.....

Regards
Mayank

On Tue, Aug 31, 2010 at 6:16 PM, Skipper Seabold <jsse...@gmail.com> wrote:

On Tue, Aug 31, 2010 at 8:35 AM, Mayank <maya...@gmail.com> wrote:
> Hello,
> Today I installed statsmodels for the purpose of estimating a logit
> model, I first wanted to get familiar with it so I am trying to
> estimate a very simple linear regression model : y = 2*x + 1
>
> I ran the following piece of code:
> ---------------------------------------------------------------------------------------
> import numpy as np
> import scikits.statsmodels as sm
>
> # get data
> y = np.array([1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23]) # cars
> X = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]) # nsdp
>
> # run the regression
> results = sm.OLS(y, X).fit()
>
> # look at the results
> print results.summary()
> ---------------------------------------------------------------------------------------
> I am trying to locate the constant value (which I expect as 1) and I
> do not understand why the coefficient is not 2 ....
> I would really appreciate any help on this ......

At this point, we don't add a constant by default. So you need to do

X = sm.add_constant(X, prepend=True)

Please let us know any comments on using the Logit model.

Skipper

josef...@gmail.com

unread,

Sep 1, 2010, 12:36:49 PM9/1/10

to pystat...@googlegroups.com

On Wed, Sep 1, 2010 at 10:33 AM, Mayank P Jain <maya...@gmail.com> wrote:
> I used statsmodels to estimate MNL model it was easy, kudos to the
> developers...........

It's always good to hear some feedback, so we know which parts of
statsmodels are actually used.
For MNL kudos go to Skipper.
I'm also glad to hear that statsmodels is easy to use.

> I felt that it would be even easier to understand the commands if one of the
> examples has data coded in the main example code.

With main example code, do you mean the class docstring or the
examples in the example directory?

>
> I did notice that there is no apparant way of estimating a nested logit
> model ........ Please advice if I am mistaken...

Interesting problem. No, I don't see a way to use statsmodels for
nested logit without quite a bit of work.

some thoughts and notes:

Since I never worked with nested logit, I had to look it up in Greene.
He mentions two ways of estimating, two-step limited information
maximum likelihood estimation or (one-step) full information MLE. I'm
not sure how general the model is that Greene uses or whether there
might be a simpler version also.

For the two-step approach we would need Conditional Logit, which
statsmodels doesn't have yet. But it might not be too difficult to
write in analogy to MNL. If conditional logit is available, the two
step procedure could be use by Conditional Logit.
Greene doesn't mention how the final result statistics would be
calculated in the two-step procedure.

Full Information MLE looks more straightforward to program using the
generic maximum likelihood approach, but I would expect that there are
some numerical convergence and precision problems for a larger number
of parameters without analytical gradient and good starting values.

To make nested model fit into the Model pattern of statsmodels, we
would need a way to specify the partition of choices, and possibly of
the exogenous variables, if they don't enter in the same way in all
branches.

(Before statsmodels, I wrote a model that combines MNL and conditional
logit, but also without nesting, and I never managed to get the
analytical gradient and hessian. The main work is keeping track which
parameters and which variables vary with individuals, choices, and
branches in the nested case.)

Unfortunately, I'm too busy to work on this (or even to cleanup and
review much of what we already have).

We would like to get the "standard" statistical models covered in
statsmodels, but the priorities depend to some extend on user feed
back and contributions.
Maybe we should start a wish list and use blueprints again. The actual
progress and roadmap is a bit of a random walk.

If you have the specification of the nested logit or references for it
for the case you would like to use, it would be useful to get a
blueprint and design started.

two oldish random references (by google) for Stata :
http://www.stata.com/support/faqs/stat/nested.html # parameter
restrictions
http://www.stata-journal.com/article.html?article=st0017 #
discusses several versions of multinomial logit/choice models.

Thanks,

Josef

Mayank P Jain

unread,

Sep 2, 2010, 2:36:05 AM9/2/10

to pystatsmodels

Hi Josef,
that was quite some information, the reason I asked about nested logit models is because I am a transportation modeler and would want to estimate mode choice model that would calculate the choice of mode (car, transit, walk/bike) based on various attributes like travel time by each mode, cost etc.

structure of a typical nested choice model may look like this:

                                                                                  Mode
            _________________________________________|_________________________________________
           |                                                                        |                                                                       |
        Car                                                                  Transit                                                             Non-Motorized
    ____|_______                                  ___________|___________                                 _____________|_________
   |                   |                                        |                                      |                                 |                                      |
Alone            Shared                             Bus                                  Metro                        Bike                                 Walk

Sample of utility of any mode :
Utility of a mode = a + b (Travel Time by mode) + c (Travel Cost by mode) + d ( if male) + e (distance traveled)

etc.......

if you guys could suggest a way to do it, then I would try to get there and update on progress................ since i have just learned about the tool I may not be very good at it :)

By data within the example I meant the code that I have attached with this post.

Regards
Mayank P Jain

V R TechNiche
Transportation Modeler/Planner
Phone: +91 901 356 0583

DiscreteChoiceModels.py

josef...@gmail.com

unread,

Sep 2, 2010, 11:52:29 PM9/2/10

to pystat...@googlegroups.com

On Thu, Sep 2, 2010 at 2:36 AM, Mayank P Jain <maya...@gmail.com> wrote:
> Hi Josef,
> that was quite some information, the reason I asked about nested logit
> models is because I am a transportation modeler and would want to estimate
> mode choice model that would calculate the choice of mode (car, transit,
> walk/bike) based on various attributes like travel time by each mode, cost
> etc.
>
> structure of a typical nested choice model may look like this:
>
>
> Mode
>
> _________________________________________|_________________________________________
>
> |
> |                                                                       |
>         Car
> Transit
> Non-Motorized
>     ____|_______
> ___________|___________
> _____________|_________
>    |                   |
> |                                      |
> |                                      |
> Alone            Shared
> Bus                                  Metro
> Bike                                 Walk
>
>
> Sample of utility of any mode :
> Utility of a mode = a + b (Travel Time by mode) + c (Travel Cost by mode) + d ( if male) + e (distance traveled)

(gmail line breaks destroys the graph)

It looks like you have both mode-specific variables (travel cost) and
individual-specific variables (gender). I'm not sure whether it is
easy to incorporate both types of explanatory variables. I never fully
figured out all the (identification) restrictions for the different
version of multinomial and conditional logit models, and whether the
types of variables could be incorporated as dummy variables.

I did a bit of a search on this, and it's easy to get lost in the
different variations of the models. For example Stata recently
improved it's coverage with Stata 10
http://www.stata.com/stata10/choice.html

http://www.indiana.edu/~statmath/stat/all/cdvm/cdvm6.html
"In the multinomial logit model, the independent variables contain
characteristics of individuals, while they are the attributes of the
choices in the conditional logit model. In other words, the
conditional logit estimates how alternative-specific, not
individual-specific, variables affect the likelihood of observing a
given outcome"

example from Greene, also in his book
http://www.indiana.edu/~statmath/stat/all/cdvm/cdvm7.html
"Imagine a choice of the travel modes among air flight, train, bus,
and car. The data set and model here are adopted from Greene (2003).
The model examines how the generalized cost measure (cost), terminal
waiting time (time), and household income (income) affect the choice.

These independent variables are not characteristics of subjects
(individuals), but attributes of the alternatives
"

another website:
http://data.princeton.edu/wws509/notes/c6s4.html

which also shows how to do "sequential logit" just stages of standard logit
http://data.princeton.edu/wws509/stata/c6s4.html

Another debate that I found in the literature was whether the model
should be fully consistent with an optimal choice theory, which I
think triggered the increase of model versions in Stata.

Before getting lost in different versions, I would just start with the
simpler versions. Stata doesn't have much detailed descriptions that
are freely available, but SAS has a very good accessible manual and
formula collection
http://support.sas.com/documentation/cdl/en/etsug/63348/HTML/default/viewer.htm#/documentation/cdl/en/etsug/63348/HTML/default/etsug_mdc_sect029.htm

* maybe the sequential logit (or MNL) as in
http://data.princeton.edu/wws509/stata/c6s4.html gives some useful
results, it might be cheap since we already have logit and MNL
* conditional logit one stage: only this would also be "relatively
easy" to implement, mostly in analogy to current MNL
* nested logit: 2stage conditional logit for estimation or just used
to build the likelihood function for the branches

The main work would be to build the likelihood function for the
branches and leaves in the nested version and handle the data
partitions. SAS and Greene might be useful for this.
As an alternative, if you have some references in the transportation
mode choice literature that describe the likelihood function for the
nested logit, then you could just implement it directly.

Once the likelihood function is available, the scipy optimizers should
be able to find the MLE. In the generic MLE framework we also have
numerical Hessian and some of the result statistics available.

Once the generic MLE works (or if it has problems), we can look at
refinements like numerical jacobians and hessians, better starting
estimates and work with different optimizers. And then, we would also
need to settle on the user interface for how to specify the
nesting/hierarchy tree.

At the moment, I would be mostly interested in how it fits into the
generic MLE framework, since that is one basis for quick
implementation of estimators. I tried out several examples, but
haven't finished yet with all the pieces and options.
In the longer term, I would like to extend the discrete choice models
also to the panel case, especially because I still have some old GAUSS
code of mine lying around waiting to be reimplemented in python.
I will try to read up a bit during the weekend.

>
> etc.......
>
> if you guys could suggest a way to do it, then I would try to get there and
> update on progress................ since i have just learned about the tool
> I may not be very good at it :)
>
> By data within the example I meant the code that I have attached with this
> post.

Ok, I see.

We can add it in a few places, or add a section or examples on how to
define and load data. We might not want to do this too much since it
requires multiple storage of the data and the main point of the
datasets in statsmodels is to have them easily available for examples
and testing.

But, similar to the work that Vincent did, it is important to improve
the documentation, examples and tutorials at the entry level.

Maynak, if you are working on this kinds of models, I would be very
interested and very willing to help out if things are not clear. And I
hope the generic framework in statsmodels is going to be good enough
to make this "relatively" easy. It would also make a step towards
models that should be useful in marketing research.

Cheers,

Mayank P Jain

unread,

Sep 3, 2010, 7:56:03 AM9/3/10

to pystatsmodels

On Fri, Sep 3, 2010 at 9:22 AM, <josef...@gmail.com> wrote:

(gmail line breaks destroys the graph)

I have attached image that shows the model structure.

It looks like you have both mode-specific variables (travel cost) and
individual-specific variables (gender). I'm not sure whether it is
easy to incorporate both types of explanatory variables. I never fully
figured out all the (identification) restrictions for the different
version of multinomial and conditional logit models, and whether the
types of variables could be incorporated as dummy variables.

I checked with others in my group, there is not problem in including the mode specific constants and individual specific constants in the model at the same time.

I did a bit of a search on this, and it's easy to get lost in the
different variations of the models. For example Stata recently
improved it's coverage with Stata 10
http://www.stata.com/stata10/choice.html

The nested logit model is what I would want to use for transportation modes.

http://www.indiana.edu/~statmath/stat/all/cdvm/cdvm6.html
"In the multinomial logit model, the independent variables contain
characteristics of individuals, while they are the attributes of the
choices in the conditional logit model. In other words, the
conditional logit estimates how alternative-specific, not
individual-specific, variables affect the likelihood of observing a
given outcome"

example from Greene, also in his book
http://www.indiana.edu/~statmath/stat/all/cdvm/cdvm7.html
"Imagine a choice of the travel modes among air flight, train, bus,
and car. The data set and model here are adopted from Greene (2003).
The model examines how the generalized cost measure (cost), terminal
waiting time (time), and household income (income) affect the choice.

These independent variables are not characteristics of subjects
(individuals), but attributes of the alternatives
"

please note that the income field is a individual specific characteristic , as far as transportation is concerned, i have never read about any issues if individual specific constants are introduced in the utility calculation for any mode.

ok, josef, I did understand part of what you suggested, here is a sample of log likelihood function and models in transportation (please check page 64 of pdf and page 56 as printed):

http://www.ce.utexas.edu/prof/bhat/COURSES/LM_Draft_060131Final-060630.pdf

I will look at the code in the statsmodels.............

I have a question, I would want to estimate the utility coefficients of various choices in my MNL model:
three choices: car, but, metro
Utility of car = a + b*(Travel time by car) + c*(Income of individual)
Utility of bus = d +b*(Travel time by bus) + e*(Income of individual)
Utility of metro = 0 + b*(Travel time by metro) + f*(Income of individual)

NOTE: I have kept the travel time coeff constant across modes..........

how can i estimate such a MNL model without nest using statsmodels.......?

Regards
Mayank

josef...@gmail.com

unread,

Sep 3, 2010, 9:10:33 AM9/3/10

to pystat...@googlegroups.com

The current MNL in statsmodels can handle on the case with variables
like "Income of individual", value of variable stays constant across
modes/choices but the coefficient varies.
http://statsmodels.sourceforge.net/generated/scikits.statsmodels.discretemod.MNLogit.loglike.html#scikits.statsmodels.discretemod.MNLogit.loglike

If you add travel time, then I think you will get different
coefficients, instead of a single `b`. This could be handled by
restrictions of the coefficients, if the current MNLogit is extended
to handle restrictions.

The better approach is to add a ConditionalLogit model, which would
specify common coefficients across modes/choices. In your reference,
as you pointed out, equations 4.63 and following would be the relevant
likelihood. It also has the formulas for score and hessian so it
should be relatively straight forward to implement. However, in my
quick look I haven't seen a discussion of the normalization yet.
Usually one mode/choice is picked as reference case.
(In the nested version, I have seen additional normalization
requirements as we go to a higher tree/hierarchy level)

The standard description of the conditional logit model doesn't say
much about how to incorporate individual specific components that
don't vary by mode. I think they drop out because they are constant
across modes, but I also "guess" that they can be incorporated using
dummy variables. (I never tried the latter).
(Addition: After browsing your pdf, equation 4.12 p.33, and they might
have more information how to implement individual specific variables.)

Your reference book/notes looks good. A lot more praxis oriented than
what I have read so far.
As a related aside: If there are repeated observations and unobserved
heterogeneity across individuals, then some simple versions of
equation 12.1 p.218 is relatively easy to put on top of a Conditional
Logit. I did this in the past.

One more implementation thought: expressing the multinomial logit
structure in terms of utilities V as in equation 4.11, might make it
possible to share code between MNLogit and a ConditionalLogit, I think
most of it is already in the current structure as Xb which could be
redefined.

These are some relatively quick comments before I have time to look at
some more of the details.

Josef

josef...@gmail.com

unread,

Sep 6, 2010, 9:43:07 PM9/6/10

to pystat...@googlegroups.com

I was working on it a bit over labor day weekend, and I might be able
to get a minimal working version next Friday.

Attached are 3 files, a version of clogit, that produces the same
estimates as an example in Greene, but no extra statistics yet. The
nested logit class is incomplete until I manage to finish the missing
tree structure. The second file contains the data from Greene, and the
third file a recursive function to try out a possible tree walker to
calculate the likelihood from branches. The main file still contains
several versions of the data structure in the script/example part and
is not yet cleaned.

It's just a status update, mainly to figure out what data structure
would be useful. In this version, exog is a list of explanatory
variables, one array per choice. Currently it's standalone,
independent of statsmodels, but once the basic structure works, we can
use the optimizers and result statistics through statsmodels.
I'm not sure the current files are understandable, but I should be
able to produce a cleaner basic version also of the nested logit by
next weekend.

One thing I'd like to add soon, are test examples for clogit and for
different nesting structures, since it's much easier to figure out if
something is correct if there are test cases. For example, degenerate
(single choice) branches and an arbitrary number of branch levels will
require some testing.

After reading around a bit, I think, the initial implementation should
just be RU2, which is the only parameterization that is fully
consistent with random utility theory. (Silberhorn, Boztug and
Hildebrandt).

runmnl.py

TableF23-2.txt

try_treewalker.py

Mayank P Jain

unread,

Sep 6, 2010, 11:23:58 PM9/6/10

to pystatsmodels

wow, thanks Josef for dedicated effort in doing this !!
Thanks a ton for your efforts and time!

I am sure you would include the nesting coefficients that define the nest in a nested logit model.

I am trying to understand how you write code for a standard application in python, if there is a programming guide that you guys follow while writing the code for statsmodels?
Just so that I can understand what you would write and have written...... maybe one day, I would be able to contribute something too......

Regards
Mayank P Jain

V R TechNiche
Transportation Modeler/Planner
Phone: +91 901 356 0583

Skipper Seabold

unread,

Sep 11, 2010, 12:52:32 PM9/11/10

to pystat...@googlegroups.com

On Mon, Sep 6, 2010 at 11:23 PM, Mayank P Jain <maya...@gmail.com> wrote:
> wow, thanks Josef for dedicated effort in doing this !!
> Thanks a ton for your efforts and time!
>
> I am sure you would include the nesting coefficients that define the nest in
> a nested logit model.
>
> I am trying to understand how you write code for a standard application in
> python, if there is a programming guide that you guys follow while writing
> the code for statsmodels?

Do you mean a coding guide for Python or an overview on the
organization of our codebase?

Mayank P Jain

unread,

Sep 11, 2010, 2:01:42 PM9/11/10

to pystat...@googlegroups.com

I mean an outline of how the code is structured in , what can i expect to find in all the .py files, and various folders.

Regards
Mayank P Jain

V R TechNiche
Transportation Modeler/Planner
Phone: +91 901 356 0583

Skipper Seabold

unread,

Sep 11, 2010, 2:15:17 PM9/11/10

to pystat...@googlegroups.com

On Sat, Sep 11, 2010 at 2:01 PM, Mayank P Jain <maya...@gmail.com> wrote:
> I mean an outline of how the code is structured in , what can i expect to
> find in all the .py files, and various folders.
>

Ah, ok. Yeah. I will look around and see what we have for this, as
I'm doing some housecleaning this weekend.

Mayank P Jain

unread,

Sep 11, 2010, 2:20:54 PM9/11/10

to pystat...@googlegroups.com

thanks skipper........

Regards
Mayank P Jain

V R TechNiche
Transportation Modeler/Planner
Phone: +91 901 356 0583

josef...@gmail.com

unread,

Sep 11, 2010, 3:09:04 PM9/11/10

to pystat...@googlegroups.com

On Sat, Sep 11, 2010 at 2:20 PM, Mayank P Jain <maya...@gmail.com> wrote:
> thanks skipper........
>
> Regards
> Mayank P Jain
>
> V R TechNiche
> Transportation Modeler/Planner
> Phone: +91 901 356 0583
>
>
>
> On Sat, Sep 11, 2010 at 11:45 PM, Skipper Seabold <jsse...@gmail.com>
> wrote:
>>
>> On Sat, Sep 11, 2010 at 2:01 PM, Mayank P Jain <maya...@gmail.com> wrote:
>> > I mean an outline of how the code is structured in , what can i expect
>> > to
>> > find in all the .py files, and various folders.

I think the easiest to see the basic structure is by browsing through
discretemod.py . It's simple because the models are self contained
(compared to glm) and don't have a long inheritance chain (compared to
regression.py)

The work/calculations are essentially split between the model class,
e.g. Logit, that contains the model specific loglike, derivatives,
Hessian, ..., the superclasses which handle the actual fitting, e.g.
calls to non-linear optimizers are in model.LikelihoodModel, and the
result class, which contains result statistics that are (mostly) the
same for all model classes in the same group.

regression.py is messier because the calculations are split across
several levels of class hierarchies.
glm requires links and families which are defined in a subdirectory.

All these models are very similar because they have, currently, one
endog and one exog array. MNLogit does the conversion for the dummy
representation of endog internally.

Many of the models that are in the sandbox, time series analysis tsa,
system of equations and sur in sysreg, Nested Logit, GMM, ...
don't fit quite so nicely in this pattern and we still have to settle
in some parts on the design.

Code other than models, e.g. statistical tests, input-output
functions, have (at least until now) a simpler structure and are
pretty isolated from the rest.