Sorry, it's still very crude. We are only testing against to mlogit
package. On DCM_clogit.py, first is the class CLogit and after, three
examples: the first, replicate Greene example (two alternative specific
variables with generic coefficients and one alternative specific
variable with an alternative specific coefficient) and, the next two,
are variants of it. Each example, is followed by R results to test.The
R's mlogit pachage look at conditional logit in multinomial logit model,
not as a separate model. I think there are two lines of thought on this:
those which separate on two and those wich taken together. If all the
independent variables are case specific, then, the end is the same, the
two models are identical. For now our models deal with only specific
variables but it should be able to work with both types of variable
(alternative specific and/or individual/case specific). We only start to
talk a bit in issue #941 about how to handle data entry.
>
> A quick look at your code:
>
> what is "import stata"? I didn't know we can import stata into python.
>
>
> I wrote a very bare bones wrapper to call Stata, estimate a model, and
> bring the results into Python. Feel free to tinker if this looks useful:
>
>
https://github.com/amarder/StataPy
>
>
> pandas might be slow in the log-likelihood because it is called very
> often.
> Just a guess: can you define `self.data.groupby('group')` outside of
> the loglikelihood, in __init__ for example and reuse it.
>
>
> This change sped up the code by about 2.5 seconds (2%), a nice quick win.
>
>
> I don't remember whether fit(method='nm') is still the default in
> GenericLikelihoodModel, method='bfgs' should be faster and more
> precise at the optimum but might not converge in some cases.
>
>
> Here are the runtimes in seconds for various maximization methods:
> nm (default): 173
> bfgs: manually stopped after 240
> newton: 114
>
> Ana, great tip on newton, this is a 34% speed up!
>
>
> Another issue that Skipper reported for NegativeBinomial is that the
> numerical derivatives are much slower than analytical derivatives.
> When those are available, then it improves the performance
> significantly.
>
>
> I think this is where we should get a big speed up. I took a quick stab
> at taking the derivative, but it looks pretty tough, and I gave up.
I know the derivative (I'll put it on github). I'll try to see how to
include it.
>
>
> Another possibility: If we can have a fast approximate initial
> estimate, then numerical optimization can start from a better position
> and will converge faster.
>
>
> Another good idea, I think sometimes people use parameter estimates from
> the standard logit (without fixed effects) as a good initial position.
Good! I'll try on it!
>
>
> I guess we will have a reasonably fast version by the end of summer or
> early fall.
>
>
> Sounds good to me. Happy to beta test,
>
> Andrew
>
Thank you! That would be great!
Ana