likelihood estimates of glmfit results?

thomas

unread,

Oct 21, 2008, 3:12:02 PM10/21/08

to

Apologies in advance for my inexpertise on statistical matters.

I'm doing logistic regression using glmfit. I want to compute pseudo-R^2 and AIC (Akaike information criterion), which for a logistic regression seem to require knowing the likelihood estimate for the fitted model. But I can't figure out how to compute this. If I'm not mistaken, glmfit is doing this already in order to iteratively discover the optimized betas, but I can't figure out how to reconstruct that actual likelihood estimate from the outputs it gives me.

Can anyone point me in the right direction? Am I right in thinking that the information I need is already being computed by glmfit, but is not output?

thomas

unread,

Oct 21, 2008, 5:23:02 PM10/21/08

to

OK, I've discovered that what Matlab means by "deviance" in the output from glmfit is definitely "residual deviance", and thus I think AIC should be deviance + 2*k, where k is the number of free parameters in the model, which I believe is just equivalent to the number of coefficients generated. Can anyone confirm this?

Peter Perkins

unread,

Oct 22, 2008, 10:21:50 AM10/22/08

to

thomas wrote:
> OK, I've discovered that what Matlab means by "deviance" in the output from glmfit is definitely "residual deviance", and thus I think AIC should be deviance + 2*k, where k is the number of free parameters in the model, which I believe is just equivalent to the number of coefficients generated. Can anyone confirm this?

Thomas, "devaince" is the standard term in the GLM literature for the difference between the log-likelihood of the fitted model, and the maximum possible log-likelihood, where the fitted values are exactly the data. It isn't something that is specific to GLMFIT.

In any case, log-likelihood and AIC/BIC are only defined up to an additive constant anyway (since they are useful only in a relative sense), so you can certainly use the deviance in place of the log-likelihood.

Hope this helps.

C F

unread,

Apr 29, 2009, 12:24:01 AM4/29/09

to

Peter Perkins <Peter.Perki...@mathworks.com> wrote in message <gdnctu$lu3$1...@fred.mathworks.com>...

I'm trying to find the AIC as well, and it seems I need the fitted log likelihood function value. I understand deviance is the difference between LL of the fitted model and max possible log-likelihood. But I'm looking for the actual LL as I'm comparing this to another non-GLM model and have access only to the fitted LL in that model. Thus, if there is a way to either recover the max possible LL or directly recoverd the fitted LL that would be very helpful. (also, i'm sure this is wrong, but isn't the LL of the fitted model in fact the "maximum" possible LL? i know its not, but why?)

thanks

Peter Perkins

unread,

Apr 29, 2009, 11:11:58 AM4/29/09

to

C F wrote:

> I'm trying to find the AIC as well, and it seems I need the fitted log likelihood function value. I understand deviance is the difference between LL of the fitted model and max possible log-likelihood. But I'm looking for the actual LL as I'm comparing this to another non-GLM model and have access only to the fitted LL in that model. Thus, if there is a way to either recover the max possible LL or directly recoverd the fitted LL that would be very helpful.

You don't say what kind of GLM you're fitting, but generally speaking: Use GLMVAL to compute fitted values for each observation. Then use the appropriate PDF function to compute individual likelihoods, then log and sum. So for a binomial GLM (e.g. logistic regression),

betaHat = glmfit(X,[y,n],'binomial');
pFitted = glmval(betaHat,X,'logistic','size',n);
LL = sum(log(binopdf(y,n,pFitted)));

For a GLM with an estimated dispersion parameter, you'll have to use that in the PDF function call.

(also, i'm sure this is wrong, but isn't the LL of the fitted model in fact the "maximum" possible LL? i know its not, but why?)

The maximum possible likelihood is the likelihood for a model where each fitted value is the observed value -- in effect one parameter for each observation. Not a very useful model.

Hope this helps.

Gabriel Diaz

unread,

Jun 8, 2009, 6:15:02 PM6/8/09

to

This doesn't seem to work for me. Here's a simple toy problem for which I know the LL and AIC....

x = predictor values
y = responses
n = number of stimulus presentations

I'm trying to calculate AIC for a binomial logistic regression, but my LL is incorrect. I've verified the correct log likelihood using two other statistics / computation packages. To be clear, I'm not suggesting that matlab is bugged. More reasonably, my own calculations are bugged!

Any advice with that log likelihood calculation?

****************

clear all

x = [ 53 57 58 63 66 67 68 69 70 72 73 75 76 78 79 81 ]';
y = [ 1 1 1 1 0 0 0 0 2 0 0 1 0 0 0 0 ]';
n = [ 1 1 1 1 1 3 1 1 4 1 1 2 2 1 1 1 ]';

[betas dev stats]= glmfit(x,[y n],'binomial', 'link', 'logit');

fitted = glmval(betas,x,'logit');

%% LL should be -10.1576
logLikelihood = sum(log( binopdf( y, n,fitted)))

%% AIC should be 24.3152
AIC = -2*logLikelihood + 2*numel(betas)

- gD

Peter Perkins <Peter....@MathRemoveThisWorks.com> wrote in message <gt9qnu$7qm$1...@fred.mathworks.com>...

Tom Lane

unread,

Jun 9, 2009, 5:18:57 PM6/9/09

to

> I'm trying to calculate AIC for a binomial logistic regression, but my LL
> is incorrect. I've verified the correct log likelihood using two other
> statistics / computation packages. To be clear, I'm not suggesting that
> matlab is bugged. More reasonably, my own calculations are bugged!

Gabriel, can you give any details about what you did to conclude that the
correct log likelihood is -10.1576?

It appears that glmfit is maximizing the likelihood correctly. Here I
demonstrate that the coefficient estimates from glmfit are the same as you
would get by using fminsearch to maximize the negative of your log
likelihood function:

>> betas
betas =
15.0429
-0.2322
>> invlogit = @(xb) 1./(1+exp(-xb));
>> xb = @(b,x) b(1)+b(2)*x;
>> f = @(b) -sum(log(binopdf(y,n,invlogit(xb(b,x)))));
>> fminsearch(f,[15 -.2])
ans =
15.0429 -0.2322
>> f(ans)
ans =
7.6727

-- Tom

Gabriel Diaz

unread,

Jun 10, 2009, 11:18:01 AM6/10/09

to

> Gabriel, can you give any details about what you did to conclude that the
> correct log likelihood is -10.1576?

I ran the toy-problem through two other packages: SPSS and Mathematica. Both packages provide LL as part of their logistic regression output and both suggest that the LL is -10.1576.

Tom Lane

unread,

Jun 10, 2009, 2:10:07 PM6/10/09

to

Okay Gabriel, I did some playing around and here's what I found out. You
have data like this:

x = [ 63 66 70 . . .]
y = [ 1 0 2 . . .]
n = [ 1 1 4 . . .]

You could also choose to enter the data in this form:

x = [ 63 66 70 70 70 70. . .]
y = [ 1 0 0 0 1 1 . . .]
n = [ 1 1 1 1 1 1 . . .]

In the first form we report that in a total of 4 trials with x=70, we
observe 2 "heads." In the second form we report four individual trials all
with the value x=70, and the first 2 have tails while the last 2 have heads.

If you use this second form with glmfit, you'll get the same coefficients.
If you do your binopdf thing with this form, you'll get the same likelihood
that you found from the other software.

I don't know if you used the second form with the other packages, or if they
convert to this form, or if they simply omit certain constant values from
the log likelihood. The value -10.1576 represents the likelihood of these
binary trials in a specific order, such as [0 0 1 1] for the x=70 data,
while the value you originally got in MATLAB represents the likelihood of
binomial trials with the specific binary sequence unspecified, so for
example the 4 values at x=70 may have come in the order [0 1 1 0]. The
difference in these values is a constant that depends on the data but not on
the parameter value.

Any conclusions you draw based on looking at differences in log likelihoods
are going to be the same either way. Just make sure to use the same form
(probably the same software) in fitting both models. Similarly, if you sort
a collection of models by their AIC values, you'll come to the same
conclusions either way.

You can pick either one for your research. I prefer the MATLAB one because
it's based on the real binomial likelihood, but again that's not going to
matter for most purposes.

-- Tom

Gabriel Diaz

unread,

Jun 10, 2009, 2:24:02 PM6/10/09

to

Thanks a lot Tom, you nailed it. This clears up a lot of confusion!

- gD

"Tom Lane" <tl...@mathworks.com> wrote in message <h0osu0$sfg$1...@fred.mathworks.com>...

Gabriel Diaz

unread,

Jun 19, 2009, 11:30:04 AM6/19/09

to

Now, if I want to calculate any variety of psuedo R^2, or chi^2, I need the LL of the original model (using just the intercept, no coefficients). Using the example provided previously, other programs suggest that this should be -14.1336. I've tried a few methods to calculate this in Matlab, with little luck. Strangely enough, of the 4 books I have in front of me, not one provides equations.

Suggestions?

My thanks in advance.

- gD

"Gabriel Diaz" <eyeba...@gmail.com> wrote in message <h0oto2$o1p$1...@fred.mathworks.com>...

Gabriel Diaz

unread,

Jun 19, 2009, 1:18:01 PM6/19/09

to

Nevermind, the answer is simple. Take the average of the predictor variable and run glmfit (using the original DV).

Thx,
- gD

"Gabriel Diaz" <eyeba...@gmail.com> wrote in message <h1gats$1vg$1...@fred.mathworks.com>...

Saari Saari

unread,

Jan 6, 2010, 4:59:04 AM1/6/10

to

Hello,
I just came forward this newsgroup and found this topic helped me alot in my project since i did the same thing as Gabriel. However, i would like to know how to get standard error for each parameter because before this i tried to use
[stats.se] but it seems did not work for my data. I hope anyone can help me for this question because i really need the value to test the significant for each parameter used. I'm sorry because i am not really good in statistical problem. Thank you.

"thomas " <tjerde_remo...@stanford.edu> wrote in message <gdl9i2$f15$1...@fred.mathworks.com>...

Nick Mariette

unread,

Jan 30, 2011, 7:19:04 AM1/30/11

to

Hello,

Sorry to post in reply to an old thread.
But can anybody help me find the:
unrestricted log likelihood
and
restricted log likelihood
of a logistic function?

Using the example from this thread would be fine for me.

I need these to perform the likelihood ratio test (lratiotest.m) on a logistic fit.

Hope you can help.
Regards,
Nick

Here's the example again for easy reference:

x = [ 53 57 58 63 66 67 68 69 70 72 73 75 76 78 79 81 ]';
y = [ 1 1 1 1 0 0 0 0 2 0 0 1 0 0 0 0 ]';
n = [ 1 1 1 1 1 3 1 1 4 1 1 2 2 1 1 1 ]';

[b,dev,stats] = glmfit(x,[y n],'binomial', 'link', 'logit');
[yfit,ylo,yhi] = glmval(b, x,'logit',stats);
logLikelihood = sum(log( binopdf( y, n,yfit)))
% Akaike information criterion
betas = b;

AIC = -2*logLikelihood + 2*numel(betas)

% alternative method of calculating loglikelihood

invlogit = @(xb) 1./(1+exp(-xb));
xb = @(b,x) b(1)+b(2)*x;
f = @(b) -sum(log(binopdf(y,n,invlogit(xb(b,x)))));

min_ans = fminsearch(f,b);
logLikelihood2 = f(min_ans)

"Tom Lane" <tl...@mathworks.com> wrote in message <h0mjk1$ec3$1...@fred.mathworks.com>...

Nick Mariette

unread,

Jan 30, 2011, 7:21:02 AM1/30/11

to

Hello,

Sorry to post in reply to an old thread.
But can anybody help me find the:
unrestricted log likelihood
and
restricted log likelihood
of a logistic function?

Using the example from this thread would be fine for me.

I need these to perform the likelihood ratio test (lratiotest.m) on a logistic fit.

Hope you can help.
Regards,
Nick

Here's the example again for easy reference:

x = [ 53 57 58 63 66 67 68 69 70 72 73 75 76 78 79 81 ]';

y = [ 1 1 1 1 0 0 0 0 2 0 0 1 0 0 0 0 ]';
n = [ 1 1 1 1 1 3 1 1 4 1 1 2 2 1 1 1 ]';

[b,dev,stats] = glmfit(x,[y n],'binomial', 'link', 'logit');
[yfit,ylo,yhi] = glmval(b, x,'logit',stats);
logLikelihood = sum(log( binopdf( y, n,yfit)))
% Akaike information criterion
betas = b;

AIC = -2*logLikelihood + 2*numel(betas)

% alternative method of calculating loglikelihood

invlogit = @(xb) 1./(1+exp(-xb));
xb = @(b,x) b(1)+b(2)*x;
f = @(b) -sum(log(binopdf(y,n,invlogit(xb(b,x)))));

min_ans = fminsearch(f,b);
logLikelihood2 = f(min_ans)

"Tom Lane" <tl...@mathworks.com> wrote in message <h0mjk1$ec3$1...@fred.mathworks.com>...

Valerie

unread,

Sep 22, 2011, 3:43:27 PM9/22/11

to

I'm trying to figure out how to extend your solution to finding the likelihood from a glmfit model to one using a gamma distribution. As you suggested, I've tried to incorporate the dispersion.

Taking into account that, for the specification of the gamma used in matlab:
variance = a * b^2
mean = a * b
dispersion = 1/a

I calculate
a = 1 / dispersion
b = mean values (found using glmval) / a
And use this for calculation of the likelihood. Does this seem correct to you?

[b,dev,stats] = glmfit(X,y,'gamma','link','log');
dispersion = stats.s;
mu = glmval(b,X,'log');
ahat = 1/dispersion;
bhat = mu / ahat;
loglike = nansum(log(gampdf(y,ahat,bhat)));

Peter Perkins <Peter....@MathRemoveThisWorks.com> wrote in message <gt9qnu$7qm$1...@fred.mathworks.com>...

Peter Perkins

unread,

Sep 23, 2011, 9:49:02 AM9/23/11

to

On 9/22/2011 3:43 PM, Valerie wrote:
> I'm trying to figure out how to extend your solution to finding the
> likelihood from a glmfit model to one using a gamma distribution. As you
> suggested, I've tried to incorporate the dispersion.
>
> Taking into account that, for the specification of the gamma used in
> matlab:
> variance = a * b^2
> mean = a * b
> dispersion = 1/a
>
> I calculate
> a = 1 / dispersion
> b = mean values (found using glmval) / a
> And use this for calculation of the likelihood. Does this seem correct
> to you?
>
> [b,dev,stats] = glmfit(X,y,'gamma','link','log'); dispersion = stats.s;
> mu = glmval(b,X,'log'); ahat = 1/dispersion; bhat = mu / ahat; loglike =
> nansum(log(gampdf(y,ahat,bhat)));

Valerie, you're close. GLMFIT returns the dispersion parameter on the
scale of std dev, not variance, so you'll have to square it. You can
verify that by running this code, which fits a gamma distribution to
univariate data, and then fits a gamma/log GLM to the same data with no
predictors:

>> rng default
>> a = 1.23; b = 4.56;
>> [a*b a*b^2] % theoretical mean/variance
ans =
5.6088 25.576
>>
>> n = 1000000;
>> y = gamrnd(a,b,n,1);
>> [mean(y) var(y)] % sample mean/variance
ans =
5.6107 25.508
>>
>> paramEsts = gamfit(y); aHat = paramEsts(1), bHat = paramEsts(2)
aHat =
1.2296
bHat =
4.563
>> [aHat*bHat aHat*bHat^2] % maximum likelihood estimated mean/variance
ans =
5.6107 25.602
>>
>> x = zeros(n,0);
>> [beta,dev,stats] = glmfit(x,y,'gamma','link','log');
>> mu = unique(glmval(beta,x,'log'));
>> phi = stats.s^2 % estimated dispersion
phi =
0.81029
>> [mu phi*mu^2] % GLM estimated mean/variance
ans =
5.6107 25.508
>> bHat = phi*mu, aHat = 1/phi % equivalent scale/shape params
bHat =
4.5463
aHat =
1.2341

You will notice that the parameter estimates from GLMFIT/GLMVAL are
slightly different than those from GAMFIT. That's because GLMFIT
estimates the dispersion parameter in a slightly different way than the
pure maximum likelihood that GAMFIT uses: GLMFIT uses a Pearson sum of
squares kind of thing (see the GLMFIT code for the exact details). What
GLMFIT does is entirely standard for a generalized linear model, and
appropriate for a regression, but it is slightly different than fitting
a univariate distribution by maximum likelihood.

Hope this helps.

Valerie

unread,

Sep 23, 2011, 2:11:30 PM9/23/11

to

This helps a lot!

Thanks.

Peter Perkins <Peter.Remove...@mathworks.com> wrote in message <j5i2oe$qve$1...@newscl01ah.mathworks.com>...

Marris Atwood

unread,

Jun 9, 2015, 8:16:40 AM6/9/15

to

Peter Perkins <Peter....@MathRemoveThisWorks.com> wrote in message <gt9qnu$7qm$1...@fred.mathworks.com>...

Shouldn't it be

betaHat = glmfit(X,[y,n],'binomial');

yFitted = glmval(betaHat,X,'logistic','size',n);
LL = sum(log(binopdf(y,n, yFitted./n)));

instead?

Peter Perkins

unread,

Jun 12, 2015, 3:10:06 PM6/12/15

to

Yes, I think you are correct. GLMVAL would return expected counts for
size n in this case. I guess you could also just leave off the 'size'
param in my code.

John

unread,

Dec 6, 2016, 7:08:09 PM12/6/16

to

Annoyingly the Matlab documentation doesn't give a clear explanation of what the dev output field is from glmfit. I checked this manually by calculating the log-likelihood as explained above for my data, and found that the dev output from glmfit is -2*log-likelihood for that given fit. Therefore, to compute an omnibus p-value for your glmfit vs. a constant model, do something like this:
omni_p=1-chi2cdf(dev0-dev,stats0.dfe-stats.dfe); % the 0 implies stats and dev from the output of glmfit with a constant model (just a column of ones for the predictors)

It should be noted that according to the wikipedia entry on deviance, dev as used by whomever wrote glmfit is not being used correctly ("the quantity {\displaystyle -2\log {\big (}p(y\mid {\hat {\theta }}_{0}){\big )}} -2 \log \big( p(y\mid\hat \theta_0)\big) is sometimes referred to as a deviance. This is [...] inappropriate, since unlike the deviance used in the context of generalized linear modelling, {\displaystyle -2\log {\big (}p(y\mid {\hat {\theta }}_{0}){\big )}} -2 \log \big( p(y\mid\hat \theta_0)\big) does not measure deviation from a model that is a perfect fit to the data."

"thomas" wrote in message <gdlh7m$k9n$1...@fred.mathworks.com>...