I'm doing logistic regression using glmfit. I want to compute pseudo-R^2 and AIC (Akaike information criterion), which for a logistic regression seem to require knowing the likelihood estimate for the fitted model. But I can't figure out how to compute this. If I'm not mistaken, glmfit is doing this already in order to iteratively discover the optimized betas, but I can't figure out how to reconstruct that actual likelihood estimate from the outputs it gives me.
Can anyone point me in the right direction? Am I right in thinking that the information I need is already being computed by glmfit, but is not output?
Thomas, "devaince" is the standard term in the GLM literature for the difference between the log-likelihood of the fitted model, and the maximum possible log-likelihood, where the fitted values are exactly the data. It isn't something that is specific to GLMFIT.
In any case, log-likelihood and AIC/BIC are only defined up to an additive constant anyway (since they are useful only in a relative sense), so you can certainly use the deviance in place of the log-likelihood.
Hope this helps.
I'm trying to find the AIC as well, and it seems I need the fitted log likelihood function value. I understand deviance is the difference between LL of the fitted model and max possible log-likelihood. But I'm looking for the actual LL as I'm comparing this to another non-GLM model and have access only to the fitted LL in that model. Thus, if there is a way to either recover the max possible LL or directly recoverd the fitted LL that would be very helpful. (also, i'm sure this is wrong, but isn't the LL of the fitted model in fact the "maximum" possible LL? i know its not, but why?)
thanks
> I'm trying to find the AIC as well, and it seems I need the fitted log likelihood function value. I understand deviance is the difference between LL of the fitted model and max possible log-likelihood. But I'm looking for the actual LL as I'm comparing this to another non-GLM model and have access only to the fitted LL in that model. Thus, if there is a way to either recover the max possible LL or directly recoverd the fitted LL that would be very helpful.
You don't say what kind of GLM you're fitting, but generally speaking: Use GLMVAL to compute fitted values for each observation. Then use the appropriate PDF function to compute individual likelihoods, then log and sum. So for a binomial GLM (e.g. logistic regression),
betaHat = glmfit(X,[y,n],'binomial');
pFitted = glmval(betaHat,X,'logistic','size',n);
LL = sum(log(binopdf(y,n,pFitted)));
For a GLM with an estimated dispersion parameter, you'll have to use that in the PDF function call.
(also, i'm sure this is wrong, but isn't the LL of the fitted model in fact the "maximum" possible LL? i know its not, but why?)
The maximum possible likelihood is the likelihood for a model where each fitted value is the observed value -- in effect one parameter for each observation. Not a very useful model.
Hope this helps.
x = predictor values
y = responses
n = number of stimulus presentations
I'm trying to calculate AIC for a binomial logistic regression, but my LL is incorrect. I've verified the correct log likelihood using two other statistics / computation packages. To be clear, I'm not suggesting that matlab is bugged. More reasonably, my own calculations are bugged!
Any advice with that log likelihood calculation?
****************
clear all
x = [ 53 57 58 63 66 67 68 69 70 72 73 75 76 78 79 81 ]';
y = [ 1 1 1 1 0 0 0 0 2 0 0 1 0 0 0 0 ]';
n = [ 1 1 1 1 1 3 1 1 4 1 1 2 2 1 1 1 ]';
[betas dev stats]= glmfit(x,[y n],'binomial', 'link', 'logit');
fitted = glmval(betas,x,'logit');
%% LL should be -10.1576
logLikelihood = sum(log( binopdf( y, n,fitted)))
%% AIC should be 24.3152
AIC = -2*logLikelihood + 2*numel(betas)
- gD
Peter Perkins <Peter....@MathRemoveThisWorks.com> wrote in message <gt9qnu$7qm$1...@fred.mathworks.com>...
Gabriel, can you give any details about what you did to conclude that the
correct log likelihood is -10.1576?
It appears that glmfit is maximizing the likelihood correctly. Here I
demonstrate that the coefficient estimates from glmfit are the same as you
would get by using fminsearch to maximize the negative of your log
likelihood function:
>> betas
betas =
15.0429
-0.2322
>> invlogit = @(xb) 1./(1+exp(-xb));
>> xb = @(b,x) b(1)+b(2)*x;
>> f = @(b) -sum(log(binopdf(y,n,invlogit(xb(b,x)))));
>> fminsearch(f,[15 -.2])
ans =
15.0429 -0.2322
>> f(ans)
ans =
7.6727
-- Tom
I ran the toy-problem through two other packages: SPSS and Mathematica. Both packages provide LL as part of their logistic regression output and both suggest that the LL is -10.1576.
Okay Gabriel, I did some playing around and here's what I found out. You
have data like this:
x = [ 63 66 70 . . .]
y = [ 1 0 2 . . .]
n = [ 1 1 4 . . .]
You could also choose to enter the data in this form:
x = [ 63 66 70 70 70 70. . .]
y = [ 1 0 0 0 1 1 . . .]
n = [ 1 1 1 1 1 1 . . .]
In the first form we report that in a total of 4 trials with x=70, we
observe 2 "heads." In the second form we report four individual trials all
with the value x=70, and the first 2 have tails while the last 2 have heads.
If you use this second form with glmfit, you'll get the same coefficients.
If you do your binopdf thing with this form, you'll get the same likelihood
that you found from the other software.
I don't know if you used the second form with the other packages, or if they
convert to this form, or if they simply omit certain constant values from
the log likelihood. The value -10.1576 represents the likelihood of these
binary trials in a specific order, such as [0 0 1 1] for the x=70 data,
while the value you originally got in MATLAB represents the likelihood of
binomial trials with the specific binary sequence unspecified, so for
example the 4 values at x=70 may have come in the order [0 1 1 0]. The
difference in these values is a constant that depends on the data but not on
the parameter value.
Any conclusions you draw based on looking at differences in log likelihoods
are going to be the same either way. Just make sure to use the same form
(probably the same software) in fitting both models. Similarly, if you sort
a collection of models by their AIC values, you'll come to the same
conclusions either way.
You can pick either one for your research. I prefer the MATLAB one because
it's based on the real binomial likelihood, but again that's not going to
matter for most purposes.
-- Tom
- gD
"Tom Lane" <tl...@mathworks.com> wrote in message <h0osu0$sfg$1...@fred.mathworks.com>...
Suggestions?
My thanks in advance.
- gD
"Gabriel Diaz" <eyeba...@gmail.com> wrote in message <h0oto2$o1p$1...@fred.mathworks.com>...
Thx,
- gD
"Gabriel Diaz" <eyeba...@gmail.com> wrote in message <h1gats$1vg$1...@fred.mathworks.com>...
"thomas " <tjerde_remo...@stanford.edu> wrote in message <gdl9i2$f15$1...@fred.mathworks.com>...
Sorry to post in reply to an old thread.
But can anybody help me find the:
unrestricted log likelihood
and
restricted log likelihood
of a logistic function?
Using the example from this thread would be fine for me.
I need these to perform the likelihood ratio test (lratiotest.m) on a logistic fit.
Hope you can help.
Regards,
Nick
Here's the example again for easy reference:
x = [ 53 57 58 63 66 67 68 69 70 72 73 75 76 78 79 81 ]';
y = [ 1 1 1 1 0 0 0 0 2 0 0 1 0 0 0 0 ]';
n = [ 1 1 1 1 1 3 1 1 4 1 1 2 2 1 1 1 ]';
[b,dev,stats] = glmfit(x,[y n],'binomial', 'link', 'logit');
[yfit,ylo,yhi] = glmval(b, x,'logit',stats);
logLikelihood = sum(log( binopdf( y, n,yfit)))
% Akaike information criterion
betas = b;
AIC = -2*logLikelihood + 2*numel(betas)
% alternative method of calculating loglikelihood
invlogit = @(xb) 1./(1+exp(-xb));
xb = @(b,x) b(1)+b(2)*x;
f = @(b) -sum(log(binopdf(y,n,invlogit(xb(b,x)))));
min_ans = fminsearch(f,b);
logLikelihood2 = f(min_ans)
"Tom Lane" <tl...@mathworks.com> wrote in message <h0mjk1$ec3$1...@fred.mathworks.com>...
Sorry to post in reply to an old thread.
But can anybody help me find the:
unrestricted log likelihood
and
restricted log likelihood
of a logistic function?
Using the example from this thread would be fine for me.
I need these to perform the likelihood ratio test (lratiotest.m) on a logistic fit.
Hope you can help.
Regards,
Nick
Here's the example again for easy reference:
x = [ 53 57 58 63 66 67 68 69 70 72 73 75 76 78 79 81 ]';
y = [ 1 1 1 1 0 0 0 0 2 0 0 1 0 0 0 0 ]';
n = [ 1 1 1 1 1 3 1 1 4 1 1 2 2 1 1 1 ]';
[b,dev,stats] = glmfit(x,[y n],'binomial', 'link', 'logit');
[yfit,ylo,yhi] = glmval(b, x,'logit',stats);
logLikelihood = sum(log( binopdf( y, n,yfit)))
% Akaike information criterion
betas = b;
AIC = -2*logLikelihood + 2*numel(betas)
% alternative method of calculating loglikelihood
invlogit = @(xb) 1./(1+exp(-xb));
xb = @(b,x) b(1)+b(2)*x;
f = @(b) -sum(log(binopdf(y,n,invlogit(xb(b,x)))));
min_ans = fminsearch(f,b);
logLikelihood2 = f(min_ans)
"Tom Lane" <tl...@mathworks.com> wrote in message <h0mjk1$ec3$1...@fred.mathworks.com>...
Taking into account that, for the specification of the gamma used in matlab:
variance = a * b^2
mean = a * b
dispersion = 1/a
I calculate
a = 1 / dispersion
b = mean values (found using glmval) / a
And use this for calculation of the likelihood. Does this seem correct to you?
[b,dev,stats] = glmfit(X,y,'gamma','link','log');
dispersion = stats.s;
mu = glmval(b,X,'log');
ahat = 1/dispersion;
bhat = mu / ahat;
loglike = nansum(log(gampdf(y,ahat,bhat)));
Peter Perkins <Peter....@MathRemoveThisWorks.com> wrote in message <gt9qnu$7qm$1...@fred.mathworks.com>...