How to get the p-value in GLM?

749 views
Skip to first unread message

xiong...@gmail.com

unread,
Oct 14, 2014, 5:21:17 AM10/14/14
to julia...@googlegroups.com
In below example of GLM, I want to get the Pr(>|t|) value 3.4e-7. How can I get it?
Also, how this p-value be calculated? By
F test or by Chisq test? I can choose the test type in R but I can not choose in Julia.

julia
> using GLM, RDatasets julia> form = dataset("datasets","Formaldehyde") 6x2 DataFrame |-------|------|--------| | Row # | Carb | OptDen | | 1 | 0.1 | 0.086 | | 2 | 0.3 | 0.269 | | 3 | 0.5 | 0.446 | | 4 | 0.6 | 0.538 | | 5 | 0.7 | 0.626 | | 6 | 0.9 | 0.782 | julia> lm1 = fit(LinearModel, OptDen ~ Carb, form) Formula: OptDen ~ Carb Coefficients: Estimate Std.Error t value Pr(>|t|) (Intercept) 0.00508571 0.00783368 0.649211 0.5516 Carb 0.876286 0.0135345 64.7444 3.4e-7

David Gonzales

unread,
Oct 14, 2014, 7:57:57 AM10/14/14
to julia...@googlegroups.com
The thing with julia is that most of the language is written in julia, so getting answers means just reading more julia code. So in GLM/src/lm.jl there is a function coeftable(..) that generates the above table. Taking the calculation from there, you get:

[ccdf(FDist(1,df_residual(lm1.model)),abs2(fval)) for fval in coef(lm1)./stderr(lm1)]

which gives the Pr(>|t|) column:
 0.551595  
 3.40919e-7

And as for how it is calculated - the formula shows it uses the F distribution.

Douglas Bates

unread,
Oct 14, 2014, 12:12:28 PM10/14/14
to julia...@googlegroups.com
On Tuesday, October 14, 2014 6:57:57 AM UTC-5, David Gonzales wrote:

The thing with julia is that most of the language is written in julia, so getting answers means just reading more julia code. So in GLM/src/lm.jl there is a function coeftable(..) that generates the above table. Taking the calculation from there, you get:

[ccdf(FDist(1,df_residual(lm1.model)),abs2(fval)) for fval in coef(lm1)./stderr(lm1)]

which gives the Pr(>|t|) column:
 0.551595  
 3.40919e-7

And as for how it is calculated - the formula shows it uses the F distribution.

That piece of code may be a little too terse.  The column name "Pr(>|t|) " is intended to be read "the probability of exceeding the absolute value of the observed t-statistic" which is the p-value for a test of the coefficient in question being zero versus not equal to zero.  Implicit in the test is the distribution of that ratio as a Student's T distribution with n-p degrees of freedom where n is the total number of observations and p is the number of coefficients.

It happens that the square of a Student's T distribution on n-p degrees of freedom is an F distribution with 1 and n-p degrees of freedom and it is easier to evaluate the probability of the F distribution exceeding the square of the t-statistic, which is what is done here.
Reply all
Reply to author
Forward
0 new messages