How to get the p-value in GLM?

xiong...@gmail.com

unread,

Oct 14, 2014, 5:21:17 AM10/14/14

to julia...@googlegroups.com

In below example of GLM, I want to get the Pr(>|t|) value 3.4e-7. How can I get it?
Also, how this p-value be calculated? By F test or by Chisq test? I can choose the test type in R but I can not choose in Julia.

julia> using GLM, RDatasets

julia> form = dataset("datasets","Formaldehyde")
6x2 DataFrame
|-------|------|--------|
| Row # | Carb | OptDen |
| 1     | 0.1  | 0.086  |
| 2     | 0.3  | 0.269  |
| 3     | 0.5  | 0.446  |
| 4     | 0.6  | 0.538  |
| 5     | 0.7  | 0.626  |
| 6     | 0.9  | 0.782  |

julia> lm1 = fit(LinearModel, OptDen ~ Carb, form)
Formula: OptDen ~ Carb

Coefficients:
               Estimate  Std.Error  t value Pr(>|t|)
(Intercept)  0.00508571 0.00783368 0.649211   0.5516
Carb           0.876286  0.0135345  64.7444   3.4e-7

David Gonzales

unread,

Oct 14, 2014, 7:57:57 AM10/14/14

to julia...@googlegroups.com

The thing with julia is that most of the language is written in julia, so getting answers means just reading more julia code. So in GLM/src/lm.jl there is a function coeftable(..) that generates the above table. Taking the calculation from there, you get:

[ccdf(FDist(1,df_residual(lm1.model)),abs2(fval)) for fval in coef(lm1)./stderr(lm1)]

which gives the Pr(>|t|) column:

0.551595

3.40919e-7

And as for how it is calculated - the formula shows it uses the F distribution.

Douglas Bates

unread,

Oct 14, 2014, 12:12:28 PM10/14/14

to julia...@googlegroups.com

On Tuesday, October 14, 2014 6:57:57 AM UTC-5, David Gonzales wrote:

The thing with julia is that most of the language is written in julia, so getting answers means just reading more julia code. So in GLM/src/lm.jl there is a function coeftable(..) that generates the above table. Taking the calculation from there, you get:

[ccdf(FDist(1,df_residual(lm1.model)),abs2(fval)) for fval in coef(lm1)./stderr(lm1)]

which gives the Pr(>|t|) column:
0.551595
3.40919e-7

And as for how it is calculated - the formula shows it uses the F distribution.

That piece of code may be a little too terse. The column name "Pr(>|t|) " is intended to be read "the probability of exceeding the absolute value of the observed t-statistic" which is the p-value for a test of the coefficient in question being zero versus not equal to zero. Implicit in the test is the distribution of that ratio as a Student's T distribution with n-p degrees of freedom where n is the total number of observations and p is the number of coefficients.

It happens that the square of a Student's T distribution on n-p degrees of freedom is an F distribution with 1 and n-p degrees of freedom and it is easier to evaluate the probability of the F distribution exceeding the square of the t-statistic, which is what is done here.

Reply all

Reply to author

Forward