Question aboit SAHM and glm - R Code

71 views
Skip to first unread message

r...@ualberta.ca

unread,
Jun 13, 2016, 6:54:28 PM6/13/16
to VisTrails SAHM
Hello,
 I'm pretty new to using Vistrails, but am really liking the interface and the ability to keep track of the modelling process! Thanks for creating this product :-)

 I have a quick question about the R code used in the glm portion of SAHM though. I ran a very quick exploratory analysis using the glm with the "Squared terms" option checked and the "SelectBestPredSubset" = AIC. However,when I looked at the glm_output.txt file of the final model I'm a bit confused as to what's going on (this could also be due to my limited knowledge of R and certain commands). The output lists the following for the best model:

Results:
     number covariates in final model   : 4

Call:
glm(formula = response ~ I(c^2) + I(cl^2) + I(built600^2) + cti,
    family = model.family, data = dat, weights = weight, na.action = "na.exclude")

Coefficients:
                Estimate Std. Error z value Pr(>|z|)   
(Intercept)   -4.626e+00  1.053e+00  -4.392 1.12e-05 ***
I(c^2)        -7.938e-04  8.491e-04  -0.935   0.3498   
I(cl^2)        6.983e-05  3.580e-05   1.950   0.0511 . 
I(built600^2)  6.771e+00  3.193e+00   2.121   0.0339 * 
cti            1.645e-01  8.798e-02   1.870   0.0615 . 

 It was my understanding that for any quadratic terms the "linear" term needs to also be included in the model. So in this case if I'm interested in the quadratic effects of c and cl and built600 shouldn't there be 3 more parameters in the model, ie.:
glm(formula = response ~ c+I(c^2) + cl+I(cl^2) + built600+I(built600^2) + cti,
    family = model.family, data = dat, weights = weight, na.action = "na.exclude")

and then in the coefficient table, shouldn't there be 3 more parameters: c, cl and built600?

Apologies if there is a simple answer to this and I'm totally missing something.

Thanks again,

Ryan

marian.k...@gmail.com

unread,
Jun 15, 2016, 2:34:20 PM6/15/16
to VisTrails SAHM
I think the problem here is that it's quite difficult to allow the user complete control over the model structure from a gui interface.  Unfortunately the stepwise model selection in R that I use is ignorant of which first order terms need to be included along with the second order terms and I only identified the scope: intercept only as the lower limit to all terms (first and second order) as the upper limit.  The algorithm's decision rules are based only on the AIC or BIC depending on what was selected. The way I've seen second order terms included in stepwise model selection procedure is to specify the lower or starting model as that including all the first order terms.  I suppose I could implement a two tiered stepwise procedure at the first select the first order terms and at the second use the results from the first stepwise procedure and test interactions from the first order terms included.  Given that I don't believe that there is a precedent for this I'd be a little reluctant to implement it but I don't know that I have another solution.   One option if you really like the evaluation metrics in SAHM or the provenance tracking would be to take the .mds file, fit the model in R and once you've determined the best set of predictors use these in SAHM specifying the  "SelectBestPredSubset" option which will include the exact set of input predictors from the correlation filtering module.  
Reply all
Reply to author
Forward
0 new messages