Calling glm with a formula when variable names are only known at run-time

69 views
Skip to first unread message

colint...@gmail.com

unread,
May 29, 2016, 2:07:42 AM5/29/16
to julia-stats
Hi all,

I'm trying to call glm to perform OLS on some data in a DataFrame. However, the variable names (i.e. column-names) in the DataFrame are only known at run-time, so I'm not sure how to construct the formula input to the glm function. For example, at run-time, my function determines that it wants to regress column 1 of a DataFrame on columns 3 and 4, but the column names of the DataFrame are only known at run-time. Obviously my function could construct an ASCIIString representation of the appropriate formula, e.g. "column1Name ~ column3Name + column4Name" from the column headers in the DataFrame, but the glm function will not accept an ASCIIString as the input type for the formula argument.

I'm sure there is a simple way around this, but I couldn't work it out from the GLM docs. I can't even seem to work out what is the type of the formula input to glm. If someone could just point me to the relevant constructor for the type of the formula input, I would imagine it shouldn't be too hard to come up with a routine to convert an ASCIIString representation of the formula to the appropriate type.

Any help would be greatly appreciated.

Cheers,

Colin

Douglas Bates

unread,
May 29, 2016, 11:12:24 AM5/29/16
to julia-stats
Constructing a formula on the fly requires you to learn a bit about the structure of the formula itself.

julia> ff = foo ~ bar + baz
Formula: foo ~ bar + baz 

julia> fieldnames(ff)
2-element Array{Symbol,1}:
 :lhs
 :rhs

julia> typeof(ff.lhs)
Symbol

julia> ff.lhs
:foo

Suppose instead that you want to have "fab" on the left hand side.  You can simply reassign the .lhs member as the symbol

julia> ff.lhs = symbol("fab")
:fab

julia> ff
Formula: fab ~ bar + baz

The right-hand side is a bit more complicated in that it is an expression.

julia> ff.rhs
:(bar + baz)

julia> typeof(ff.rhs)
Expr

julia> fieldnames(ff.rhs)
3-element Array{Symbol,1}:
 :head
 :args
 :typ

julia> ff.rhs.args
3-element Array{Any,1}:
 :+
 :bar
 :baz

Now it happens that the + function can take an arbitrary number of arguments.  I can change the formula to "fab ~ 1 + baz + boz" with

julia> ff.rhs.args = Any[:+, 1, :baz, :box]
4-element Array{Any,1}:
  :+
 1
  :baz
  :box

julia> ff
Formula: fab ~ 1 + baz + box 

Does this help?

colint...@gmail.com

unread,
May 29, 2016, 8:51:19 PM5/29/16
to julia-stats
That is extremely helpful, thank you! And I can see from this that I don't need to bother messing around with an ASCIIString formula - I can just create the Formula type directly myself.

Out of curiosity, is anyone working on a general routine to convert an ASCIIString formula to a Formula type? Your example makes it clear that it is fairly trivial for a linear additive formula, but I can see how it could get complicated very quickly for more general formula types...

Cheers, and thanks again.

Colin
Reply all
Reply to author
Forward
0 new messages