Brad
a / b
is shorthand for a + a:b
"In R, if the subject id's are unique then you can just use subject on its own. If the subject ids are recycled in each region then nesting treats them as if they were unique. Otherwise R would think you had a crossed model.
Hi Nathaniel,
This was an extremely informative post. I was told that I need to use
the / syntax because I have subjects from multiple regions; however,
each subject has a unique ID regardless of region.
Given this, I can just do:
smf.MixedLM.from_formula('value~timepoint*dose', groups='region')
I updated to the master version of statsmodels and compared the
results of the MixedLM with R's lme4. The results are very similar.
That said, it's good to know about passing in details regarding the
structure of random effects via the vc_var structure. I had not been
aware of this and did not fully understand what it was for when I came
across it in the API docs.
The number you are expecting ;-). (number of regions + number of subjects - 1)
I believe that in lme4's custom formula syntax, they don't actually compute a design matrix for the stuff on the right hand side of their | operator. I think the logic is that they first expand out formulas operators like /, and then they treat it as a list of categorical variables separated by + signs (with : taking on its regular meaning in R, which is the row-by-row concatenation thing I mentioned, rather than its related but slightly different formula meaning). I haven't checked the code, though. They definitely don't do anything involving redundancy elimination there (it makes no sense in the context), though, and I'm pretty sure they treat anything they find as categorical regardless of whether it's marked with C().
-n
Brad
{'cell:target': '0+C(cell):C(target)', 'cell': '0+C(cell)'}