repeated measures design

69 views
Skip to first unread message

Ted Swiecki

unread,
Apr 18, 2012, 6:28:52 PM4/18/12
to Eureqa Group
I'm trying to figure out if there is a way to structure the data and
model to account for a repeated measures model. The data structure is
this: I have annual data (11 years) collected on a set of 324
geographically fixed points. I'm looking to model changes in various
outcomes as they fluctuate over the 11 year span. Some of the
possible predictors (e.g., weather variables) change from year to
year, others are fixed. This is essentially a repeated measures
design, in which the same experimental units are observed over time.
Essentially, I want to look at changes over time in these 324 units
and how they are affected by the predictors. I haven't quite figured
out how or if it can be done with Eureqa. Any ideas would be
appreciated.

L

unread,
Apr 19, 2012, 8:59:53 AM4/19/12
to eureqa...@googlegroups.com
Depends on what you want to do.
 
If you separate each of the 324 sets of 11 rows with a blank row, it will treat each of them as a separate grouping. 
 
If they are further sorted in chronological order within each set of 11, the delay and moving average building blocks will operate as expected (if selected)
 
Does that help any, or do I misunderstand your goal?
 
 
 
<sig>&SIGFILE; NOT FOUND ERR 000121</sig>

--
Eureqa Formulize ( http://www.nutonian.com )
-------------------------------------------------
Unsubscribe: eureqa-group+unsub...@googlegroups.com
View Group: http://groups.google.com/group/eureqa-group


Ted Swiecki

unread,
Apr 22, 2012, 1:02:57 AM4/22/12
to Eureqa Group
I had wondered about that tactic. Will give it a go and see if it
works out. A bit of a pain to format in all the blank rows, though.

Michael Schmidt

unread,
Apr 25, 2012, 10:42:16 AM4/25/12
to eureqa...@googlegroups.com
Yeah this depends, do you want to find a single model that captures each location simultaneously? Or find a possibly different model for each? Or find relationships between location? Perhaps you're looking for a dynamical relationship (e.g. the difference or derivative between years) for each? Maybe the average across location would work here?


--
Eureqa Formulize ( http://www.nutonian.com )
-------------------------------------------------

Ted Swiecki

unread,
Apr 26, 2012, 1:06:21 AM4/26/12
to eureqa...@googlegroups.com
What I'm interested in is determining if there are relationships between various predictors.  Some of these (e.g., weather related) fluctuate from year to year across the entire data set (which is essentially of a network of plots spread out across a single location).  Other predictors (e.g., soil type) are fixed across years but vary between the plots.  And other predictors differ between the plots in a given year and change from year to year for individual plots (e.g., fire history - only some plots have burned in specific years). There are spatial relationships between the plots involved here as well, but I'm not even attempting to deal with that in this model.  I have a number of outcome variables, which for simplicity are considered one at a time. They are essentially independent outcomes, though many of them are likely to be correlated with each other.

I have run a few models and they seem to be working out, but they tend to be complex an difficult to interpret.  [This is a topic for a separate thread, but a better way to look at the effects of individual variables would be a real boon.  Using sliders that allow you to vary the input variable and see the effect on the curve is very handy, and is used by various analytical packages, e.g., SAS JMP]

The initial models I ran were based on year to year changes in outcome variables (outcome_year n - outcome_year n-1).  For those, I treated each year as a separate set of 324 points.  I wasn't sure whether that was the best way to do that analysis, but I did end up with some reasonable models. 

L

unread,
Apr 26, 2012, 10:02:18 AM4/26/12
to eureqa...@googlegroups.com
> I have run a few models and they seem to be working out, but they tend to be complex an difficult to interpret.
 
The models are often complicated and hard to interpret, often because they mask the combination of a very good, fairly simple fit - plus a big messy noise term driven by the tiny vagaries of the sample.  (Sadly, it's not usually as simple as "plus.") 
 
The thing to do is look not for the best fit every time, but for the simplest "very good and good enough" fit, often a few rows down the sort from the "best" result.  That is likely to be far more comprehensible. 
 
Often I find this, in my mathematically naive way, a little to the right of the "elbow" of the pareto.  That suggests to me that the more mathematically sophisticated (most of you, he says with simple factuality) might find a way to compute that "happy medium" between complexity and fit. 
 
It might not mean very much until a good level of stability and maturity are reached.  But again, I leave this to the more mathematically sophistomacated.
 
> [This is a topic for a separate thread,
 
Oops.
 

Ted Swiecki

unread,
Apr 27, 2012, 12:38:19 AM4/27/12
to eureqa...@googlegroups.com, L
Well actually, I'm mainly using these models to identify which variables seem to have effects on the outcome and the direction of the effect.  For instance, when a couple of variables show up in almost all of the better models, they are probably important.  The difficulty in interpretation arises in this data set because there are a lot of threshold-type relationships - var1 has some effect only at some levels of var 2 , etc.  That's where faster, easier  ways to tease out these interactions would be useful.

L

unread,
Apr 27, 2012, 10:11:31 AM4/27/12
to eureqa...@googlegroups.com
 
As a very crude approach, try including subsets of the variables - not all of them at once.  Chances are the "influential" ones will rapidly produce (pretty) good fits, the "irrelevant" ones will search for a long time to little effect.
 
Ideally if there are few enough of them that it won't try your patience, try them one at a time, x = f(y).
 
To be sensitive to thresholds, include the logical building blocks and/or the squashing building blocks.

Ted Swiecki

unread,
Apr 27, 2012, 5:09:31 PM4/27/12
to eureqa...@googlegroups.com, L
When there is a lot of interaction between predictors, single variable models are often not very useful.  The formulize software seems to be well suited to plowing through a wide array of variable at one crack and pick out those variables that are meaningful.  There wouldn't be much point in using it if I had to grunt out dozens of one and two variable models on my own.  And yes, I have made extensive use of the the logical building blocks in particular. 
Reply all
Reply to author
Forward
0 new messages