Time variable (years) in regression analysis

23 views
Skip to first unread message

cbechet90

unread,
Feb 6, 2017, 7:32:16 PM2/6/17
to StatForLing with R
Hello everyone,

I would like to fit a GLM with fixed and random effects on a data set in which time is an independent variable. Let's say the data set starts from 1530 and ends in 1730, would you rescale the origin to 0 and change all other data values to x-1530? If yes, would you recommend a package which can deal with time variables in this way? I considered grouping my observations into decades and then converting the decades into a categorical variable. However, in doing so I would certainly miss interesting information about between-years variation.

I look forward to reading your suggestions.

Thank you very much for the help!

Matías Guzmán Naranjo

unread,
Feb 6, 2017, 7:45:06 PM2/6/17
to statforli...@googlegroups.com
Why do you want to rescale to have the origin at 0?

--
You received this message because you are subscribed to the Google Groups "StatForLing with R" group.
To unsubscribe from this group and stop receiving emails from it, send an email to statforling-with-r+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

cbechet90

unread,
Feb 6, 2017, 8:17:32 PM2/6/17
to StatForLing with R
Because I'm dealing with a time variable and it seems to me that keeping the years as such would be misleading for the interpretation of the coefficients. The values for "year", when rescaled, would range between 0 and 200. Or does it matter at all? I've never used time variables as interval scaled variables until now due to lack of representativeness.

Matías Guzmán Naranjo

unread,
Feb 6, 2017, 8:21:17 PM2/6/17
to statforli...@googlegroups.com
It depends. You can interpret the time variable just fine without rescaling, since it'll be fitted as a regular linear predictor. Unless you're worried about specifying some sort of special interaction, or you need to compare coefficients, you shouldn't need to rescale.

cbechet90

unread,
Feb 6, 2017, 8:33:58 PM2/6/17
to StatForLing with R
Actually I expect some interaction between the time variable and the occurrence of a linguistic item (e.g. occurrence of the definite article "the"). For instance, there seems to be an interaction between the passage of time and the loss of the definite article in such expressions as "instead of", "in place of" and "in lieu of", leaving aside problems of multicollinearity and autocorrelation.

Matías Guzmán Naranjo

unread,
Feb 6, 2017, 8:38:25 PM2/6/17
to statforli...@googlegroups.com
> I expect some interaction between the time variable and the occurrence of a linguistic item

unless there is some mathematical reason why specifying that interaction requires rescaling, you do not need to rescale. It'd be easier if you could share a sample of your data set and your model specification.

> leaving aside problems of multicollinearity and autocorrelation.

rescaling won't help you here. I don't have the reference right now, but rescaling only hides collinearity.

To unsubscribe from this group and stop receiving emails from it, send an email to statforling-with-r+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Stefan Th. Gries

unread,
Feb 6, 2017, 8:45:03 PM2/6/17
to StatForLing with R
Some comments (one with some self-promotion, sorry, but the fit is too
close to ignore):

- I don't think you need rescaling: the significance tests and the
predicted values shouldn't be affected by that much;
- I do think that, if you enter any TIME predictor - numeric, ordinal,
categorical - it needs to also interact with probably all other
predictors;
- if you enter TIME as a numeric predictor, using a straight-line
regression is probably way too simplistic because doing that means
you're hypothesizing a linear unchanged increase over time (which I
cannot believe you would want to subscribe to) - thus, I'd recommend
using a structure for TIME that allows for curvature: a simple way is
using a polynomial (2nd or 3rd degree), a complex one is fitting a
GAMM;
- you may consider first grouping together time points into time
periods and now I have to refer to Gries & Hilpert (2010k), where we
have very erratic temporal data and use a method I called VNC to find
structure in the temporal data so that, then, TIME could be entered
into a mixed-effects model as a categorical predictor.

Matías Guzmán Naranjo

unread,
Feb 6, 2017, 9:00:02 PM2/6/17
to statforli...@googlegroups.com
If you want to go down the route of structuring TIME (instead of, say, using splines or GAMs, etc.), you could take a look at Rafał L. Górski & Maciej Eder, ("Piotrowski’s Law and Four Cases in the History of Polish"). They basically use overlapping 10 year (I think) intervals. So level 1 goes form the year 0-10, level 2 from 5-15, etc.

Christophe Bechet

unread,
Feb 6, 2017, 9:09:45 PM2/6/17
to statforli...@googlegroups.com
Thank you very much for your suggestions. I will try different methods and see if the different models differ from each other. BTW, I already used VNC before running a MCU and it can indeed be helpful in the present case.

To unsubscribe from this group and stop receiving emails from it, send an email to statforling-with-r+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages