# Time variable (years) in regression analysis

23 views

### cbechet90

Feb 6, 2017, 7:32:16 PM2/6/17
to StatForLing with R
Hello everyone,

I would like to fit a GLM with fixed and random effects on a data set in which time is an independent variable. Let's say the data set starts from 1530 and ends in 1730, would you rescale the origin to 0 and change all other data values to x-1530? If yes, would you recommend a package which can deal with time variables in this way? I considered grouping my observations into decades and then converting the decades into a categorical variable. However, in doing so I would certainly miss interesting information about between-years variation.

Thank you very much for the help!

### Matías Guzmán Naranjo

Feb 6, 2017, 7:45:06 PM2/6/17
Why do you want to rescale to have the origin at 0?

--
You received this message because you are subscribed to the Google Groups "StatForLing with R" group.
To unsubscribe from this group and stop receiving emails from it, send an email to statforling-with-r+unsub...@googlegroups.com.

### cbechet90

Feb 6, 2017, 8:17:32 PM2/6/17
to StatForLing with R
Because I'm dealing with a time variable and it seems to me that keeping the years as such would be misleading for the interpretation of the coefficients. The values for "year", when rescaled, would range between 0 and 200. Or does it matter at all? I've never used time variables as interval scaled variables until now due to lack of representativeness.

### Matías Guzmán Naranjo

Feb 6, 2017, 8:21:17 PM2/6/17
It depends. You can interpret the time variable just fine without rescaling, since it'll be fitted as a regular linear predictor. Unless you're worried about specifying some sort of special interaction, or you need to compare coefficients, you shouldn't need to rescale.

### cbechet90

Feb 6, 2017, 8:33:58 PM2/6/17
to StatForLing with R
Actually I expect some interaction between the time variable and the occurrence of a linguistic item (e.g. occurrence of the definite article "the"). For instance, there seems to be an interaction between the passage of time and the loss of the definite article in such expressions as "instead of", "in place of" and "in lieu of", leaving aside problems of multicollinearity and autocorrelation.

### Matías Guzmán Naranjo

Feb 6, 2017, 8:38:25 PM2/6/17
> I expect some interaction between the time variable and the occurrence of a linguistic item

unless there is some mathematical reason why specifying that interaction requires rescaling, you do not need to rescale. It'd be easier if you could share a sample of your data set and your model specification.

> leaving aside problems of multicollinearity and autocorrelation.

rescaling won't help you here. I don't have the reference right now, but rescaling only hides collinearity.

To unsubscribe from this group and stop receiving emails from it, send an email to statforling-with-r+unsubscribe@googlegroups.com.

### Stefan Th. Gries

Feb 6, 2017, 8:45:03 PM2/6/17
to StatForLing with R
Some comments (one with some self-promotion, sorry, but the fit is too
close to ignore):

- I don't think you need rescaling: the significance tests and the
predicted values shouldn't be affected by that much;
- I do think that, if you enter any TIME predictor - numeric, ordinal,
categorical - it needs to also interact with probably all other
predictors;
- if you enter TIME as a numeric predictor, using a straight-line
regression is probably way too simplistic because doing that means
you're hypothesizing a linear unchanged increase over time (which I
cannot believe you would want to subscribe to) - thus, I'd recommend
using a structure for TIME that allows for curvature: a simple way is
using a polynomial (2nd or 3rd degree), a complex one is fitting a
GAMM;
- you may consider first grouping together time points into time
periods and now I have to refer to Gries & Hilpert (2010k), where we
have very erratic temporal data and use a method I called VNC to find
structure in the temporal data so that, then, TIME could be entered
into a mixed-effects model as a categorical predictor.

### Matías Guzmán Naranjo

Feb 6, 2017, 9:00:02 PM2/6/17
If you want to go down the route of structuring TIME (instead of, say, using splines or GAMs, etc.), you could take a look at Rafał L. Górski & Maciej Eder, ("Piotrowski’s Law and Four Cases in the History of Polish"). They basically use overlapping 10 year (I think) intervals. So level 1 goes form the year 0-10, level 2 from 5-15, etc.