when is a time series not a time series?

Mountain Bikn' Guy

unread,

Jan 27, 2003, 6:35:17 PM1/27/03

to

My dependent variable fits at least one definition of a time series: "If you
take a sequence of equally spaced readings, this is called a time series."
Furthermore, there is very strong autocorrelation (near 1) in the dependent
variable -- when tested in the order the data is collected. However, I can
randomly resort all the data (dependent plus independent variables) so that
there is no longer any autocorrelation and this does not affect the
predictive ability of the independent variables. So I'm thinking that I am
not dealing with a time series. Any thoughts?

Any arguments in favor of using time series analyses?

Knowing that I _can_ remove the autocorrelation, can I proceed to perform
parametric regression analysis without actually randomly sorting the data
and treat this as a non-time-series analysis?

TIA

Steve

Mountain Bikn' Guy

unread,

Jan 27, 2003, 9:23:52 PM1/27/03

to

But order of observations has everything to do with a time series, doesn't
it?
And autoregression models use prior values of the time series to generate
current estimates, so randomly sorting the data would invalidate everything
for a real time series.. By these standards, I'm not dealing with a time
series and any correlation among dependent variables could just be an
artifact due to lack of random sampling -- I'm sampling at regular time
intervals. Of course, lack of random sampling has its own issues, but my
question is related to whether I should treat my data as a time series.

I assume your "Nope!" refers to my last question, which was :

> >Knowing that I _can_ remove the autocorrelation, can I proceed to perform
> >parametric regression analysis without actually randomly sorting the data
> >and treat this as a non-time-series analysis?

Regards,
Steve

"Dick Startz" <richard...@attbi.com> wrote in message
news:piob3vs0m2rl3tb60...@4ax.com...
> Nope!
>
> The order of observations has no effect on coefficient estimates or
> prediction from a regression. But if the errors in the regression (not
> just the values of the dependent variable) are correlated with one
> another, then a regression may not be the best thing to do for
> variety of reasons.
>
> Sorting the data doesn't make the correlation go away, it just hides
> it. It used to be that observation 1 was correlated with observation 2
> and observation 2 with observation 3, etc. Now 1 is correlated with
> 393 (or wherever 2 got randomly sorted to), etc.
>
> -Dick Startz

> ----------------------
> Richard Startz Richard...@attbi.com
> Lundberg Startz Associates

Glen

unread,

Jan 27, 2003, 10:23:14 PM1/27/03

to

"Mountain Bikn' Guy" <v...@attbi.com> wrote in message news:<VmjZ9.63329$Ve4.6714@sccrnsc03>...

> My dependent variable fits at least one definition of a time series: "If you
> take a sequence of equally spaced readings, this is called a time series."
> Furthermore, there is very strong autocorrelation (near 1) in the dependent
> variable -- when tested in the order the data is collected. However, I can
> randomly resort all the data (dependent plus independent variables) so that
> there is no longer any autocorrelation and this does not affect the
> predictive ability of the independent variables. So I'm thinking that I am
> not dealing with a time series. Any thoughts?

Time series really just means collected over time. What matters is
the extent to which the time order results in a correlation structure
in the data.

If you randomly shuffle any time series, *of course* you lose
the correlation structure between adjacent or near-adjacent elements,
because the correlated elements are no longer adjacent. Why would you
think that would somewhow mean the original data was uncorrelated?

> Any arguments in favor of using time series analyses?

Because it can take into account the effect of serial correlation?

Note that the "predictive ability" *is* affected in the sense that
a prediction interval for the next observation will not have the
correct coverage probability, because you have ignored the explanatory
effect of the immediately previous observations.

There is a kind of bias in the following sense: look at your fitted
values for the observations that follow "high" previous values, and
look at your fitted values for the observations that follow "low"
previous values. You should see that the fitted values in the first
group tend to be a bit low, and the fitted values in the second
group tend to be a bit high.

> Knowing that I _can_ remove the autocorrelation,

You haven't removed the relationship to the observations that
*were* immediately preceeding, you've just made them all different
distances away now, so that if you use a measure only designed to
pick up the structured correlation you used to have, it naturally
can't see the correlations that are still there, because you
changed the structure. You haven't removed the structure, you've
simply hidden it.

Note that your future observations still arrive /in order/, not
shuffled. It doesn't help you to pretend otherwise.

> ... can I proceed to perform

> parametric regression analysis without actually randomly sorting the data
> and treat this as a non-time-series analysis?

You can do what you like.

Whether it is a sensible thing to do is another matter.

Glen

David Reilly

unread,

Jan 28, 2003, 7:14:21 AM1/28/03

to

"Mountain Bikn' Guy" <v...@attbi.com> wrote in message news:<VmjZ9.63329$Ve4.6714@sccrnsc03>...

> My dependent variable fits at least one definition of a time series: "If you
> take a sequence of equally spaced readings, this is called a time series."
> Furthermore, there is very strong autocorrelation (near 1) in the dependent
> variable -- when tested in the order the data is collected. However, I can
> randomly resort all the data (dependent plus independent variables) so that
> there is no longer any autocorrelation and this does not affect the
> predictive ability of the independent variables.

What do you mean ? Are you saying that that the predictive ability (
statistical importance ? ) of the independent variables was zero in
both cases or that the form of the resultant transfer function (ARMAX)
model was the same in both cases ? . In both of these cases the
predictions would be the same .

For a discussion of the treatment of causal variables , please see
http://www.autobox.com/teach.html

Regards

Dave Reilly
Automatic Forecasting Systems

P.S.

If you wish you can call me and I will try and sort out your dilemma
...

215-675-0652

Mountain Bikn' Guy

unread,

Jan 28, 2003, 10:51:21 AM1/28/03

to

As I read these replies and think about them, it is starting to seem more
and more clear that I am not dealing with a time series, even though
everyone keeps arguing that it is a true time series. I look forward to more
comments.

In my case, the order of the data does not matter in terms of making
predictions about new cases:

> There is a kind of bias in the following sense: look at your fitted
> values for the observations that follow "high" previous values, and
> look at your fitted values for the observations that follow "low"
> previous values. You should see that the fitted values in the first
> group tend to be a bit low, and the fitted values in the second
> group tend to be a bit high.

This bias is not present in my fitted values.

> If you randomly shuffle any time series, *of course* you lose
> the correlation structure between adjacent or near-adjacent elements,
> because the correlated elements are no longer adjacent. Why would you
> think that would somewhow mean the original data was uncorrelated?

This statement can be turned around. If the order of the data is not
relevant in terms of the accuracy of the fitted values, then I can argue
that the correlation was only due to non-random (time-based) sampling. So,
here's the logic turned around: generate a series of random data values,
sort these values numerically, and then perform a test for autocorrelation.
Of course, there will be high correlation. But it is meaningless. It seems
obvious that the correlation is an artifact. If I again resort the data,
that autocorrelation goes away. At the moment, that's how I'm viewing the
correlation present in my data.

Steve

Mountain Bikn' Guy

unread,

Jan 28, 2003, 11:04:42 AM1/28/03

to

>
> Note that the "predictive ability" *is* affected in the sense that
> a prediction interval for the next observation will not have the
> correct coverage probability, because you have ignored the explanatory
> effect of the immediately previous observations.

My working assumption is that the explanatory effect of any (originally)
previous observation is irrelevant. The model does not need to take into
account the value of previous observations. Given this situation, I would
think that using time series techniques and methods would be inappropriate
(even though I do collect the data sequentially over time).

>
> Note that your future observations still arrive /in order/, not
> shuffled. It doesn't help you to pretend otherwise.

The fact that future values may arrive in order is, again, irrelevant. I
will use the model to predict a future value out of order (ie, in the same
way oen would use any linear regression equation) so again, I do not think I
should/could use time series methods.

The dilemma is that time series methods seem inappropriate, but there are
also violations of most of the assumptions for using std regression methods
as well unless I take actions to offset these (for example, resorting the
data). Maybe I need to go completely to non parametric methods... ?

Steve

Gus Gassmann

unread,

Jan 28, 2003, 1:41:59 PM1/28/03

to

Mountain Bikn' Guy wrote:

> >
> > Note that the "predictive ability" *is* affected in the sense that
> > a prediction interval for the next observation will not have the
> > correct coverage probability, because you have ignored the explanatory
> > effect of the immediately previous observations.
>
> My working assumption is that the explanatory effect of any (originally)
> previous observation is irrelevant. The model does not need to take into
> account the value of previous observations. Given this situation, I would
> think that using time series techniques and methods would be inappropriate
> (even though I do collect the data sequentially over time).

Not inappropriate. Perhaps "overkill", perhaps not statistically significant,
but certainly appropriate.

> > Note that your future observations still arrive /in order/, not
> > shuffled. It doesn't help you to pretend otherwise.
>
> The fact that future values may arrive in order is, again, irrelevant. I
> will use the model to predict a future value out of order (ie, in the same
> way oen would use any linear regression equation) so again, I do not think I
> should/could use time series methods.

The reason for using time series models is simple: To make autocorrelation
work _for_ you, by improving the forecasts. Just ask yourself this question:
In predicting the next value, will you be able to make a better forecast
(with a tighter forecast interval) knowing the current value? If yes, time
series
analysis should be used. If not, or if you don't care about forecast intervals,
use any method you like.

> The dilemma is that time series methods seem inappropriate, but there are
> also violations of most of the assumptions for using std regression methods
> as well unless I take actions to offset these (for example, resorting the
> data). Maybe I need to go completely to non parametric methods... ?

Why do I get the impression that you are grasping at straws?
What assumptions are you talking about, how are they violated,
and why does re-sorting the data fix whatever problems you got?

Mountain Bikn' Guy

unread,

Jan 28, 2003, 2:14:05 PM1/28/03

to

"Gus Gassmann" <hgas...@mgmt.dal.ca> wrote in message
news:3E36CEF7...@mgmt.dal.ca...

> Mountain Bikn' Guy wrote:
>
> > >
> > > Note that the "predictive ability" *is* affected in the sense that
> > > a prediction interval for the next observation will not have the
> > > correct coverage probability, because you have ignored the explanatory
> > > effect of the immediately previous observations.
> >
> > My working assumption is that the explanatory effect of any (originally)
> > previous observation is irrelevant. The model does not need to take into
> > account the value of previous observations. Given this situation, I
would
> > think that using time series techniques and methods would be
inappropriate
> > (even though I do collect the data sequentially over time).
>
> Not inappropriate. Perhaps "overkill", perhaps not statistically
significant,
> but certainly appropriate.

Understood. Dave at Autobox made the point that time series analysis can be
considered a superset of standard multiple regression methods.

>
> > > Note that your future observations still arrive /in order/, not
> > > shuffled. It doesn't help you to pretend otherwise.
> >
> > The fact that future values may arrive in order is, again, irrelevant. I
> > will use the model to predict a future value out of order (ie, in the
same
> > way oen would use any linear regression equation) so again, I do not
think I
> > should/could use time series methods.
>
> The reason for using time series models is simple: To make autocorrelation
> work _for_ you, by improving the forecasts. Just ask yourself this
question:
> In predicting the next value, will you be able to make a better forecast
> (with a tighter forecast interval) knowing the current value? If yes, time
> series
> analysis should be used. If not, or if you don't care about forecast
intervals,
> use any method you like.

This makes a lot of sense to me. I do not think we can/will get a better
forecast using other values of Y. Therefore, I am not going to use Y in the
regression equation as I would in autoregression.

>
> > The dilemma is that time series methods seem inappropriate, but there
are
> > also violations of most of the assumptions for using std regression
methods
> > as well unless I take actions to offset these (for example, resorting
the
> > data). Maybe I need to go completely to non parametric methods... ?
>
> Why do I get the impression that you are grasping at straws?
> What assumptions are you talking about, how are they violated,
> and why does re-sorting the data fix whatever problems you got?

Resorting doesn't change anything -- this was just an effort to explain my
situation. What is happening is that I am/was reading too many conflicting
things and therefore, trying to achieve a situation where I would not
violate *any* assumptions of any of my statistical techniques -- an
impossibility, to be sure.But this discussion has cleared some things up and
I now feel comfortable that I do not need to use autoregression techniques
even though I have an autocorrelated time series.

Maybe Dave at Autobox will post some of his thoughts because they were
helpful to me. (Thanks Dave!)

>
>
>

Rich Ulrich

unread,

Jan 28, 2003, 4:16:12 PM1/28/03

to

On Tue, 28 Jan 2003 19:14:05 GMT, "Mountain Bikn' Guy" <v...@attbi.com>
wrote:

[ snip, much... ]

> I now feel comfortable that I do not need to use autoregression techniques
> even though I have an autocorrelated time series.

> [ ... ]

X is autocorrelated, and Y is autocorrelated?

One characterization that I like is to say,
this will act like a test-situation for t-test or correlation
that has a much tinier number of degrees of freedom.
Consider: You do not really have more *information*
if you measure air temperature (say) every minute....

Is somebody's climate question (if that were the topic) really
worth 24 time points per day? 24x60? 24x60x60 ?
In the important sense of 'information', you can't manufacture
'information' by measuring the same thing over and over again,
when there is nothing independent.

One way to 'fix' some autocorrelation is to skip all
the points that are close enough to be correlated.
That's true even if you have to drop 9 out of 10 points, or
999 out of 1000.

Check for cross-correlation, and if there is not any, then
that changes the next questions.
I would not worry about applying diagnostics
until I spent some time in figuring whether the series
*ought* to be correlated, for some logical reason.
That should give me a hint of which critical
components need to be disentangled, which *changes*
ought to provide a sensitive test of co-variation.

--
Rich Ulrich, wpi...@pitt.edu
http://www.pitt.edu/~wpilib/index.html

Mountain Bikn' Guy

unread,

Jan 28, 2003, 6:42:49 PM1/28/03

to

"Rich Ulrich" <wpi...@pitt.edu> wrote in message
news:2krd3v8upoqvnh7iu...@4ax.com...

> On Tue, 28 Jan 2003 19:14:05 GMT, "Mountain Bikn' Guy" <v...@attbi.com>
> wrote:
>
> [ snip, much... ]

>

> X is autocorrelated, and Y is autocorrelated?

Y is autocorrelated. The X variables are of many different data types,
including nominal, ordinal, interval and continuous. Some of the continuous
X's will show autocorrelation. Actually, our data set has thousands of
independent variables, and I have not actually tested every one for
autocorrelation.

>
[snip]

>
> Check for cross-correlation, and if there is not any, then
> that changes the next questions.
> I would not worry about applying diagnostics
> until I spent some time in figuring whether the series
> *ought* to be correlated, for some logical reason.
> That should give me a hint of which critical
> components need to be disentangled, which *changes*
> ought to provide a sensitive test of co-variation.

Theoretically, the autocorrelation of Y is appropriate but not meaningful in
our models. Dave Reilly put is nicely: If the correct model is Y = 3X, then
Y will be upward trending and will show autocorrelation. However, with the
correct model, the causal variable X explains all the variability, and other
observed values of Y are not needed in the model. In the absense of the
correct X, autoregression would be helpful.

Rich Ulrich

unread,

Jan 30, 2003, 4:17:37 PM1/30/03

to

On Tue, 28 Jan 2003 23:42:49 GMT, "Mountain Bikn' Guy" <v...@attbi.com>
wrote:
< responding to me, where I asked ,>

> > X is autocorrelated, and Y is autocorrelated?

> Y is autocorrelated. The X variables are of many different data types,
> including nominal, ordinal, interval and continuous. Some of the continuous
> X's will show autocorrelation. Actually, our data set has thousands of
> independent variables, and I have not actually tested every one for
> autocorrelation.

You might get some advice that is usefully tuned to your
concerns, if you reveal more about the data.

How might those nominal variables relate to the time series?
I think of my nominal variables as each keeping one value
for a long period of time. That would always be the case for
the 'time series' that can be subdivided into smaller intervals,
like those series on temperatures, etc.

If the nominal has each value for a while, then the Time series
can be (and ought to be) aggregated into equivalent units
of time -- if there is a *big*, overall main effect. On the
other hand, if the nominal variable has an influence that is
'temporary' as the times are scaled, then the proper analysis
only uses data from the time when the nominal *changes*.

There are several things that can be called 'event analyses'.
It might be that for any given change from B to C, there
could be a systematic change in the time series, over
some fairly short sequence. But it would be stupid to test all
those possibilities, with thousands of variables and millions
of time-frames, if there wasn't a logical basis for some question.