Trajectory modelling, useful with only two timepoints?

Thomas Fröjd

unread,

Nov 4, 2009, 10:23:58 AM11/4/09

to MedStats

Hi.

I am working with a researcher analysing data following a number of
individuals on two timepoints. The measurements we are interested in
are the number of PTSD symptoms they have on each timepoint.

The researcher I work with suspect that there are a number of
naturally occuring subpopulations among the individuals with diffrent
patterns in recovery. Basically she hypothesizes that one group will
will have a low number of symptoms on each occation, another group
will have a high number of symptoms on both ocations and the last will
have a high number of symptoms the first occation but few in the last
occation.

She has asked me if I could estimate the proportions in the sample of
the subpopulations.

My idea is to use trajectory modelling with PROC TRAJ in SAS. I
completely lack real world experience with this type of analysis so I
would like to ask if you think it is the proper way to go. Also is it
still useful to use when there are only two points in time? It seems
as it is most useful when there are more than two time points
observed.

Maybe there is a simpler alternative I am overlooking? Some other type
of clustering?

Best regards
Thomas

Ted Harding

unread,

Nov 4, 2009, 11:02:50 AM11/4/09

to meds...@googlegroups.com

My initial approach to a question of this kind would be to start
with a plot, (X,Y) for each subject, where

X = Number of symptoms at first time point
Y = Number of symptoms for second time point.

If your colleague's suspicions are correct, then the plotted points
should exhibit a clustering corresponding to the her descriptions
(Low+Low, High+High, High+Low) of the groups, on the lines of

H + *
| *****
T | ***
i | **
m |
e | ** **
| **** *****
2 |**** ****
| ** **
L +----------------+
L Time 1 H

Another approach -- if you have some idea of where to draw dividing
lines between High & Low (and inspection of a plot such as described
may provide you with a good idea), then you can simply draw up a Table:

Time 2
T Low High
i
m Low nLL nLH
e
1 High nHL nHH

which will give you the proportions directly.

I don't think getting into "trajectory analysis" can add anything
to simple comparisons of Time 1 with Time 2 as above.

However, once you have created the (X,Y) pairs for each subject,
it could well be worth looking into methods of Cluster Analysis.

Hoping this helps,
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.H...@manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 04-Nov-09 Time: 16:02:47
------------------------------ XFMail ------------------------------

Jeremy Miles

unread,

Nov 4, 2009, 12:17:20 PM11/4/09

to meds...@googlegroups.com

Hi Thomas,

Three thoughts.

1) I have never seen trajectory models on two time points, but that's
not to say that it can't be done. However, each individual's
trajectory will be the difference between their time 1 score and time
2 score. If you find that difference, then plot it, see if it looks
to by bi/multimodal. Then do a scatterplot time 1 score against
difference - does it have obvious clusters?

2) There is debate in the literature about the Nagin approach (as with
proc traj) and the approach of Bength Muthen, implemented in Mplus.
Very, very briefly, Mplus allows random variation (if you ask for it)
of slopes within groups. Proc traj doesn't.

3) PTSD scores tend to be non-normally distributed. Trajectory
approaches make the assumption that non-normality is because of
mixtures of distributions. If your data are just plain non-normal,
you'll find groups which try to account for the non-normality by
creating mixtures of normal distributions. There's nothing wrong with
that - that means that you are modeling your distribution better, but
don't go interpreting them as qualitatively different groups without
more evidence. (If you've got more than three time periods, you can
also use mixtures to handle non-linearity).

Here are a couple of potentially useful papers:

BAUER, D. J. and CURRAN, P. J. Distributional assumptions of growth
mixture models: Implications for overextraction of latent trajectory
classes. Psychological Methods 8: 338-363, 2003.

BAUER, D. J. A semiparametric approach to modeling nonlinear relations
among latent variables. Structural Equation Modeling 12: 513-535,
2005.

Jeremy

2009/11/4 Thomas Fröjd <tfr...@gmail.com>:

--
Jeremy Miles
Psychology Research Methods Wiki: www.researchmethodsinpsychology.com

Thomas Fröjd

unread,

Nov 5, 2009, 5:52:36 AM11/5/09

to MedStats

Hi, Thank you all for your timely replies.

Let me clearify the reson behind the analysis first. As Peter and Jeff
pointed out, two measures are not really a repeated measures study and
not enough to model growth properly. This data is from the first
follow up in a cohort study and will in the future be extended with
more measurements. For the moment however we would like to describe
the material so far as good as possibly.

The reason for trying to estimates patterns of change are not to test
any hypothesis if subgroup exists or trying to determine what factors
affects recovery. The question of interest are more like:

How many of the subjects where affected in the first timepoint but
seem to have recovered?
How many had a medium or high score in the beginning but show no or
small signs of recovery?
How many had a low score, indicating less impact, but seem to have a
delayed reaction?
How many had a low score in both time points indicating resilency
against PTSD?

Plotting the data T1 vs T2 as Ted Harding suggested (I am really
impressed by your ascii graphing skills by the way) shows no clear
clusters and I am therefore reluctant to set any cutoffs. I would
prefer the data to determine this.

I am now looking into clustering as suggested. I am pretty new to this
area so I would appreciate it if anyone could give me som help on the
following questions.

A Jeremy poited out the data has a count distribution with most
observations on small values. One of the reasons I ended up with PROC
TRAJ was that it conveniently supported zero inflated poisson as a
distributional model. Does this matter when using cluster analysis?

What clustering techniques do you think are useful? From the little I
know K-means clustering should be the proper one?

What variables should I cluster on? Should I use measurements at T1
and T2 as analysis variables or is the measurement at T1 and the
diffrence better (T2-T1)? Or will it not matter?

Any tips on how to decide the number of clusters? I am thinking that
since the variables are correlated this might induce some problems
with some techniques.

Again, thanks for all the input. Have a nice day everyone.

/Thomas

Reply all

Reply to author

Forward