Re: {MEDSTATS} Trajectory modelling, useful with only two timepoints?

Peter Flom

unread,

Nov 4, 2009, 10:57:22 AM11/4/09

to MedStats

Thomas Fröjd <tfr...@gmail.com> wrote
>
>I am working with a researcher analysing data following a number of
>individuals on two timepoints. The measurements we are interested in
>are the number of PTSD symptoms they have on each timepoint.
>
>The researcher I work with suspect that there are a number of
>naturally occuring subpopulations among the individuals with diffrent
>patterns in recovery. Basically she hypothesizes that one group will
>will have a low number of symptoms on each occation, another group
>will have a high number of symptoms on both ocations and the last will
>have a high number of symptoms the first occation but few in the last
>occation.
>
>She has asked me if I could estimate the proportions in the sample of
>the subpopulations.
>
>My idea is to use trajectory modelling with PROC TRAJ in SAS. I
>completely lack realworld experience with this type of analysis so I
>would like to ask if you think it is the proper way to go. Also is it
>still useful to use when there are only two points in time? It seems
>as it is most useful when there are more than two time points
>observed.
>
>Maybe there is a simpler alternative I am overlooking? Some other type
>of clustering?
>

I've never used PROC TRAJ (never even heard of it) but I'd hesitate to call something with only two timepoints a trajectory. Patterns in recovery are too complex to be captured with only two time points.

I am also leery of classifying number of symptoms into "low" and "high" unless there is a clear separation.

That said, does she have hypotheses about which people will fall into which categories, or is she searching for patterns? If the former, a multinomial logistic model might be good. If the latter, then some sort of cluster analysis.

HTH

Peter

Peter L. Flom, PhD
Statistical Consultant
Website: www DOT peterflomconsulting DOT com
Writing; http://www.associatedcontent.com/user/582880/peter_flom.html
Twitter: @peterflom

Jeff Allard

unread,

Nov 4, 2009, 11:00:52 AM11/4/09

to meds...@googlegroups.com

For growth modeling, use of Proc Mixed is straight forward. Singer and Willett have a great book on this topic to check out. Many resources here: http://gseacademic.harvard.edu/alda/

Two points dont really constitute repeated measures though....

Jeff

2009/11/4 Peter Flom <peterflom...@mindspring.com>

Peter Flom

unread,

Nov 5, 2009, 6:36:41 AM11/5/09

to MedStats

Thomas Fröjd <tfr...@gmail.com> wrote
>Hi, Thank you all for your timely replies.
>

You're certainly welcome! Interesting discussion, and I'm learning, too.

>Let me clearify the reson behind the analysis first. As Peter and Jeff
>pointed out, two measures are not really a repeated measures study and
>not enough to model growth properly. This data is from the first
>follow up in a cohort study and will in the future be extended with
>more measurements. For the moment however we would like to describe
>the material so far as good as possibly.
>

That makes sense.

>The reason for trying to estimates patterns of change are not to test
>any hypothesis if subgroup exists or trying to determine what factors
>affects recovery. The question of interest are more like:
>
>How many of the subjects where affected in the first timepoint but
>seem to have recovered?
>How many had a medium or high score in the beginning but show no or
>small signs of recovery?
>How many had a low score, indicating less impact, but seem to have a
>delayed reaction?
>How many had a low score in both time points indicating resilency
>against PTSD?
>

Now we get into "what the data show" vs. "what theory says". I know a bit about PTSD, and I don't think "low" and "high" adequately describe the possibilities. For one thing, there are subscales to PTSD. Do you have subscale scores?

>
>Plotting the data T1 vs T2 as Ted Harding suggested (I am really
>impressed by your ascii graphing skills by the way) shows no clear
>clusters and I am therefore reluctant to set any cutoffs. I would
>prefer the data to determine this.
>

That ascii graph *was* remarkable, wasn't it?

But if that sort of graph doesn't show clear clusters, I am reluctant to
recommend cluster approaches. I am (as I've mentioned repeatedly) not a big fan of categorization, and I *am* a fan of using substantive knowledge. If you *must* categorize, can you use substantively derived cutoffs?

The question then would be how to answer these questions without categorization?

I'd start with more graphs ... a density plot of the differences in scores might help.

>I am now looking into clustering as suggested. I am pretty new to this
>area so I would appreciate it if anyone could give me som help on the
>following questions.
>
>A Jeremy poited out the data has a count distribution with most
>observations on small values. One of the reasons I ended up with PROC
>TRAJ was that it conveniently supported zero inflated poisson as a
>distributional model. Does this matter when using cluster analysis?
>
>What clustering techniques do you think are useful? From the little I
>know K-means clustering should be the proper one?
>
>What variables should I cluster on? Should I use measurements at T1
>and T2 as analysis variables or is the measurement at T1 and the
>diffrence better (T2-T1)? Or will it not matter?
>
>Any tips on how to decide the number of clusters? I am thinking that
>since the variables are correlated this might induce some problems
>with some techniques.
>

There is some literature on choosing number of clusters in hierarchical methods, using statistics like CCC and so on. There's a decent summary in the SAS documentation. When I've done clustering, I've not found these to be very informative. Often (again, in my experience) the different measures suggest differnt numbers of clusters.

I would suggest trying k-means with a varying number of k, and seeing which results make sense.

My intuition is that it won't matter much whether you use T1 and T2, or T1 and T2-T1, but I am not sure of this at all

Adrian Sayers

unread,

Nov 5, 2009, 7:21:17 AM11/5/09

to meds...@googlegroups.com

I wonder if you have thought about latent class modeling. I dont know
a great deal about it, but it sounds like it may be the thing your
after. as it look at patterns within data and how they change over
time.

From the guys i know that use this, they all do it in m-plus.

I am no expert, but might be worth a look

bw
Adrian

2009/11/5 Peter Flom <peterflom...@mindspring.com>:

Thomas Fröjd

unread,

Nov 5, 2009, 7:29:33 AM11/5/09

to MedStats

Hi

There are subscales. Intrusion, avoidance and hyperarousal. Looking
for patterns in change between the individuals sounds like a great
idea. Any ideas about how this could be done?

On 5 Nov, 12:36, Peter Flom <peterflomconsult...@mindspring.com>
wrote:

SR Millis

unread,

Nov 5, 2009, 7:45:57 AM11/5/09

to meds...@googlegroups.com

I'm coming late to this discussion, so this question may have already been answered. Does your PTSD scale have an response bias scale to rule out symptoms over endsorsement or under endorsement? PTSD is quite easy to "fake" on essentially every PTSD scale I've ever encountered.

SR Millis

--- On Thu, 11/5/09, Thomas Fröjd <tfr...@gmail.com> wrote:

Ted Harding

unread,

Nov 5, 2009, 8:17:12 AM11/5/09

to meds...@googlegroups.com

On 05-Nov-09 11:36:41, Peter Flom wrote:
> Thomas Fröjd <tfr...@gmail.com> wrote

>>[...]

>>Plotting the data T1 vs T2 as Ted Harding suggested (I am really
>>impressed by your ascii graphing skills by the way) shows no clear
>>clusters and I am therefore reluctant to set any cutoffs. I would
>>prefer the data to determine this.
>
> That ascii graph *was* remarkable, wasn't it?

That's nothing! (And it's quite easily done). Once upon a time
it was the only way to do such things (in the absence of a pen
plotter, which was cutting-edge technology at the time).

People are very used to things like Tukey's box-plot which was
first developed back in the 1960's & 70's when all that most people
had was printout from a teletype or line printer (fixed-space in
most cases), and Tukey devised many of his representations with the
aim of presenting information as vividly as possible on such primitive
devices.

An ASCII Art instance of a box-plot can be seen on

http://en.wikipedia.org/wiki/Box_plot

under

Example
A plain-text version might look like this:

+-----+-+
* o |-------| | |---|
+-----+-+

+---+---+---+---+---+---+---+---+---+---+---+---+ number line
0 1 2 3 4 5 6 7 8 9 10 11 12

and computers were programmed to generate such things.

Much less in use now, but very useful then, was the stem-and-leaf
plot, again an instance of ingenious ASCII art, which cleverly
combined both the graphical impact of the histogram with the
numerical information embodied in printed numerals. Again, an
example can be seen in

http://en.wikipedia.org/wiki/Stem-and-leaf_plot

4 | 4 6 7 9
5 |
6 | 3 4 6 8 8
7 | 2 2 5 6
8 | 1 4 8
9 |
10 | 6
key: 5|4=54
leaf unit: 1.0
stem unit: 10.0

which encapsulates the distribution of the data (pre-sorted):

44 46 47 49 63 64 66 68 68 72 72 75 76 81 84 88 106

Here, the "stem" is at the 10's, and the "leaves" are the list
of units digits (in increasing order), stem on the left of the
vertical line and leaves on the right.

Thus you can directly see both the histogram (with bins of width 10)
and all the individual numerical values -- which is more information
than you could get from a standard histogram with bin-width 10;
however, you could indicate that too in a standard histogram by
over-plotting with the actual data points (though even then the
actual numerical values would be missing).

I have put an example of such over-plotting in the Files section
of the MedStats website under the name

histoplus.jpg

It still doesn't have the "punch" of the stem-and-leave plot!
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.H...@manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 05-Nov-09 Time: 13:17:09
------------------------------ XFMail ------------------------------

Thomas Fröjd

unread,

Nov 5, 2009, 8:31:18 AM11/5/09

to MedStats

Hi SR Millis

We have been using the IES-R (R for revised) scale, the 22 questions
version.

I must admit I don't really know what a response bias scale is. The
things I have read about response bias mostly takes it up in the
context of survey design and how to design questions. As I understand
it by some googling it is a tool for adjusting for a bias of choosing
extreme values from likert scales. Am I right?

Seems as a useful thing to do but I can't find anything about IES-R
and response bias scale when I online now. Will try to find the manual
and have a look.

Best regards.

Peter Flom

unread,

Nov 5, 2009, 8:53:23 AM11/5/09

to MedStats

These are known by other names, too, such as "lie scales"; the intent is to
detect people who are giving incorrect answers on purpose, to appear "good" or "bad".

The best known of these is, I think, the Crowne-Marlowe (sp?) scale. They often ask respondents to respond to
statements that very few people could honestly endorse ....
e.g
I have never told a lie
Rude people don't annoy me

etc.

As I recall, there are subscales for different types of lying - external and internal

HTH

Peter

Peter Flom

unread,

Nov 5, 2009, 9:19:00 AM11/5/09

to MedStats

Thomas Fröjd <tfr...@gmail.com> wrote

>
>Hi
>
>There are subscales. Intrusion, avoidance and hyperarousal. Looking
>for patterns in change between the individuals sounds like a great
>idea. Any ideas about how this could be done?
>

OK, now you've got 6 scores (3 at each time point, instead of 1). With this, there
are a lot more possibilities.

Again, I'd start with graphs .... x axis would be time, y axis score, and a line for each person
on each pair of measures. Whether this can be done on one graph depends on N. If you have more than
about 30 people, the lines will be hard to detect, you can try randomly dividing, or dividing by pre score.

And now, you've got something to cluster, if you decide to go that route. You might want to have a PRE cluster and a POST cluster and see who moves from one to the other.

HTH

Peter

Jeremy Miles

unread,

Nov 5, 2009, 12:09:28 PM11/5/09

to meds...@googlegroups.com

Adrian wrote:
>
> I wonder if you have thought about latent class modeling. I dont know
> a great deal about it, but it sounds like it may be the thing your
> after. as it look at patterns within data and how they change over
> time.
>
> From the guys i know that use this, they all do it in m-plus.
>

Hi Adrian

Proc Traj can be thought of as a special case of a latent class model.
(Specifically, if you do a latent class model with groups defined on
the slopes and intercepts, and you fix the variances of the latent
variables within groups to zero, you've got a proc traj model).

Thomas wrote:
>A Jeremy poited out the data has a count distribution with most
>observations on small values. One of the reasons I ended up with PROC
>TRAJ was that it conveniently supported zero inflated poisson as a
>distributional model. Does this matter when using cluster analysis?

It's sort of zero inflated Poisson, but it's not really because the
upper limit is censored. I don't know enough about it to know if that
would hurt you. In Mplus you can treat the variables as ordinal and
so logistic/probit regression and do equivalent to proc traj.

Here's another paper you might find useful (by some colleagues of
mine), in case you don't know about it:
All Symptoms Are Not Created Equal: The Prominent Role of Hyperarousal
in the Natural Course of Posttraumatic Psychological Distress.
By Schell, Terry L.; Marshall, Grant N.; Jaycox, Lisa H.
Journal of Abnormal Psychology. Vol 113(2), May 2004, 189-197.

They found different rates of decline for different symptoms, and also
that scoring higher on hyperarousal predicted other symptoms.

SR wrote:
> I'm coming late to this discussion, so this question may have already
> been answered. Does your PTSD scale have an response bias
> scale to rule out symptoms over endsorsement or under endorsement? PTSD
> is quite easy to "fake" on essentially every PTSD scale I've ever encountered.

Interesting point - I've never seen anyone worry about this with PTSD
(but then people don't worry about it with any kind of symptom
checklist - depression being probably the most commonly used).

Jeremy

--
Jeremy Miles
Psychology Research Methods Wiki: www.researchmethodsinpsychology.com

SR Millis

unread,

Nov 6, 2009, 5:59:36 PM11/6/09

to meds...@googlegroups.com

Thomas,

What I mean by response bias is whether an individual is responding to your symptom questionnaire in a reliable and valid manner. Most symptom questionnaires do not have any internal scales or indices to determine this. By contrast, of the personality assessment scales, the MMPI-2 and the recent MMPI-2-RF have several scales assessing validity:

VRIN-r Variable Response Inconsistency
TRIN-r True Response Inconsistency
F-r Infrequent Responses
Fp-r Infrequent Psychopathology Responses
Fs Infrequent Somatic Responses
FBS-r Symptom Validity
L-r Uncommon Virtues
K-r Adjustment Validity

These scales allow the examiner to determine whether an examinee over- or under- reported psychiatric or somatic symptoms, randomly responded to test items, responded "yes" or "no" to items regardless of item content.

If your questionnaire data are contaminated by response bias, your statistical findings will be garbage no matter how elegent the statistical nalysis plan is.

Scott Millis

Reply all

Reply to author

Forward