>I am working with a researcher analysing data following a number of >individuals on two timepoints. The measurements we are interested in >are the number of PTSD symptoms they have on each timepoint.
>The researcher I work with suspect that there are a number of >naturally occuring subpopulations among the individuals with diffrent >patterns in recovery. Basically she hypothesizes that one group will >will have a low number of symptoms on each occation, another group >will have a high number of symptoms on both ocations and the last will >have a high number of symptoms the first occation but few in the last >occation.
>She has asked me if I could estimate the proportions in the sample of >the subpopulations.
>My idea is to use trajectory modelling with PROC TRAJ in SAS. I >completely lack realworld experience with this type of analysis so I >would like to ask if you think it is the proper way to go. Also is it >still useful to use when there are only two points in time? It seems >as it is most useful when there are more than two time points >observed.
>Maybe there is a simpler alternative I am overlooking? Some other type >of clustering?
I've never used PROC TRAJ (never even heard of it) but I'd hesitate to call something with only two timepoints a trajectory. Patterns in recovery are too complex to be captured with only two time points.
I am also leery of classifying number of symptoms into "low" and "high" unless there is a clear separation.
That said, does she have hypotheses about which people will fall into which categories, or is she searching for patterns? If the former, a multinomial logistic model might be good. If the latter, then some sort of cluster analysis.
For growth modeling, use of Proc Mixed is straight forward. Singer and
Willett have a great book on this topic to check out. Many resources here:
http://gseacademic.harvard.edu/alda/
Two points dont really constitute repeated measures though....
Jeff
2009/11/4 Peter Flom <peterflomconsult...@mindspring.com>
> >I am working with a researcher analysing data following a number of
> >individuals on two timepoints. The measurements we are interested in
> >are the number of PTSD symptoms they have on each timepoint.
> >The researcher I work with suspect that there are a number of
> >naturally occuring subpopulations among the individuals with diffrent
> >patterns in recovery. Basically she hypothesizes that one group will
> >will have a low number of symptoms on each occation, another group
> >will have a high number of symptoms on both ocations and the last will
> >have a high number of symptoms the first occation but few in the last
> >occation.
> >She has asked me if I could estimate the proportions in the sample of
> >the subpopulations.
> >My idea is to use trajectory modelling with PROC TRAJ in SAS. I
> >completely lack realworld experience with this type of analysis so I
> >would like to ask if you think it is the proper way to go. Also is it
> >still useful to use when there are only two points in time? It seems
> >as it is most useful when there are more than two time points
> >observed.
> >Maybe there is a simpler alternative I am overlooking? Some other type
> >of clustering?
> I've never used PROC TRAJ (never even heard of it) but I'd hesitate to call
> something with only two timepoints a trajectory. Patterns in recovery are
> too complex to be captured with only two time points.
> I am also leery of classifying number of symptoms into "low" and "high"
> unless there is a clear separation.
> That said, does she have hypotheses about which people will fall into which
> categories, or is she searching for patterns? If the former, a multinomial
> logistic model might be good. If the latter, then some sort of cluster
> analysis.
You're certainly welcome! Interesting discussion, and I'm learning, too.
>Let me clearify the reson behind the analysis first. As Peter and Jeff >pointed out, two measures are not really a repeated measures study and >not enough to model growth properly. This data is from the first >follow up in a cohort study and will in the future be extended with >more measurements. For the moment however we would like to describe >the material so far as good as possibly.
That makes sense.
>The reason for trying to estimates patterns of change are not to test >any hypothesis if subgroup exists or trying to determine what factors >affects recovery. The question of interest are more like:
>How many of the subjects where affected in the first timepoint but >seem to have recovered? >How many had a medium or high score in the beginning but show no or >small signs of recovery? >How many had a low score, indicating less impact, but seem to have a >delayed reaction? >How many had a low score in both time points indicating resilency >against PTSD?
Now we get into "what the data show" vs. "what theory says". I know a bit about PTSD, and I don't think "low" and "high" adequately describe the possibilities. For one thing, there are subscales to PTSD. Do you have subscale scores?
>Plotting the data T1 vs T2 as Ted Harding suggested (I am really >impressed by your ascii graphing skills by the way) shows no clear >clusters and I am therefore reluctant to set any cutoffs. I would >prefer the data to determine this.
That ascii graph *was* remarkable, wasn't it?
But if that sort of graph doesn't show clear clusters, I am reluctant to recommend cluster approaches. I am (as I've mentioned repeatedly) not a big fan of categorization, and I *am* a fan of using substantive knowledge. If you *must* categorize, can you use substantively derived cutoffs?
The question then would be how to answer these questions without categorization?
I'd start with more graphs ... a density plot of the differences in scores might help.
>I am now looking into clustering as suggested. I am pretty new to this >area so I would appreciate it if anyone could give me som help on the >following questions.
>A Jeremy poited out the data has a count distribution with most >observations on small values. One of the reasons I ended up with PROC >TRAJ was that it conveniently supported zero inflated poisson as a >distributional model. Does this matter when using cluster analysis?
>What clustering techniques do you think are useful? From the little I >know K-means clustering should be the proper one?
>What variables should I cluster on? Should I use measurements at T1 >and T2 as analysis variables or is the measurement at T1 and the >diffrence better (T2-T1)? Or will it not matter?
>Any tips on how to decide the number of clusters? I am thinking that >since the variables are correlated this might induce some problems >with some techniques.
There is some literature on choosing number of clusters in hierarchical methods, using statistics like CCC and so on. There's a decent summary in the SAS documentation. When I've done clustering, I've not found these to be very informative. Often (again, in my experience) the different measures suggest differnt numbers of clusters.
I would suggest trying k-means with a varying number of k, and seeing which results make sense.
My intuition is that it won't matter much whether you use T1 and T2, or T1 and T2-T1, but I am not sure of this at all
I wonder if you have thought about latent class modeling. I dont know
a great deal about it, but it sounds like it may be the thing your
after. as it look at patterns within data and how they change over
time.
From the guys i know that use this, they all do it in m-plus.
I am no expert, but might be worth a look
bw
Adrian
2009/11/5 Peter Flom <peterflomconsult...@mindspring.com>:
> Thomas Fröjd <tfr...@gmail.com> wrote
>>Hi, Thank you all for your timely replies.
> You're certainly welcome! Interesting discussion, and I'm learning, too.
>>Let me clearify the reson behind the analysis first. As Peter and Jeff
>>pointed out, two measures are not really a repeated measures study and
>>not enough to model growth properly. This data is from the first
>>follow up in a cohort study and will in the future be extended with
>>more measurements. For the moment however we would like to describe
>>the material so far as good as possibly.
> That makes sense.
>>The reason for trying to estimates patterns of change are not to test
>>any hypothesis if subgroup exists or trying to determine what factors
>>affects recovery. The question of interest are more like:
>>How many of the subjects where affected in the first timepoint but
>>seem to have recovered?
>>How many had a medium or high score in the beginning but show no or
>>small signs of recovery?
>>How many had a low score, indicating less impact, but seem to have a
>>delayed reaction?
>>How many had a low score in both time points indicating resilency
>>against PTSD?
> Now we get into "what the data show" vs. "what theory says". I know a bit about PTSD, and I don't think "low" and "high" adequately describe the possibilities. For one thing, there are subscales to PTSD. Do you have subscale scores?
>>Plotting the data T1 vs T2 as Ted Harding suggested (I am really
>>impressed by your ascii graphing skills by the way) shows no clear
>>clusters and I am therefore reluctant to set any cutoffs. I would
>>prefer the data to determine this.
> That ascii graph *was* remarkable, wasn't it?
> But if that sort of graph doesn't show clear clusters, I am reluctant to
> recommend cluster approaches. I am (as I've mentioned repeatedly) not a big fan of categorization, and I *am* a fan of using substantive knowledge. If you *must* categorize, can you use substantively derived cutoffs?
> The question then would be how to answer these questions without categorization?
> I'd start with more graphs ... a density plot of the differences in scores might help.
>>I am now looking into clustering as suggested. I am pretty new to this
>>area so I would appreciate it if anyone could give me som help on the
>>following questions.
>>A Jeremy poited out the data has a count distribution with most
>>observations on small values. One of the reasons I ended up with PROC
>>TRAJ was that it conveniently supported zero inflated poisson as a
>>distributional model. Does this matter when using cluster analysis?
>>What clustering techniques do you think are useful? From the little I
>>know K-means clustering should be the proper one?
>>What variables should I cluster on? Should I use measurements at T1
>>and T2 as analysis variables or is the measurement at T1 and the
>>diffrence better (T2-T1)? Or will it not matter?
>>Any tips on how to decide the number of clusters? I am thinking that
>>since the variables are correlated this might induce some problems
>>with some techniques.
> There is some literature on choosing number of clusters in hierarchical methods, using statistics like CCC and so on. There's a decent summary in the SAS documentation. When I've done clustering, I've not found these to be very informative. Often (again, in my experience) the different measures suggest differnt numbers of clusters.
> I would suggest trying k-means with a varying number of k, and seeing which results make sense.
> My intuition is that it won't matter much whether you use T1 and T2, or T1 and T2-T1, but I am not sure of this at all
There are subscales. Intrusion, avoidance and hyperarousal. Looking
for patterns in change between the individuals sounds like a great
idea. Any ideas about how this could be done?
On 5 Nov, 12:36, Peter Flom <peterflomconsult...@mindspring.com>
wrote:
> You're certainly welcome! Interesting discussion, and I'm learning, too.
> >Let me clearify the reson behind the analysis first. As Peter and Jeff
> >pointed out, two measures are not really a repeated measures study and
> >not enough to model growth properly. This data is from the first
> >follow up in a cohort study and will in the future be extended with
> >more measurements. For the moment however we would like to describe
> >the material so far as good as possibly.
> That makes sense.
> >The reason for trying to estimates patterns of change are not to test
> >any hypothesis if subgroup exists or trying to determine what factors
> >affects recovery. The question of interest are more like:
> >How many of the subjects where affected in the first timepoint but
> >seem to have recovered?
> >How many had a medium or high score in the beginning but show no or
> >small signs of recovery?
> >How many had a low score, indicating less impact, but seem to have a
> >delayed reaction?
> >How many had a low score in both time points indicating resilency
> >against PTSD?
> Now we get into "what the data show" vs. "what theory says". I know a bit about PTSD, and I don't think "low" and "high" adequately describe the possibilities. For one thing, there are subscales to PTSD. Do you have subscale scores?
> >Plotting the data T1 vs T2 as Ted Harding suggested (I am really
> >impressed by your ascii graphing skills by the way) shows no clear
> >clusters and I am therefore reluctant to set any cutoffs. I would
> >prefer the data to determine this.
> That ascii graph *was* remarkable, wasn't it?
> But if that sort of graph doesn't show clear clusters, I am reluctant to
> recommend cluster approaches. I am (as I've mentioned repeatedly) not a big fan of categorization, and I *am* a fan of using substantive knowledge. If you *must* categorize, can you use substantively derived cutoffs?
> The question then would be how to answer these questions without categorization?
> I'd start with more graphs ... a density plot of the differences in scores might help.
> >I am now looking into clustering as suggested. I am pretty new to this
> >area so I would appreciate it if anyone could give me som help on the
> >following questions.
> >A Jeremy poited out the data has a count distribution with most
> >observations on small values. One of the reasons I ended up with PROC
> >TRAJ was that it conveniently supported zero inflated poisson as a
> >distributional model. Does this matter when using cluster analysis?
> >What clustering techniques do you think are useful? From the little I
> >know K-means clustering should be the proper one?
> >What variables should I cluster on? Should I use measurements at T1
> >and T2 as analysis variables or is the measurement at T1 and the
> >diffrence better (T2-T1)? Or will it not matter?
> >Any tips on how to decide the number of clusters? I am thinking that
> >since the variables are correlated this might induce some problems
> >with some techniques.
> There is some literature on choosing number of clusters in hierarchical methods, using statistics like CCC and so on. There's a decent summary in the SAS documentation. When I've done clustering, I've not found these to be very informative. Often (again, in my experience) the different measures suggest differnt numbers of clusters.
> I would suggest trying k-means with a varying number of k, and seeing which results make sense.
> My intuition is that it won't matter much whether you use T1 and T2, or T1 and T2-T1, but I am not sure of this at all
I'm coming late to this discussion, so this question may have already been answered. Does your PTSD scale have an response bias scale to rule out symptoms over endsorsement or under endorsement? PTSD is quite easy to "fake" on essentially every PTSD scale I've ever encountered.
SR Millis
--- On Thu, 11/5/09, Thomas Fröjd <tfr...@gmail.com> wrote:
> From: Thomas Fröjd <tfr...@gmail.com> > Subject: {MEDSTATS} Re: Trajectory modelling, useful with only two timepoints? > To: "MedStats" <medstats@googlegroups.com> > Date: Thursday, November 5, 2009, 7:29 AM
> Hi
> There are subscales. Intrusion, avoidance and hyperarousal. > Looking > for patterns in change between the individuals sounds like > a great > idea. Any ideas about how this could be done?
> Thomas Fröjd <tfr...@gmail.com> wrote >>[...] >>Plotting the data T1 vs T2 as Ted Harding suggested (I am really >>impressed by your ascii graphing skills by the way) shows no clear >>clusters and I am therefore reluctant to set any cutoffs. I would >>prefer the data to determine this.
> That ascii graph *was* remarkable, wasn't it?
That's nothing! (And it's quite easily done). Once upon a time it was the only way to do such things (in the absence of a pen plotter, which was cutting-edge technology at the time).
People are very used to things like Tukey's box-plot which was first developed back in the 1960's & 70's when all that most people had was printout from a teletype or line printer (fixed-space in most cases), and Tukey devised many of his representations with the aim of presenting information as vividly as possible on such primitive devices.
An ASCII Art instance of a box-plot can be seen on
Example A plain-text version might look like this:
+-----+-+ * o |-------| | |---| +-----+-+
+---+---+---+---+---+---+---+---+---+---+---+---+ number line 0 1 2 3 4 5 6 7 8 9 10 11 12
and computers were programmed to generate such things.
Much less in use now, but very useful then, was the stem-and-leaf plot, again an instance of ingenious ASCII art, which cleverly combined both the graphical impact of the histogram with the numerical information embodied in printed numerals. Again, an example can be seen in
Here, the "stem" is at the 10's, and the "leaves" are the list of units digits (in increasing order), stem on the left of the vertical line and leaves on the right.
Thus you can directly see both the histogram (with bins of width 10) and all the individual numerical values -- which is more information than you could get from a standard histogram with bin-width 10; however, you could indicate that too in a standard histogram by over-plotting with the actual data points (though even then the actual numerical values would be missing).
I have put an example of such over-plotting in the Files section of the MedStats website under the name
histoplus.jpg
It still doesn't have the "punch" of the stem-and-leave plot! Ted.
We have been using the IES-R (R for revised) scale, the 22 questions
version.
I must admit I don't really know what a response bias scale is. The
things I have read about response bias mostly takes it up in the
context of survey design and how to design questions. As I understand
it by some googling it is a tool for adjusting for a bias of choosing
extreme values from likert scales. Am I right?
Seems as a useful thing to do but I can't find anything about IES-R
and response bias scale when I online now. Will try to find the manual
and have a look.
Best regards.
On 5 Nov, 13:45, SR Millis <srmil...@yahoo.com> wrote:
> I'm coming late to this discussion, so this question may have already been answered. Does your PTSD scale have an response bias scale to rule out symptoms over endsorsement or under endorsement? PTSD is quite easy to "fake" on essentially every PTSD scale I've ever encountered.
> SR Millis
> --- On Thu, 11/5/09, Thomas Fröjd <tfr...@gmail.com> wrote:
> > From: Thomas Fröjd <tfr...@gmail.com>
> > Subject: {MEDSTATS} Re: Trajectory modelling, useful with only two timepoints?
> > To: "MedStats" <medstats@googlegroups.com>
> > Date: Thursday, November 5, 2009, 7:29 AM
> > Hi
> > There are subscales. Intrusion, avoidance and hyperarousal.
> > Looking
> > for patterns in change between the individuals sounds like
> > a great
> > idea. Any ideas about how this could be done?
These are known by other names, too, such as "lie scales"; the intent is to
detect people who are giving incorrect answers on purpose, to appear "good" or "bad".
The best known of these is, I think, the Crowne-Marlowe (sp?) scale. They often ask respondents to respond to
statements that very few people could honestly endorse ....
e.g
I have never told a lie
Rude people don't annoy me
etc.
As I recall, there are subscales for different types of lying - external and internal
-----Original Message-----
>From: Thomas Fröjd <tfr...@gmail.com>
>Sent: Nov 5, 2009 8:31 AM
>To: MedStats <medstats@googlegroups.com>
>Subject: {MEDSTATS} Re: Trajectory modelling, useful with only two timepoints?
>Hi SR Millis
>We have been using the IES-R (R for revised) scale, the 22 questions
>version.
>I must admit I don't really know what a response bias scale is. The
>things I have read about response bias mostly takes it up in the
>context of survey design and how to design questions. As I understand
>it by some googling it is a tool for adjusting for a bias of choosing
>extreme values from likert scales. Am I right?
>Seems as a useful thing to do but I can't find anything about IES-R
>and response bias scale when I online now. Will try to find the manual
>and have a look.
>Best regards.
>On 5 Nov, 13:45, SR Millis <srmil...@yahoo.com> wrote:
>> I'm coming late to this discussion, so this question may have already been answered. Does your PTSD scale have an response bias scale to rule out symptoms over endsorsement or under endorsement? PTSD is quite easy to "fake" on essentially every PTSD scale I've ever encountered.
>> SR Millis
>> --- On Thu, 11/5/09, Thomas Fröjd <tfr...@gmail.com> wrote:
>> > From: Thomas Fröjd <tfr...@gmail.com>
>> > Subject: {MEDSTATS} Re: Trajectory modelling, useful with only two timepoints?
>> > To: "MedStats" <medstats@googlegroups.com>
>> > Date: Thursday, November 5, 2009, 7:29 AM
>> > Hi
>> > There are subscales. Intrusion, avoidance and hyperarousal.
>> > Looking
>> > for patterns in change between the individuals sounds like
>> > a great
>> > idea. Any ideas about how this could be done?
>There are subscales. Intrusion, avoidance and hyperarousal. Looking >for patterns in change between the individuals sounds like a great >idea. Any ideas about how this could be done?
OK, now you've got 6 scores (3 at each time point, instead of 1). With this, there are a lot more possibilities.
Again, I'd start with graphs .... x axis would be time, y axis score, and a line for each person on each pair of measures. Whether this can be done on one graph depends on N. If you have more than about 30 people, the lines will be hard to detect, you can try randomly dividing, or dividing by pre score.
And now, you've got something to cluster, if you decide to go that route. You might want to have a PRE cluster and a POST cluster and see who moves from one to the other.
> I wonder if you have thought about latent class modeling. I dont know > a great deal about it, but it sounds like it may be the thing your > after. as it look at patterns within data and how they change over > time.
> From the guys i know that use this, they all do it in m-plus.
Hi Adrian
Proc Traj can be thought of as a special case of a latent class model. (Specifically, if you do a latent class model with groups defined on the slopes and intercepts, and you fix the variances of the latent variables within groups to zero, you've got a proc traj model).
Thomas wrote: >A Jeremy poited out the data has a count distribution with most >observations on small values. One of the reasons I ended up with PROC >TRAJ was that it conveniently supported zero inflated poisson as a >distributional model. Does this matter when using cluster analysis?
It's sort of zero inflated Poisson, but it's not really because the upper limit is censored. I don't know enough about it to know if that would hurt you. In Mplus you can treat the variables as ordinal and so logistic/probit regression and do equivalent to proc traj.
Here's another paper you might find useful (by some colleagues of mine), in case you don't know about it: All Symptoms Are Not Created Equal: The Prominent Role of Hyperarousal in the Natural Course of Posttraumatic Psychological Distress. By Schell, Terry L.; Marshall, Grant N.; Jaycox, Lisa H. Journal of Abnormal Psychology. Vol 113(2), May 2004, 189-197.
They found different rates of decline for different symptoms, and also that scoring higher on hyperarousal predicted other symptoms.
SR wrote: > I'm coming late to this discussion, so this question may have already > been answered. Does your PTSD scale have an response bias > scale to rule out symptoms over endsorsement or under endorsement? PTSD > is quite easy to "fake" on essentially every PTSD scale I've ever encountered.
Interesting point - I've never seen anyone worry about this with PTSD (but then people don't worry about it with any kind of symptom checklist - depression being probably the most commonly used).
What I mean by response bias is whether an individual is responding to your symptom questionnaire in a reliable and valid manner. Most symptom questionnaires do not have any internal scales or indices to determine this. By contrast, of the personality assessment scales, the MMPI-2 and the recent MMPI-2-RF have several scales assessing validity:
These scales allow the examiner to determine whether an examinee over- or under- reported psychiatric or somatic symptoms, randomly responded to test items, responded "yes" or "no" to items regardless of item content.
If your questionnaire data are contaminated by response bias, your statistical findings will be garbage no matter how elegent the statistical nalysis plan is.
Scott Millis
--- On Thu, 11/5/09, Thomas Fröjd <tfr...@gmail.com> wrote:
> From: Thomas Fröjd <tfr...@gmail.com>
> Subject: {MEDSTATS} Re: Trajectory modelling, useful with only two timepoints?
> To: "MedStats" <medstats@googlegroups.com>
> Date: Thursday, November 5, 2009, 8:31 AM
> Hi SR Millis
> We have been using the IES-R (R for revised) scale, the 22
> questions
> version.
> I must admit I don't really know what a response bias scale
> is. The
> things I have read about response bias mostly takes it up
> in the
> context of survey design and how to design questions. As I
> understand
> it by some googling it is a tool for adjusting for a bias
> of choosing
> extreme values from likert scales. Am I right?
> Seems as a useful thing to do but I can't find anything
> about IES-R
> and response bias scale when I online now. Will try to find
> the manual
> and have a look.
> Best regards.
> On 5 Nov, 13:45, SR Millis <srmil...@yahoo.com>
> wrote:
> > I'm coming late to this discussion, so this question
> may have already been answered. Does your PTSD scale have
> an response bias scale to rule out symptoms over
> endsorsement or under endorsement? PTSD is quite easy to
> "fake" on essentially every PTSD scale I've ever
> encountered.
> > SR Millis
> > --- On Thu, 11/5/09, Thomas Fröjd <tfr...@gmail.com>
> wrote:
> > > From: Thomas Fröjd <tfr...@gmail.com>
> > > Subject: {MEDSTATS} Re: Trajectory modelling,
> useful with only two timepoints?
> > > To: "MedStats" <medstats@googlegroups.com>
> > > Date: Thursday, November 5, 2009, 7:29 AM
> > > Hi
> > > There are subscales. Intrusion, avoidance and
> hyperarousal.
> > > Looking
> > > for patterns in change between the individuals
> sounds like
> > > a great
> > > idea. Any ideas about how this could be done?