I've never used PROC TRAJ (never even heard of it) but I'd hesitate to call something with only two timepoints a trajectory. Patterns in recovery are too complex to be captured with only two time points.
I am also leery of classifying number of symptoms into "low" and "high" unless there is a clear separation.
That said, does she have hypotheses about which people will fall into which categories, or is she searching for patterns? If the former, a multinomial logistic model might be good. If the latter, then some sort of cluster analysis.
HTH
Peter
Peter L. Flom, PhD
Statistical Consultant
Website: www DOT peterflomconsulting DOT com
Writing; http://www.associatedcontent.com/user/582880/peter_flom.html
Twitter: @peterflom
You're certainly welcome! Interesting discussion, and I'm learning, too.
>Let me clearify the reson behind the analysis first. As Peter and Jeff
>pointed out, two measures are not really a repeated measures study and
>not enough to model growth properly. This data is from the first
>follow up in a cohort study and will in the future be extended with
>more measurements. For the moment however we would like to describe
>the material so far as good as possibly.
>
That makes sense.
>The reason for trying to estimates patterns of change are not to test
>any hypothesis if subgroup exists or trying to determine what factors
>affects recovery. The question of interest are more like:
>
>How many of the subjects where affected in the first timepoint but
>seem to have recovered?
>How many had a medium or high score in the beginning but show no or
>small signs of recovery?
>How many had a low score, indicating less impact, but seem to have a
>delayed reaction?
>How many had a low score in both time points indicating resilency
>against PTSD?
>
Now we get into "what the data show" vs. "what theory says". I know a bit about PTSD, and I don't think "low" and "high" adequately describe the possibilities. For one thing, there are subscales to PTSD. Do you have subscale scores?
>
>Plotting the data T1 vs T2 as Ted Harding suggested (I am really
>impressed by your ascii graphing skills by the way) shows no clear
>clusters and I am therefore reluctant to set any cutoffs. I would
>prefer the data to determine this.
>
That ascii graph *was* remarkable, wasn't it?
But if that sort of graph doesn't show clear clusters, I am reluctant to
recommend cluster approaches. I am (as I've mentioned repeatedly) not a big fan of categorization, and I *am* a fan of using substantive knowledge. If you *must* categorize, can you use substantively derived cutoffs?
The question then would be how to answer these questions without categorization?
I'd start with more graphs ... a density plot of the differences in scores might help.
>I am now looking into clustering as suggested. I am pretty new to this
>area so I would appreciate it if anyone could give me som help on the
>following questions.
>
>A Jeremy poited out the data has a count distribution with most
>observations on small values. One of the reasons I ended up with PROC
>TRAJ was that it conveniently supported zero inflated poisson as a
>distributional model. Does this matter when using cluster analysis?
>
>What clustering techniques do you think are useful? From the little I
>know K-means clustering should be the proper one?
>
>What variables should I cluster on? Should I use measurements at T1
>and T2 as analysis variables or is the measurement at T1 and the
>diffrence better (T2-T1)? Or will it not matter?
>
>Any tips on how to decide the number of clusters? I am thinking that
>since the variables are correlated this might induce some problems
>with some techniques.
>
There is some literature on choosing number of clusters in hierarchical methods, using statistics like CCC and so on. There's a decent summary in the SAS documentation. When I've done clustering, I've not found these to be very informative. Often (again, in my experience) the different measures suggest differnt numbers of clusters.
I would suggest trying k-means with a varying number of k, and seeing which results make sense.
My intuition is that it won't matter much whether you use T1 and T2, or T1 and T2-T1, but I am not sure of this at all
SR Millis
--- On Thu, 11/5/09, Thomas Fröjd <tfr...@gmail.com> wrote:
That's nothing! (And it's quite easily done). Once upon a time
it was the only way to do such things (in the absence of a pen
plotter, which was cutting-edge technology at the time).
People are very used to things like Tukey's box-plot which was
first developed back in the 1960's & 70's when all that most people
had was printout from a teletype or line printer (fixed-space in
most cases), and Tukey devised many of his representations with the
aim of presenting information as vividly as possible on such primitive
devices.
An ASCII Art instance of a box-plot can be seen on
http://en.wikipedia.org/wiki/Box_plot
under
Example
A plain-text version might look like this:
+-----+-+
* o |-------| | |---|
+-----+-+
+---+---+---+---+---+---+---+---+---+---+---+---+ number line
0 1 2 3 4 5 6 7 8 9 10 11 12
and computers were programmed to generate such things.
Much less in use now, but very useful then, was the stem-and-leaf
plot, again an instance of ingenious ASCII art, which cleverly
combined both the graphical impact of the histogram with the
numerical information embodied in printed numerals. Again, an
example can be seen in
http://en.wikipedia.org/wiki/Stem-and-leaf_plot
4 | 4 6 7 9
5 |
6 | 3 4 6 8 8
7 | 2 2 5 6
8 | 1 4 8
9 |
10 | 6
key: 5|4=54
leaf unit: 1.0
stem unit: 10.0
which encapsulates the distribution of the data (pre-sorted):
44 46 47 49 63 64 66 68 68 72 72 75 76 81 84 88 106
Here, the "stem" is at the 10's, and the "leaves" are the list
of units digits (in increasing order), stem on the left of the
vertical line and leaves on the right.
Thus you can directly see both the histogram (with bins of width 10)
and all the individual numerical values -- which is more information
than you could get from a standard histogram with bin-width 10;
however, you could indicate that too in a standard histogram by
over-plotting with the actual data points (though even then the
actual numerical values would be missing).
I have put an example of such over-plotting in the Files section
of the MedStats website under the name
histoplus.jpg
It still doesn't have the "punch" of the stem-and-leave plot!
Ted.
--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.H...@manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 05-Nov-09 Time: 13:17:09
------------------------------ XFMail ------------------------------
OK, now you've got 6 scores (3 at each time point, instead of 1). With this, there
are a lot more possibilities.
Again, I'd start with graphs .... x axis would be time, y axis score, and a line for each person
on each pair of measures. Whether this can be done on one graph depends on N. If you have more than
about 30 people, the lines will be hard to detect, you can try randomly dividing, or dividing by pre score.
And now, you've got something to cluster, if you decide to go that route. You might want to have a PRE cluster and a POST cluster and see who moves from one to the other.
HTH
Peter
Hi Adrian
Proc Traj can be thought of as a special case of a latent class model.
(Specifically, if you do a latent class model with groups defined on
the slopes and intercepts, and you fix the variances of the latent
variables within groups to zero, you've got a proc traj model).
Thomas wrote:
>A Jeremy poited out the data has a count distribution with most
>observations on small values. One of the reasons I ended up with PROC
>TRAJ was that it conveniently supported zero inflated poisson as a
>distributional model. Does this matter when using cluster analysis?
It's sort of zero inflated Poisson, but it's not really because the
upper limit is censored. I don't know enough about it to know if that
would hurt you. In Mplus you can treat the variables as ordinal and
so logistic/probit regression and do equivalent to proc traj.
Here's another paper you might find useful (by some colleagues of
mine), in case you don't know about it:
All Symptoms Are Not Created Equal: The Prominent Role of Hyperarousal
in the Natural Course of Posttraumatic Psychological Distress.
By Schell, Terry L.; Marshall, Grant N.; Jaycox, Lisa H.
Journal of Abnormal Psychology. Vol 113(2), May 2004, 189-197.
They found different rates of decline for different symptoms, and also
that scoring higher on hyperarousal predicted other symptoms.
SR wrote:
> I'm coming late to this discussion, so this question may have already
> been answered. Does your PTSD scale have an response bias
> scale to rule out symptoms over endsorsement or under endorsement? PTSD
> is quite easy to "fake" on essentially every PTSD scale I've ever encountered.
Interesting point - I've never seen anyone worry about this with PTSD
(but then people don't worry about it with any kind of symptom
checklist - depression being probably the most commonly used).
Jeremy
--
Jeremy Miles
Psychology Research Methods Wiki: www.researchmethodsinpsychology.com