Documenting analyses and data edits

498 views
Skip to first unread message

Allan Reese (Cefas)

unread,
Oct 5, 2010, 5:47:17 AM10/5/10
to meds...@googlegroups.com

Bendix Carstensen <b...@steno.dk> Oct 03 linked a question about viewing output to this issue, arriving at:

“It's immaterial how you arrive at the script. The point is that you run it an batch mode at the end to get a coherent documentation of what you did from reading data to the results you present. Otherwise your results does not qualify as reproducible.”

That’s true but to my mind does not go far enough.  Increasingly, labs have exacting standards to determine that chemicals are pure and fit for use, instruments calibrated, and staff trained and certified for procedures (other research areas have equivalents).  It’s so depressing that the data then get put through software that, according to the ISO standard 17025 5.4.7.2: “Commercial off-the-shelf software (eg wordprocessing, database and statistical programmes [sic]) in general use within their designated application range may be considered to be sufficiently validated, … the laboratory shall ensure that computer software developed by the user is documented in sufficient detail and is suitably validated as being adequate for use, [and] procedures are established and implemented for protecting the data.”

 

What that means in practice is the data often goes into Excel, ad hoc “corrections” are then applied and the file overwritten, and formulas and data get mixed up on one sheet.  The data may be extracted either by saving a text file or cut’n’paste to create an R dataset.  R is used “because it’s free”.  Without intending disparagement, many people are now using R with no concepts of programming (a computer does what you tell it, not what you want), numerical analysis (does 1.5 = 3/2 in a test?), or data structures (eg treatment of missing values). 

 

Running analyses from a batch file would be a considerable step forward, but still means only that the final program is retained with little record of how it was developed, what changes were made to the data en route, and what test data had been used to validate the user-written R program.  How can you link an output value to a lab reading when the data may have units changed, observations aggregated or merged, outliers rejected or values imputed, and transformations applied?  If you change no *value* in Excel, it will still timestamp the file as “last modified” if you change a column width or font or add a graph.

 

My own solution is to use Stata or SPSS, each of which I’ve configured to keep a record of commands run, not just when I’ve switched on the log or saved an output file.  Stata is particularly helpful in that any data change made in the interactive editor generates a command in the log and a session will not exit if the data have been changed without the explicit override.  Nor will it by default overwrite a data file.  So I get the data into a system file as early as possible (from the supplied Excel sheet E&OE), end the datafile name with yymmdd which should agree with the system “data modified” and keeps files in name and date order, and subsequently always (almost always, I’m human) save as a new file with the date of change.  The command log files are text files whose names also end yymmdd, a new file created every day.  This allows simple text searches to find when I worked on any dataset or variable and enables the extraction of all changes to the dataset and analyses to be rerun.  Not often enough, I write comments while working, which also go into the log file.  Such logs, of course, document all the analyses and also all the blunders and blind alleys that are written out of the sanitized official report.  

 

Allan

 

***********************************************************************************

This email and any attachments are intended for the named recipient only. Its unauthorised use, distribution, disclosure, storage or copying is not permitted. If you have received it in error, please destroy all copies and notify the sender. In messages of a non-business nature, the views and opinions expressed are the author's own and do not necessarily reflect those of the organisation from which it is sent. All emails may be subject to monitoring.

***********************************************************************************

 

Neil Shephard

unread,
Oct 5, 2010, 8:06:37 AM10/5/10
to meds...@googlegroups.com
At this juncture it seems pertinent to mention the Reproducible
Research resource...

http://www.reproducibleresearch.org/

By no means scripture, but it does have useful resources on software
and articles on reproducible research.

One that I read a number of years ago that made a huge change in my
approach to work was http://www.bepress.com/bioconductor/paper2/

I'm too am continually amazed at the lengths people go to to generate
their data in a documented and reproducible manner only to then
completely disregard the same standards for managing that data.


Neil

--
"Our civilization would be pitifully immature without the intellectual
revolution led by Darwin" - Motoo Kimura, The Neutral Theory of
Molecular Evolution

Email - nshe...@gmail.com
Website - http://slack.ser.man.ac.uk/
Photos - http://www.flickr.com/photos/slackline/

Frank Harrell

unread,
Oct 5, 2010, 10:46:28 AM10/5/10
to MedStats
These thoughts are very helpful. I'll just add that if we got to the
point of disallowing the use of Excel in science, and in scripting all
steps used to arrive at the answers, results would be reproducible and
we'd all be better off. I would rather have code to review (from
almost any language except for the ugly SAS macro language) than the
authors' all-too-brief synopsis of the analyses.

Frank

On Oct 5, 7:06 am, Neil Shephard <nsheph...@gmail.com> wrote:
> At this juncture it seems pertinent to mention the Reproducible
> Research resource...
>
> http://www.reproducibleresearch.org/
>
> By no means scripture, but it does have useful resources on software
> and articles on reproducible research.
>
> One that I read a number of years ago that made a huge change in my
> approach to work washttp://www.bepress.com/bioconductor/paper2/
>
> I'm too am continually amazed at the lengths people go to to generate
> their data in a documented and reproducible manner only to then
> completely disregard the same standards for managing that data.
>
> Neil
>
> --
> "Our civilization would be pitifully immature without the intellectual
> revolution led by Darwin" - Motoo Kimura, The Neutral Theory of
> Molecular Evolution
>
> Email - nsheph...@gmail.com
> Website -http://slack.ser.man.ac.uk/
> Photos -http://www.flickr.com/photos/slackline/

John Hughes

unread,
Oct 5, 2010, 2:09:23 PM10/5/10
to meds...@googlegroups.com
I would go further than Frank and disallow the use of proprietary
$oftware because you can't examine the source code. Who knows what's
going on under the hood?

Bearing in mind FDA 21 CFR 11 ELECTRONIC RECORDS; ELECTRONIC SIGNATURES
one solution is to use GNU make
(http://www.gnu.org/software/make/manual/make.html).

GNU make allows you to put all your programs/macros/data into a
hierarchy of dependencies. You run the make batch file, make
automatically will only run those programs which have changed or whose
dependants have changed. All can be saved to a log file.

If you use emacs using version control as your text editor then you can
also get a record of all the changes that have taken place which
provides a complete audit trail.

Hope this helps.

John

Bruce Weaver

unread,
Oct 5, 2010, 2:11:15 PM10/5/10
to MedStats
On Oct 5, 10:46 am, Frank Harrell <f.harr...@vanderbilt.edu> wrote:
> These thoughts are very helpful.  I'll just add that if we got to the
> point of disallowing the use of Excel in science, and in scripting all
> steps used to arrive at the answers, results would be reproducible and
> we'd all be better off.  I would rather have code to review (from
> almost any language except for the ugly SAS macro language) than the
> authors' all-too-brief synopsis of the analyses.
>
> Frank


That seems very doable these days, too. I know that some journals
have "additional material" that is available through the website.
There's no reason why the code used to perform the analysis could not
be part of it.

--
Bruce Weaver
bwe...@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/Home
"When all else fails, RTFM."

Peter Flom

unread,
Oct 5, 2010, 2:19:58 PM10/5/10
to meds...@googlegroups.com
John Hughes wrote

<<<
I would go further than Frank and disallow the use of proprietary
$oftware because you can't examine the source code. Who knows what's
going on under the hood?
>>>

Well, really though, even with open source code, such as R, how many
actually examine the code to a sufficient extent to know it is correct? I
have even seen R packages that explicitly include a warning that this is
beta code and not to be fully trusted. I wonder how many packages OUGHT to
have such warnings? Has a package been fully tested? How fully? OK, the
core R packages are surely fine. And to some extent, you can really on some
people to do good code.

But how many of us would fully understand, say, plyr() ?
Or would take the time to fully understand any of the cutting edge packages?
(And here I mean, look at each line of code and figure out what it is
doing). If we did this, would we actually have time to do any analysis?

Would anyone at NIH or wherever our grant is submitted take the time to even
MINIMALLY understand the code?

We have to rely on the coders.

Further, we can modify the code in R, and, in modifying it, mess it up. Do
we then have to send all modifications of code with each article?

That doesn't mean proprietary code (SAS, SPSS or whatever) is necessarily
correct. But they have a vested interest in not producing code that is
wrong.

Peter


Neil Shephard

unread,
Oct 6, 2010, 2:50:46 AM10/6/10
to meds...@googlegroups.com
On Tue, Oct 5, 2010 at 6:19 PM, Peter Flom
<peterflom...@mindspring.com> wrote:
> John Hughes wrote

>
> But how many of us would fully understand, say, plyr() ?
> Or would take the time to fully understand any of the cutting edge packages?
> (And here I mean, look at each line of code and figure out what it is
> doing).  If we did this, would we actually have time to do any analysis?

Its not necessarily the need for _everyone_ to be able to do this, but
peer groups/users as a whole can check open-source code for errors.
Not everyone who uses plyr() will have the time to check it, but some
will and one would hope they would feed back any errors found. No
matter how many people use proprietary software none of them are able
to check the code, only that the results are accurate (by
cross-validation in a second piece of software).

A useful paper in this area is..

@article{Kee07,
author = {Keeling},
title = {A comparative study of the reliability of nine
statistical software packages},
journal = {Computational Statistics \& Data Analysis},
year = {2007},
month = {May},
volume = {51},
pages = {3811--3831},
}

> That doesn't mean proprietary code (SAS, SPSS or whatever) is necessarily
> correct.  But they have a vested interest in not producing code that is
> wrong.
>

Those who write open-source code also have a vested interest in
writing correct code too! Most of the time the packages/software will
have been developed because of their need to solve a solution (and to
solve it accurately).

Neil

--
"Our civilization would be pitifully immature without the intellectual
revolution led by Darwin" - Motoo Kimura, The Neutral Theory of
Molecular Evolution

Email - nshe...@gmail.com

Peter Flom

unread,
Oct 6, 2010, 6:38:32 AM10/6/10
to meds...@googlegroups.com
Neil Shephard wrote (in part)

<<<
Its not necessarily the need for _everyone_ to be able to do this, but
peer groups/users as a whole can check open-source code for errors.
Not everyone who uses plyr() will have the time to check it, but some
will and one would hope they would feed back any errors found. No
matter how many people use proprietary software none of them are able
to check the code, only that the results are accurate (by
cross-validation in a second piece of software).
>>>>

I will check out the paper in your message, thanks

I wonder how often cross validation is actually done? Or even possible? One
of the often touted advantages of R is that it has "cutting edge"
statistics. This is correct; many statistical programmers and statisticians
write new methods in R. But these methods, by their nature, can't be
cross-validated. Of course, we can check that the answers make sense ....
if they are off by huge amounts or in obvious ways, we can detect it. But
what of more subtle errors?

I am certainly not against R or other open software. I use R, myself (in
addition to SAS). But open sources is not a panacea.

Peter

Chris Hunt

unread,
Oct 7, 2010, 8:19:50 AM10/7/10
to meds...@googlegroups.com
Hello,

I have a set of data that include measurements made by two different
machines. One is a basic machine that measures six different variables
relating to eye shape and the other is a far more sophisticated machine that
measures 20 different variables (plus those that the first machine can do).
I would like to see how much better variables collected on the sophisticated
machine are at predicting contact lens fit than the basic machine.

I have used stepwise regression analysis to get two regression models, one
with the variables collected on the basic machine and with all variables.
My two final models each have one factor, one has an adjusted R^2 of 0.07
and the other 0.14. Is it possible to say the later is statistically better
than the former, and if so how do I do it?

Many thanks,

Chris

Peter Flom

unread,
Oct 7, 2010, 8:31:57 AM10/7/10
to meds...@googlegroups.com
Chris Hunt wrote

If you use stepwise, you can't really say anything, except that your results
are all wrong.
See Harrell, Regression Modeling Strategies (excellent book) or Flom and
Cassell, Stopping Stepwise: Why stepwise variable selection methods are
wrong, and what you should use (a paper presented various places), or other
books and online resources. Stepwise methods are just wrong.

Now, how to actually answer your question. What is your sample size? Are
the two machines used on the same people, or different people?

If it is large enough, you could run a regression with ALL the variables
from each machine, and then compare R^2. It is virtually certain that the
model with more variables will have a higher R^2; if the two samples are the
same, then delete "virtually". Is that what you want to know? Or do you
want to know HOW much better? Or (for some reason) do you want to know if
the difference is statistically significant?

(Of course, you'll want to check each regression model for the assumptions
and for collinearity).

One thing is to compare various fit indices, like AIC, AICC, BIC, SBC .... I
don't have a strong preference among them, and in my experience they tend to
agree.

Another idea is to use graphs of the "degree of fit" of each model.

If the samples are NOT very large, I suggest using LASSO or LAR, which are
available in both SAS and R, and maybe other software as well.

HTH

Peter

Ted Harding

unread,
Oct 7, 2010, 8:45:00 AM10/7/10
to meds...@googlegroups.com

In addition to Peter's comments above, I would supplement his question:


"Are the two machines used on the same people, or different people?"

If they were used on the same people, how do the variables which
were measured by both machines compare? (I.e. when the same variables
are measured on the same person by the two different machines,
how closely do they agree?).

The answer to this question could be very important in determining
a good approach to your original question!

Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <ted.h...@wlandres.net>
Fax-to-email: +44 (0)870 094 0861
Date: 07-Oct-10 Time: 13:44:56
------------------------------ XFMail ------------------------------

Chris Hunt

unread,
Oct 7, 2010, 9:10:48 AM10/7/10
to meds...@googlegroups.com
Thanks for your swift replies.

The machines were used on the same people and the variables that were
collected on both were the same. There were 50 people.

Obviously if I run a regression analysis with all predictors from each
machine, the new one will produce a better model. Although many of these
variables are related so the model would be flawed! I think my real
question is, if I pick the best predictor variable from the basic machine
and the best one or two independent predictors from the new machine, which
will give a better model?

Regards
Chris

Ted.

--
To post a new thread to MedStats, send email to MedS...@googlegroups.com .
MedStats' home page is http://groups.google.com/group/MedStats .
Rules: http://groups.google.com/group/MedStats/web/medstats-rules

Bruce Weaver

unread,
Oct 7, 2010, 10:13:27 AM10/7/10
to MedStats
On Oct 7, 9:10 am, Chris Hunt <cr...@hotmail.com> wrote:
> Thanks for your swift replies.
>
> The machines were used on the same people and the variables that were
> collected on both were the same.  There were 50 people.
>
> Obviously if I run a regression analysis with all predictors from each
> machine, the new one will produce a better model.  Although many of these
> variables are related so the model would be flawed!  I think my real
> question is, if I pick the best predictor variable from the basic machine
> and the best one or two independent predictors from the new machine, which
> will give a better model?
>
> Regards
> Chris

Hi Chris. Your sample size is not anywhere near large enough to run a
model with all 20 explanatory variables. Here's a short note you
might find useful.

www.angelfire.com/wv/bwhomedir/notes/linreg_rule_of_thumb.txt

Cheers,
Bruce

Chris Hunt

unread,
Oct 7, 2010, 10:30:21 AM10/7/10
to meds...@googlegroups.com
That is why I have selected the best predictor for the model and not used
them all.
Thanks


-----Original Message-----
From: meds...@googlegroups.com [mailto:meds...@googlegroups.com] On Behalf

www.angelfire.com/wv/bwhomedir/notes/linreg_rule_of_thumb.txt

Cheers,
Bruce

--

Peter Flom

unread,
Oct 7, 2010, 10:39:14 AM10/7/10
to meds...@googlegroups.com
The trouble would be in how you got the 'best predictor' model. STEPWISE
does NOT give you this. To really get the best predictor model, you'd need
to look at EVERY combination of variables. But both that method and
stepwise give wrong results - p values are too low, parameter estimates are
biased away from 0, and so on.

If you must use an automated method, I recommend (again) using a penalized
method such as LASSO or LAR. In SAS, these can be implemented in PROC
GLMSELECT. In R, you can use package LAR. But I even more recommend using a
model based on substantive knowledge.

LAR or LASSO are MUCH better than STEPWISE, but they still are ways of
letting the computer do your thinking for you. Sometimes, there's no
alternative. But it's not ideal

Peter


-----Original Message-----
From: meds...@googlegroups.com [mailto:meds...@googlegroups.com] On Behalf

Of Chris Hunt
Sent: Thursday, October 07, 2010 10:30 AM
To: meds...@googlegroups.com
Subject: RE: {MEDSTATS} Regression

Chris Hunt

unread,
Oct 7, 2010, 11:12:10 AM10/7/10
to meds...@googlegroups.com
I used correlation coefficient to pick out the few "significant" variables,
then looked at each combination of these to find the best model. Taking into
account variables that are correlated to each other.

I know this is not the best method but I think it is better than the
automatic stepwise method. I don't think SPSS does LASSO or LAR.

I know you can compare two regression models if one is nested in the other.
But I cannot find anything to compare two models where the dependant
variable is the same but the independent variables are different. In fact,
the independent variables in each model may be related to each other but not
be the same.

Chris

Neil Shephard

unread,
Oct 7, 2010, 11:54:54 AM10/7/10
to meds...@googlegroups.com
On Thu, Oct 7, 2010 at 3:12 PM, Chris Hunt <cr...@hotmail.com> wrote:
> I know you can compare two regression models if one is nested in the other.
> But I cannot find anything to compare two models where the dependant
> variable is the same but the independent variables are different.  In fact,
> the independent variables in each model may be related to each other but not
> be the same.

Seemingly unrelated regression may be of some utility.

A brief overview on Wikipedia (but I'd recommend reading more in
formal text books)

http://en.wikipedia.org/wiki/Seemingly_unrelated_regressions

Greg Snow

unread,
Oct 7, 2010, 12:12:19 PM10/7/10
to meds...@googlegroups.com
Chris,

It seems that you really have 2 questions here.

1. are the new and old machines equivalent (on the variables that they both measure)?

2. do the additional variables measured by the new machine give additional information?


Question 1 can be approached using Bland-Altman plots rather than regression, that whole process will probably be more informative and enlightening than doing canned regressions.

Question 2 can be approached by comparing nested regression models. The ideal (but you don't have enough data for meaningful results) is to have the full model include all variables and the reduced model include just the variables from the old machine. The comparison then tests if the new machines additional variables contribute significantly beyond the old machine. You can do something similar with subsets of the variables.

Note also that with R^2 values of 0.14 you are likely to find that the opinion of the doctors assistant gives better predictions than either machine.

--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg...@imail.org
801.408.8111

Ray Koopman

unread,
Oct 7, 2010, 5:54:41 PM10/7/10
to MedStats
On Oct 7, 8:12 am, Chris Hunt <cr...@hotmail.com> wrote:
> [...]
> I know you can compare two regression models if one is nested in
> the other. But I cannot find anything to compare two models where
> the dependant variable is the same but the independent variables
> are different. In fact, the independent variables in each model
> may be related to each other but not be the same.

1. If the two predictor sets share some variables then partial the
common predictors out of the other predictors and the d.v., drop
the common predictors from both sets, and decrement the sample size
by the number of common predictors.

2. Regress the d.v. on the two predictor sets separately, and get
the residuals from each regression.

3. If the two predictor sets have the same number of of predictors
then test the hypothesis that the two residuals have equal variances
(i.e., test the hypothesis that Correlation(u1+u2,u1-u2) = 0, where
u1 & u2 are the residuals.)

4. If the two predictor sets have different numbers of predictors
then ???. The analysis should probably be similar to that in step 3,
but should include some adjustment for the numbers of predictors.

Ray Koopman

unread,
Oct 7, 2010, 6:24:19 PM10/7/10
to MedStats
On Oct 7, 2:54 pm, Ray Koopman <koop...@sfu.ca> wrote:
> On Oct 7, 8:12 am, Chris Hunt <cr...@hotmail.com> wrote:
>
>> [...]
>> I know you can compare two regression models if one is nested in
>> the other. But I cannot find anything to compare two models where
>> the dependant variable is the same but the independent variables
>> are different. In fact, the independent variables in each model
>> may be related to each other but not be the same.
>
> 1. If the two predictor sets share some variables then partial the
> common predictors out of the other predictors and the d.v., drop
> the common predictors from both sets, and decrement the sample size
> by the number of common predictors.
>
> 2. Regress the d.v. on the two predictor sets separately, and get
> the residuals from each regression.
>
> 3. If the two predictor sets have the same number of of predictors
> then test the hypothesis that the two residuals have equal variances
> (i.e., test the hypothesis that Correlation(u1+u2,u1-u2) = 0, where
> u1 & u2 are the residuals.)
The df for the test is not the usual n-2, but n-p-1, where p = the
number of predictors in each set.

Ray Koopman

unread,
Oct 7, 2010, 6:30:38 PM10/7/10
to MedStats
On Oct 7, 3:24 pm, Ray Koopman <koop...@sfu.ca> wrote:
> On Oct 7, 2:54 pm, Ray Koopman <koop...@sfu.ca> wrote:
>> On Oct 7, 8:12 am, Chris Hunt <cr...@hotmail.com> wrote:
>>> [...]
>>> I know you can compare two regression models if one is nested in
>>> the other. But I cannot find anything to compare two models where
>>> the dependant variable is the same but the independent variables
>>> are different. In fact, the independent variables in each model
>>> may be related to each other but not be the same.
>>
>> 1. If the two predictor sets share some variables then partial the
>> common predictors out of the other predictors and the d.v., drop
>> the common predictors from both sets, and decrement the sample size
>> by the number of common predictors.
>>
>> 2. Regress the d.v. on the two predictor sets separately, and get
>> the residuals from each regression.
>>
>> 3. If the two predictor sets have the same number of of predictors
>> then test the hypothesis that the two residuals have equal variances
>> (i.e., test the hypothesis that Correlation(u1+u2,u1-u2) = 0, where
>> u1 & u2 are the residuals.)
>
> The df for the test is not the usual n-2, but n-p-1, where p = the
> number of predictors in each set.

It's being a bad day -- make that n-p-2, not n-p-1.

Ray Koopman

unread,
Oct 8, 2010, 12:53:55 AM10/8/10
to MedStats
A first guess at the unequal-numbers-of-predictors case:
Let p1 & p2 be the numbers of predictors in the two sets.
Then df1 = n-p1-2, and df2 = n-p2-2. Instead of just u1 & u2,
use u1/sqrt(df1) and u2/sqrt(df2), whose variances will be
equal under the null, and compute t with df = 2/(1/df1 + 1/df2),
the harmonic mean of df1 & df2.

And yes, I realize that step 1 is not necessary. That was
just me thinking out loud, trying to simplify the problem.

Chris Evans

unread,
Oct 8, 2010, 2:00:54 AM10/8/10
to meds...@googlegroups.com
Hm, this one is a runner isn't it? Tried to send this yesterday but
doesn't seem it got through. Health warning: I'm not a statistician.

1) You seem to be confirming that the convergent validity of the six
measurements from the two different machines is perfect or near perfect,
is that right? At least that gets rid of one issue if so and it
probably suggests that there is also very low unreliability.

2) Since I'm not a statistician, my first question is whether there is
some a priori model. So:
2a) Is "contact lens fit" a single variable? It sounds like an issue
about the congruence of two two-dimensional surfaces to me. They are
approximately spheroidal but presumably have curvature,
diameter, i.e. arc subtended and some "conical" distortion factor(s)
(astigmatism). My guess is that the reality is that the surfaces are
actually more complex than that but any reasons based on physics to
predict any particular relationship between your measurements
and "fit" is going to help you greatly.
2b) Is fit continuous or catgorical? We seem to be assuming that it's
continuous and linear but I think it could be ordinal or even a survival
time measure couldn't it? What is it?
2c) We are assuming that there are NO physical aspects of the six
variables or the additional 20 that you would expect to predict
interactions between them and the fit. That's vital.
2d) IF you have reason to believe that linear regression is the sole and
best fit to any predictive relationships and that there aren't reasons
to expect interactions what about the correlations within the
predictors? Are there a priori reasons to expect correlations and/or
are there strong and significant observed correlations:
2di) between the six common predictor variables?
2dii) betwen the additional 20?
2dii) between any/all of the six and any/all of the 20?

3) If there are strong observed correlations within the six and within
the 20 then you will have multicollinearity and will have weakness in
the estimation of the regression of them onto the dependent: beware!

4) I don't know about lasso etc but if you are confident that purely
linear regression between correlated predictors and a single linear
dependent is a good model for both the six and the 26 predictors then
wouldn't this work:
4a) enter all six variables into a linear prediction against the
dependent and save the residuals,
4b) now run a canonical correlation analysis of the correlation of the
20 additional variables onto the residuals,
4c) if it's significant, you've got evidence there is some systematic
variance unexplained by the six and
4d) the loadings of the 20 onto the the resulting canonical correlation
variable tell you which of the 20 variables contribute most to that
regression equation.

However, on n=50 you have very little power as others are saying and any
a priori model clarity should be used to help preserve any power it
might give and any plausible non-linearity in the model will pretty much
ensure that you have no real test as yet.

Ultimately, as Greg Snow said, with R^2 values of .07 and .14, you know
you have very little predictive regression so I doubt if this is going
to give you significant results on n=50.

Cheers all,

Chris


Chris Hunt sent the following at 07/10/2010 14:10:


--
Chris Evans <ch...@psyctc.org> Skype: chris-psyctc
Consultant Psychiatrist in Psychotherapy, Notts. PDD network;
Clinical Director, Psychological Therapies, Nottinghamshire NHS Trust;
Professor, Psychotherapy, Nottingham University
*If I am writing from one of those roles, it will be clear. Otherwise*
*my views are my own and not representative of those institutions *
If you have difficulty Emailing me on this address or getting a reply,
send again but cc to: chris dot evans at nottshc dot nhs dot uk
and to: c dot evans at nottingham dot ac dot uk

Reply all
Reply to author
Forward
0 new messages