>Hello folks >I am a physician who works in Mexico. I would like to predict the >weight of a fetus before birth through ultrasound measurements. There >are many studies which have published an equation or a formulae in >order to estimate fetal weight, and the equation has been obtained >from independent variables (parameters of ultrasound). Unfortunately, >none of these studies has been done in Mexican population. >I have collected the birth weight (dependent variable) of almost 500 >newborns (NB). I hav also collected 13 ultrasound measurements >(independent variables) per fetus in the 48 hours prior to birth >(prenatal stage). My goal is to find an equation or formula to predict >the weight of the baby using ultrasound variables (independent >variables). I have read about this and I think I have to run a linear >regression in which the dependent variable would be the birth weight, >and ultrasound variables would be included as independent variables.
So far so good ....
>According to what I have read, I have to choose a selection of >variables backwards method by which I will obtain a linear model.
Not good at all. Backwards methods (and other automatic variable selection methods) are not good. They are commonly used, but they are wrong.
The
>problem is that I have no experience on how to perform this. Even >though, I have tried to do it using SPSS software and after running >the regression, at the results window I get a series of data such us >tables (descriptive statistics, correlation, included/deleted >variables, a summary model, ANOVA, analysis of colinearity, excluded >variables), and Graphics. What is the right way to run the multiple >regression? How can I get the model from these data? Which data must >be included in the equation? Thanks in advance for your help.
You might try asking on an SPSS list, for details of how to do things in SPSS, but which variables you should use is not dependent on software. If you are trying to replicate previous results, you should use the same variables.
With 500 newborns, you could use all 13 variables - unless there are collinearity problems.
Or you might want to use something like principal component regression, or partial least squares; you might be concerned with possible nonlinear effects; there are other possibilities as well.
Peter
Peter L. Flom, PhD Statistical Consultant www DOT peterflomconsulting DOT com
snip----------------- > With 500 newborns, you could use all 13 variables - unless there are collinearity problems. snip-----------------
Collinearity is very likely.
Start with a correlation matrix of all 13 measurements [Statistics => Correlate => Bivariate...]. Correlation coefficiants above, say, 0.80 usually show that the inclusion of both variables is not necessary or is even counterproductive.
> jabs <jabenavi...@gmail.com> wrote >> Hello folks >> I am a physician who works in Mexico. I would like to predict the >> weight of a fetus before birth through ultrasound measurements. There >> are many studies which have published an equation or a formulae in >> order to estimate fetal weight, and the equation has been obtained >>from independent variables (parameters of ultrasound). Unfortunately, >> none of these studies has been done in Mexican population. >> I have collected the birth weight (dependent variable) of almost 500 >> newborns (NB). I hav also collected 13 ultrasound measurements >> (independent variables) per fetus in the 48 hours prior to birth >> (prenatal stage). My goal is to find an equation or formula to predict >> the weight of the baby using ultrasound variables (independent >> variables). I have read about this and I think I have to run a linear >> regression in which the dependent variable would be the birth weight, >> and ultrasound variables would be included as independent variables.
> So far so good ....
>> According to what I have read, I have to choose a selection of >> variables backwards method by which I will obtain a linear model.
> Not good at all. Backwards methods (and other automatic variable selection methods) > are not good. They are commonly used, but they are wrong.
> The >> problem is that I have no experience on how to perform this. Even >> though, I have tried to do it using SPSS software and after running >> the regression, at the results window I get a series of data such us >> tables (descriptive statistics, correlation, included/deleted >> variables, a summary model, ANOVA, analysis of colinearity, excluded >> variables), and Graphics. What is the right way to run the multiple >> regression? How can I get the model from these data? Which data must >> be included in the equation? Thanks in advance for your help.
> You might try asking on an SPSS list, for details of how to do things in SPSS, > but which variables you should use is not dependent on software. If you > are trying to replicate previous results, you should use the same variables.
> With 500 newborns, you could use all 13 variables - unless there are collinearity problems.
> Or you might want to use something like principal component regression, or partial least squares; > you might be concerned with possible nonlinear effects; there are other possibilities as well.
> Peter
> Peter L. Flom, PhD > Statistical Consultant > www DOT peterflomconsulting DOT com
On Jul 2, 4:49 pm, Christian Lerch <t....@gmx.net> wrote:
> snip-----------------
> > With 500 newborns, you could use all 13 variables - unless there are
> collinearity problems.
> snip-----------------
> Collinearity is very likely.
> Start with a correlation matrix of all 13 measurements [Statistics =>
> Correlate => Bivariate...]. Correlation coefficiants above, say, 0.80
> usually show that the inclusion of both variables is not necessary or is
> even counterproductive.
> Regards,
> Christian
Using bivariate correlations to try to assess multicollinearity is not
a very good idea, IMO. First, you can have complete linear dependence
in the absence of any alarming looking bivariate correlations. To
illustrate, try this example that Jerry Dallal posted in sci.stat.math
a couple years ago:
Check out all of the simple correlations.
Regress Y on X1,X2,X3.
Second, in models that include products or polynomial terms (e.g., a
model with both X and X-squared as predictors), there can be very high
correlations between variables, but no problematic
multicollinearity.
Tolerance and Variance Inflation Factor (which are available in the
SPSS Regression procedure) are better measures of problematic
multicollinearity, I think.
For more info, see the Multicollinearity link here:
Examining zero order correlations will not necessarily help in detecting high collinearity. The absence of high correlations can't be viewed as evidence of no problem. It's possible for 3 or more variables to be collinear while no 2 of the variables taken alone are highly correlated.
You need to request collinearity diagnostics in linear regression. Then, examine the condition indexes. Identify any that are large, ie, >30 (or even 20). Then, examine the associated variance-decomposition proportions for those large condition indexes. Large VDP (>.50) will identify those variables that are involved in the near dependency.
Scott R Millis, PhD, ABPP (CN,CL,RP), CStat, CSci Professor & Director of Research Dept of Physical Medicine & Rehabilitation Dept of Emergency Medicine Wayne State University School of Medicine 261 Mack Blvd Detroit, MI 48201 Email: smil...@med.wayne.edu Tel: 313-993-8085 Fax: 313-966-7682
--- On Thu, 7/2/09, Christian Lerch <t....@gmx.net> wrote:
> From: Christian Lerch <t....@gmx.net> > Subject: {MEDSTATS} Re: Help with multiple regression > To: MedStats@googlegroups.com > Date: Thursday, July 2, 2009, 4:49 PM
> snip----------------- > > With 500 newborns, you could use all 13 variables - > unless there are collinearity problems. > snip-----------------
> Collinearity is very likely.
> Start with a correlation matrix of all 13 measurements > [Statistics => Correlate => Bivariate...]. Correlation > coefficiants above, say, 0.80 usually show that the > inclusion of both variables is not necessary or is even > counterproductive.
>Start with a correlation matrix of all 13 measurements [Statistics => >Correlate => Bivariate...]. Correlation coefficiants above, say, 0.80 >usually show that the inclusion of both variables is not necessary or is >even counterproductive.
Actually, correlations are neither necessary nor sufficient for collinearity.
Much better to use condition indexes
Peter
Peter L. Flom, PhD Statistical Consultant www DOT peterflomconsulting DOT com