Help with multiple regression

jabs

unread,

Jul 2, 2009, 3:52:46 PM7/2/09

to MedStats

Hello folks
I am a physician who works in Mexico. I would like to predict the
weight of a fetus before birth through ultrasound measurements. There
are many studies which have published an equation or a formulae in
order to estimate fetal weight, and the equation has been obtained
from independent variables (parameters of ultrasound). Unfortunately,
none of these studies has been done in Mexican population.
I have collected the birth weight (dependent variable) of almost 500
newborns (NB). I hav also collected 13 ultrasound measurements
(independent variables) per fetus in the 48 hours prior to birth
(prenatal stage). My goal is to find an equation or formula to predict
the weight of the baby using ultrasound variables (independent
variables). I have read about this and I think I have to run a linear
regression in which the dependent variable would be the birth weight,
and ultrasound variables would be included as independent variables.
According to what I have read, I have to choose a selection of
variables backwards method by which I will obtain a linear model. The
problem is that I have no experience on how to perform this. Even
though, I have tried to do it using SPSS software and after running
the regression, at the results window I get a series of data such us
tables (descriptive statistics, correlation, included/deleted
variables, a summary model, ANOVA, analysis of colinearity, excluded
variables), and Graphics. What is the right way to run the multiple
regression? How can I get the model from these data? Which data must
be included in the equation? Thanks in advance for your help.

Doug Altman

unread,

Jul 2, 2009, 6:23:27 PM7/2/09

to MedS...@googlegroups.com, MedStats

I am no more in favour of saying that one should never use stepwise selection than saying that one should always do so. I do not think that a model with 13 variables would make sense or be necessary, but with 500 individuals and a continuous outcome I would doubt that there would be much bias from eliminating variables. However it may be sensible to examine which variables have been included in earlier models and exclude hose which never feature. There are bigger problems. As others have noted, there will inevitably be high correlations between different measurements of fetal size. However, other groups have used standard regression methods and derived equations that predict well.

You are trying to explain variability in (essentially) a measure of volume using (I assume) linear and circumferential dimensions. It may be that one or more of these variables may have a non-linear relation with birth weight. Investigating nonlinearity is not straightforward. Some form of model stability exercise may be worthwhile, eg using bootstrap.

The following paper - in which 29 formulae are reviewed - may be useful:

Scioscia M, Vimercati A, Ceci O, Vicino M, Selvaggi LE.
Estimation of birth weight by two-dimensional ultrasonography: a critical appraisal of its accuracy.
Obstet Gynecol. 2008 Jan;111(1):57-65.

Handling this type of problem requires good judgement as well as several technical considerations (as evidenced by some of the earlier replies). My advice is to find a statistician with relevant experience to work with you on these data.

Good luck
Doug

_____________________________________________________

Doug Altman
Professor of Statistics in Medicine
Centre for Statistics in Medicine
University of Oxford
Wolfson College Annexe
Linton Road
Oxford OX2 6UD

email:  doug....@csm.ox.ac.uk
Tel:    01865 284400 (direct line 01865 284401)
Fax:    01865 284424
www:     http://www.csm-oxford.org.uk/

EQUATOR Network - resources for reporting research
www: http://www.equator-network.org/

Barry McDonald

unread,

Jul 2, 2009, 7:32:40 PM7/2/09

to MedS...@googlegroups.com

There are some interesting issues that arise in this problem.

1.   Taking a step back from the actual query (how to devise a formula for use in a Mexican context) I wonder why one needs a specifically Mexican formula for what is essentially a physical relationship between physical measurements.   True the birthweight measurements themselves might all average out to be smaller (say) than US averages, but conditional upon country, why should the physical relationship {ultrasound --> birthweight | Mexico} have a different formula than {ultrasound --> birthweight | US } or {ultrasound --> birthweight | US } say, apart from the intercept? Does the type of ultrasound measurements taken differ from country? Is there some sort of genetic hypothesis that Mexicans have (say) longer femurs for the same birthweight than US or UK?

2.   Following on from 1, if there are formulas that others have suggested, and perhaps they devised those by stepwise regression, then your data provide a great opportunity to test those formulas. Their formulas provide a genuine prior hypothesis for you, and the p-values you get from testing the variables included in their models should be (waving hands) a lot more valid than the p-values from stepwise regression.   It is at least as much a contribution to science to test someone else's model for validity as it is to come up with yet another of many suggested regression formulae based on stepwise. (In fact there would be nothing to stop you publishing both an evaluation and your own "best stepwise" formula in the sense that it is a data summary that lets your data speak for itself in the same way as other datasets have.)

So a.) one can compare others' overall models on your data to see which does best.   Use Mean squared error of prediction. (perhaps standardise all variable first to avoid needing an intercept)
b.) For variable selection,   I would start by looking through the ultrasound literature for the most commonly used and most significant variable, test that in your data, then after including it (if significant) check whether the next most common significant variable is needed, etc., in sequence.       (Wiser heads than mine might suggest a reference for a better methodology for assessing several competing models. )

3. To get a better understanding of your data I would suggest doing principal components first, save the scores, then regressing those scores on the birthweight. The first PC will probably be a measure of overall size of the fetus, and have all numbers in the first column of the component matrix (eigenvector coefficients of the transformation) with similar values, indicating that all measurements tend to be big together or all small together,   and this will be very significantly related to the birthweight. If you get any other significant scores, then they will tell you whether particular measurements (or combinations or ratios of measurements) are related to the response in the sense that they tweak the first overall size effect of the first PC.    To understand any other significant scores, look at the numbers in the corresponding column of the component matrix. Numbers that are big (>0.3, say) and of the same sign indicate important variables in the direction of increasing or decreasing birthweight; number that are big and of opposite sign may represent ratios that are of interest (e.g. bigger skull to femur ratio may be important - total guess here since I know next to nothing about ultrasound measurements. )

Hope something here is useful,
regards,   Barry

Reply all

Reply to author

Forward