Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Multiple Regression with Dependent Variable as a Percentage

3,073 views
Skip to first unread message

Cpullig

unread,
Jun 29, 1998, 3:00:00 AM6/29/98
to

Reviewers are teeling me that I can not use multiple regression with a single
dependent variable that is a percentage (0-100%). I am treating this variable
as a continous variable and they are suggesting logistic regression. I am sure
that I can split the dependent variable into groups based on the percentages,
but this seems to be giving up a great deal of information. Any suggestions or
good arguments for my position.

Chris Pullig

Richard F Ulrich

unread,
Jun 29, 1998, 3:00:00 AM6/29/98
to

: Chris Pullig Cpullig (cpu...@aol.com) wrote:
: Reviewers are teeling me that I can not use multiple regression with a single

: dependent variable that is a percentage (0-100%). I am treating this variable
: as a continous variable and they are suggesting logistic regression.

I know that when I write reviews,
even if I am sure that I don't like what is there,
I try to keep in mind that I may be overlooking some
precedent or justification or clever explanation.
- so I hope that they are not *insisting* on any one solution.

If your scores are all between 20-80%, then maybe the reviewers are
being foolish. If your outcomes are percentages PER SUBJECT, then you
do not have a ready candidate for the usual, maximum likelihood logistic
regression, and that is a second chance that your reviewers could be
foolish.

On the other hand, if you do have extreme percents, then it is
reasonable to transform. The arcsin squareroot of P is one, if you
don't want to apply a pragmatic "probit" or logit transform
to each score.

Hope this helps.
--
Rich Ulrich, biostatistician wpi...@pitt.edu
http://www.pitt.edu/~wpilib/index.html Univ. of Pittsburgh

Van Kirk, Jeff

unread,
Jun 30, 1998, 3:00:00 AM6/30/98
to

Chris- What is the distribution of your DVAR? If there is little or no
variance in values (%), then the reviewer may be correct. There is,
however, no inherent impediment to using % as a dep. var. as it is
certainly an interval variable. Jeff

> -----Original Message-----
> From: Cpullig [SMTP:cpu...@AOL.COM]
> Sent: Monday, June 29, 1998 10:08 AM
> To: SPS...@UGA.CC.UGA.EDU
> Subject: Multiple Regression with Dependent Variable as a
> Percentage
>
> Reviewers are teeling me that I can not use multiple regression with a
> single
> dependent variable that is a percentage (0-100%). I am treating this
> variable
> as a continous variable and they are suggesting logistic regression.

Stuart Drucker

unread,
Jun 30, 1998, 3:00:00 AM6/30/98
to

>> Reviewers are teeling me that I can not use multiple regression with a
>> single
>> dependent variable that is a percentage (0-100%). I am treating this
>> variable
>> as a continous variable and they are suggesting logistic regression.
>> I am sure
>> that I can split the dependent variable into groups based on the
>> percentages,
>> but this seems to be giving up a great deal of information. Any
>> suggestions or
>> good arguments for my position.
>>
>> Chris Pullig

Chris,

The problem with using % as a dependent variable to OLS regression (the
REGRESSION procedure) is that the predictions aren't bounded within the
range of 0 to 1. You can wind up with predictions which are nonsensical
(i.e. -24% or 112%). Proportions tend to be non-normal on the extremes, 20%
to 80% for "large samples", so this can happen with OLS multiple regression.
By proportion, I mean percentage/100 (e.g. 20% is 0.20).

There's a couple of ways you can approach the problem. If your variable is
coded as 0 and 1 (i.e. yes/no at a respondent level), logistic regression is
technically the correct approach to take to handle the non-normality of the
distribution. I'd look at Hosmer and Lemeshow's Applied Logistic Regression
to learn more about LR.

On the other hand, if your DV is collected as 0 to 100 at a respondent
level, you can do a non-linear transformation on the DV of the form
ln(p/1-p), where p is the answer/100. Adjust the extreme answers of 0 and
100 to (say) 0.005 and 0.995 so that the transformation makes sense. Then,
you can run the regular OLS procedure.

However, keep in mind that the predictions are now of a different form than
the straight linear model. They take the form of exp(bx)/(1+exp(bx), where
"x" is the respondent answers, and "b" is the vector of beta weights.

Hope this helps--

Hector E. Maletta

unread,
Jun 30, 1998, 3:00:00 AM6/30/98
to

Chris,
logistic regression predicts the odds of an event as a logistic function
of some independent variables. This has nothing to do with splitting the
dependent variable into groups based on the percentages (percentages of
what?). The advise you received is right, insofar as a logistic model
fits your data, i.e. the probability of the event is an S-shaped
function of the independent variables set, i.e. it is relatively
unresponsive to the independent variables in the low-probability range,
has a middle range of high responsivity, and turns relatively
unresponsive again in the high probability range.

Of course, the event's probability may behave in other ways, such as
being a linear or exponential function of your set of independent
variables, but your life as a data analyst consists precisely of such
decisions: which model is most appropriate for your data?

Hector Maletta
Universidad del Salvador
Buenos Aires, Argentina

huntley george manhertz

unread,
Jul 1, 1998, 3:00:00 AM7/1/98
to

If your independent variables are continuous you can transform these
varialbes and the dependent varialbess to log values. The interpretation
of the relationship would be an elastisity effect where the percentage
change in your d-var occurs by percentage changes in your i-vars.


HGM

Hector E. Maletta

unread,
Jul 1, 1998, 3:00:00 AM7/1/98
to

I wonder whether Stuart's recommendation of an OLS after a nonlinear
transformation does not boil down to a logistic regression.

Hector Maletta
Universidad del Salvador
Buenos Aires, Argentina


Stuart Drucker wrote comments on Chris Pullig's question.

Chris Pullig had wrote:
> >> Reviewers are teeling me that I can not use multiple regression with a
> >> single
> >> dependent variable that is a percentage (0-100%). I am treating this
> >> variable
> >> as a continous variable and they are suggesting logistic regression.
> >> I am sure
> >> that I can split the dependent variable into groups based on the
> >> percentages,
> >> but this seems to be giving up a great deal of information. Any
> >> suggestions or
> >> good arguments for my position.
> >>
> >> Chris Pullig
>

0 new messages