In particular, in a situation like this, the predictor variables are often
correlated, and a regression analysis cannot be interpreted as "the effect
of this variable". In other words: The coefficient of one term can (at
best) be interpreted as "the effect of this term, keeping all the others
constant". But given the correlations that naturally occur, "changing one
term while keeping all the others constant" is not possible. The result is
that correlated terms may have coefficients that are misleading -- e.g.,
something correlated with percent of women may have an inflated
coefficient while the coefficient of percent of women may be lowered -- or
it could have worked out vice-versa.
Thus, statements such as "increasing the number of women decreases the
ranking of the department" are at best misleading.
I taught a graduate course in regression for about ten years and am very
aware that regression (perhaps even more than other statistical
techniques) is very often taken to say more than it can possibly say --
and often done sloppily as well.
In fact, as a mathematician teaching statistics, I became so aware of the
misuses and misinterpretations (rarely deliberate) of statistics, that
when I retired I started a website on Common Mistakes in Using Statistics
(http://www.ma.utexas.edu/users/mks/statmistakes/StatisticsMistakes.html;
for mistakes involving regression coefficients, see
http://www.ma.utexas.edu/users/mks/statmistakes/regressioncoeffs.html),
and giving yearly workshops on the topic.
Martha Smith
> --
> You received this message because you are subscribed to the Google Groups
> "WomeninMath" group.
> To post to this group, send email to women...@googlegroups.com.
> To unsubscribe from this group, send email to
> womeninmath...@googlegroups.com. For more options, visit this
> group at http://groups.google.com/group/womeninmath?hl=en.
>
>
I've briefly read the account of the analysis. One thing that stands out
as iffy to me is that they use a stepwise procedure, based on
t-statistics, to arrive at the weights/regression coefficients. The major
problem with this is that it uses multiple inference without adjusting for
the number of t-tests performed. My impression is that this is still often
done in the social sciences (despite the fact that it is not justified by
the logic of hypothesis testing and that simulations show that it can lead
to misleading results), although it seems to be used increasingly less
frequently in the sciences, at least in biology (I'm not sure about
engineering). However, this dubious practice may be partly or largely
mitigated by the fact that they use a nonparametric method of calculating
confidence intervals (which they call Ninety Percent Ranges) of the
rankings.
The bottom line in any event: It's important to get away from thinking in
terms of single numbers for rankings, but instead to think in terms of the
ranking range, as reflecting the inherent uncertainty in giving rankings.
(For example, my department's ranking should be considered as "in the
range from roughly 9 to 30" for the regression ranking, and "in the range
from roughly 12 to 32) for the survey ranking. I lot of people don't like
this, but it is much more honest/realistic than giving a single number
ranking.
A corollary: Interpreting a specific weight/regression coefficient doesn't
make a whole lot of sense in the broader context.
Martha
http://www.futurity.org/top-stories/do-recommendations-cost-women-jobs/
Happy New Year,
Beata Randrianantoanina