[R] ordered and unordered variables

22 views
Skip to first unread message

meng

unread,
May 21, 2013, 1:35:59 AM5/21/13
to R help
Hi all:
If the explainary variables are ordinal,the result of regression is different from
"unordered variables".But I can't understand the result of regression from "ordered
variable".

The data is warpbreaks,which belongs to R.

If I use the "unordered variable"(tension):Levels: L M H
The result is easy to understand:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 36.39 2.80 12.995 < 2e-16 ***
tensionM -10.00 3.96 -2.525 0.014717 *
tensionH -14.72 3.96 -3.718 0.000501 ***

If I use the "ordered variable"(tension):Levels: L < M < H
I don't know how to explain the result:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 28.148 1.617 17.410 < 2e-16 ***
tension.L -10.410 2.800 -3.718 0.000501 ***
tension.Q 2.155 2.800 0.769 0.445182

What's "tension.L" and "tension.Q" stands for?And how to explain the result then?

Many thanks.



[[alternative HTML version deleted]]

______________________________________________
R-h...@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

David Winsemius

unread,
May 21, 2013, 8:55:18 AM5/21/13
to meng, R help

On May 20, 2013, at 10:35 PM, meng wrote:

> Hi all:
> If the explainary variables are ordinal,the result of regression is different from
> "unordered variables".But I can't understand the result of regression from "ordered
> variable".
>
> The data is warpbreaks,which belongs to R.
>
> If I use the "unordered variable"(tension):Levels: L M H
> The result is easy to understand:
> Estimate Std. Error t value Pr(>|t|)
> (Intercept) 36.39 2.80 12.995 < 2e-16 ***
> tensionM -10.00 3.96 -2.525 0.014717 *
> tensionH -14.72 3.96 -3.718 0.000501 ***
>
> If I use the "ordered variable"(tension):Levels: L < M < H
> I don't know how to explain the result:
> Estimate Std. Error t value Pr(>|t|)
> (Intercept) 28.148 1.617 17.410 < 2e-16 ***
> tension.L -10.410 2.800 -3.718 0.000501 ***
> tension.Q 2.155 2.800 0.769 0.445182
>
> What's "tension.L" and "tension.Q" stands for?And how to explain the result then?

Ordered factors are handled by the R regression mechanism with orthogonal polynomial contrasts: ".L" for linear and ".Q" for quadratic. If the term had 4 levels there would also have been a ".C" (cubic) term. Treatment contrasts are used for unordered factors. Generally one would want to do predictions for explanations of the results. Trying to explain the individual coefficient values from polynomial contrasts is similar to and just as unproductive as trying to explain the individual coefficients involving interaction terms.

--

David Winsemius
Alameda, CA, USA

meng

unread,
May 22, 2013, 1:09:59 AM5/22/13
to David Winsemius, R help
Thanks.


As to the data " warpbreaks", if I want to analysis the impact of tension(L,M,H) on breaks, should I order the tension or not?


Many thanks.












At 2013-05-21 20:55:18,"David Winsemius" <dwins...@comcast.net> wrote:
>
>On May 20, 2013, at 10:35 PM, meng wrote:
>
>> Hi all:
>> If the explainary variables are ordinal,the result of regression is different from
>> "unordered variables".But I can't understand the result of regression from "ordered
>> variable".
>>
>> The data is warpbreaks,which belongs to R.
>>
>> If I use the "unordered variable"(tension):Levels: L M H
>> The result is easy to understand:
>> Estimate Std. Error t value Pr(>|t|)
>> (Intercept) 36.39 2.80 12.995 < 2e-16 ***
>> tensionM -10.00 3.96 -2.525 0.014717 *
>> tensionH -14.72 3.96 -3.718 0.000501 ***
>>
>> If I use the "ordered variable"(tension):Levels: L < M < H
>> I don't know how to explain the result:
>> Estimate Std. Error t value Pr(>|t|)
>> (Intercept) 28.148 1.617 17.410 < 2e-16 ***
>> tension.L -10.410 2.800 -3.718 0.000501 ***
>> tension.Q 2.155 2.800 0.769 0.445182
>>
>> What's "tension.L" and "tension.Q" stands for?And how to explain the result then?
>
>Ordered factors are handled by the R regression mechanism with orthogonal polynomial contrasts: ".L" for linear and ".Q" for quadratic. If the term had 4 levels there would also have been a ".C" (cubic) term. Treatment contrasts are used for unordered factors. Generally one would want to do predictions for explanations of the results. Trying to explain the individual coefficient values from polynomial contrasts is similar to and just as unproductive as trying to explain the individual coefficients involving interaction terms.
>
>--
>
>David Winsemius
>Alameda, CA, USA
>

[[alternative HTML version deleted]]

Uwe Ligges

unread,
May 22, 2013, 5:30:34 AM5/22/13
to meng, R help


On 22.05.2013 07:09, meng wrote:
> Thanks.
>
>
> As to the data " warpbreaks", if I want to analysis the impact of tension(L,M,H) on breaks, should I order the tension or not?

No homework questions on this list, please ask your teacher.

Best,
Uwe Ligges

meng

unread,
May 22, 2013, 10:44:04 PM5/22/13
to Uwe Ligges, R help
It's not homework.
I met this question during my practical work via R.
The boss is an expert of biology,but he doesn't know statistics.So I must find the right method to this work.

David Winsemius

unread,
May 23, 2013, 12:12:36 AM5/23/13
to meng, R help, Uwe Ligges

On May 22, 2013, at 7:44 PM, meng wrote:

> It's not homework.
> I met this question during my practical work via R.
> The boss is an expert of biology,but he doesn't know statistics.So I must find the right method to this work.
>

Yes, you must. Unfortunately, the Rhelp mailing list is for problem with R coding, but _not_ designed to offer tutorials on the proper education of stats-challenged biologists. It is an unfortunate truth that many a physician or biologist may rise to a position of authority without a proper grounding in statistics. The rectification of those deficiencies is not the stated goal of R help.

--
David Winsemius, MD, MPH

PIKAL Petr

unread,
May 23, 2013, 3:44:24 AM5/23/13
to meng, R help
Hi

Try to put your question on stackexchange. Or maybe it is already answered there. I am not an statistical expert but based on common sense (which can be counter intuitive sometimes) I will use ordered factor if I expect influence of tension value on breaks. Anyway I will probably consult more experienced people around or some textbook.

Regards
Petr

Greg Snow

unread,
May 23, 2013, 9:56:29 AM5/23/13
to meng, R help
Meng,

This really comes down to what question you are trying to answer. Before
worrying about details of default contrasts and issues like that you first
need to work out what is really the question of interest. The main
difference between declaring a variable ordered or not is the default
contrasts. Defaults are provided because there are many cases where which
contrasts are used internally does not matter, so why make someone think
about it. In cases where the choice of contrasts matter, it is rare that
any default coding is the correct/best choice and you should really think
through what contrasts answer the question of interest and use those custom
contrasts.

For example, to answer the question if Tension has any overall effect it
does not matter which contrast encoding you use (as long as it is full
rank), the test statistic and p-value for testing the whole effect will be
the same. The predictions of the means of groups will also be the same
regardless of which contrasts are used (and this is often a clearer way to
present/explain the results).

A case where the specific contrasts would matter would be if we want to see
if we can reduce the number of groups by combining groups together, or
interpolate to certain groups. The treatment contrasts will test if low
and medium can be combined (which makes sense) and if low and high can be
combined (which does not make sense unless the first is true and in fact
the overall factor is not significant), what makes more sense would be to
compare low to medium and medium to high (it could be that low is different
from the other 2, but med and high can be combined). The polynomial
contrasts give a different view, the quadratic term in this case tests
whether the medium group is the average of the low group and the high group
(so we could interpolate medium), this only makes sense if the medium
tension is centered (in some sense) between the other 2, i.e. the
difference from low to medium is exactly the same as the difference from
medium to high, but if that were the case then I would expect a numerical
term rather than an ordered factor.

So, to summarize, it depends on the question of interest. For some
questions the contrasts don't matter, in which case it does not matter, in
other cases the correct contrasts to use are determined by the question and
you should use the contrasts that answer that question (which are rarely a
default).
--
Gregory (Greg) L. Snow Ph.D.
538...@gmail.com

meng

unread,
May 23, 2013, 9:36:14 PM5/23/13
to Greg Snow, R help
Many thanks for your detailed reply.


I'll read your mail thoroughly. Thanks!






At 2013-05-23 21:56:29,"Greg Snow" <538...@gmail.com> wrote:

Meng,


This really comes down to what question you are trying to answer. Before worrying about details of default contrasts and issues like that you first need to work out what is really the question of interest. The main difference between declaring a variable ordered or not is the default contrasts. Defaults are provided because there are many cases where which contrasts are used internally does not matter, so why make someone think about it. In cases where the choice of contrasts matter, it is rare that any default coding is the correct/best choice and you should really think through what contrasts answer the question of interest and use those custom contrasts.


For example, to answer the question if Tension has any overall effect it does not matter which contrast encoding you use (as long as it is full rank), the test statistic and p-value for testing the whole effect will be the same. The predictions of the means of groups will also be the same regardless of which contrasts are used (and this is often a clearer way to present/explain the results).


A case where the specific contrasts would matter would be if we want to see if we can reduce the number of groups by combining groups together, or interpolate to certain groups. The treatment contrasts will test if low and medium can be combined (which makes sense) and if low and high can be combined (which does not make sense unless the first is true and in fact the overall factor is not significant), what makes more sense would be to compare low to medium and medium to high (it could be that low is different from the other 2, but med and high can be combined). The polynomial contrasts give a different view, the quadratic term in this case tests whether the medium group is the average of the low group and the high group (so we could interpolate medium), this only makes sense if the medium tension is centered (in some sense) between the other 2, i.e. the difference from low to medium is exactly the same as the difference from medium to high, but if that were the case then !
Reply all
Reply to author
Forward
0 new messages