pairwise_tukeyhsd - retrieve "reject" values

196 views
Skip to first unread message

Arnaldo Russo

unread,
Apr 8, 2015, 6:43:19 PM4/8/15
to pystatsmodels

I have posted this doubt as a issue on the repository, but I think people here have more access.

I was thinking about how to get the numeric values of "reject" column.

In [140]: print(tukey_result.summary())
Multiple Comparison of Means - Tukey HSD,FWER=0.05
=============================================
group1 group2 meandiff  lower   upper  reject
---------------------------------------------
ger    tbw   14.1319  -3.3327 31.5966 False 
ger    tww   30.2923  11.3796  49.205  True 
tbw    tww   16.1604  -0.9708 33.2915 False 
---------------------------------------------
  1. tukey_result.reject returns boolean results the same for attribute reject2. Where I can find the numeric float values of p?
  2. why the simple tukey_result.summary() does not print the summary table?

Cheers,
Arnaldo.


---
Arnaldo D'Amaral Pereira Granja Russo
Lab. de Estudos dos Oceanos e Clima
Instituto de Oceanografia - FURG


josef...@gmail.com

unread,
Apr 8, 2015, 8:33:21 PM4/8/15
to pystatsmodels
On Wed, Apr 8, 2015 at 6:42 PM, Arnaldo Russo <arnald...@gmail.com> wrote:
> I have posted this doubt as a issue on the repository, but I think people
> here have more access.

Sorry, I was distracted and had forgotten about this before replying

>
> I was thinking about how to get the numeric values of "reject" column.
>
> In [140]: print(tukey_result.summary())
> Multiple Comparison of Means - Tukey HSD,FWER=0.05
> =============================================
> group1 group2 meandiff lower upper reject
> ---------------------------------------------
> ger tbw 14.1319 -3.3327 31.5966 False
> ger tww 30.2923 11.3796 49.205 True
> tbw tww 16.1604 -0.9708 33.2915 False
> ---------------------------------------------
>
> tukey_result.reject returns boolean results the same for attribute reject2.
> Where I can find the numeric float values of p?

p-values are still not available, I mentioned them in a general issue
but didn't add any priority, and it got lost

as far as I can figure out (after trying for a while)

psturng is the inverse cdf or inverse sf of the studentized range distribution
The returned values will be the boundary cases for pvalues outside the
supported range

tt.res is the results instance from tukeyhsd (used in the test suite)

>>> from statsmodels.stats.libqsturng import psturng
>>> st_range = np.abs(tt.res.meandiffs) / tt.res.std_pairs
>>> psturng(st_range, len(tt.res.groupsunique), tt.res.df_total)
array([ 0.01054871, 0.10790278, 0.54980308])


there is a helper function in the sandbox that uses numerical
integration (which might not be very robust numerically in extremer
cases or large number of groups)

>>> import statsmodels.sandbox.stats.multicomp as multi
>>> [multi.tukey_pvalues(ii, len(tt.res.groupsunique), tt.res.df_total)[0] for ii in st_range]
[0.010573068621088755, 0.10790986711831496, 0.55139512033628768]


The statsmodels test suite has a case to check against R, I don't have
R or other test cases available right now.

>>> #R from test suite
>>> tukeyhsd2s[:, 3]
array([ 0.01056279, 0.1079035 , 0.5513904 ])

looks pretty close for a p-value with non-standard distribution (good
approximation for large pvalues is usually not required and often not
a priority)

>>> psturng(st_range, len(tt.res.groupsunique), tt.res.df_total) / tukeyhsd2s[:, 3] - 1
array([ -1.33303869e-03, -6.70150226e-06, -2.87876374e-03])
>>> psturng(st_range, len(tt.res.groupsunique), tt.res.df_total) - tukeyhsd2s[:, 3]
array([ -1.40806078e-05, -7.23115549e-07, -1.58732269e-03])


> why the simple tukey_result.summary() does not print the summary table?

It just returns a SimpleTable without extras and defines a useful
__str__ but not an informative __repr__. That's why we need
print(tukey_result.summary()). It looks like it supports html in
ipython notebooks.

Josef

Arnaldo Russo

unread,
Apr 9, 2015, 7:53:19 AM4/9/15
to pystatsmodels
Hi Josef,

Thank you for your time and explanation!
Is there any reason why `psturng` output is not activated as an attribute of `pairwise_tukeyhsd`? 

Just including another doubt here, (sorry for cross-posting). Is there any function to compute the same test as tukeyhsd does, but for non parametric analysis?

If you prefer, we can keep this conversation inside github.

Cheers,
Arnaldo.



---
Arnaldo D'Amaral Pereira Granja Russo
Lab. de Estudos dos Oceanos e Clima
Instituto de Oceanografia - FURG



josef...@gmail.com

unread,
Apr 9, 2015, 8:15:23 AM4/9/15
to pystatsmodels
On Thu, Apr 9, 2015 at 7:52 AM, Arnaldo Russo <arnald...@gmail.com> wrote:
> Hi Josef,
>
> Thank you for your time and explanation!
> Is there any reason why `psturng` output is not activated as an attribute of
> `pairwise_tukeyhsd`?
>
> Just including another doubt here, (sorry for cross-posting). Is there any
> function to compute the same test as tukeyhsd does, but for non parametric
> analysis?
>
> If you prefer, we can keep this conversation inside github.

The main general issue is https://github.com/statsmodels/statsmodels/issues/852
which has a discussion of Kruskal and Nemenyi and similar.
That issue would be the location for general multiple comparison discussion.


Overall, the missing features are because nobody is currently working
on this. We had some contributed enhancements and some bug/maintenance
fixes since I wrote this, but no big push to get other functionality
in.

My related work will be targeting multiple comparisons after model estimation.

PR's and test cases are always welcome.
Reply all
Reply to author
Forward
0 new messages