On Wed, Apr 8, 2015 at 6:42 PM, Arnaldo Russo <
arnald...@gmail.com> wrote:
> I have posted this doubt as a issue on the repository, but I think people
> here have more access.
Sorry, I was distracted and had forgotten about this before replying
>
> I was thinking about how to get the numeric values of "reject" column.
>
> In [140]: print(tukey_result.summary())
> Multiple Comparison of Means - Tukey HSD,FWER=0.05
> =============================================
> group1 group2 meandiff lower upper reject
> ---------------------------------------------
> ger tbw 14.1319 -3.3327 31.5966 False
> ger tww 30.2923 11.3796 49.205 True
> tbw tww 16.1604 -0.9708 33.2915 False
> ---------------------------------------------
>
> tukey_result.reject returns boolean results the same for attribute reject2.
> Where I can find the numeric float values of p?
p-values are still not available, I mentioned them in a general issue
but didn't add any priority, and it got lost
as far as I can figure out (after trying for a while)
psturng is the inverse cdf or inverse sf of the studentized range distribution
The returned values will be the boundary cases for pvalues outside the
supported range
tt.res is the results instance from tukeyhsd (used in the test suite)
>>> from statsmodels.stats.libqsturng import psturng
>>> st_range = np.abs(tt.res.meandiffs) / tt.res.std_pairs
>>> psturng(st_range, len(tt.res.groupsunique), tt.res.df_total)
array([ 0.01054871, 0.10790278, 0.54980308])
there is a helper function in the sandbox that uses numerical
integration (which might not be very robust numerically in extremer
cases or large number of groups)
>>> import statsmodels.sandbox.stats.multicomp as multi
>>> [multi.tukey_pvalues(ii, len(tt.res.groupsunique), tt.res.df_total)[0] for ii in st_range]
[0.010573068621088755, 0.10790986711831496, 0.55139512033628768]
The statsmodels test suite has a case to check against R, I don't have
R or other test cases available right now.
>>> #R from test suite
>>> tukeyhsd2s[:, 3]
array([ 0.01056279, 0.1079035 , 0.5513904 ])
looks pretty close for a p-value with non-standard distribution (good
approximation for large pvalues is usually not required and often not
a priority)
>>> psturng(st_range, len(tt.res.groupsunique), tt.res.df_total) / tukeyhsd2s[:, 3] - 1
array([ -1.33303869e-03, -6.70150226e-06, -2.87876374e-03])
>>> psturng(st_range, len(tt.res.groupsunique), tt.res.df_total) - tukeyhsd2s[:, 3]
array([ -1.40806078e-05, -7.23115549e-07, -1.58732269e-03])
> why the simple tukey_result.summary() does not print the summary table?
It just returns a SimpleTable without extras and defines a useful
__str__ but not an informative __repr__. That's why we need
print(tukey_result.summary()). It looks like it supports html in
ipython notebooks.
Josef