Confidence interval in DscrStatsW

17 views

Skip to first unread message

Peter Tittmann

unread,

Oct 10, 2017, 8:30:36 PM10/10/17

to pystatsmodels

Hi,

I'm trying to reconcile calculation of the upper and lower bound of a 95% confidence interval done manually and done using the DscrStatsW module.

Hear are the basics:

dfTest = testData[(testData['RX'] == 'X1') & (testData['Year'] == 2016)]['Onsite_CO2']
ds = DescrStatsW(dfTest, ddof=1)
stats = {'Mean' : ds.mean,
         'n' : ds.nobs,
         'StDev': ds.std,
         'SEM': ds.std_mean,
         't': scipyStats.t.ppf(0.95, ds.nobs),
         'Bound_Manual' : scipyStats.t.ppf(0.95, ds.nobs) * scipyStats.sem(dfTest),
         'Bound_Derived' : ds.tconfint_mean(alpha= 1-0.95)[1] - ds.mean,
        }
pd.DataFrame.from_dict(stats, orient = 'index')

Which results in:

	0
n	147.000000
t	1.655285
StDev	202.394872
SEM	16.693248
Bound_Derived	32.991628
Bound_Manual	27.632090
Mean	389.620282

I'd be grateful if someone could explain to me why the 'Bound' is different here between methods.

Thank you,

Peter

josef...@gmail.com

unread,

Oct 10, 2017, 8:43:14 PM10/10/17

to pystatsmodels

one guess is the df correction in scipyStats.t.ppf(0.95, ds.nobs) which might be ds.nobs - 1

reverse engineering:

>>> stats.t.ppf(0.975, 147-1) * 16.693248

32.991628145671356

your manual version uses 0.05 on each side which means alpha=0.1

the statsmodels version uses 0.025 in each tail, i.e. alpha=0.05

you can also check the statsmodels version with your data at alpha=0.1

ds.tconfint_mean(alpha=0.1)[1] - ds.mean

Josef

Thank you,

Peter

Reply all

Reply to author

Forward

0 new messages