Confidence interval in DscrStatsW

17 views
Skip to first unread message

Peter Tittmann

unread,
Oct 10, 2017, 8:30:36 PM10/10/17
to pystatsmodels
Hi,

I'm trying to reconcile calculation of the upper and lower bound of  a 95% confidence interval done manually and done using the  DscrStatsW module.

Hear are the basics:

dfTest = testData[(testData['RX'] == 'X1') & (testData['Year'] == 2016)]['Onsite_CO2']
ds
= DescrStatsW(dfTest, ddof=1)
stats
= {'Mean' : ds.mean,
         
'n' : ds.nobs,
         
'StDev': ds.std,
         
'SEM': ds.std_mean,
         
't': scipyStats.t.ppf(0.95, ds.nobs),
         
'Bound_Manual' : scipyStats.t.ppf(0.95, ds.nobs) * scipyStats.sem(dfTest),
         
'Bound_Derived' : ds.tconfint_mean(alpha= 1-0.95)[1] - ds.mean,
       
}
pd
.DataFrame.from_dict(stats, orient = 'index')

Which results in:

0
n147.000000
t1.655285
StDev202.394872
SEM16.693248
Bound_Derived32.991628
Bound_Manual27.632090
Mean389.620282

I'd be grateful if someone could explain to me why the 'Bound' is different here between methods.

Thank you,

Peter

josef...@gmail.com

unread,
Oct 10, 2017, 8:43:14 PM10/10/17
to pystatsmodels
one guess is the df correction in scipyStats.t.ppf(0.95, ds.nobs) which might be ds.nobs - 1

reverse engineering:

>>> stats.t.ppf(0.975, 147-1) * 16.693248
32.991628145671356

your manual version uses 0.05 on each side which means alpha=0.1
the statsmodels version uses 0.025 in each tail, i.e. alpha=0.05

you can also check the statsmodels version with your data at alpha=0.1

ds.tconfint_mean(alpha=0.1)[1] - ds.mean

Josef
 

Thank you,

Peter

Reply all
Reply to author
Forward
0 new messages