pscore and p-values (confidence interval)

Javier Burroni

unread,

Jun 20, 2013, 3:05:25 PM6/20/13

to pystatsmodels

I have to replicate a published paper which uses pscore matching.

When implementing pscore, I realize that display the p-values and confidence interval is not useful. Actually, it's harmful as may give to the reader some confidence which is not strong.

We should think on this for the implementation. I think, it would be adequate to show how the confident interval moves as the matching algorithm's parameters moves.

Here is an example Inline image 1

you can see the lack of relation between the confidence interval and the actual possible (treatment effect in vertical axis) values when changing the radius parameter.

saludos

jb

--
" To be is to do " ( Socrates )
" To be or not to be " ( Shakespeare )
" To do is to be " ( Sartre )
" Do be do be do " ( Sinatra )

pscore.png

josef...@gmail.com

unread,

Jun 20, 2013, 3:20:45 PM6/20/13

to pystat...@googlegroups.com

On Thu, Jun 20, 2013 at 3:05 PM, Javier Burroni <javier....@gmail.com> wrote:

I have to replicate a published paper which uses pscore matching.
When implementing pscore, I realize that display the p-values and confidence interval is not useful. Actually, it's harmful as may give to the reader some confidence which is not strong.

We should think on this for the implementation. I think, it would be adequate to show how the confident interval moves as the matching algorithm's parameters moves.
Here is an example

you can see the lack of relation between the confidence interval and the actual possible (treatment effect in vertical axis) values when changing the radius parameter.

Just to understand a bit:

These are the confidence intervals for the matched t-test without taking the uncertainty from the matching process into account?

Are you matching here one-on-one or one to many with mean of reference group within radius?

Josef

pscore.png

Javier Burroni

unread,

Jun 20, 2013, 3:36:50 PM6/20/13

to pystatsmodels

I'm matching one treated to as many possible control within a radius

for each radius I've computed the mean treatment effect and its variance, so, I've got the confidence interval. This confidence interval is the value reported by, for instance, stata.

Then, using np.logspace, I've computed this interval for different radius. The researcher may choose one particular radius and report data only for that radius

pscore.png

josef...@gmail.com

unread,

Jun 20, 2013, 4:11:21 PM6/20/13

to pystat...@googlegroups.com

On Thu, Jun 20, 2013 at 3:36 PM, Javier Burroni <javier....@gmail.com> wrote:

I'm matching one treated to as many possible control within a radius

for each radius I've computed the mean treatment effect and its variance, so, I've got the confidence interval. This confidence interval is the value reported by, for instance, stata.

Ok, if Stata reports it the same way, then we are fine :)

Then, using np.logspace, I've computed this interval for different radius. The researcher may choose one particular radius and report data only for that radius

Can you plot the effect size (diff in value of treated minus local mean of reference group) as a function of the propensity score of the treated?

My guess is that similar to the example in the stratification part, there is a systematic change in effect size. The shape of your plot relative to radius might mean that this is convex. Local constant approximation to the convex curve would have increasing effect size with increasing radius.

It wouldn't surprise me then that the radius has a large influence (which might just be bias from not using a local approximation.)

(maybe this is obvious anyway, but I never worked with matching data).

Josef

pscore.png

Javier Burroni

unread,

Jun 21, 2013, 12:08:29 AM6/21/13

to pystatsmodels

Here is the plots. As the matched sets varies with radius, I made two plots using radius: 0.001 and 0.01.

radius001.png

radius010.png

pscore.png

Javier Burroni

unread,

Jun 21, 2013, 12:32:36 AM6/21/13

to pystatsmodels

I will check the way I compute the variance. There should be a bug in the variance estimation code. I'll check

radius010.png

pscore.png

radius001.png

josef...@gmail.com

unread,

Jun 21, 2013, 12:39:50 AM6/21/13

to pystat...@googlegroups.com

Is the effect size (vertical axis) on the same scale as before?

Some of these large negative observations might mess up a bit the mean estimate. Maybe not much if the sample is large enough.

But I don't see what would cause the increase in the mean as function of radius, and why the variance is so small.

Another plot that might be interesting is to show both levels instead of the difference in a plot, one color for treated, another color for reference (individual reference observations, or within radius group means).
That might show if there are some treated or some reference observations far away from the rest.

Josef

radius010.png

radius001.png

pscore.png

Javier Burroni

unread,

Jun 21, 2013, 12:43:01 AM6/21/13

to pystatsmodels

There were... 1-alpha :S

Even though the same conclusion can be reach. You can find a zone with p-value less than 5% (radius in (0.002, 0.003))

radius010.png

pscore.png

radius001.png

pscore.png

josef...@gmail.com

unread,

Jun 22, 2013, 7:01:27 AM6/22/13

to pystat...@googlegroups.com

On Fri, Jun 21, 2013 at 12:43 AM, Javier Burroni <javier....@gmail.com> wrote:

There were... 1-alpha :S

Even though the same conclusion can be reach. You can find a zone with p-value less than 5% (radius in (0.002, 0.003))

(forgot to reply)

I think choosing the radius to get a significant result is a bit cheating.

What I would do in this case is try out some robust statistics, to see if the result differs when we downweigh or remove some of the extreme observations.

Even if the "outliers" don't have much influence on the mean, they can still have a large influence on the estimate of the variance (since it squares the deviations.) Without the "outliers" the variance should drop, and the result might be more significant (smaller pvalues).

I don't know how you are testing for the difference right now, but some suggestions.

Redo the analysis after dropping some extreme observations to see if the result changes. The p-values won't be completely correct, but it would give an idea about their influence. If the result changes, use a proper outlier robust procedure.

For example, if you can write the testing problem as linear regression, then you can use OLS where you get the outlier and influence diagnostics, and you can use RLM to get a robust estimate.

Or, use a robust or non-parametric replacement for the t-test.
(I just spent parts of the last two weeks on t-test and Anova for trimmed mean comparison.)

Josef

radius010.png

pscore.png

radius001.png

Javier Burroni

unread,

Jun 22, 2013, 11:54:50 AM6/22/13

to pystatsmodels

thanks for the reply.

Just to note:

1) This help me to fix some code (which, eventually, will be in statsmodels)

2) I want to *add* something to the framework to make it easier not to cheat ("incentive compatible" not cheating). Probably some MetaPscore. As you already noticed, radius has a dependency on the pscore distribution, and so it could be somehow tested for its adequacy

/jb

pscore.png

radius010.png

radius001.png

Reply all

Reply to author

Forward