You can put whatever priors you want on the
variables. It doesn't affect the computation much.
It does affect the concentration of the posterior,
which is very sensitive to the choice of prior in
these models.
I also didn't provide a prior on the deviation
parameters, which I should probably do.
On 9/10/12 4:12 AM, Emmanuel Charpentier wrote:
> Dear Bob,
>
> Thank you for this effort. I have skimmed the stan files, and notice a couple of things :
>
> 1) You use one more parameter, the mean ability, whih, from a "sampling efficiency" point of view, should play the same
> role that the mean of the alpha distribution in Gelman & Hill's textbook.
Exactly. You can parameterize it the other
way, but it's then adding a dependency to all the
parameters on the mean, which is less flexible than
just having the mean float.
> 2) You give to this mean an informative prior. This would be hard to defend against a reviewer.
Presumably in the same sense that any informative prior might
be hard to defend against a reviewer!
I could've added the priors as data as in the S. McKay Curtis
JSS paper on BUGS item-response models, so that they'd be
easier for users to play with.
I found when the priors were fatter than the values used
to simulate the data, the mean posterior estimates drifted
from their simulated values (in predictable, repeatable ways).
I'd recommend the hierarchical models, which don't have this
problem, but can also be difficult to defend against reviewers
in my experience.
> 3) Similarly, you give abilities, discriminations and difficulties somewhat informative priors. The same remark applies
> : whereas imposing a standard deviation on abilities is just a choice of scale; imposing a small standard deviation to
> discriminations and difficulties *is* informative. Hard to do when you are exploring a situation where no prior
> knowledge has been formalized.
Indeed. As I said above, feel free to change the priors!
> 4) However, in the IRT2 hierrchical model, you do not state a prior for the abilities and difficulties. This, as far as
> I know, is equivalent to giving them the (improper) prior on (0, \infty), ths making the posterior improper, AFAIK. So
> much for model comparison...
Exactly -- the scale parameters have improper priors.
The posterior doesn't behave improperly, but I haven't done
a formal derivation.
Given how the model behaves without a prior (recovering
the true simulated values), and given the quantity of
data for the priors (# of questions, # of students), a
very weak prior shouldn't have much effect.
It's easy enough to add a prior. Andrew likes using
weakly informative priors in these situations, but
I realize other researchers have different approaches.
> In short, where Gelman & Hill's solution introduced both an additive and a multiplicative redundancy (by means of
> centering and scaling "raw" parameters), you keep the additive redundancy and replace the multiplicative redundancy by
> imposing more informative distributions.
Gelman and Hill didn't propose a single solution. They discuss
several methods (setting one value based on prior knowledge,
setting the mean to 0, then their actual model also sets the
scale to 1 by using a z-score type normalization).
I was just trying to use the simplest thing possible.
> I'll try to assess the impact of these choices by trying various values of the prior parameters.
Let us know what happens. I don't have that much time to play
around with IRT parameterizations.
- Bob