On 2/9/14, 11:38 PM, Andrew Gelman wrote:
> I agree. Although it could be useful to mention the term "identifiability" just to say that this is a related concept.
Andrew followed his own advice in BDA, but it never gets defined,
which is why I think I was so confused about what it meant. Because
the usage in BDA is not the usual notion from non-Bayesian stats.
In ARM, Gelman and Hill (p. 220) are more precise, saying
Identifiability refers to whether the data contain sufficient information
for unique estimation of a given parameter or set of parameters in a particular
model.
The cases mentioned in BDA and arm are (a) collinearity in regressions, (b)
separability in logistic regressions, (c) additive and multiplicative nonidentifiability
in IRT-like models, and (d) mixture model "label switching".
This is obviously not the usual definition of identifiability for MLE
(as defined in Greene and on the Wikipedia) and presumably why Andrew
didn't like my first section --- it doesn't match his usage of the term.
So I propose I just drop this notion of "identifiability" altogether and
concentrate on "problematic posteriors".
More below on what I think that should entail.
> On Feb 9, 2014, at 11:35 PM, Ben Goodrich wrote:
...
>> There is not much to say about improper posteriors, except that you basically can't do Bayesian inference.
I disagree. The problem stems from advice from Andrew and others
in the context of Gibbs. Advice our users are trying to apply to Stan,
with no luck, leading them to conclude Stan is slow.
Check out ARM's section 19.4, "Redundant parameters and intentionally nonidentifiable
models". Here Andrew and Jennifer suggest for computational efficiency to use
parameterizations that are not identifiable and then post-process to something that is.
This works fine in Gibbs for reasons I tried to explain in the section Ben wants
me to cut, but won't work in Stan for reasons I try to explain in the same section.
>> Although Stan can optimize a log-likelihood function, everybody doing so should
>> know that you can't do maximum
>> likelihood inference without a unique maximum.
"Should know" and "do know" can be miles apart.
But I think the bigger point is that people try to do inference all the
time when parameters aren't identified in the sense of Gelman and Hill.
Sometimes the models aren't identified in the sense of Greene and the
Wikipedia and sometimes they are.
For example, in the two-location parameter example I was going to include,
you get perfectly reasonable predictions for new data with any of the
maximal (not maximum) likelihood estimates --- just as good
as with the single parameter model. You also get reasonable inferences in
Gibbs and even in Stan. It's just that your paramters aren't
identified in the Gelman/Hill sense.
Same thing with collinear predictors under L1 priors --- the
model plus data don't give a unique MLE, but the inferences are just fine in
most cases because most software takes finitely many finite steps.
- Bob