The issue of needing to throw away warmup iterations before
convergence is not at all unique to Stan.
Many MCMC algorithms perform a number of warmup (burnin, adaption,
or whatever you call it) iterations to tune parameters of
the algorithm, then fix the algorithm parameters before converting
to a properly Markovian regime to do sampling. The reason you
can't keep the warmup iterations is twofold:
1. they often don't form a Markov chain. This is true for Stan
and it would also be true of a Metropolis algorithm where you
are estimating the covariance matrix with which to do jumping
proposals and estimating a step size to tune rejection rate.
2. they typically aren't reasonable draws from the posterior if
you start with random inits far out in the tails. If you keep
the warmup iterations then you'll bias the final estimates.
I'm not sure what BUGS does for it's adaptive rejection
sampling during warmup --- that is, I don't know if you can
properly use all the draws or if you need to throw away the
warmup draws. Usually you want to throw away warmup draws
before you've converged to the high mass volume of the
posterior anyway because even though they will wash out
asymptotically, they will bias a small-ish finite sample.
I'm also not sure what JAGS does with its slice sampling, which
can also be adapted.
Some conjugate models don't need to be adapted at all, but
those still suffer from issue (2).
A way to get around issue (1) is to gradually decline adaption
according to something like a Robbins-Monro strategy --- that
can make the result acceptable, but it'll still suffer from
issue (2) if you don't start from a reasonable draw from the
posterior.
- Bob