> I have a non-Empirical likelihood question realted to my GSOC project. It
> is mostly theoretical.
>
> The model I am working with is an AFT; it is a model with random right
> censoring:
>
>
http://en.wikipedia.org/wiki/Accelerated_failure_time_model
>
> The model is a simple single variable least squares y = a +Bx, and there is
> a vector, d, that indicates whether or not the response is right censored.
>
> Now here is my issue. The way to proceed with estimating such a model is
> using WLS where the weights are a function of whether or not the model is
> censored and the Kaplan-Meier estimate. Anyway, I am dealing with 3 papers.
> The first 2 are EXACTLY the same. In fact it seems as one is almost
> directly copied from the other. For reference, they are:
>
> Li and Wang. "Empirical Likelihood Regression Analysis for Right Censored
> Data." Statistica Sinica, 13, 51-68.
>
> Qin and Jing. "Empirical Likelihood for Censored Linear Regression".
> Scandanavian Journal of Statistics. Vol 28 661-663
they are referring to Koul, ... published in The Annals of Statistics,
Vol. 9, No. 6 (Nov., 1981), cited 363 times
>
> Zhou, Kim, Bathke, "Empirical Likelihood Analysis for the Heteroskedastic
> Accelerated Failure time Models" Statistica Sinica 22 (2012), 295-316
>
>
> The first two are exactly the same. The third is different (and the third
> was also written by the person who wrote the censored EL regression library
> in R).
>
> The difference between the first two and the third is that in the first two,
> the authors suggest a formula for weighting the Y's only, and then
> performing OLS on the new Y data, not performing WLS. To me this seems
> strange as I am unfamiliar with this approach. The third paper suggests
> weighting observations (not just the response variable) by the same formula
> that the other two papers suggested weighting the endogenous variables.
>
> Now, the only way I am able to replicate the parameter estimates for 'B' in
> R is if I follow the third paper. To add another layer of confusion, in the
> third paper the authors acknowledge that there are 2 ways of weighting
> observations to estimate the parameters a and B and they use the other way
> in the emplike package in R.
>
> And one last element of frustration, even though I am able to replicate the
> results in R, the results are not the same as the results reported in the
> first paper (the data was publicly available). In other words, using the
> data from the Li and Wang paper and running it through the R program
> designed to estimate AFT model parameters did not produce the same results
> as the authors of the original paper.
>
> Now, to finally ask a concrete question, is it acceptable to proceed in the
> way that is consistent with the R package and the Zhou paper even though
> there are discrepancies in the other papers? Also, the more I explore
> this, the more I realize that there doesn't seem to be a unified way to
> estimate a randomly censored regression model. Maybe all of the methods are
> correct in some way?
My guess is that these are different estimators that are all "correct".
The best would be to find a reference to which is "better" (or for
which cases one or the other is better), and implement both or the
better.
Matching an R package is fine, the quality and usefulness depends a
bit on which R package has it.
I think you could proceed with the R equivalent, and commit to
whatever you have implemented into the example folder or the sandbox,
so whenever we want to expand to different methods for AFT or censored
regression we have a place to start.
Skipper has Tobit mostly finished, but from only a quick look at some
of your references, I cannot tell how much overlap there is. (Skipper
implemented maximum likelihood, but as far as I remember there are
also some methods that use linear models with transformed data.)
Josef