Censoring and init values

29 views
Skip to first unread message

PierGianLuca

unread,
Nov 29, 2022, 10:52:43 AM11/29/22
to nimble-users
Hi everyone!

I have a question and a problem about censoring.

- Question, or rather a confirmation:

From §5.2.7.2 of the manual, if I have a variate t[i] that is *left*-censored, that is, unknown except for being smaller than a given value c[i], then I should use

censored[i] ~ dinterval(t[i], c[i])
t[i] ~ [***some distribution]

with censored[i]=0 if t[i] is left-censored (that is, t[i]<c[i]), and censored[i]=1 if t[i] is known. Is this correct?


- Problem:

I was trying to confirm the above with a code example, and I stumbled into a problem. Nimble seems to use a RW-sampler for the censored values. Let's say that t[4] is such a left-censored value, set to NA in the data. In some cases, especially at the beginning, proposal values for t[4] may *not* satisfy the censoring condition, and I get the warning

warning: logProb of data node censored[4]: logProb is -Inf

in fact, in this case the first two samples of t[4] do *not* satisfy the left-censoring condition (they were larger than c[4]).

I tried to specify acceptable values of t[4] in the init function, but Nimble says that it will discard them because 't' is data.

I observed the same phenomenon for right-censored data.

How can I bypass this problem?

Cheers!
Luca

Chris Paciorek

unread,
Dec 2, 2022, 11:35:16 AM12/2/22
to PierGianLuca, nimble-users
Hi Luca,

You need to set initial values for 't' for the elements that are censored. Suppose you have 5 observations and the fourth one is censored, then:

inits = list(t = c(NA,NA,NA, -3, NA))

(this assumes -3 is a valid value)

This might be something we should document better. I will check. And ideally (it's been on our to-do list for a while) our initialization procedure
would be smart enough to always initialize with valid values when there is censoring.

-Chris

--
You received this message because you are subscribed to the Google Groups "nimble-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nimble-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nimble-users/c8ac1e8e-344b-cb74-e809-16c442c3781d%40magnaspesmeretrix.org.

PierGianLuca

unread,
Dec 2, 2022, 12:09:53 PM12/2/22
to paci...@stat.berkeley.edu, nimble-users
Hi Chris,

Thank you for the confirmation.

I've been experimenting with censoring and the "dinterval" distribution, and I think it has great potential for further interesting uses. Here's one, and I'd value very much any thoughts or criticisms you may have about it – when & if you have time.

In density regression it is somewhat problematic to find a good mathematical representation of integer variates (or generally discrete-ordinal variates). If the possible values are few, say fewer than 8, one can simply use a categorical kernel, with a conjugate Dirichlet distribution on its "prob" parameter. Something like:

for(d in 1:ndata){
datum[d] ~ dcat(prob=prob[...])
}
prob ~ ddirichlet(...) # conjugate


If the possible values are many, say 100 or more, one can treat the variate as continuous, with typical log-gaussian kernel and so on.

In the intermediate range 5–100, however, the categorical representation does not represent natural smoothness/ordinality requirements well; while the continuous representation gets confused by the discrete data and leads to probability densities difficult to relate to a discrete distribution.

An alternative is to use a binomial or negative-binomial kernel. But these lead to slow mixing because of lack of conjugate relationships; plus problems with under- or over-dispersion.

I'm now trying an approach based on dinterval(), and it's working well so far. The idea (not new) is to introduce an auxiliary (latent/hidden) continuous variate underlying the integer one, and to connect the two with dinterval(). Something like:

for(d in 1:ndata){
datum[d] ~ dinterval(t=auxvariate[d], c=intervals[...])
auxvariate[d] ~ dnorm(mean=mean, var=variance)
}
mean ~ dnorm(...) # conjugate
variance ~ dinvgamma(...) # conjugate

The intervals[...] are chosen so as to bracket the possible integer values. For example, if the values are in the 0:8 range then

intervals <- seq(from=0.5, to=7.5, by=1)

At first I thought this could be slow, but actually it's quite fast and it seems to be mixing well.

I hope I haven't been too confusing. Nimble is just great!!

Cheers,
Luca



On 221202 17:35, Chris Paciorek wrote:
> Hi Luca,
>
> You need to set initial values for 't' for the elements that are censored. Suppose you have 5 observations and the fourth one is censored, then:
>
> inits = list(t = c(NA,NA,NA, -3, NA))
>
> (this assumes -3 is a valid value)
>
> This might be something we should document better. I will check. And ideally (it's been on our to-do list for a while) our initialization procedure
> would be smart enough to always initialize with valid values when there is censoring.
>
> -Chris
>
> On Tue, Nov 29, 2022 at 7:52 AM PierGianLuca <lu...@magnaspesmeretrix.org <mailto:lu...@magnaspesmeretrix.org>> wrote:
>
> Hi everyone!
>
> I have a question and a problem about censoring.
>
> - Question, or rather a confirmation:
>
>  From §5.2.7.2 of the manual, if I have a variate t[i] that is *left*-censored, that is, unknown except for being smaller than a given value c[i], then I should use
>
> censored[i] ~ dinterval(t[i], c[i])
> t[i] ~ [***some distribution]
>
> with censored[i]=0 if t[i] is left-censored (that is, t[i]<c[i]), and censored[i]=1 if t[i] is known. Is this correct?
>
>
> - Problem:
>
> I was trying to confirm the above with a code example, and I stumbled into a problem. Nimble seems to use a RW-sampler for the censored values. Let's say that t[4] is such a left-censored value, set to NA in the data. In some cases, especially at the beginning, proposal values for t[4] may *not* satisfy the censoring condition, and I get the warning
>
> warning: logProb of data node censored[4]: logProb is -Inf
>
> in fact, in this case the first two samples of t[4] do *not* satisfy the left-censoring condition (they were larger than c[4]).
>
> I tried to specify acceptable values of t[4] in the init function, but Nimble says that it will discard them because 't' is data.
>
> I observed the same phenomenon for right-censored data.
>
> How can I bypass this problem?
>
> Cheers!
> Luca
>
> --
> You received this message because you are subscribed to the Google Groups "nimble-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to nimble-users...@googlegroups.com <mailto:nimble-users%2Bunsu...@googlegroups.com>.
> To view this discussion on the web visit https://groups.google.com/d/msgid/nimble-users/c8ac1e8e-344b-cb74-e809-16c442c3781d%40magnaspesmeretrix.org <https://groups.google.com/d/msgid/nimble-users/c8ac1e8e-344b-cb74-e809-16c442c3781d%40magnaspesmeretrix.org>.
>

Chris Paciorek

unread,
Dec 10, 2022, 12:15:16 PM12/10/22
to PierGianLuca, nimble-users
Hi Luca,

I'm glad this is working well for you. I don't have any particular insights here. As you say, tying a discrete distribution to an underlying continuous one is a common strategy that has been used successfully in lots of different contexts.

-Chris
Reply all
Reply to author
Forward
0 new messages