--
You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
To post to this group, send email to stan-...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
<logistic_OD_multi_Gelman_5p.stan.txt><logistic_OD_multi_Gelman_5p_unkn.stan.txt><RunModel.R.txt><demoLog5p.csv><UnknData.csv>
//Use the current parameters to calculate the interpolated concentration of each unknown value.
//The OD (Y value) must be higher than the bottom asymptote (beta[1]) and lower than the top asymptote (beta[1] + beta[2]) or RStan throws an error at the end (for RStan > 2.6).
//So any value outside the asymptotes gets the "marker value" of -50. Additional R-side commands deal with this afterwards.
for(i in 1:N_unkn){
vector[5] beta;
beta <- exp(logBeta[pID_unkn[i]]);
if(Unknown[i] > beta[1] && Unknown[i] < (beta[1] + beta[2])){
indiv_theta[i] <- log10((beta[3] / pow(pow((beta[1] - (beta[1] + beta[2])) / (beta[1] - Unknown[i]), 1 / beta[5]) - 1, -1 / beta[4])) * Dilution[i]);
} else {
indiv_theta[i] <- -50;
}This should be (?) generating a warning about using a non-linear transform on the LHS of a sampling statement, ultimately associated with the line:
indiv_theta[i] ~ normal(theta[uID[i]], sigma_unkn);
So that's not quite the model you think it is. I don't quite see how this version matches up with with the article so I won't try to make more detailed comments on it but it looks like you could simplify this a fair bit---I don't think the '-50' flag value should be necessary in this model.
KrzysztofHi…it’s been a few years since I worked on this. . . . I actually fit the model in Bugs, believe it or not! One thing I remember for real data was that if you have some unknowns with zero concnentrations, the model can have problems because the lognormal model on the unknown concentrations won’t make sense. In that case we needed to go with a mixture model. Also, the model I fit did not have any correlation matrix. I think that your problems are coming from letting all 5 betas vary like this. I’d suggest just keeping the betas constant, or maybe just varying one of them, not all 5.A
On Feb 10, 2016, at 5:45 PM, Joanthan Audet <jona...@hotmail.com> wrote:
I think someone inquired about the code for the serial dilution assay in the paper here. I have also been trying to implement it. I seem to have been able to code the model overall, but some issues remain. I did modify it a bit: I included 5 parameters instead of four (but have faced the same issues with a 4-parameter model) and I adjusted the priors for my assay. Also, the model does not take days into account (the paper had 3 levels: individual plate -> plates on same day -> days).Two issues are beyond my math skills:First, the model always runs into large numbers of divergent transitions even with adapt_delta = 0.95.Second, (probably related) some of the R_hat's are terrible, especially for the correlation matrix Omega.Third, should I be worried about all the warnings regarding positive-definite matrices and other related issues?I also don't know if I am missing opportunities to vectorize (or otherwise speed up) the code...I included 2 models, an initial one (which has the above issues) which simply calculates the curves, and a second one (ends in "unkn") which is meant to include multilevel interpolation of samples based on the curve. I was very interested in using the multilevel prediction across dilutions of unknown samples but the paper is vague. As I understand Stan, making the prediction multilevel requires putting it in the "model" block since it is a sampling statement (rather than just in the "generated quantities" for normal interpolation, which has worked fine in other models without any multilevel aspects). However, including unknowns completely messes up the curve parameters for that plate; I have included unknown data for one of the plates. The inclusion of unknowns also slows sampling substantially, the 3000 samples in the script generally take far longer than the 6000 for the model without unknowns.I tried to comment the models as much as I could. I also included data and unknown data along with a script.I am currently using Microsoft R Open 3.2.3 on Windows 7 (and 10) x64 along with the MKL libraries it provides. The script calls on the Checkpoint package to ensure that all package versions are stable.
I will try out your suggestions today.Regarding the unknown calculations, I have to admit I don't quite follow the reasoning of section 3.2 of the paper. In the context of least-squares fit, I would just isolate x instead of y and work from there. This is what I did here. But that method is only valid if "y" is between the bottom and top, each of which moves with the state of the sampler.
The model without the unknown, while still having sampling issues, actually retrodicts the curves quite well already. I am including a corrected script, I noticed some missing functions in the original one. Also, a version of the curve-only model with the cholesky decomposition. Regarding tau, I did make it half-normal, its values in the posterior are all 0.05-0.5 so right now I think I put a normal(0.25, 0.1) prior on it. Would it be better to model log_tau instead?
Sent from Mail for Windows 10
The model without the unknown, while still having sampling issues, actually retrodicts the curves quite well already.
I am including a corrected script, I noticed some missing functions in the original one. Also, a version of the curve-only model with the cholesky decomposition.
Regarding tau, I did make it half-normal, its values in the posterior are all 0.05-0.5 so right now I think I put a normal(0.25, 0.1) prior on it. Would it be better to model log_tau instead?
vector<lower = 0>[5] tau;
Stan (hoping I get this right) internally uses log(tau) as a parameter (on the real line).
If the posterior really looks a lot like the prior I would wonder whether there's much information about it in the data, or if it's not well identified. Since you can do these models plate-by-plate there ought to be... making the prior tighter is not going to solve this issue so I would avoid normal(.25,.1) and try to understand the model better. If you can currently fit the model (even without unknowns) I would run that version and look at the pairs plots of other parameters with tau and see if some are highly or non-linearly correlated with it.
Also looks like the OD line (OD ~ normal(cOD, sigma_idiv);) and the parent cOD calculation makes a lot more sense now with the paper.
Krzysztof
To post to this group, send email to stan...@googlegroups.com.
I think I am starting to figure out what the main issue is with this model. Normally, the curve parameters are dependent on the standard, i.e. P(Bottom, Span, Inflec, Asym, Slope | x_std, OD_std), and the unknown concentrations are dependent on the curve parameters, i.e. P(theta | Bottom, Span, Inflec, Asym, Slope, OD_unkn). In the current model, since everything is calculated at once we have P(Bottom, Span, Inflec, Asym, Slope, theta | x_std, OD_std, OD_unkn) which implies that the curve parameters are also optimized as a function of the unknowns, this makes no logical or physical sense.
Also, since there are 72 wells of unknowns (18 samples, 2 dilutions each, in duplicate) and 24 wells of standard per plate, the unknown overwhelm the standards.
As I said earlier, one solution is to use inverse calculation in the generated quantities block. However that has two problems: 1) Artificial limits have to be placed on some parameters (so that Bottom < OD < Bottom + Span), even though OD readings above the top of the curve are possible due to measurement and enzymatic noise (this is an ELISA assay);
2) In biological assays, in my experience, interpolated values do not quite scale across dilutions, i.e. simply multiplying to reverse the dilution can yield surprisingly different values, that is the strength (in theory) of the approach used in the paper, it can weigh different dilutions depending on the precision they yield.
Is there a way to make the curve parameters independent of the unknowns' OD while keeping the multilevel unknown estimation? That issue did not seem to be a problem in the paper.
Thanks for the tips!
I have used non-centered parameterization in other models, but did not realize it would help here.
I will try those changes out tomorrow.
Re 4.6 GHz: It does worry me whenever I look at the temperature but that is from Asus’ OC utility… and it cools down pretty quick.
Sent from Outlook Mail for Windows 10 phone

