Hi Roy,
Thanks for your interest in SCALE! Yes, SCALE can be applied to dataset where only a subsection of the cells have spike-ins. As a matter of fact, the real datasets that we have in our paper (human fibroblast + mouse blastocyst) have 12 and 30 cells with spike-ins. We estimate the technical noise associated parameters assuming they are shared across cells that are sequenced and processed from the same batch.
You can just feed in the normalization step with the cells that have spike-ins. With this said, I’m not sure what leads to 60% of the cells with **detected** spike-in reads. It seems to me this can result from poor library prep (e.g., balancing the concentrations of spike-ins and endogenous RNAs) and if this is the case, I would recommend a stringent QC on the cells, not just the ones without spike-ins but the entire cohort.
Hope that this helps and feel free to let us know if you have further questions!
Yuchao
From: Roy Moh Lik Ang [mailto:ra...@stanford.edu]
Sent: Friday, April 28, 2017 4:20 PM
To: Jiang, Yuchao <yuc...@wharton.upenn.edu>
Subject: Enquiry about SCALE
Dear Yuchao,
I am Roy, a second-year genetics PhD student at Stanford University. I recently came across your paper published in Genome Biology, titled “SCALE: modeling allele-specific gene expression by single-cell RNA sequencing”. I am interested in applying your package to study transcriptional bursting characteristics in another scRNA-seq dataset with allele-specific counts, but what I have found is that although spike-ins were added to every cell, only ~60% of the sequenced cells have detected the spike-in molecules. I was wondering if SCALE can still be applied to the entire scRNA-seq data set, or do cells without spike-ins reads detected have to be filtered out first before analysis? Is there some way to estimate technical noise for all single cells using only spike-ins from a subset of them?
Your help is greatly appreciated. Congratulations on your publication!
Regards,
Roy
Just to clarify -- for our case, only 12 and 30 cells have added spike-ins and we use them all. The other cells don’t have spike-ins added.
Yuchao
On May 2, 2017, at 8:21 PM, Roy Moh Lik Ang <ra...@stanford.edu> wrote:
Hi Yuchao,Sorry to trouble you again. I have been looking through the SCALE vignette (thank you for making it so straightforward and easy to read/use!), and I am trying to figure out how you went about doing the cell size input. In the paper, you recommended using either the expression of GAPDH, or the ratio of endogenous reads over the total number of spike-in reads when spike-ins are available.I have a couple questions:
- Since not all your cells have spike-ins, I am assuming you used the ratio for those with spike-ins, and for those without spike-ins, you took the total counts of GAPDH for each cell from the Deng et al. (2014) total counts data? Is it okay to use two different methods to estimate cell size for the same cell population?
- Did you have to do any normalization for sequencing depth/library size before keying in the cell size input, specifically referring to using GAPDH expression as the cell size estimator?
I do apologize if these questions have already been addressed previously. Your help is much appreciated!Regards,Roy
Hi Yuchao,Ah excellent. Thank you for addressing my query!Have a good weekend,Roy
From: Jiang, Yuchao
Sent: Friday, April 28, 2017 1:53 PM
To: Roy Moh Lik Ang
Cc: SCALE_s...@googlegroups.com; min...@mail.med.upenn.edu; Zhang, Nancy R
Subject: RE: Enquiry about SCALE
On May 10, 2017, at 4:36 PM, Roy Moh Lik Ang <ra...@stanford.edu> wrote:
Hi Yuchao,Thanks for pointing that out. Somehow I have not been establishing subsets of my spike-in data correctly. I have corrected my code. What I notice is that taking different subsets of the spike-in data to use as input for tech_bias still causes poor fitting of the kappa and tau curve.Keep me posted on further developments of this issue. :)Roy
Subject: Re: Estimation of abkt termsHi Roy,
Sorry for my delayed reply. I’m starting to look into this issue. I know what is going on with the kappa and tau term and will reply to you after I make sure everything runs proper now. For your error in the email below, looking from the error message, it seems that the column names of
fib.spikein_input.subset
cannot be found in the column names of
fib.alleleA
These two needs to be matching with the former being a subset (since we don’t require all the cells to have spike-ins).
Let me know if this solves this error.
Thanks,Yuchao
On May 8, 2017, at 3:02 PM, Roy Moh Lik Ang <ra...@stanford.edu> wrote:
Hi Yuchao,I have been throwing subsets of my spike-in data into the tech_bias function to see if somehow maybe a subset of the cells with spike-ins are bad/outliers. Essentially, I selected for cells with at least a certain number of total spike-in reads. However, I keep running into this error:
> fib.spikein_input.subset = fib.spikein_input[, (colSums(fib.spikein_input[, 3:length(fib.spikein_input[1,])]) > 70000)]> dim(fib.spikein_input.subset)[1] 72 46> abkt.new = tech_bias(spikein_input = fib.spikein_input.subset, alleleA = fib.alleleA,+ alleleB = fib.alleleB, pdf = TRUE)Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :NA/NaN/Inf in 'y'Here’s another example:
> fib.spikein_input.subset = fib.spikein_input[, (colSums(fib.spikein_input[, 3:length(fib.spikein_input[1,])]) < 100000)]> dim(fib.spikein_input.subset)[1] 72 75> abkt.new = tech_bias(spikein_input = fib.spikein_input.subset, alleleA = fib.alleleA,+ alleleB = fib.alleleB, pdf = TRUE)Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :NA/NaN/Inf in 'y'Some other times, it says ‘x’ instead of ‘y’ as the matrix with NA/NaN/Inf:
> fib.spikein_input.subset = fib.spikein_input[, (colSums(fib.spikein_input[, 3:length(fib.spikein_input[1,])]) < 60000)]> dim(fib.spikein_input.subset)[1] 72 43> abkt.new = tech_bias(spikein_input = fib.spikein_input.subset, alleleA = fib.alleleA,+ alleleB = fib.alleleB, pdf = TRUE)Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :NA/NaN/Inf in 'x'I can’t say I am too familiar with R, but I took a look at the source code and was wondering if this could be the cause of the problem?
Y = Y[Q != 0]Q = Q[Q != 0]lmfit = lm(log(Q) ~ log(Y))Thanks for the help!Roy
Subject: RE: Estimation of abkt termsHi Yuchao,Yeah I looked into the distribution of burst frequencies and sizes but nothing out of the ordinary stands out. I would love to know more about what you find.I have uploaded the rar file on Google Drive. You may access it here.Thank you for looking into this!!Roy
Subject: Re: Estimation of abkt termsHi Roy,
Hmmm I looked at the supplement of the paper and saw 2x125bp. However it is for the human T-cell scRNAseq. The read lengths for mouse fibroblast are indeed ~43 on average like you said. I’m not sure what is going on but looking at the correlation plot on this dataset that you sent (which is based on alpha, beta, kappa, and tau), the results make a lot of sense and actually look very good. I will look into this.
Can you send the rar file via Dropbox or Google Drive? Our email server blocked it. My account for both is yj...@cornell.edu .
Thanks very much!Yuchao
On May 5, 2017, at 9:14 PM, Roy Moh Lik Ang <ra...@stanford.edu> wrote:
WARNING: This e-mail has been altered by MIMEDefang. Following this paragraph are indications of the actual changes made. For more information about your site's MIMEDefang policy, contact Wharton Email Services <postm...@wharton.upenn.edu>. For more information about MIMEDefang, see: http://www.roaringpenguin.com/mimedefang/enduser.php3 An attachment named allele_counts.rar was removed from this document as it constituted a security hazard. If you require this document, please contact the sender and arrange an alternate means of receiving it.
Hi Yuchao,
Thank you for that very elaborate explanation. In that case, do you recommend I use the kappa and tau terms obtained with no Poisson sampling (i.e. kappa=29, tau=8.5) or the other set of values for downstream analysis of the kinetic parameters, or would it make little difference to the outcome?
Also, thank you for sharing your scripts with me. I will see if I can reproduce the graph you provided.
Hi Yuchao,
Understood. Thank you so much for your guidance on this matter! I am glad this matter has been resolved :)