EBSeq FDR threshold calculation

919 views
Skip to first unread message

Damien Wilburn

unread,
Nov 25, 2012, 12:25:06 PM11/25/12
to rsem-...@googlegroups.com
Hello. My name is Damien Wilburn, I'm a PhD student at the University of Louisville, and recently have been working on a number of RNAseq related projects (all including de novo transcriptome assembly, which I have used Trinity to do). One in particular involves examining the development process of a pheromone gland in salamanders which is only present during the mating season, and regresses once the season is over. One of the major goals is to identify differentially expressed transcripts beween time points, and I have been using RSEM/EBSeq for this. Two small questions though that I'm slightly confused about.

1) Similar to many other people, I experienced scaffolding problems and low read alignments when I would use RSEM in --paired mode. As recommended by Dr.Dewey in this thread (https://groups.google.com/forum/?fromgroups=#!topic/rsem-users/MWrky7PSTwk), I am currently aligning only the left reads for abundance estimation. However, its still not immediately clear why this is recommended over aligning both left and right reads in a non-paired manner. I imagine it could artificially bias expression levels between fully assembled transcripts where mate pairs align versus those that don't (scaffolded), but it seems like this difference would primarily affect extremely low expression transcripts, and for anything moderately expressed, the numbers would balance between the left and right reads to average to a similar TPM.

2) I'm a little confused as to the crit_fun in EBSeq. I've been doing some reading on FDR and some of the underlying math behind it, but this doesn't seem to immediately coalesce with the code in crit_fun -- which, looking at the code, sorts the p-values low to high, and then finds the max p-value where the average of all preceding p-values equals the FDR (which, when I set FDR to 0.05, in at least one of my designs, the crit_fun calculated PPEE cutoff is ~0.2). This seems very high, and somewhat counterintuitive. Obviously very rigid multiple comparisons corrections like Bon Ferroni require that p-values of individual test statistics all be less than the set alpha (which makes sense to control the overall Type I error), but it feels unusual to reject the null hypothesis when the probability of the two transcripts being equal is 20%. However, I will always be the first to admit that empirical Bayesian statistics tend to confuse me, and so I recognize I may be missing an assumption there and/or confusing terminology with respect the posterior probabilities.


Any and all advice that could be provided would be greatly appreciated, I really like understanding why things are computed the way they are and not just letting a black box spit out a p-value for me :) Thank you very much. Best regards, Damien.

Ning Leng

unread,
Nov 25, 2012, 2:58:08 PM11/25/12
to rsem-...@googlegroups.com
Hi Damien,

We don't need extra multiple test adjustment here because we are using
one (huge) model to test all genes(contigs) simultaneously (compare to
test one gene at a time).
If you want to get more conservative list of genes with target FDR
0.05, you could take the genes with PPDE >= .95. Then every gene in
your list would have FDR <= 0.05.
(More like idea behind bon ferroni adjustment)
crit_fun is giving the largest list we could achieve with average FDR
of the list less than 0.05.
For example, if we sort the genes by PPDE, and the top 6 genes are with PPDE:
.99, .99, .98, .98 ,.96 , .8
Then the average FDR of the list is
(.01 + .01 + .02 + .02 + .04 + .2)/6=.05
Although the 6th gene is with FDR .2, the average FDR of these 6 genes
is 0.05. So the soft cutoff would be PPDE = 0.8, and these 6 genes is
the largest list with average FDR <0.05.
(more like idea behind B-H adjustment)

Hope these are helpful.
Please let me know if you have further questions
Best
Ning
--
Ning Leng
University of Wisconsin Madison
Department of Statistics

Damien Wilburn

unread,
Nov 25, 2012, 3:07:18 PM11/25/12
to rsem-...@googlegroups.com, nl...@wisc.edu
Oooooooh, ok ok. I had already figured out the math behind crit_fun figuring out the max p-value to yield an average of 0.05, but I was still thinking of each PPEE/PPDE as independent tests (when in fact the estimates are part of one larger model that has already converged) -- analogous to not needing Bon ferroni corrections on t-tests in levels of GLM variables when likelihood ratio tests already control for multiple comparisons. Thanks, that helps a lot. Best, Damien.

b...@cs.wisc.edu

unread,
Nov 26, 2012, 12:53:29 AM11/26/12
to rsem-...@googlegroups.com
Hi Damien,

> 1) Similar to many other people, I experienced scaffolding problems and
> low
> read alignments when I would use RSEM in --paired mode. As recommended by
> Dr.Dewey in this thread
> (https://groups.google.com/forum/?fromgroups=#!topic/rsem-users/MWrky7PSTwk),
> I am currently aligning only the left reads for abundance estimation.
> However, its still not immediately clear why this is recommended over
> aligning both left and right reads in a non-paired manner.

RSEM assume that the reads are independent and identically distributed.
However, a paired-end read's two mates are not independent with each
other. That's the reason why Colin suggest only use one mate instead two.

Best,
Bo


Damien Wilburn

unread,
Nov 26, 2012, 3:51:09 AM11/26/12
to rsem-...@googlegroups.com, b...@cs.wisc.edu
Ah, makes perfect sense. Thanks for the quick reply Bo. 

Best, Damien.

archana bhardwaj

unread,
Aug 9, 2013, 3:02:05 AM8/9/13
to rsem-...@googlegroups.com
Hello everyone 

I am new to NGS . I have paired end sequence data for differential gene expression analysis. First i did the prepossing of my data such as adapter removal, ambiguous character and quality filtration . Now i want to assemble my data in form of contigs. I run trininty.pl to get the assembled contigs. There are so many output files . What is the exact contig files??? Inchworm contigs or bundled.fasta ??? How can i get the read count of each contig ??? 

waiting for reply 
Reply all
Reply to author
Forward
0 new messages