Conceptual majiq doubt

13 views
Skip to first unread message

Miriam Martínez

unread,
9:57 AM (8 hours ago) 9:57 AM
to Biociphers
Hi biociphers team,

I have a conceptual question about how majiq works. I think the best way to expose my doubt is with an example. Let's say that I perform a HET analysis between two groups, a case group and a control group. The bams I start with come from RNAseq so they have transcriptome-wide information. From this analysis, for gene A I obtain a significant result of event X. 

Now, imagine that from those initial bams I just extract the information for 5 genes (including gene A from the example) and perform the same HET analysis (the only difference is the number of genes for which I have read information on the bam files). In this new analysis I would also obtain the same event X for gene A or performing the analysis with this subsample would actually affect this result? 

Sorry if the question is confusing or if due to the nature of the model majiq uses, the answer is evident, but I would really appreciate if you could clarify me this.

Please don't hesitate to ask me if the example or question is unclear. Thank you very much in advance.

Best,

Miriam

bsl...@seas.upenn.edu

unread,
1:03 PM (4 hours ago) 1:03 PM
to Biociphers
Dear Miriam,

Usually you would obtain the same event X, but there are a couple of caveats which prevent this from being always true. 
By design, MAJIQ-HET does not do any multiple-testing correction. Yoseph described the reasons why on this previous post. Thus, the HET calculation is unchanged by the decision to include or exclude other genes. So, usually you would get the same results regardless of whether you add more genes.
However, additional genes which overlap gene A could have an impact. In MAJIQ V3 (are you using V3?), reads overlapping exons and annotated introns of one gene are not counted towards introns of another overlapping gene. If the annotation contains only gene A, then reads overlapping gene A will be assigned to A. If the annotation contains gene B overlapping A, then the same reads would be assigned differently wherever the above rule applies. If the bam files contain reads from transcripts of A and B, but the annotation only contains A, then reads from B would be assigned to A. This would change if the annotation also contains B.
The other caveat is that by design, MAJIQ includes random sampling in its workflows. Thus, two executions of MAJIQ with identical inputs can yield slightly different results. Past experiments show that such variation changes few significance decisions; those which do change usually are right on the significance threshold.

Please let me know if you have additional questions!
Barry
Reply all
Reply to author
Forward
0 new messages