324748
3125090 3126847 2 1352 1872
3131135 3131188 2 1352 1872
exonInfo.tab has three columns
324748
0 1757 0
6045 6098 1758
7977 8102 1812
0 1992 0
3064 3201 1993
what are these three columns?
definition of columns in sjdbInfo.txt can be found your Manual.
transcriptInfo.tab has four columns:
ENSG00000157870.14 2207 44 2774
ENSG00000142606.15 63 576 98
ENSG00000142611.16 161 2 159
ENSG00000177133.10 26 0 26
ENSG00000227372.10 2402 77 2491
ENSG00000272153.1 83 806 9
ENSG00000157870 is EnsEMBL Gene ID, what is .14?
Hi Lior,
I think there are two major questions that were indeed somewhat confounded in my previous post.
1. Counting vs Maximum Likelihood (ML) methods.
Read counting (e.g. htseq-count, featureCounts or STAR --quantMode GeneCounts) simply counts the number of uniquely mapped reads that overlap exons of each gene. The Maximum Likelihood methods (e.g. Cufflinks, RSEM, eXpress, SailFish, kallisto, Salmon), calculate the relative abundance of the isoforms (rather than genes) by maximizing the likelihood of observed alignments (Cufflinks, RSEM, eXpress) or k-mers (SailFish) or pseudo-alignments (kallisto, Salmon). In Lior’s words, ML models “disambiguate reads between isoforms of genes”.
As Lior pointed out, “when these isoforms have different lengths, the naïve counting methods can be very inaccurate”. While I agree with this statement, my question is:
How often does this lead to an actual error in differential expression calls in real data?
I would like to point to the paper by Soneson, Love, and Robinson which showed that this effect is practically undetectable in real data.
2. Pseudo-alignment vs. full alignment quantification.
The ML methods can use full alignments as input (Cufflinks, RSEM, eXpress) or pseudo-alignments (Kallisto, Salmon). The main point of my previous post was that pseudo-alignments do not provide more accurate quantifications than full alignments. For instance, the comparison of Kallisto and RSEM performance in Kallisto paper (Fig 2a) shows higher accuracy for RSEM: mean relative difference in estimated transcript read counts 0.03 for RSEM vs 0.05 for Kallisto.
Would you agree that pseudo-alignment quantification is less accurate than full-alignment ML quantification, and the advantage of the pseudo-alignments is only in the speed?
Cheers
Alex
--
You received this message because you are subscribed to the Google Groups "rna-star" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rna-star+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/rna-star/e7a3505e-03c5-4811-be2b-45e07e3b4008n%40googlegroups.com.