In response to Rob's comments, I have a follow up question:
I have used Salmon to quantify, tximport for lengthScaledTPM, and limma for differential expression. The RNA seq data is from the same cell line, with RNA isolated from confluent and thinly plated cells (contrast of interest) in triplicates.
One of my differentially expressed hits is HIST2H4B, histone cluster 2 H4B. HIST2H4A is an identical sequence, not meeting statistical significance with multiple hypothesis testing, but with the opposite trend. Here, all mapping would be ipso facto ambiguous, right?
Based on what you said above, I am guessing that my result is a reflection that the overall data structure between the two groups is different and that the algorithm has made more assignments of HIST2H4B based on some "learned parameters". Although the choice of H4B over H4A has to be random, right?
By extension, would there be a similar scenario for highly similar but non-identical sequences, where the "learned parameters" would systematically trump identity of reads?
Thanks
Nico