salmon VS sailfish

674 views
Skip to first unread message

Runxuan zhang

unread,
Dec 17, 2014, 8:26:21 AM12/17/14
to sailfis...@googlegroups.com
Dear Rob,

I have used the sum of the read number from all transcripts to calculate the mapped ratio of reads using both sailfish and salmon.

Reads mapped ratio 

  sailfish kmer size 30:  0.9596

  salmon  0.9521



Does it make sense that sailfish actually have more reads mapped to reference than salmon?


Thanks a lot,


Runxuan 

Rob

unread,
Dec 17, 2014, 10:51:14 AM12/17/14
to sailfis...@googlegroups.com
Hi Runxuan,

  There are a number of variables to consider when making such a comparison.  The first is that Sailfish doesn't really deal in terms of reads, but rather k-mers.  Thus,  the number of reads mapping in the sailfish case is estimated as the estimated number of mapping k-mers divided by the average number of k-mers in a read.  Also, particularly if this is a paired-end library, the two programs treat the input somewhat differently.  For example, apart from the direction constraints specified in the library flag, sailfish essentially treats the paired end reads as two single-end reads.  Salmon, as I mentioned in my response to your other post, treats the entire pair as a single fragment.  The other difference this gives rise to, currently, is that salmon requires both ends of a fragment to "map" in order to consider that fragment --- therefore, like the STAR aligner, it's essentially disallowing orphaned paired-end reads, since these are usually not very helpful and there are typically a small number of them (however, I'm considering allowing them with a command line switch).

  So, long story short, I don't think these numbers are necessarily directly comparable (ideally, you'd want to compare some measure of accuracy on the library which you've quantified).  That said, there is no reason, theoretically, that one method always has to have a higher mapping ratio than the other under all parameter settings, and each method has parameters that will allow greater sensitivity at the potential cost of reduced specificity.  For example, in sailfish, there is obviously the -k parameter.  In Salmon, there is the -c (required coverage) parameter, the -k (minimum length) parameter, and even the --extraSensitive parameter which "searches extra hard" (non-technical description)  for MEMs to cover the read.  Either way, even if these numbers were comparable, they are so similar (and high) that I imagine any discernible difference you'd see in quantification results would be due more to differences in the quantification algorithm than in the mapping ratio.

Best,
Rob
Reply all
Reply to author
Forward
0 new messages