Different kmer size caused variable total_reads in reads.count_info

62 views
Skip to first unread message

Runxuan zhang

unread,
Dec 4, 2014, 10:20:00 AM12/4/14
to sailfis...@googlegroups.com
Dear Rob,

I was trying Sailfish (0.6.3) with various kmers to decide the best parameter to choose (or have a feel how it affects the performance). I found that in the reads.count_info file, when i use different kmer size, total_reads values vary a lot. The longer the kmer, the smaller total_reads number is.

My understanding is that the total_reads should be the same when the same fasta file is used as inputs (It should not depend on the kmer size, but the mapped reads should) or I am missing anything here?

Thank you very much,

Runxuan 

Rob

unread,
Dec 8, 2014, 5:46:11 PM12/8/14
to sailfis...@googlegroups.com
Hi Runxuan,

  The reads.count_info file, though it is in a human-readable format, is not actually intended for end-user consumption.  Rather, it's a file that is used to communication information between different stages of the Sailfish pipeline.  Interestingly, after searching though the Sailfish source code, I found no trace of the keyword "total_reads".  Actually, I believe that total_reads was probably a misnomer to begin with, which is probably why I replaced it (at some point) with another name.  Of course, you're correct that the total number of reads should not rely at all on the k-mer size you're using.  I'm actually looking at improving the way in which the estimated number of mapped reads is calculated, as the current method is too stringent, and it can considerably underestimate the actual number of mappable reads.

  However, as I've been saying a lot around here (and in response to e-mails), you should give our new tool, Salmon, a try.  While it has a number of benefits over Sailfish, one of them is that it is capable of considering arbitrary length exact matches when trying to estimate potential read origins.  This means that, while it does have some tweak-able parameters that are described via the --help flag, it actually less sensitive than Sailfish to such key parameters.  Further, it builds a different type of index that doesn't require you to select such a k up-front, so that you can play around with the effect of the related parameters (there is no direct analog for k in salmon) much more quickly.

Best,
Rob

Runxuan zhang

unread,
Dec 9, 2014, 7:46:09 AM12/9/14
to sailfis...@googlegroups.com
Dear Rob,

Thank you very much for your quick response. I will try the salmon then.

Thanks a lot,

Runxuan
Reply all
Reply to author
Forward
0 new messages