Variable length reads and effective length

109 views

Skip to first unread message

Shawn Driscoll

unread,

Jan 27, 2017, 2:04:11 PM1/27/17

to Sailfish Users Group

Hello,

I've been a long time user to Sailfish, well as long as it has existed at least, and just recently started running Salmon. I also use Kallisto. I noticed the other day that all three of these programs tend to calculate different 'effective' lengths of transcripts. Is it generally not advised to run variable length reads through Salmon and similar programs that rely on effective length normalization? The reason I have variable length reads is because I trim the TruSeq adapters from reads before aligning or quantifying. We often run NextSeq500 with single-end 150 and the core fragments to a median of about 150. So roughly 40% of the reads end up being trimmed.

Thanks-
Shawn

Rob

unread,

Jan 28, 2017, 1:44:42 PM1/28/17

to Sailfish Users Group

Hi Shawn,

Indeed, I remember some of your posts from the early days of Sailfish :). Welcome to the Salmon train! The methods of effective length corrections vary slightly between the different tools, this is true. This is mainly because there are different reasonable definitions of effective lengths. However, I would suggest that, even in the presence of variable-length reads, it is important to use effective length correction. The main thing that this corrects for (in the absence of bias correction, where it is also used) is the empirical fragment length distribution. Specifically, the fragment length distribution has an effect on the probability of observing fragments of different sizes assuming they derive from different transcripts (e.g. we cannot generate a 400bp long fragment 100bp from the end of the transcript, etc.). The precise effective lengths will be affected by (1) the estimated empirical fragment length distribution (2) the precise method of calculating effective lengths (though the different approaches taken by different tools should only have an effect on rather short transcripts here) and (3) whether or not you are using bias correction (if you are, the change in the probability of generating fragments from different transcripts is incorporated into their effective lengths).

So, long story short, it is almost certainly OK to use effective length correction with variable length reads. It would, of course, be reasonable (and easy given the speed of these tools) to see if and how the results vary if you don't do adapter trimming (since the strategies used in the mapping-based modes of these tools are very robust to the presence of adapters or low-quality bases). Let me know if you have any other questions.

--Rob

Reply all

Reply to author

Forward

0 new messages