Is there a minimum read count or number of genes for Salmon bias correction?

30 views

Skip to first unread message

Thomas Brooks

unread,

Feb 7, 2022, 11:05:49 AM2/7/22

to Sailfish Users Group

Hi,

I am trying to apply Salmon to a very small (artificial in-silico) genome for testing of an in-development pipeline. I am wondering if there is a limit to bias correction options (particularly --gcbias, --posbias but also --seqbias) in how few reads they need in order to be expected to operate decently? For example, is 100,000 reads too few for them to work? One million reads? I notice that seqbias is documented to use the first million reads: is that minimum and does that hold for the other bias options?

Similarly, are there restrictions on the number of distinct genes and/or transcripts needed for these corrections to be meaningful? Would you expect them to operate adequately if only a few dozen transcripts were expressed?

On a related note, is there any output from Salmon about the size of the observed biases and amount of 'correction' applied? I'm interested in values that could be compared between different samples that would indicate how much of a bias is present or how much Salmon was able to do to compensate.

These are stranded, paired-end bulk RNA-seq data if that is relevant. Thanks for the time and for the great software.

Best,

Tom Brooks

ITMAT Bioinformatics Lab
Institute for Translational Medicine and Therapeutics
University of Pennsylvania

Rob

unread,

Feb 17, 2022, 12:54:12 PM2/17/22

to Sailfish Users Group

Hi Tom,

I believe you asked this over on Biostars as well. I've taken the liberty to add this question, Mike Love's initial response, and my additional thoughts on this over on the GitHub Discussions. You can find that here (https://github.com/COMBINE-lab/salmon/discussions/752). I'm happy to discuss more if you have follow up questions.

Also, I'm sorry to have taken so long to get back to you. As you might see, the Google Group isn't so active anymore. Most salmon-related discussions happen over on GitHub, which is more actively monitored (and provides some more nice capabilities of writing in markdown, linking to other issues / questions, and for bugs e.g. even being referenced in release notes.