Hi,
I am trying to apply Salmon to a very small (artificial in-silico) genome for testing of an in-development pipeline. I am wondering if there is a limit to bias correction options (particularly --gcbias, --posbias but also --seqbias) in how few reads they need in order to be expected to operate decently? For example, is 100,000 reads too few for them to work? One million reads? I notice that seqbias is documented to use the first million reads: is that minimum and does that hold for the other bias options?
Similarly, are there restrictions on the number of distinct genes and/or transcripts needed for these corrections to be meaningful? Would you expect them to operate adequately if only a few dozen transcripts were expressed?
On a related note, is there any output from Salmon about the size of the observed biases and amount of 'correction' applied? I'm interested in values that could be compared between different samples that would indicate how much of a bias is present or how much Salmon was able to do to compensate.
These are stranded, paired-end bulk RNA-seq data if that is relevant. Thanks for the time and for the great software.
Best,
Tom Brooks
ITMAT Bioinformatics Lab
Institute for Translational Medicine and Therapeutics
University of Pennsylvania