Warning about strand bias in unstranded protocol

304 views
Skip to first unread message

Lucila Traverso

unread,
Oct 29, 2020, 1:01:54 PM10/29/20
to Sailfish Users Group
Hi everyone,
I would appreciate your help with some doubts that I have with my analysis. I am working with Salmon 1.2.1 inside Trinity to quantitate the expression of my assembled transcripts. I have several transcriptome assemblies of my species of interest and mapped back my unstranded reads to them. With one of the assemblies, I receive the warning: "Detected a *potential* strand bias > 1% in an unstranded protocol..." for all the mapped libraries against it. However, the mapping rate against that transcriptome is better than the obtained against others. The numbers in lib_format_counts.json file are similar for all the mapped libraries in this case, I paste one of them here as an example:

{
    "read_files": "[ /D3.paired.1.fastq.gz, /D3.paired.2.fastq.gz]",
    "expected_format": "IU",
    "compatible_fragment_ratio": 1.0,
    "num_compatible_fragments": 16916771,
    "num_assigned_fragments": 16916771,
    "num_frags_with_concordant_consistent_mappings": 18994531,
    "num_frags_with_inconsistent_or_orphan_mappings": 1987115,
    "strand_mapping_bias": 0.5131041666677635,
    "MSF": 0,
    "OSF": 0,
    "ISF": 9746173,
    "MSR": 0,
    "OSR": 0,
    "ISR": 9248358,
    "SF": 1034884,
    "SR": 952231,
    "MU": 0,
    "OU": 0,
    "IU": 0,
    "U": 0
}

Should I be worried by these numbers? I understand that the mapping bias is not very important (since its value is 0.51 in the worst case). Any idea of why these particular assembly could be resulting in this warning, since mapping the same libraries to other assemblies does not result in this bias?

Finally, I have a question regarding the mapping statistics. The mapping rate of this sample is 83.66% (shown in salmon_quant.log file). Does this number represent the unambiguous mapping, and means that 83.66% of the reads were assigned to a particular transcript for its quantification?

Thank you so much in advance.
Best,
Lucila.

Rob

unread,
Oct 29, 2020, 4:05:03 PM10/29/20
to Sailfish Users Group
Hi Lucila,

  First, I would not be worried by this "potential" mapping bias.  Since fragments are not allocated during mapping, but afterward during abundance estimation, these numbers only signify that slightly more fragments are compatible with alignment in the forward orientation than in the reverse complement orientation.  Salmon is generally very conservative in how it reports warnings, so a potential mapping bias of ~1% is not something to be concerned about.  I don't have a good sense of why this might occur with this assembly but not the others — but it's small enough that it could just be minor variation in mapping.

  Regarding your second question, 83.66% is the overall mapping rate.  This doesn't necessarily represent unambiguous mappings, but it does represent the total fraction of reads that will be assigned to transcripts at the end of quantification.  That is, out of all sequencing reads, 83.66% could be successfully aligned to >=1 transcript, and will be used for abundance quantification.

Best,
Rob

Lucila Traverso

unread,
Oct 30, 2020, 8:57:45 AM10/30/20
to Sailfish Users Group
Thank you so much Rob for your complete answer! Everything is much more clear now.
Best,
Lucila.
Reply all
Reply to author
Forward
0 new messages