A question about lib_format_counts.json

138 views
Skip to first unread message

Nancy Dong

unread,
Oct 16, 2017, 3:51:05 PM10/16/17
to Sailfish Users Group
Hello,

First of all, great job on Salmon! :)

Just for my knowledge, I wonder why lib_format_counts.json shows (for the lack of a better way of saying it) that there are multiple library types?

For example, I used the "-l A" option to estimate the RNA-seq library type, because I inherited a set of data with minimal description. I get this lib_format_counts.json as output:

{
    "read_files": "( AC_corrected_BYD_ACOSRB_7_1_HJLK5BBXX.12BA013_noribo_clean.cor.fq, AC_corrected_BYD_ACOSRB_7_2_HJLK5BBXX.12BA013_noribo_clean.cor.fq ), ( AB_corrected_BYD_ABOSRB_7_1_HJLK5BBXX.12BA012_noribo_clean.cor.fq, AB_corrected_BYD_ABOSRB_7_2_HJLK5BBXX.12BA012_noribo_clean.cor.fq )",
    "expected_format": "ISR",
    "compatible_fragment_ratio": 0.8961446182393716,
    "num_compatible_fragments": 34933186,
    "num_assigned_fragments": 38981639,
    "num_consistent_mappings": 63280996,
    "num_inconsistent_mappings": 21390210,
    "MSF": 0,
    "OSF": 13469,
    "ISF": 19989114,
    "MSR": 0,
    "OSR": 37493,
    "ISR": 63280996,
    "SF": 461456,
    "SR": 885290,
    "MU": 0,
    "OU": 0,
    "IU": 0,
    "U": 0
}

I highlighted the part I want to ask about. Salmon estimates that my data is "ISR". What does it mean that other library types, such as "OSF" and "ISF" also have "mappings"?

Thank you very much for the clarification!

Rob

unread,
Oct 17, 2017, 6:22:52 PM10/17/17
to Sailfish Users Group
Hi Nancy,

  Thanks for the kind words!  The lib format count lists the total number of possible compatible mappings for all of the fragments.  What this mean is that salmon first maps the read without considering the library type constraints --- and the fragment may map to different isoforms in different ways.  Then, it catalogues all
of these potential mappings and records them in the lib_format_coutns.json file.  So, even though incompatible fragment types will be discarded (or assigned a lower probability, depending on the value of --incompatPrior), those possible mappings will still show up in lib_format_counts.json.  This is useful e.g. as debugging / 
sanity-checking output.  For example, in your sample, I can see that salmon detected ISR as the most likely library type, and that 89% of all mapped fragments had at least one mapping compatible with this library type.  The library formats compatible with ISR are ISR, and SR (the orphaned-read variant).  This means 11% of fragments mapped only in some other way (e.g. ISF, SF, OSR, OSF or MSR).  The entries you highlight give the complete breakdown.  Again, since these numbers record all possible mappings, the sum will generally be much higher than the total number of mapped reads, since the read might map to some isoform in a way compatible with the library type and to another isoform in an incompatible way.  This file, however, records everything.

Best,
Rob

Nancy Dong

unread,
Oct 19, 2017, 2:09:48 PM10/19/17
to Sailfish Users Group
Ah, thank you very much for the detailed answer! :)

Nancy
Reply all
Reply to author
Forward
0 new messages