Understanding lib_format_counts.json

456 views
Skip to first unread message

Juanlu Trincado

unread,
Oct 4, 2017, 9:11:04 AM10/4/17
to Sailfish Users Group
Dear Rob,

I'm quantifying using last version of Salmon (8.2). My reads are 75nt (I indexed using 21 length k-mer). It's suppose that these reads are ISR, but I'm just obtaining around 40% of the reads mapped. Looking lib_format_counts.json, it seems that there are a lot more reads mapped using SR library mode but this library is for single end reads, isn't it?How could be this possible?I show you one of the executions:

-bash-4.2$ cat lib_format_counts.json 
{
    "read_files": "( /homes/users/jtrincado/scratch/Snail1/data/fasta/KD1_R1.fastq, /homes/users/jtrincado/scratch/Snail1/data/fasta/KD1_R2.fastq )",
    "expected_format": "ISR",
    "compatible_fragment_ratio": 0.2282644263761963,
    "num_compatible_fragments": 8938002,
    "num_assigned_fragments": 39156351,
    "num_consistent_mappings": 38003759,
    "num_inconsistent_mappings": 157191619,
    "MSF": 0,
    "OSF": 1802,
    "ISF": 141734,
    "MSR": 0,
    "OSR": 99795,
    "ISR": 38003759,
    "SF": 3826333,
    "SR": 153119539,
    "MU": 0,
    "OU": 0,
    "IU": 0,
    "U": 0
}


Thanks a lot for all this huge work.
Best,

Juan L.

Rob

unread,
Oct 6, 2017, 10:20:24 AM10/6/17
to Sailfish Users Group
Hi Juan,

  This certainly seems like a low mapping rate.  Often, something like this might be indicative of a highly-incomplete transcriptome, a poor quality dataset, or both. The format of the relevant parts of the `lib_format_counts.json` file is as follows:

 "compatible_fragment_ratio": 0.2282644263761963, ---- This is the total fraction of fragments that were compatible with the specified library type (here, mapping in a proper pair, facing inward, with the first read coming from the reverse-complement strand)

 "num_compatible_fragments": 8938002, ---- This is the number of actual fragments compatible with the library type.
 "num_assigned_fragments": 39156351, ---- This is the total number of fragments that could be assigned to at least one transcript (including orphan mappings)
 "num_consistent_mappings": 38003759, ---- This is the total number of consistent mappings, that is, a compatible fragment might map in a library-compatible way to more than one transcript. This number includes them all and accounts for multi-mapping
 "num_inconsistent_mappings": 157191619, ---- The converse of the above

The rest of the file is a little bit confusing.  It specifies the number of mappings to the transcriptome of each type.  First, note that all of the `U` variants will always be 0 because, while a library may be unstranded, each individual fragment
maps with some particular orientation.  So, what we see here is that the majority of your fragments seem to map with library type "SR".  This is indicative of a case where you see many orphaned mappings.  "SR" is, in a sense, the orphaned
version of "ISR" (or "OSR"), and this suggests that of the fragments that do map to your transcriptome, many of them are not mapped in a proper pair; rather, only one end of the fragment was mapped successfully to you transcriptome.
You do have some other types of mappings here, but if we were to count all of the "SR" fragment types as compatible with "ISR", then 191123298 of your total 195192962 mappings would be compatible with this library type (almost 98%).  

    "MSF": 0,
    "OSF": 1802,
    "ISF": 141734,
    "MSR": 0,
    "OSR": 99795,
    "ISR": 38003759,
    "SF": 3826333,
    "SR": 153119539,
    "MU": 0,
    "OU": 0,
    "IU": 0,
    "U": 0

So, my overall thought here is that the mapping rate, and the huge number of orphans, is a bigger problem than the orientation.  Does this organism have a genome assembly you can map a few samples against to see what the alignment rate is like?

Best,
Rob

Juanlu Trincado

unread,
Nov 17, 2017, 5:34:06 PM11/17/17
to Sailfish Users Group
Hi Rob,

sorry for the delay on answering. I really appreciate the detailed mail. I found it really useful for fully understanding Salmon output.
I'm kind of embarrassed, cause I finally found out it was my fault... I mistaken which organism the samples comes from. I run it with the proper annotation and it worked perfectly.

Thanks again for all you work.
Best,
Juan
Reply all
Reply to author
Forward
0 new messages