unmapped reads in Salmon

399 views
Skip to first unread message

Vasisht Tadigotla

unread,
May 31, 2016, 3:45:56 PM5/31/16
to Sailfish Users Group
Hi, 

Is there a way to get a list of unmapped reads from the quasi-mapping mode in Salmon. I'm trying to identify the class of transcripts not present in my index. 

Thanks,
Vasisht

Rob

unread,
May 31, 2016, 10:00:40 PM5/31/16
to Sailfish Users Group
Hi Vasisht,

  There is no such feature in v0.6.0, but I've added the ability to dump the names of unmapped reads in the current working branch.  Currently, you can build this version from the `nb` branch of the Salmon repository, but I'm attaching a zip file with the relevant source (and a pre-compiled linux binary can be grabbed from Google Drive, here) to make things easier.  If you pass the flag `--writeUnmappedNames` to salmon's quant command, then it will create a file, called `unmapped_names.txt` in the aux subdirectory of the quantification directory that contains the names of the reads that were not mapped during quantification.  I chose to write out the read names rather than the reads themselves to save space.  Also, note that, if you're doing paired-end quantification, it only writes out the name of the first read (in which case, you should consider the pair as unmapped).  Let me know if this helps.

Best,
Rob
salmon-nb.zip

Vasisht Tadigotla

unread,
Jun 1, 2016, 9:34:51 AM6/1/16
to Rob, Sailfish Users Group
Hi Rob,

This is perfect, thanks for the quick fix. How does this handle orphaned reads? I'm not currently using them for quantification but can see cases where it might be useful. 

Thanks,
Vasisht

--
Sailfish is available at https://github.com/kingsfordgroup/sailfish
Citation:
Patro, Rob, Stephen M. Mount, and Carl Kingsford. "Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms." Nature biotechnology 32.5 (2014): 462-464.
---
You received this message because you are subscribed to the Google Groups "Sailfish Users Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sailfish-user...@googlegroups.com.
To post to this group, send email to sailfis...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/sailfish-users/be6fc6a9-776b-46a2-b39a-6f8694454fe3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Pour trouver les limites du possible il faut tenter l'impossible.

Rob

unread,
Jun 8, 2016, 9:35:38 PM6/8/16
to Sailfish Users Group
Hi Vasisht,

  I thought about this a bit.  What I've implemented (currently only on the develop branch, but this strategy will be the default in the next release unless there are objections) is that I will output the read name, followed by a concise description of exactly what / how the read was unmapped.  For single-end reads, it's simple; the read is either mapped (and so doesn't appear in the file) or is unmapped so that the read name in the unmapped file is followed by 'u'.  For paired-end reads, 'u' means that neither end maps.  The other possibilities are 'm1' (only read 1 mapped — read 1 is an orphan), 'm2' (only read 2 mapped — read 2 is an orphan), 'm12' (both reads 1 and 2 mapped, but never to the same transcript).  I think this covers all of the relevant cases in the paired-end case.  Any thoughts?

--Rob


On Wednesday, June 1, 2016 at 9:34:51 AM UTC-4, Vasisht Tadigotla wrote:
Hi Rob,

This is perfect, thanks for the quick fix. How does this handle orphaned reads? I'm not currently using them for quantification but can see cases where it might be useful. 

Thanks,
Vasisht
Hi Vasisht,

  There is no such feature in v0.6.0, but I've added the ability to dump the names of unmapped reads in the current working branch.  Currently, you can build this version from the `nb` branch of the Salmon repository, but I'm attaching a zip file with the relevant source (and a pre-compiled linux binary can be grabbed from Google Drive, here) to make things easier.  If you pass the flag `--writeUnmappedNames` to salmon's quant command, then it will create a file, called `unmapped_names.txt` in the aux subdirectory of the quantification directory that contains the names of the reads that were not mapped during quantification.  I chose to write out the read names rather than the reads themselves to save space.  Also, note that, if you're doing paired-end quantification, it only writes out the name of the first read (in which case, you should consider the pair as unmapped).  Let me know if this helps.

Best,
Rob

On Tuesday, May 31, 2016 at 3:45:56 PM UTC-4, Vasisht Tadigotla wrote:
Hi, 

Is there a way to get a list of unmapped reads from the quasi-mapping mode in Salmon. I'm trying to identify the class of transcripts not present in my index. 

Thanks,
Vasisht

--
Sailfish is available at https://github.com/kingsfordgroup/sailfish
Citation:
Patro, Rob, Stephen M. Mount, and Carl Kingsford. "Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms." Nature biotechnology 32.5 (2014): 462-464.
---
You received this message because you are subscribed to the Google Groups "Sailfish Users Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sailfish-users+unsubscribe@googlegroups.com.
To post to this group, send email to sailfish-users@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages