Re: [Sailfish Users Group msg. 453] Re: unmapped reads in Salmon

40 views

Skip to first unread message

Vasisht Tadigotla

unread,

Jun 22, 2016, 1:31:49 PM6/22/16

to Rob Patro, Sailfish Users Group

Hi Rob,

Commit 766defb37341ac1d69675dd7d11bd54fd6621521 has removed this functionality from the develop branch. Do you plan to handle this in a different manner going forward?

Thanks,

Vasisht

On Thu, Jun 9, 2016 at 10:00 AM, Rob Patro <rob....@gmail.com> wrote:

Hi Vasisht,

I’d committed the change but hadn’t pushed it yet. I just did that. You should be able to check that out now if yo want.

Best,
Rob

--
Rob Patro
Sent with Airmail

On June 9, 2016 at 9:39:56 AM, Vasisht Tadigotla (vasish...@gmail.com) wrote:

Hi Rob,

I think that covers all the cases and would provide all the information for further processing the unmapped reads. For m1/m2 - Do you mean read1/2 is mapped and read2/1 is an orphan? Is this checked into the develop branch? I couldn't find the relevant commit.

Vasisht

On Wed, Jun 8, 2016 at 9:35 PM, Rob <rob....@gmail.com> wrote:

Hi Vasisht,

I thought about this a bit. What I've implemented (currently only on the develop branch, but this strategy will be the default in the next release unless there are objections) is that I will output the read name, followed by a concise description of exactly what / how the read was unmapped. For single-end reads, it's simple; the read is either mapped (and so doesn't appear in the file) or is unmapped so that the read name in the unmapped file is followed by 'u'. For paired-end reads, 'u' means that neither end maps. The other possibilities are 'm1' (only read 1 mapped — read 1 is an orphan), 'm2' (only read 2 mapped — read 2 is an orphan), 'm12' (both reads 1 and 2 mapped, but never to the same transcript). I think this covers all of the relevant cases in the paired-end case. Any thoughts?

--Rob

On Wednesday, June 1, 2016 at 9:34:51 AM UTC-4, Vasisht Tadigotla wrote:

Hi Rob,

This is perfect, thanks for the quick fix. How does this handle orphaned reads? I'm not currently using them for quantification but can see cases where it might be useful.

Thanks,

Vasisht

Hi Vasisht,

There is no such feature in v0.6.0, but I've added the ability to dump the names of unmapped reads in the current working branch. Currently, you can build this version from the `nb` branch of the Salmon repository, but I'm attaching a zip file with the relevant source (and a pre-compiled linux binary can be grabbed from Google Drive, here) to make things easier. If you pass the flag `--writeUnmappedNames` to salmon's quant command, then it will create a file, called `unmapped_names.txt` in the aux subdirectory of the quantification directory that contains the names of the reads that were not mapped during quantification. I chose to write out the read names rather than the reads themselves to save space. Also, note that, if you're doing paired-end quantification, it only writes out the name of the first read (in which case, you should consider the pair as unmapped). Let me know if this helps.

Best,

Rob

On Tuesday, May 31, 2016 at 3:45:56 PM UTC-4, Vasisht Tadigotla wrote:

Hi,

Is there a way to get a list of unmapped reads from the quasi-mapping mode in Salmon. I'm trying to identify the class of transcripts not present in my index.

Thanks,

Vasisht

--
Sailfish is available at https://github.com/kingsfordgroup/sailfish
Citation:
Patro, Rob, Stephen M. Mount, and Carl Kingsford. "Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms." Nature biotechnology 32.5 (2014): 462-464.
---
You received this message because you are subscribed to the Google Groups "Sailfish Users Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sailfish-user...@googlegroups.com.
To post to this group, send email to sailfis...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/sailfish-users/be6fc6a9-776b-46a2-b39a-6f8694454fe3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

Pour trouver les limites du possible il faut tenter l'impossible.

--
Sailfish is available at https://github.com/kingsfordgroup/sailfish
Citation:
Patro, Rob, Stephen M. Mount, and Carl Kingsford. "Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms." Nature biotechnology 32.5 (2014): 462-464.
---
You received this message because you are subscribed to the Google Groups "Sailfish Users Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sailfish-user...@googlegroups.com.
To post to this group, send email to sailfis...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/sailfish-users/6f960563-e0ac-4eb6-820a-1fca4710ca7f%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--

Pour trouver les limites du possible il faut tenter l'impossible.

Pour trouver les limites du possible il faut tenter l'impossible.

Rob

unread,

Jun 22, 2016, 2:00:42 PM6/22/16

to Sailfish Users Group, rob....@gmail.com

Hi Vasisht,

No, this was a merge mistake! Thanks for bringing this to my attention. I'll add it back in when I make the next commit.

Best,

Rob

On Wednesday, June 22, 2016 at 1:31:49 PM UTC-4, Vasisht Tadigotla wrote:

Hi Rob,

Commit 766defb37341ac1d69675dd7d11bd54fd6621521 has removed this functionality from the develop branch. Do you plan to handle this in a different manner going forward?

Thanks,

Vasisht

On Thu, Jun 9, 2016 at 10:00 AM, Rob Patro wrote:

Hi Vasisht,

I’d committed the change but hadn’t pushed it yet. I just did that. You should be able to check that out now if yo want.

Best,
Rob

--
Rob Patro
Sent with Airmail

On June 9, 2016 at 9:39:56 AM, Vasisht Tadigotla (vasish...@gmail.com) wrote:

Hi Rob,

I think that covers all the cases and would provide all the information for further processing the unmapped reads. For m1/m2 - Do you mean read1/2 is mapped and read2/1 is an orphan? Is this checked into the develop branch? I couldn't find the relevant commit.

Vasisht

On Wed, Jun 8, 2016 at 9:35 PM, Rob wrote:

Hi Vasisht,

I thought about this a bit. What I've implemented (currently only on the develop branch, but this strategy will be the default in the next release unless there are objections) is that I will output the read name, followed by a concise description of exactly what / how the read was unmapped. For single-end reads, it's simple; the read is either mapped (and so doesn't appear in the file) or is unmapped so that the read name in the unmapped file is followed by 'u'. For paired-end reads, 'u' means that neither end maps. The other possibilities are 'm1' (only read 1 mapped — read 1 is an orphan), 'm2' (only read 2 mapped — read 2 is an orphan), 'm12' (both reads 1 and 2 mapped, but never to the same transcript). I think this covers all of the relevant cases in the paired-end case. Any thoughts?

--Rob

On Wednesday, June 1, 2016 at 9:34:51 AM UTC-4, Vasisht Tadigotla wrote:

Hi Rob,

This is perfect, thanks for the quick fix. How does this handle orphaned reads? I'm not currently using them for quantification but can see cases where it might be useful.

Thanks,

Vasisht

Hi Vasisht,

There is no such feature in v0.6.0, but I've added the ability to dump the names of unmapped reads in the current working branch. Currently, you can build this version from the `nb` branch of the Salmon repository, but I'm attaching a zip file with the relevant source (and a pre-compiled linux binary can be grabbed from Google Drive, here) to make things easier. If you pass the flag `--writeUnmappedNames` to salmon's quant command, then it will create a file, called `unmapped_names.txt` in the aux subdirectory of the quantification directory that contains the names of the reads that were not mapped during quantification. I chose to write out the read names rather than the reads themselves to save space. Also, note that, if you're doing paired-end quantification, it only writes out the name of the first read (in which case, you should consider the pair as unmapped). Let me know if this helps.

Best,

Rob

On Tuesday, May 31, 2016 at 3:45:56 PM UTC-4, Vasisht Tadigotla wrote:

Hi,

Is there a way to get a list of unmapped reads from the quasi-mapping mode in Salmon. I'm trying to identify the class of transcripts not present in my index.

Thanks,

Vasisht

--
Sailfish is available at https://github.com/kingsfordgroup/sailfish
Citation:
Patro, Rob, Stephen M. Mount, and Carl Kingsford. "Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms." Nature biotechnology 32.5 (2014): 462-464.
---
You received this message because you are subscribed to the Google Groups "Sailfish Users Group" group.

To unsubscribe from this group and stop receiving emails from it, send an email to sailfish-users+unsubscribe@googlegroups.com.
To post to this group, send email to sailfish-users@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/sailfish-users/be6fc6a9-776b-46a2-b39a-6f8694454fe3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

Pour trouver les limites du possible il faut tenter l'impossible.

--
Sailfish is available at https://github.com/kingsfordgroup/sailfish
Citation:
Patro, Rob, Stephen M. Mount, and Carl Kingsford. "Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms." Nature biotechnology 32.5 (2014): 462-464.
---
You received this message because you are subscribed to the Google Groups "Sailfish Users Group" group.

To unsubscribe from this group and stop receiving emails from it, send an email to sailfish-users+unsubscribe@googlegroups.com.
To post to this group, send email to sailfish-users@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/sailfish-users/6f960563-e0ac-4eb6-820a-1fca4710ca7f%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.