NGLess counting of paired reads

39 views
Skip to first unread message

Vadim Puller

unread,
Jan 14, 2021, 3:39:43 AM1/14/21
to NGLess

Dear NGLess team,

I am looking for a clarification regarding NGLess counting of paired reads.

More specifically: in the context of a pipeline for analyzing shotgun microbiome data we have tested NGLess `count` tool on a simulated set of data samples. The resulting count seems approximately two times lower than the number of reads in the simulated dataset. This led us to think that NGLess counts a pair of reads mapping to the same feature as 1 (i.e., NGLess counts features rather than reads).
Counting with different values of argument `sense` (`{both, sense, antisense}`) produces identical results. (That the results are exactly identical is probably due to the way the dataset was simulated - the reads are perfectly paired and all present in the gene catalogue that we use).

Thus, my questions are as follows:

  • Could you confirm that NGLess counts a pair of reads mapping to the same feature as 1?
  • How would NGLess count a case where only one read in a pair maps to a feature? (perhaps 0.5?)

I appreciate your help!

Vadim.

Luis Pedro Coelho

unread,
Jan 14, 2021, 3:48:06 AM1/14/21
to NGLess List
Hi Vadim,

More specifically: in the context of a pipeline for analyzing shotgun microbiome data we have tested NGLess `count` tool on a simulated set of data samples. The resulting count seems approximately two times lower than the number of reads in the simulated dataset. This led us to think that NGLess counts a pair of reads mapping to the same feature as 1 (i.e., NGLess counts features rather than reads).


If these are paired-end reads, then, yes, NGLess will treat each pair as 1 unit.

Counting with different values of argument `sense` (`{both, sense, antisense}`) produces identical results. (That the results are exactly identical is probably due to the way the dataset was simulated - the reads are perfectly paired and all present in the gene catalogue that we use).


This is a bit strange, actually. The {antisense} should produce no hits then (because they are not mapping to the catalog on the right strand).

Thus, my questions are as follows:

  • Could you confirm that NGLess counts a pair of reads mapping to the same feature as 1?

As above, yes.

  • How would NGLess count a case where only one read in a pair maps to a feature? (perhaps 0.5?)

It would still treat it as 1. Conceptually, it treats the pair as a unit and if one of the mates maps and the other one does not, it is treated as the pair having mapped.

HTH
Luis

I appreciate your help!

Vadim.


--
You received this message because you are subscribed to the Google Groups "NGLess" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ngless+un...@googlegroups.com.

Vadim Puller

unread,
Jan 15, 2021, 5:19:19 AM1/15/21
to NGLess

Dear Luis,

Thank you for your answer, it helped us a lot!

Could you please also comment about filtering mapped reads: I am considering to filter already mapped reads before counting them, keeping only the reads that map the best - this is the opposite of [the example described in the documentation]((http://ngless.embl.de/preprocess.html#filtering-reads-matching-a-reference)), where these reads are discarded. So I thought to use the following:

###Option 1

```
filt = select(mapped) using |mr|:

mr = mr.filter(min_identity_pc=95, action={unmatch})

if mr.flag({unmapped}):

discard

```

It seems to me that the result should be identical with using the following:

###Option 2

```

filt = select(mapped) using |mr|:

mr = mr.filter(min_identity_pc=95, action={drop})

```

However, if I save `filt` as a SAM file, without further processing, the output turns out to be different. In particular, the result of the first option contains as many records as the original SAM file (and this number of records does not change, if I I use `reverse=True` option.)

This gives me a reason t think that I misunderstand something about what the two snippets of the code do, and I will appreciate your explanation about this.

Regards, Vadim.

Luis Pedro Coelho

unread,
Jan 15, 2021, 10:05:39 PM1/15/21
to NGLess List

Dear Vadim (et al),

The difference is the action you use:

mr = mr.filter(min_identity_pc=95, action={unmatch})

mr = mr.filter(min_identity_pc=95, action={drop})


The simplest to understand is `{drop}`: it means those matches will get removed. The read can still be preserved if it was a multiple-mapper, but otherwise, the read is completely removed from the underlying SAM file. With {unmatched}, the read is always preserved but the corresponding SAM lines will be rewritten to be as if the read had not matched

HTH
Luis 

Vadim Puller

unread,
Jan 16, 2021, 1:35:16 AM1/16/21
to NGLess
Dear Luis,

This is how I understood the difference between {drop} and {unmatch}. What confuses me here is that {unmatch} is followed by
```
if mr.flag({unmapped}): discard
```
isn't this to discard an already {unmatched} read?

I appreciate your clarification.
Regards, Vadim.

Luis Pedro Coelho

unread,
Jan 19, 2021, 5:04:16 AM1/19/21
to NGLess List
Ah, ok. I read your previous message too fast.

I tested it locally and, indeed, they behave differently.

It's perhaps easiest to think about it in terms of SAM files:

- `filter` acts on individual lines in the file
- `if mr.flag(...)` acts on the full insert.

I think the difference can show up in paired-end reads if one of the mates is mapped and the other is not.

Best,
Luis

Luis Pedro Coelho | Fudan University | http://luispedro.org
--
You received this message because you are subscribed to the Google Groups "NGLess" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ngless+un...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages