Dear NGLess team,
I am looking for a clarification regarding NGLess counting of paired reads.
More specifically: in the context of a pipeline for analyzing shotgun microbiome data we have tested NGLess `count` tool on a simulated set of data samples. The resulting count seems approximately two times lower than the number of reads in the simulated dataset. This led us to think that NGLess counts a pair of reads mapping to the same feature as 1 (i.e., NGLess counts features rather than reads).
Counting with different values of argument `sense` (`{both, sense, antisense}`) produces identical results. (That the results are exactly identical is probably due to the way the dataset was simulated - the reads are perfectly paired and all present in the gene catalogue that we use).
Thus, my questions are as follows:
I appreciate your help!
Vadim.
More specifically: in the context of a pipeline for analyzing shotgun microbiome data we have tested NGLess `count` tool on a simulated set of data samples. The resulting count seems approximately two times lower than the number of reads in the simulated dataset. This led us to think that NGLess counts a pair of reads mapping to the same feature as 1 (i.e., NGLess counts features rather than reads).
Counting with different values of argument `sense` (`{both, sense, antisense}`) produces identical results. (That the results are exactly identical is probably due to the way the dataset was simulated - the reads are perfectly paired and all present in the gene catalogue that we use).
Thus, my questions are as follows:
- Could you confirm that NGLess counts a pair of reads mapping to the same feature as 1?
- How would NGLess count a case where only one read in a pair maps to a feature? (perhaps 0.5?)
I appreciate your help!
Vadim.
--You received this message because you are subscribed to the Google Groups "NGLess" group.To unsubscribe from this group and stop receiving emails from it, send an email to ngless+un...@googlegroups.com.To view this discussion on the web visit https://groups.google.com/d/msgid/ngless/2ff1fefe-5616-4a7e-aa7b-01cee6358cb7n%40googlegroups.com.
Dear Luis,
Thank you for your answer, it helped us a lot!
Could you please also comment about filtering mapped reads: I am considering to filter already mapped reads before counting them, keeping only the reads that map the best - this is the opposite of [the example described in the documentation]((http://ngless.embl.de/preprocess.html#filtering-reads-matching-a-reference)), where these reads are discarded. So I thought to use the following:
###Option 1
mr = mr.filter(min_identity_pc=95, action={unmatch})
if mr.flag({unmapped}):
discard
```
It seems to me that the result should be identical with using the following:
###Option 2
```
filt = select(mapped) using |mr|:
mr = mr.filter(min_identity_pc=95, action={drop})
```
However, if I save `filt` as a SAM file, without further processing, the output turns out to be different. In particular, the result of the first option contains as many records as the original SAM file (and this number of records does not change, if I I use `reverse=True` option.)
This gives me a reason t think that I misunderstand something about what the two snippets of the code do, and I will appreciate your explanation about this.
Regards, Vadim.
mr = mr.filter(min_identity_pc=95, action={unmatch})
mr = mr.filter(min_identity_pc=95, action={drop})
--You received this message because you are subscribed to the Google Groups "NGLess" group.To unsubscribe from this group and stop receiving emails from it, send an email to ngless+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ngless/1d1bb3de-ee31-4d3a-b6dd-c25a48e31203n%40googlegroups.com.