counting in ngless dist1/all1

ullo...@googlemail.com

unread,

Apr 18, 2021, 6:40:32 AM4/18/21

to NGLess

Dear user-group,

I was wondering whether the statement in the docs:

"Generally, for obtaining gene abundances, distribution of multiple mappers is the best (using multiple={dist1}), while for functional annotations, you want to count them all (using multiple={all1}). This implies that the functional annotations will sum to a higher value than the number of reads. This may seem strange at first, but it is the intended behaviour."

implies, that mapping your samples to the same references should theoretically results in more hits using all1 than dist1? If so, I observed different behavior mapping samples to the iMGMC mouse gene catalog:

```

imgmc_counts_new = count(imgmc_mapped,
                    features=['seqname'],
                    normalization={raw},
                    multiple={dist1})
collect(imgmc_counts_new,
        current=current,
        allneeded=samples,
        ofile=RESULTS</>'imgmc_geneabundance.dist1.raw.txt')
imgmc_counts_new = count(imgmc_mapped,
                    features=['seqname'],
                    normalization={raw},
                    multiple={all1})
collect(imgmc_counts_new,
        current=current,
        allneeded=samples,
        ofile=RESULTS</>'imgmc_geneabundance.all1.raw.txt')
```

```

777014121 Apr 17 23:16 preproc/imgmc_geneabundance.all1.raw.txt
1425878298 Apr 17 23:04 preproc/imgmc_geneabundance.dist1.raw.txt

```

Any idea/comment?

Best,

Ulrike

Luis Pedro Coelho

unread,

Apr 18, 2021, 11:22:13 PM4/18/21

to Ulrike Löber, NGLess List

Yes, that is correct. You get more apparent hits with all1 than with dist1:

If your "sample" is a single read that maps to genes A & B (annotated to functions FA and FB), then the all1 is A=1/B=1 (or FA=1/FB=1), whilst the dist1 is A=.5/B=.5 (or FA=.5/FB=.5).

After much discussion (frankly probably too much as, in practice all these measures are so heavily correlated across samples that it's unlikely to matter too much), we considered that if you are discussing gene counts, then A=.5/B=.5 is more meaningful, but for functional analyses, you probably want to use FA=1/FB=1

Best,

Luis

Luis Pedro Coelho | Fudan University | http://luispedro.org

https://orcid.org/0000-0002-9280-7885

--
You received this message because you are subscribed to the Google Groups "NGLess" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ngless+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ngless/5e1ff880-7bd0-406d-b636-483a2bc022f2n%40googlegroups.com.

Luis Pedro Coelho

unread,

Apr 19, 2021, 1:17:25 AM4/19/21

to Ulrike Löber, NGLess List

No, that seems strange, indeed.

> 777014121 Apr 17 23:16 preproc/imgmc_geneabundance.all1.raw.txt
> 1425878298 Apr 17 23:04 preproc/imgmc_geneabundance.dist1.raw.txt

Sorry, I'm not 100% sure what those numbers are? It feels too large to be the # of lines/total sum of the columns.

Best
Luis

Ulrike Löber

unread,

Apr 19, 2021, 1:29:15 AM4/19/21

to Luis Pedro Coelho, NGLess List

No, it's the file size. But I would have expected to get the same number of lines, unbedingt the same samples on the same reference, only changing dist1 to all1.

Best,

Ulrike

Gesendet von Yahoo Mail auf Android

Am Mo., Apr. 19, 2021 at 7:17 schrieb Luis Pedro Coelho
<lu...@luispedro.org>:

Luis Pedro Coelho

unread,

Apr 19, 2021, 1:33:07 AM4/19/21

to Ulrike Löber, NGLess List

Oh, if it's the file size, then it's a bit harder to evaluate. I expect that all1 will have the same (or more) lines, but dist1 may result in larger file sizes because it takes more character to write fractional numbers.

Best

Luis

Luis Pedro Coelho | Fudan University | http://luispedro.org

https://orcid.org/0000-0002-9280-7885

Ulrike Löber

unread,

Apr 22, 2021, 12:00:31 AM4/22/21

to Luis Pedro Coelho, NGLess List

But in my case, it's the other way around. The dist1 file is much bigger than all1. Any explanation for that?

Best,

Ulrike

Gesendet von Yahoo Mail auf Android

Am Mo., Apr. 19, 2021 at 5:22 schrieb Luis Pedro Coelho
<lu...@luispedro.org>:

Reply all

Reply to author

Forward