Question about unassigned reads (-1) using IGC

12 views
Skip to first unread message

quinten...@gmail.com

unread,
Nov 23, 2020, 6:42:13 AM11/23/20
to NGLess
Dear Luis/Renato,

I had a question about the ultimate matrix I end up with and the unassigned reads when I profile my reads to the IGC database (KEGG KO annotations).
When I profile the samples, ~95% of reads from most samples align to the IGC (or at least that is what I get from the output below) 
[Sat 21-11-2020 19:26] Line 10: Mapped readset stats (/exports/mm-run/bacteriologie/quinten/Databases/ngless/data/Modules/igc.ngm/1.0/./IGC.fna.gz):
[Sat 21-11-2020 19:26] Line 10: Total reads: 8541473
[Sat 21-11-2020 19:26] Line 10: Total reads aligned: 8309631 [97.29%]
[Sat 21-11-2020 19:26] Line 10: Total reads Unique map: 3131652 [36.66%]
[Sat 21-11-2020 19:26] Line 10: Total reads Non-Unique map: 5177979 [60.62%]

However, when looking at my output matrix (annotated for KEGG ko), about 30-50% of reads per sample end up in the unassigned (-1) category. Is this simply because many of the genes are not annotated with a KEGG ko (so these reads will end up in the assigned category), or is there another reason that I am missing?

Best,
Quinten

Renato Alves

unread,
Nov 23, 2020, 6:55:25 AM11/23/20
to ngl...@googlegroups.com

Dear Quinten,

> However, when looking at my output matrix (annotated for KEGG ko), about 30-50% of reads per sample end up in the unassigned (-1) category. Is this simply because many of the genes are not annotated with a KEGG ko (so these reads will end up in the assigned category), or is there another reason that I am missing?

That would be my understanding as well.

Depending on how you are running this analysis, you may want to distinguish between reads that didn't map the reference and reads that map to genes without annotation.
To do this you need only to discard unmapped reads before calling count() and after map(reference='igc').


Cheers,
Renato

quinten...@gmail.com

unread,
Nov 23, 2020, 7:27:59 AM11/23/20
to NGLess
Dear Renato,

Ok, good to know =). Thanks a lot for the quick answer.
I think I don't specifically need to know this (I'll simply remove all unassigned reads and only look at annotated counts which I will then convert into relative abundances, and then do some statistics), but thanks a lot for the tip!

Best,
Quinten

Op maandag 23 november 2020 om 12:55:25 UTC+1 schreef renato.alves:
Reply all
Reply to author
Forward
0 new messages