Thresholds for gene catalog mapping

16 views
Skip to first unread message

ullo...@googlemail.com

unread,
Jan 23, 2022, 2:37:37 PM1/23/22
to NGLess
Dear all,
So far, I have used the defaults from the human-gut-profiler.ngl, whenever I performed gene catalog mappings. However, now I wonder whether 45bp is not relatively short if I'm using 2x150bp as input.

igc_mapped_post = select(igc_mapped) using |mr|:

mr = mr.filter(min_match_size=45, min_identity_pc=95, action={drop})

if not mr.flag({mapped}):
discard

Has anyone ever performed any benchmarking on this? It would be pretty cool if we could discuss that in this community.

Cheers,
Ulrike

p.s. using GMGC as a reference now, just posted the original code snippet

Luis Pedro Coelho

unread,
Jan 24, 2022, 5:59:41 AM1/24/22
to NGLess List
I would guess that it's not so much about the size of the initial read, but whether you can identify hits with confidence and the GMGC is just that much larger so there is an argument that you need a longer hit to accept it.

We didn't specifically benchmark this, though.

HTH,
Luis
--
You received this message because you are subscribed to the Google Groups "NGLess" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ngless+un...@googlegroups.com.

Reply all
Reply to author
Forward
0 new messages