You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to img-use...@lbl.gov
IMG/M-ER contig lineage assignment has been changed to use a more conservative assignment algorithm. False positives dues to very few genes with phylogenetic distribution assignments on a contig has been noticed for the previous algorithm. Here are the notes taken from the source code ...
############################################################################ # predLin3.py # # This is a more conservative algorithm that takes into the total # account the number of genes in a contig, not just the percentage # of genes with phylogenetic distribution hits on the contig, # for assigning the lineage. # It needs to deal with case that there are few genes in the contig # that have phylogenetic distribution hits. # # -- Start from most specific lineage specification and go # to least specific. # -- For each level of specificty, find the lineage with the # that occurs most commonly. Record it's number of phylogenetic # distribution hits. # -- For contigs having >= 10 or <= 3 genes # stop and return the lineage if the percentage of hits for # for this lineage >= 51%. # -- Else if the contig has between 4 and 9 genes # stop and return the lineage if there are >= 3 hits # for that lineage. (This is a heuristic hack based on # observations about the data.) # # --es 01/09/15 # ############################################################################