IMG/M-ER contig lineage assignment

24 views
Skip to first unread message

Ernest Szeto

unread,
Jan 9, 2015, 4:56:18 PM1/9/15
to img-use...@lbl.gov
IMG/M-ER contig lineage assignment has been changed to use a more conservative assignment algorithm.
False positives dues to very few genes with phylogenetic distribution assignments on a contig has been noticed for the previous algorithm.
Here are the notes taken from the source code ...

############################################################################
# predLin3.py
#
# This is a more conservative algorithm that takes into the total
# account the number of genes in a contig, not just the percentage
# of genes with phylogenetic distribution hits on the contig,
# for assigning the lineage.
# It needs to deal with case that there are few genes in the contig
# that have phylogenetic distribution hits.
#
#   -- Start from most specific lineage specification and go
#      to least specific.
#   -- For each level of specificty, find the lineage with the
#      that occurs most commonly. Record it's number of phylogenetic
#       distribution hits.
#   -- For contigs having >= 10 or <= 3 genes
#      stop and return the lineage if the percentage of hits for
#      for this lineage >= 51%.
#   -- Else if the contig has between 4 and 9 genes
#      stop and return the lineage if there are >= 3 hits
#      for that lineage.  (This is a heuristic hack based on
#      observations about the data.)
#  
#   --es 01/09/15
#
############################################################################

Reply all
Reply to author
Forward
0 new messages