[maker-devel] Apply old maker gff to new assembly

49 views
Skip to first unread message

Boyher, Adam

unread,
Nov 15, 2019, 4:26:32 PM11/15/19
to maker...@yandell-lab.org
Hi
I'm a bit unsure of how to apply a previous maker annotation to a new
assembly. I just want a quick annotation, so just taking the genes that
are already in the annotation file with their functional annotations
and naming schemes and find those genes in the new assembly. Which
settings should i use in maker_opts.ctl?

Thanks
Adam
_______________________________________________
maker-devel mailing list
maker...@yandell-lab.org
http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org

Carson Holt

unread,
Dec 2, 2019, 1:42:50 PM12/2/19
to Boyher, Adam, maker...@yandell-lab.org
Here is an archived post that covers this topic —> https://groups.google.com/forum/#!searchin/maker-devel/est_forward%7Csort:date/maker-devel/gDbvTknuep4/gDiGCMgjCQAJ

This will help map old structural models to new coordinates. Moving functional data may take some additional work. You might even want to rerun domain finders like interproscan rather than just copying old domain coordinates.

—Carson

Boyher, Adam

unread,
Dec 2, 2019, 2:50:54 PM12/2/19
to cars...@gmail.com, maker...@yandell-lab.org
Thank you Carson. That helps.

Can I ask another question in this thread?
I have a phased genome (i.e. haplotype specific chromosomes of a
diploid organism). I'm debating on the best way to annotate. Does it
make sense to annotate both phases together, or separately? I've
previously annotated them separately, and found some genes that exist
on both phases are only annotated on one. I thought by annotating them
together, the second/third iterations using augustus and snap would
help annotate those missing genes. Another question I have is about
creating a "global" gene naming scheme. Meaning, I want genes that
exist on both phases to be named the same on both annotations,
global_0001 for instance, and genes that only exist on one phase to be
named a different way, phase0_0001. Do you have ideas on the best way
to do that?

Thanks for your help and time!
Adam

Carson Holt

unread,
Dec 2, 2019, 3:03:47 PM12/2/19
to Boyher, Adam, maker...@yandell-lab.org
Either separate or together will probably result in near identical results. In GFF3 format ID= has to be unique but Name= does not. So you can name them the same by, setting Name= in the GFF3 file. I think map_gff_ids can do that for you if you provide a two column, tab delimited, text file (old-name new-name). Naming them the same thing in the fasta files will result in confusion though (unless the transcript has identical sequence), so while the genes can have the same name, keep the mRNA names different with a suffix (i.e. something like -RA, -RB, etc).

As far as identifying who goes with who, reciprocal best blast hits might be the best staring point (i.e. each model is each others best scoring BLAST hit - both ways). Just beware of historical genomic duplications that might make this difficult.

—Carson

Reply all
Reply to author
Forward
0 new messages