[maker-devel] mapping cDNA to updated genome

116 views
Skip to first unread message

Prashant S Hosmani

unread,
Aug 31, 2016, 10:36:02 AM8/31/16
to maker...@yandell-lab.org
Hi All,

I am working on updating a plant genome annotation. I would like to map genes from previous annotation to a new genome build. There is a protocol about this in Campbell et al 2014, current protocols in bioinformatics (basic protocol 4 - Mapping annotations to a new assembly). I followed that protocol exactly with setting est_forward=1. But in output I’m getting large number of genes. My input cDNA fasta contains ~35K genes and after mapping there are ~58K genes. 

I’m using maker version 3.0. There are few changes in the genome and I’m not expecting many changes in the mapping previous genes.

Please let me know if there are any other parameters to control mapping of EST’s. I was hoping to get similar number of genes mapped on to new assembly with very few changes.

Thank you for your help in advance.
Prashant


Prashant Hosmani
Sol Genomics Network
Boyce Thompson Institute, Ithaca, NY, USA



Michael Campbell

unread,
Aug 31, 2016, 12:10:41 PM8/31/16
to Prashant S Hosmani, maker...@yandell-lab.org
Hi Prashant,

I’m almost positive that the additional genes are coming from multiply aligning cDNAs. Did you repeat mask your genome before mapping things forward?

Another thought, what kind of whole genome duplications has your plant been through. it may be that the multiple alignments are to pseudogenes is some stage of decay. If that is the case it would probably be safe to keep the the gene from longest/best aligned cDNA.

Thanks,
Mike
_______________________________________________
maker-devel mailing list
maker...@box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

Carson Holt

unread,
Aug 31, 2016, 12:13:11 PM8/31/16
to Michael Campbell, maker...@yandell-lab.org
Also if you have multiple alignments of the same cDNA, you can use the score column of the mRNA feature to see which aligns best. If they have the same score, you will have to disambiguate manually or just remove all copies.

—Carson

Prashant S Hosmani

unread,
Sep 20, 2016, 4:34:03 PM9/20/16
to Carson Holt, maker...@yandell-lab.org
Hi Mike and Carson,

Thank you for your help. I used masked genome for aligning cDNAs. And yes, this was due to multiple aligning cDNA’s. I guess you could also filter according genes based on the alignment score from gff. I used GMAP (http://research-pub.gene.com/gmap/) to align cDNA on to the updated genome. GMAP has parameters to filter based on alignment scores and also can choose best path per cDNA.

Regards,
Prashant


Prashant Hosmani
Sol Genomics Network
Boyce Thompson Institute, Ithaca, NY, USA



Reply all
Reply to author
Forward
0 new messages