Re: [maker-devel] maker2 scripts for functional annotation

462 views
Skip to first unread message

Carson Holt

unread,
Sep 25, 2013, 10:35:46 AM9/25/13
to Xia...@dupont.com, myan...@genetics.utah.edu, Corban-Gre...@dupont.com, maker...@yandell-lab.org
If it is launching predictors then you have snap hmm or augustus_species
set. You ned to blank out all other options in the control files
(including repeat masking options, proteins, ESTs, etc.) when trying to
convert mathc/match_part to gene/mRNA/exons/CDS, or else those other
programs will run.

--Carson


On 9/25/13 10:31 AM, "Xia...@dupont.com" <Xia...@dupont.com> wrote:

>Hi Carson,
>
>Thank you for the message and your kind help. We tested maker2 by setting
>keep_preds=1, pred_gff=generated_gff_file_from_first_makerRun . But it
>seemed maker2 started to launch all predictors again and it took long
>time to finish. I wonder if there is any way that we can directly get
>gene/mRNA/exons/CDS gff file without re-running maker2 to convert
>match/match_part features into gene/mRNA/exons/CDS.
>
>Thanks,
>Xia
>
>-----Original Message-----
>From: Carson Holt [mailto:cars...@gmail.com]
>Sent: Thursday, September 19, 2013 5:58 PM
>To: Mark Yandell; CAO, XIA; RIVERA, CORBAN GREGORY;
>maker...@yandell-lab.org
>Subject: Re: [maker-devel] maker2 scripts for functional annotation
>
>Hello Corban & Xia,
>
>Some scripts like gff3_preds2models are deprecated. To get the same
>result as was offered by gff3_preds2models, just give your
>match/match_part features to pref_gff= in the maker_opts.ctl file, set
>keep_preds=1, and run with all other options and predictors turned off.
>The final MAKER result will be your match/match_part features converted
>into gene/mRNA/exons/CDS.
>
>For functional annotation, you can use Interproscan, BLASTP against
>UniProt, or BALST2GO. My preference is to use InterProScan to add GO
>terms and proteins domains via the ipr_update_gff and iprscan2gff3
>scripts. Then add putative gene functions via BLASTP to UniProt and
>maker_functional_fasta and maker_functional_gff scripts.
>
>Go ahead and take a look and that those tools and let me know if you have
>any questions or need help you configuring them.
>
>Thanks,
>Carson
>
>
>On 9/19/13 11:53 AM, "Mark Yandell" <myan...@genetics.utah.edu> wrote:
>
>>Hi Corban & Xia,
>>
>>
>>I've forwarded your question along to the MAKER_dev list, were you can
>>get speedy answers to your maker related questions. Thanks for using
>>MAKER.
>>
>>--mark
>>
>>
>>Mark Yandell
>>Professor of Human Genetics
>>H.A. & Edna Benning Presidential Endowed Chair Eccles Institute of
>>Human Genetics University of Utah
>>15 North 2030 East, Room 2100
>>Salt Lake City, UT 84112-5330
>>ph:801-587-7707
>>
>>________________________________________
>>From: Xia...@dupont.com [Xia...@dupont.com]
>>Sent: Thursday, September 19, 2013 11:49 AM
>>To: Mark Yandell; Corban-Gre...@dupont.com
>>Subject: maker2 scripts for functional annotation
>>
>>Dr. Yandell,
>>
>>We were recently evaluating maker2 for annotation and going through the
>>maker tutorial from 2012.
>>
>>http://gmod.org/wiki/MAKER_Tutorial_2012
>>
>>The tutorial makes references to some scripts that we couldn¹t find in
>>the current release. We were looking for scripts like
>>gff3_preds2models to convert match/match_part format into annotations
>>with gene/mRNA/exons/CDS and others. I was wondering if maybe we did
>>not have the most up to date version.
>>
>>In addition to getting accurate gene annotations, I was looking for a
>>solution to get functional assignments. I see that there are some
>>scripts like maker_functional_fasta that may help, but I was wondering
>>what you would recommend.
>>
>>Thanks,
>>
>>Corban & Xia
>>
>>This communication is for use by the intended recipient and contains
>>information that may be Privileged, confidential or copyrighted under
>>applicable law. If you are not the intended recipient, you are hereby
>>formally notified that any use, copying or distribution of this e-mail,
>>in whole or in part, is strictly prohibited. Please notify the sender
>>by return e-mail and delete this e-mail from your system. Unless
>>explicitly and conspicuously designated as "E-Contract Intended", this
>>e-mail does not constitute a contract offer, a contract amendment, or
>>an acceptance of a contract offer. This e-mail does not constitute a
>>consent to the use of sender's contact information for direct marketing
>>purposes or for transfers of data to third parties.
>>
>>The dupont.com web address will continue in use for a transitional
>>period for communications sent or received on behalf of DuPont
>>Performance Coatings., which is not affiliated in any way with the
>>DuPont Company.
>>
>>Francais Deutsch Italiano Espanol Portugues Japanese Chinese
>>Korean
>>
>> http://www.DuPont.com/corp/email_disclaimer.html
>>
>>_______________________________________________
>>maker-devel mailing list
>>maker...@box290.bluehost.com
>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>
>
>This communication is for use by the intended recipient and contains
>information that may be Privileged, confidential or copyrighted under
>applicable law. If you are not the intended recipient, you are hereby
>formally notified that any use, copying or distribution of this e-mail,
>in whole or in part, is strictly prohibited. Please notify the sender by
>return e-mail and delete this e-mail from your system. Unless explicitly
>and conspicuously designated as "E-Contract Intended", this e-mail does
>not constitute a contract offer, a contract amendment, or an acceptance
>of a contract offer. This e-mail does not constitute a consent to the
>use of sender's contact information for direct marketing purposes or for
>transfers of data to third parties.
>
>The dupont.com<http://dupont.com/> web address will continue in use for a
>transitional period for communications sent or received on behalf of
>DuPont
>Performance Coatings., which is not affiliated in any way with the DuPont
>Company.
>
>Francais Deutsch Italiano Espanol Portugues Japanese Chinese Korean
>
> http://www.DuPont.com/corp/email_disclaimer.html
>

_______________________________________________
maker-devel mailing list
maker...@box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

Carson Holt

unread,
Sep 27, 2013, 6:48:29 AM9/27/13
to mhin...@ebi.ac.uk, maker...@yandell-lab.org
From: Carson Holt <cars...@gmail.com>
Date: Friday, September 27, 2013 6:42 AM
To: <mhin...@ebi.ac.uk>
Subject: Re: [maker-devel] maker2 scripts for functional annotation

If you set keep_preds=1, then unsupported predictions become genes (you don't need EST's or proteins).  If you only supply a single pred_gff input and turn everything else off, then the result is maker turning match/match_part into gene/mRNA/exon/CDS, and it runs rather quickly (only processing is the time spent verifying reading frame, etc.).  If you leave other things on in the control files, then you will get a lot of other processes like a standard MAKER run.

Thanks,
Carson


From: <mhin...@ebi.ac.uk>
Date: Friday, September 27, 2013 4:34 AM
To: Carson Holt <cars...@gmail.com>
Subject: Re: [maker-devel] maker2 scripts for functional annotation

Hi... Xia and Carson

I've been trying to do something similar to get maker gene models derived from CEGMA predictions, and thought it would be nice to use the CEGMA GFF rather than the protein fasta as that includes exon structure.  The CEGMA output is  a GFFv2 variant and i managed to get this into GFFv3 via a combination of Augustus/gff2gbSmallDNA.pl, EMBOSS/seqret and then sed to patch a few tags. (the tags came out as into EMBL/ databank_entry, mRNA and CDS, not sure if this is valid for pred_gff or not))

If you run maker with pref_gff=my_file and keep pred=1 with est2genome and protien2genome switched off then you get a lot of est2genome and blast activity. (I also had pred_stats=1 on one run). You can prevent most of this my removing the est and protein files from the config :-). However without EST and protien evidence you get no gene models, so (i guess - I'm new to maker also, Carson please correct me if i'm wrong) if you've already run est2genome and proetien2genome then pref_gff could be used to convert your GFF to maker models, if you filter the maker gene models by source.

AFAICS if you have est and protein data configured and est2genome and protein2genome switched off then maker will used these as evidence for your GFF which means it will have to align them, which could be mistaken for running those analyses.

Hope this helps and apologies if i'm wrong!

Carson Holt

unread,
Sep 27, 2013, 6:48:52 AM9/27/13
to mhin...@ebi.ac.uk, maker...@yandell-lab.org
So to give a little background to this, the question was how to turn match/match_part into gene/mRNA/exon/CDS like the old gff3_preds2models script.  The steps below will basically just turn maker into a feature type converter and ignore all it's other capabilities.  That being said, depending on what your final goal is, you might actually want to be running something a different way, but if your only goal is to blindly convert feature types, then those steps will work.

Thanks,
Carson

Xia...@dupont.com

unread,
Oct 1, 2013, 3:00:42 PM10/1/13
to cars...@gmail.com, maker...@yandell-lab.org, Corban-Gre...@dupont.com
Hi Carson,

Thank you for your message and your kind help. Now conversion from match/match_part to gene/mRNA/exons/CDS works well after blanking out some options in control file.

I have one more question about maker2. In case we don't have EST evidence (set of ESTs or assembled mRNA-seq in fasta format) for a genome, does maker2 function? If it does function, could you please let me know the performance of maker2 without providing EST evidence compared to the one with EST evidence?

Thank you again and I look forward to hearing from you.

Best,

Carson Holt

unread,
Oct 2, 2013, 10:14:44 AM10/2/13
to Xia...@dupont.com, maker...@yandell-lab.org, Corban-Gre...@dupont.com
It still works, but it will be reduced. Without ESTs you won't get any
UTR prediction, also you will be limited by how well the protein set
matches to the genes. If you use the protein set of a close relative you
will capture most things. You will probably capture 80-95% of what you
would by also including ESTs. It all depending on how different the
species in the proteins evidence file are compared to the the species
being annotated.

--Carson

Xia...@dupont.com

unread,
Oct 2, 2013, 10:40:12 AM10/2/13
to cars...@gmail.com, maker...@yandell-lab.org, Corban-Gre...@dupont.com
Hi Carson,

Thank you for your quick and kind reply. One more question here. If we don't have EST evidence for the strain we sequenced/assembled, will it be ok to provide some EST evidence that is from some other strains which are close to the one we are trying to annotate? Could you please let us know how this will affect performance of maker2?

Thank you again.
Reply all
Reply to author
Forward
0 new messages