[maker-devel] BUSCO

335 views
Skip to first unread message

Misner, Ian (NIH/NIAID) [C]

unread,
May 12, 2016, 5:32:17 PM5/12/16
to maker...@yandell-lab.org
Hello,

Are there any guidelines for using BUSCO to help train MAKER? CEGMA has been discontinued but I used to use the cegma2zff.pl steps to use those proteins as a training step. BUSCO seems to train Augustus but I'm not sure what file to pass from BUSCO to MAKER for this to be properly utilized. I didn't see anything specific about this in the archives. 
-----

Ian Misner, Ph.D.

Computational Genomics Specialist

Contractor, Medical Science and Computing, Inc.

Bioinformatics and Computational Biosciences Branch (BCBB)

NIH/NIAID/OD/OSMO/OCICB

5601 Fishers Lane, Room 4A59

Office: 301-761-6208

Mobile: 301-704-0151

Web: BCBB Home Page
Twitter: @NIAIDBioIT



Disclaimer: The information in this e-mail and any of its attachments is confidential and may contain sensitive information. It should not be used by anyone who is not the original intended recipient. If you have received this e-mail in error please inform the sender and delete it from your mailbox or any other storage devices. National Institute of Allergy and Infectious Diseases shall not accept liability for any statements made that are sender's own and not expressly made on behalf of the NIAID by one of its representatives.

Xabier Vázquez Campos

unread,
May 12, 2016, 8:32:12 PM5/12/16
to Misner, Ian (NIH/NIAID) [C], maker...@yandell-lab.org

_______________________________________________
maker-devel mailing list
maker...@box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org




--
Xabier Vázquez-Campos, PhD
Research Associate
Water Research Centre
School of Civil and Environmental Engineering
The University of New South Wales
Sydney NSW 2052 AUSTRALIA

Panos Ioannidis

unread,
May 13, 2016, 3:57:32 AM5/13/16
to Misner, Ian (NIH/NIAID) [C], maker...@yandell-lab.org, Felipe Simao Neto, Robert Waterhouse
Hello Ian,

Xabier is right. You have to run BUSCO with the --long switch and then, in the maker_opts.ctl file, you should point the augustus_species variable to your trained species (i.e. the name you pass with the -o/-a parameter).

So, in Xabier's example your maker_opts.ctl file should contain the following line:

augustus_species=Genus_species

Felipe, Rob, is there something else that I'm missing? Truth is that I haven't run this recently and there might be differences in newer BUSCO versions.

Panos


Panos Ioannidis, PhD
Postdoctoral researcher
Computational Evolutionary Genomics Group
University of Geneva

Dolze, Florian

unread,
May 13, 2016, 6:20:15 AM5/13/16
to Panos Ioannidis, Misner, Ian (NIH/NIAID) [C], maker...@yandell-lab.org, Felipe Simao Neto, Robert Waterhouse

On a somewhat related note, is there an advantage of using BUSCO to train Augustus instead of the provided Augustus webtraining service? Does anybody know how those 2 compare?
_______________________________________________
maker-devel mailing list
maker...@yandell-lab.org
http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org


Robert Waterhouse

unread,
May 13, 2016, 11:10:11 AM5/13/16
to Panos Ioannidis, maker...@yandell-lab.org, Felipe Simao Neto, Misner, Ian (NIH/NIAID) [C]
I think in the Augustus 'species' directory there should be a new folder named according to your BUSCO run, and in that folder should be the trained parameters for your new species, so from MAKER I guess you can point to these trained parameters.

Rob


  \\      Dr Robert Waterhouse
O0o--    SIB maître assistant

A maturing understanding of the composition of the insect gene repertoire COIS 2015
BUSCO: assessing genome assembly and annotation completeness Bioinformatics 2015

Robert Waterhouse

unread,
May 13, 2016, 11:10:11 AM5/13/16
to Dolze, Florian, maker...@yandell-lab.org, Felipe Simao Neto, Misner, Ian (NIH/NIAID) [C]
I would guess that the main 'advantage' of using BUSCO to train Augustus is that one will probably run BUSCO on one's genome anyway before starting MAKER, so there will already be a useful set of trained parameters ready to use. I guess the 'advantage' of using the Augustus webtraining service is that one could give it much more starting data (if indeed this is available, e.g. cDNAs). Indeed if there was enough time and it made a substantial difference one might even use the BUSCO gene model output as the 'Training gene structure file' for Augustus webtraining service. I don't believe that anyone has done a comparison on how different the trained parameters end up being.

Rob


  \\      Dr Robert Waterhouse
O0o--    SIB maître assistant

A maturing understanding of the composition of the insect gene repertoire COIS 2015
BUSCO: assessing genome assembly and annotation completeness Bioinformatics 2015

Fields, Christopher J

unread,
May 13, 2016, 12:26:26 PM5/13/16
to Robert Waterhouse, maker...@yandell-lab.org, Felipe Simao Neto, Misner, Ian (NIH/NIAID) [C]
Our group have mainly used the BUSCO model in the ‘bootstrap’ run for MAKER, then retrain Augustus and SNAP using a filtered data set from that run for new rounds of MAKER.

Also, one personal observation: we have found some genome assemblies where BUSCO performs poorly compared to CEGMA (e.g. BUSCO reports poor overall percent of SCO present, while CEGMA reports much higher numbers).  We’re still delving into this, but in those cases we avoid using the BUSCO model for obvious reasons.

chris
Reply all
Reply to author
Forward
0 new messages