[maker-devel] Adapting MAKER pipeline for my HPC

Steven L. Miller

unread,

Oct 5, 2021, 9:17:34 PM10/5/21

to maker...@yandell-lab.org

Sorry to say probably not a new issue. I saw that MAKER is an easy-to-use genome annotation pipeline designed to be usable by small research groups with little bioinformatics experience. That statement describes me to a t. I am new to Whole Genome Sequencing and associated analysis, and have a limited skill set with computers. I have a a 1.52 GB, 75k unitigs, non-photosynthetic vascular plant assembly for which I would like to predict functional genes. I have tried to follow the tutorial that you have posted for Winter School 2018 to see if I can duplicate your installation and implementation for my HPC. I doubt that you will be impressed, but I recently received a “certificate” for completing our HPC Linux course path the University of Wyoming. . I realize now how much I don’t know and how much work it will take to complete an annotation, but I am needing to generate at least some preliminary data to include in an upcoming grant submittal (~2 months).

I read your MAKER wiki page, but it seems that it hasn’t been completed in many cases, with many important parts (important parts to me, in any case) under construction.

I am writing to get any help I can to find a way forward in my project. Any suggestions and any help you can provide would be gratefully accepted!

Sincerely,
Steve Miller
Botany
University of Wyoming

Jason Stajich

unread,

Oct 6, 2021, 12:34:30 PM10/6/21

to Steven L. Miller, maker...@yandell-lab.org

Steven -

You might start with the galaxy install and tutorial or the cyverse one.

https://galaxyproject.github.io/training-material/topics/genome-annotation/tutorials/annotation-with-maker/tutorial.html

https://learning.cyverse.org/projects/sciapps_guide/en/latest/annotation.html

http://gmod.org/wiki/MAKER_Tutorial

Not sure if the space needed will be too much for your large genome. To gain better help you might specify the problems you are having in running or installing?

I'll also point out a parallel genome annotation tool we have built called funannotate which does the prediction training automatically from BUSCO or RNASeq datasets.

https://funannotate.readthedocs.io/en/latest/ -

installation with conda https://funannotate.readthedocs.io/en/latest/install.html

also docker image.

https://hub.docker.com/r/nextgenusfs/funannotate

https://github.com/reslp/funannotate-docker

Jason Stajich

_______________________________________________
maker-devel mailing list
maker...@yandell-lab.org
http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org

Mark Yandell

unread,

Oct 6, 2021, 1:12:40 PM10/6/21

to Jason Stajich, Steven L. Miller, maker...@yandell-lab.org

Thanks, Jason!

Jason Stajich

unread,

Oct 6, 2021, 8:54:17 PM10/6/21

to Steven L. Miller, maker...@yandell-lab.org

It’s just a genome assembly as input. Nothing more.

On Wed, Oct 6, 2021 at 1:28 PM Steven L. Miller <Fu...@uwyo.edu> wrote:

Yes thank you Jason for responding! Yes I have tried the Galaxy approach and as you predicted my dataset is too large to run an annotation locally on my laptop - about 157 x too large by my calculations. I would be happy to try your funannotate tool, but my dataset is in Fasta, and I would have to run BUSCO to get my Fasta data into a different format? Because of the size of my WGS, that would likely have to be done on the HPC?

Although I do not have all the experience necessary to pull this off quickly, I am thinking about my experimental design and the steps that I will have to accomplish.

My experimental design is this:

I have a non-photosynthetic plant. I want to eventually find out how this plant functions differently from its fully photosynthetic cousin and if all related non-photosynthetic plants function in the same way. There are two annotated genomes of closely related plants in NCBI: 1. another fairly closely related non-photosynthetic plant and 2. a fairly closely related fully photosynthetic plant. I do not have a clue as to how good either these annotations are.

For the annotation portion of my WGS, I can understand using the fairly closely related fully photosynthetic plant to train my annotation gene prediction, assuming that all functional genes required for photosynthesis and metabolism are present in this plant. Once my WGS is annotated, I can then move on to compare the functional genes in both non-photosynthetic plants.

So, these are the steps I must figure out to run this analysis on my HPC:

1. installation of necessary MAKER related bits of software? MAKER version 2.31.10 is installed on my HPC, but I don’t know if all the associated tools are also installed (e.g. Augustus)

2. upload my assembled WGS in fasta format (from my limited knowledge I would need to use GlobusFTP?)

3. upload the annotated WGS of the fully photosynthetic plant from NCBI (again in Fasta using Globus FTP?)

4. upload the annotated WGS of the other non-photosynthetic plant (again in Fasta using Globus FTP?)

5. Run MAKER, using one or the other fully annotated WGS to train MAKER (or Augustus) to predict the genes from my non-photosynthetic plant

6. Use the ugly MAKER output data in GFF3 along with reports from InterProScan and a BLAST report of homology and script maker_map_ids to make pretty graphics

Thanks again for any help you can provide. I wish there was an upcoming workshop I could attend to get into this process in a hurry!

On Oct 6, 2021, at 11:11 AM, Mark Yandell <myan...@genetics.utah.edu> wrote:

◆ This message was sent from a non-UWYO address. Please exercise caution when clicking links or opening attachments from external sources.

--

Jason Stajich
jason....@gmail.com

Steven L. Miller

unread,

Oct 11, 2021, 1:50:54 PM10/11/21

to Mark Yandell, jason....@gmail.com, maker...@yandell-lab.org

Yes thank you Jason for responding! Yes I have tried the Galaxy approach and as you predicted my dataset is too large to run an annotation locally on my laptop - about 157 x too large by my calculations. I would be happy to try your funannotate tool, but my dataset is in Fasta, and I would have to run BUSCO to get my Fasta data into a different format? Because of the size of my WGS, that would likely have to be done on the HPC?

Although I do not have all the experience necessary to pull this off quickly, I am thinking about my experimental design and the steps that I will have to accomplish.

My experimental design is this:

I have a non-photosynthetic plant. I want to eventually find out how this plant functions differently from its fully photosynthetic cousin and if all related non-photosynthetic plants function in the same way. There are two annotated genomes of closely related plants in NCBI: 1. another fairly closely related non-photosynthetic plant and 2. a fairly closely related fully photosynthetic plant. I do not have a clue as to how good either these annotations are.

For the annotation portion of my WGS, I can understand using the fairly closely related fully photosynthetic plant to train my annotation gene prediction, assuming that all functional genes required for photosynthesis and metabolism are present in this plant. Once my WGS is annotated, I can then move on to compare the functional genes in both non-photosynthetic plants.

So, these are the steps I must figure out to run this analysis on my HPC:

1. installation of necessary MAKER related bits of software? MAKER version 2.31.10 is installed on my HPC, but I don’t know if all the associated tools are also installed (e.g. Augustus)

2. upload my assembled WGS in fasta format (from my limited knowledge I would need to use GlobusFTP?)

3. upload the annotated WGS of the fully photosynthetic plant from NCBI (again in Fasta using Globus FTP?)

4. upload the annotated WGS of the other non-photosynthetic plant (again in Fasta using Globus FTP?)

5. Run MAKER, using one or the other fully annotated WGS to train MAKER (or Augustus) to predict the genes from my non-photosynthetic plant

6. Use the ugly MAKER output data in GFF3 along with reports from InterProScan and a BLAST report of homology and script maker_map_ids to make pretty graphics

Thanks again for any help you can provide. I wish there was an upcoming workshop I could attend to get into this process in a hurry!

On Oct 6, 2021, at 11:11 AM, Mark Yandell <myan...@genetics.utah.edu> wrote:

◆ This message was sent from a non-UWYO address. Please exercise caution when clicking links or opening attachments from external sources.

Carson Holt

unread,

Oct 20, 2021, 1:41:24 AM10/20/21

to Steven L. Miller, maker...@yandell-lab.org

Funnanotate takes FASTA as input. But if you still choose the MAKER route, BUSCO can train Augustus for you (BUSCO also runs much faster than MAKER). Once you have trained Augustus, you could use MAKER to annotate just a few contigs (not the whole genome). That might give you something to work with, since you are only looking for some preliminary data.

—Carson

Reply all

Reply to author

Forward