Hi all,
I am a relatively new user to Maker2, and I’m looking for advise on running many iterations of the same dataset in Maker2.
I have a relatively small genome (~124 MB) from a wasp that is assembled into ~1,500 scaffold. I have run several iterations of Maker2 by re-generating .hmms in SNAP and feeding them into the next round, and my gene predictions keep increasing (in number and in size). The only thing that changes at each round is the .hmm.
This is the evidence that I give is:
- de novo assembled ESTs from a different strain of the same species (70,000 contigs… I am currently working on improving this assembly with the hope that this will be helpful here)
- 610 proteins extracted from the genome scaffolds using CEGMA and HaMSTr
For my 1st iteration, I used the Nasonia .hmm from SNAP, and the est2genome/protein2genome option.
For the 2nd, 3rd and 4th rounds I have used .hmms generated from the previous round, all without the est2genome/protein2genome option. All other files are the same as in the original run.
As I understand it, after the second round, nothing should change in Maker2. But the differences are obvious between runs. Some entirely new exons are annotated. For example, just counting “exon” in the .gff file gives me 73,000 after the third iteration and 96,000 after the fourth! Actually the biggest leap in this number is between the third and fourth round. I can also see that many features are longer when I look at the files in Geneious.
Is this sort of change possible after the second round of Maker2? Is there something I have done wrong in my runs, or am a understanding this output incorrectly?
Thank you,
Alice
_______________________________________________
maker-devel mailing list
maker...@box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
I posted a while ago about a genome I'm running through the Maker2
pipeline. I was concerned because my results were still changing with
3 and 4 iterations.
Following the very useful advice of Carson (below), I've made a few
modifications (adding a RepeatModeler run, using a big protein
database), but my gene predictions are still changing between the 3rd
and 4th iterations. Perhaps this is ok, but these increasing gene
lengths make me worry that I haven't built stable models.
Here is the short version of what I've done.
1. Run RepeatModeler, but this only produced 47 sequences in the
resulting .fasta... so that seemed a bit small.
2. Run Maker2 using:
- RepeatModeler output + "model_org=all" and "softmask=1" in the
Repeat Masking section.
- protein evidence from 2 distantly related species AND all of Uniprot
- ests from a different strain of my species (a parasitoid wasp)
- the .hmm from Nasonia, one of the 2 distantly related species whose
proteome I also provided as protein evidence
- my assembled genome of 1,509 scaffolds.
3. After this, I did three subsequent rounds of Maker2 (cleverly named
Rounds 2, 3 and 4). Each one used the same input, except the Nasonia
.hmm was replaced by a SNAP generated .hmm from the previous round.
Also, the est2genome and protein2genome was changed from 1 to 0 in all
runs after the first.
Here are some results:
Round1: 14,647 genes, average length 2,491
Round2: 12,158 genes, average length 3,760
Round3: 13,515 genes, average length 3,090
Round4: 12,169 genes, average length 3,918
This is a bit confusing because the number of genes predicted goes up
and down, as does their lengths. I've doubly checked the dates of my
files, and they are all labeled such that I don't think anything could
be swapped.
So my questions are:
Is this an indication that my models are unstable and I shouldn't
trust these predictions?
Is the decreasing number of genes, while also getting longer perhaps a
good thing?
How do I know when to stop if genes keep getting longer?
Thanks very much,
Alice
> _______________________________________________
> maker-devel mailing list
> maker...@box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>
--
Alice Dennis
aliceb...@gmail.com
Postdoctoral Researcher
Institute for Integrative Biology, ETH Zürich & EAWAG
Überlandstrasse 133
P.O. Box 611
8600 Dübendorf, Switzerland
https://adennis5.wordpress.com/