Hello Maker community,
I have, at last, finished annotating my genome with Maker (!) and have a few questions on the final output.
1. I used
gff3_merge and
fasta_merge in order to merge all the gffs and all the different fasta files that were produced during the runs (I split my assembly to smaller chunks that ran in parallel). Are these two scripts the only ones I have to run after Maker has finished? Am I leaving anything important behind?
2. I noticed that all my transcripts (both in the fasta files as well as in the gff) have the name "XXX-mRNA-1". The fact that I can't find any of them containing "mRNA-2" means that there are no splice variants from the same gene?
3. In my
*maker.proteins.fasta file I see that some proteins have a name like
snap_masked-XXX
whereas others (apparently, also predicted by SNAP) have a name, like
maker-XXX-snap-gene-XXX
What is the difference between these two genes that are both predicted by SNAP? By reading other posts in this list, I was left with the impression that all genes predicted by SNAP/Augustus that lie in a masked region (as the first name implies), are put to another fasta file, named
*maker.snap_masked.proteins.fasta.
4. By looking at a few genes in the
*maker.transcripts.fasta file I came to the conclusion that only complete genes (i.e. with a start and a stop codon) are reported in this file. Am I right?