We are annotating nematode genomes. Although it is evident that some
genes in Nematoda will have 10k or even larger introns (C. elegans
knowledge) most wont. We are also using EST (RNAseq) evidence for our
annotation. From looking at the maker output I get the feeling (no
hard numbers yet) that maker is predicting a lot of genes with long
introns, that might not be supported by ESTs, i.e. it seems to say
there is an intron between these two far away exons, so they belong to
one gene, while in truth they should be separate coding regions of
differing genes. So, should one lower the max intron size? Or is there
another issue here?
Main question behind this:
Does maker know that most introns are small and only allow really huge
ones when it's got hard evidence (protein or est or both), or will it
stick together bits and pieces of different things.
Regards,
Phil
_______________________________________________
maker-devel mailing list
maker...@box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Yes, lower the max_intron size.
The max intron length will affect EST and protein alignments produced by MAKER, i.e. will not allow gaps longer than the max to occur between individual HSPs. So lowering it will in most cases solve that problem for you, because these alignments directly influence the final gene models. SNAP, Augustus, and GeneMark however will not necessarily respect the maximum and can still call long introns (in which case this has more to do with better training those algorithms).
For mRNA-seq data, where you preprocess the reads using TopHat or Cufflinks and then provide the results to MAKER, you can set the maximum intron size independently in those programs.