Re: [maker-devel] How do I extract matrix from annotation

15 views
Skip to first unread message

Carson Holt

unread,
Oct 20, 2021, 2:07:42 AM10/20/21
to Emmanuel Nnadi, maker...@yandell-lab.org
This is an older email that looks like never got an answer. Briefly you need to use Linux command line tools to evaluate the GFF3 file.

Examples:

#count protein coding transcripts
cat annotations.gff | grep -c -P “\tmRNA\t"

#count tRNA transcripts
cat annotations.gff | grep -c -P “\ttRNA\t"

#count snoRNA transcripts
cat annotations.gff | grep -c -P “\tsnoRNA\t"


You can also pull out the Parent feature and count uniq entries to look at genes instead of transcripts. Example:

#count protein coding genes
cat annotations.gff | grep -c -P “\tmRNA\t” | perl -ane ‘/Parent=([^\;\n]+)/; print "$1\n”’ | sort | uniq | grep -c “"


You can also try tools like SOBA from the Sequence Ontology that give statistics on GFF3 features —> http://www.sequenceontology.org/cgi-bin/soba.cgi

—Carson



On Jul 27, 2021, at 9:15 AM, Emmanuel Nnadi <een...@gmail.com> wrote:

Hello Carson, Greetings from Nigeria.
Please how can I extract these matrix from my annotations?

Number of  protein-coding genes in the assembled tea plant genome  Those with known proteins and/or domains . Annotation of noncoding RNA genes  ribosomal RNA genes Number of  transfer RNA genes, Number  transcription factor genes and  simple sequence 

Thanks

Nnaemeka Emmanuel Nnadi,Ph.D
Department of Microbiology,
Faculty of Natural and Applied Science,
Plateau State University, Bokkos, Plateau State, Nigeria.
Publications: 



Reply all
Reply to author
Forward
0 new messages