Gene prediction from a genome

sfu....@gmail.com

unread,

May 4, 2017, 3:32:29 PM5/4/17

to sfu-omics

Hi Everyone,

This is a discussion question and I welcome all feedback. Does anyone have a preferred method to predict a gene from a genome section?

A genome just got released for one of the species that I'm working on, and I'm trying to predict how one of the genes of my interest will look like. The gene that I'm interested on is relative long (~10,000) and repetitive.

I have the genome (1,700 scaffolds) and RNA-seq libraries for this species. My gene of interest seems to be completely found in one of the scaffolds (lucky me).

I just want to try different approaches to see if I can improve my current annotation of the gene. In addition of assembling the gene with a de novo approach, I have also assembled the gene with a genome-guided approach. Both approaches had a hard time assembling the section with the multiple tandem repeats. So I was thinking about trying to predict the gene from the genome based on other characteristics such as splicing sites.

Any suggestions for gene prediction from a genome would be appreciated.

Best,
Vanessa

michell...@gmail.com

unread,

May 5, 2017, 7:47:40 PM5/5/17

to sfu-omics, sfu....@gmail.com

Hi Vanessa,

In my lab, I've been told to blast your sequences to the closest related species.

In addition I supplement those results, with output from Blast2GO. It breaks certain regions of your gene into conserved areas and provides you with a GO term/InterPro term/KEGG term.That will help narrow down your genes function.

This may not be the best approach, this is just what I've done in the past.

Best,
Michelle

Vanessa Guerra

unread,

May 8, 2017, 2:36:09 PM5/8/17

to sfu-omics, sfu....@gmail.com, michell...@gmail.com

Great feedback, thank you Michelle. I was also thinking I have to take a combine 1) sequence similarity search and 2) gene structure search approach. I know how the Gene looks in general, but we lose confidence in some areas due to the presence of tandem repeats that vary depending on the allele present. I'm taking a similar approach than the one you suggested but with different programs. I already have identified the region of interest in the genome with blast (1). And now (2 - ab initio approach) I'm looking for splice sites near the mapped areas (mapped with RNA-seq libraries sequences) of the gene, and then I was planning to visualizing the areas of interest in SMART to look at the domain architecture .

I had not thought of using Blast2GO for the second step! I use that program for general annotations (GO#, KEGG#, etc), but I hadn't thought of using it for the second step.