Easiest way to annotate protein coding genes in a genome

14 views
Skip to first unread message

OBBARD Darren

unread,
May 16, 2020, 7:49:48 AM5/16/20
to ashworth-c...@googlegroups.com
Hi all!

Imagine I had a small (20Mbp) eukaryotic genome (it's a trypanosomatid) and I wished to annotate predicted protein-coding genes (CDSs).

I do not have any RNAseq data, but I do have protein sequences from related species.

What is the easiest way to do this? What is the best way to do this? (given the data I have)

Thanks!

D


--
Darren Obbard
darren...@ed.ac.uk

Institute of Evolutionary Biology
University of Edinburgh
Room 2.09, Ashworth 2, Charlotte Auerbach Road
EdinburghEH9 3FL

Office 0131 651 7781
Mobile: 07968 838 635

http://obbard.bio.ed.ac.uk/
-------------------------------------------------------------------

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

Dominik R. Laetsch

unread,
May 16, 2020, 7:56:47 AM5/16/20
to ashworth-c...@googlegroups.com, Dominik Laetsch
Hi Darren,

On 16 May 2020, at 12:49, OBBARD Darren <darren...@ed.ac.uk> wrote:

Hi all!

Imagine I had a small (20Mbp) eukaryotic genome (it's a trypanosomatid) and I wished to annotate predicted protein-coding genes (CDSs).

I do not have any RNAseq data, but I do have protein sequences from related species.

What is the easiest way to do this? What is the best way to do this? (given the data I have)

I would suggest either:


Or 

https://github.com/Gaius-Augustus/BRAKER (can be used with orthology data)

... but there is no "easy' way, I think.

Cheers, 

Dom


Thanks!

D


--
Darren Obbard
darren...@ed.ac.uk

Institute of Evolutionary Biology
University of Edinburgh
Room 2.09, Ashworth 2, Charlotte Auerbach Road
EdinburghEH9 3FL

Office 0131 651 7781
Mobile: 07968 838 635

http://obbard.bio.ed.ac.uk/
-------------------------------------------------------------------

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

--
The wiki is at:
   https://www.wiki.ed.ac.uk/display/AshCodes/Ashworth+Codemonkeys
The mailing list archive is at:
https://groups.google.com/forum/?fromgroups#!forum/ashworth-code-monkeys
If you have trouble editing the wiki or emailing the group, let me know: sujai...@ed.ac.uk
---
You received this message because you are subscribed to the Google Groups "Ashworth Codemonkeys" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ashworth-code-mo...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/ashworth-code-monkeys/AM6PR05MB50151DB1F93A118FFF1D9EFADFBA0%40AM6PR05MB5015.eurprd05.prod.outlook.com.

Georgios D. Koutsovoulos

unread,
May 16, 2020, 7:59:23 AM5/16/20
to ashworth-c...@googlegroups.com
Hi Darren,

When you say related, what is the expected divergence?

Cheers,
Georgios

BARKER Daniel

unread,
May 16, 2020, 10:56:56 AM5/16/20
to ashworth-c...@googlegroups.com
Hi,

For annotating a specific gene family in genomes of various Drosophila spp. on the basis of protein sequences from D. melanogaster, in the past we used GeneWise (Wise2) plus manual curation.

For very close relatives, there may be a simpler way to map the proteins from one to the genome of the other.

Good luck,

Daniel

Dr Daniel Barker
Institute of Evolutionary Biology
School of Biological Sciences
University of Edinburgh
Charlotte Auerbach Road
The Kings Buildings
Edinburgh
EH9 3FL
United Kingdom


The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.


From: ashworth-c...@googlegroups.com <ashworth-c...@googlegroups.com> on behalf of Georgios D. Koutsovoulos <gdkouts...@gmail.com>
Sent: 16 May 2020 12:59
To: ashworth-c...@googlegroups.com <ashworth-c...@googlegroups.com>
Subject: Re: [ashworth-code-monkeys] Easiest way to annotate protein coding genes in a genome
 

Lewis Stevens

unread,
May 16, 2020, 1:02:27 PM5/16/20
to ashworth-c...@googlegroups.com
Hi Darren, 

I'd typically recommend BRAKER for eukaryotic gene prediction (it works wonderfully on Caenorhabditis genomes) but it relies on RNA-seq and/or homologous proteins to identify exon/intron junctions which it then uses to train AUGUSTUS. Given trypanosomatids typically don't have many introns, my guess is that it wouldn't work quite so well in your case. The lack of introns might be a blessing though - what about a simple ab initio method, like GeneMarkS (http://exon.gatech.edu/GeneMark/genemarks.cgi)? I also came across Companion from Matt Berriman's group at Sanger (https://companion.sanger.ac.uk/https://doi.org/10.1093/nar/gkw292). It's a web service and appears to have a focus on gene prediction in Protozoan parasites. Never used it, though.

Whatever route you go down, I'd recommend comparing BUSCO scores for the genome and the predicted protein sequences to get a quick and easy assessment of how well the prediction went. BUSCO has a dataset for Euglenozoa (specified with -l euglenozoa_odb10 in latest versions of BUSCO).

Thanks, 

Lewis 

OBBARD Darren

unread,
May 17, 2020, 6:04:11 AM5/17/20
to ashworth-c...@googlegroups.com
Hi all!

Thank you for all the suggestions! It turns out that a Tryp was probably the easiest thing I could have asked for! (1) virtually no cis-splicing, and (2) a dedicated online platform for their annotation!

Thanks to Kathryn and lewis who pointed me to https://companion.sanger.ac.uk/ and http://companion.gla.ac.uk/ !

I have an annotation that will do for me (not checked busco yet though!)

Thanks!

D

--

Darren Obbard
darren...@ed.ac.uk

Institute of Evolutionary Biology
University of Edinburgh
Ashworth Laboratories, Charlotte Auerbach Road
Edinburgh EH9 3FL

Office 0131 651 7781
Mobile: 07968 838 635

http://obbard.bio.ed.ac.uk/

> -----Original Message-----
> From: ashworth-c...@googlegroups.com <ashworth-code-
> mon...@googlegroups.com> On Behalf Of Lewis Stevens
> Sent: 16 May 2020 17:59
> To: ashworth-c...@googlegroups.com
> Subject: Re: [ashworth-code-monkeys] Easiest way to annotate protein
> coding genes in a genome
>
> Hi Darren,
>
> To: ashworth-c...@googlegroups.com <mailto:ashworth-
> code-m...@googlegroups.com> <ashworth-code-
> mon...@googlegroups.com <mailto:ashworth-code-
> mon...@googlegroups.com> >
> Subject: Re: [ashworth-code-monkeys] Easiest way to annotate
> protein coding genes in a genome
>
> Hi Darren,
>
> When you say related, what is the expected divergence?
>
> Cheers,
> Georgios
>
> On Sat, 16 May 2020 at 13:49, OBBARD Darren
> <darren...@ed.ac.uk <mailto:darren...@ed.ac.uk> > wrote:
>
>
> Hi all!
>
> Imagine I had a small (20Mbp) eukaryotic genome (it's a
> trypanosomatid) and I wished to annotate predicted protein-coding genes
> (CDSs).
>
> I do not have any RNAseq data, but I do have protein
> sequences from related species.
>
> What is the easiest way to do this? What is the best way to
> do this? (given the data I have)
>
> Thanks!
>
> D
>
>
> --
> Darren Obbard
> darren...@ed.ac.uk <mailto:darren...@ed.ac.uk>
>
> Institute of Evolutionary Biology
> University of Edinburgh
> Room 2.09, Ashworth 2, Charlotte Auerbach Road
> EdinburghEH9 3FL
>
> Office 0131 651 7781
> Mobile: 07968 838 635
>
> http://obbard.bio.ed.ac.uk/
> -------------------------------------------------------------------
>
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
> --
> The wiki is at:
>
> https://www.wiki.ed.ac.uk/display/AshCodes/Ashworth+Codemonkeys
> <https://www.wiki.ed.ac.uk/display/AshCodes/Ashworth+Codemonkeys>
> The mailing list archive is at:
>
> https://groups.google.com/forum/?fromgroups#!forum/ashworth-
> code-monkeys
> If you have trouble editing the wiki or emailing the group, let
> me know: sujai...@ed.ac.uk <mailto:sujai...@ed.ac.uk>
> ---
> You received this message because you are subscribed to the
> Google Groups "Ashworth Codemonkeys" group.
> To unsubscribe from this group and stop receiving emails
> from it, send an email to ashworth-code-
> monkeys+u...@googlegroups.com <mailto:ashworth-code-
> monkeys%2Bunsu...@googlegroups.com> .
> To view this discussion on the web, visit
> https://groups.google.com/d/msgid/ashworth-code-
> monkeys/AM6PR05MB50151DB1F93A118FFF1D9EFADFBA0%40AM6PR05MB
> 5015.eurprd05.prod.outlook.com
> <https://groups.google.com/d/msgid/ashworth-code-
> monkeys/AM6PR05MB50151DB1F93A118FFF1D9EFADFBA0%40AM6PR05MB
> 5015.eurprd05.prod.outlook.com> .
>
>
>
>
> --
> The wiki is at:
> https://www.wiki.ed.ac.uk/display/AshCodes/Ashworth+Codemonk
> eys
> The mailing list archive is at:
> https://groups.google.com/forum/?fromgroups#!forum/ashworth-
> code-monkeys
> If you have trouble editing the wiki or emailing the group, let me
> know: sujai...@ed.ac.uk <mailto:sujai...@ed.ac.uk>
> ---
> You received this message because you are subscribed to the Google
> Groups "Ashworth Codemonkeys" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to ashworth-code-mo...@googlegroups.com.
> To view this discussion on the web, visit
> https://groups.google.com/d/msgid/ashworth-code-
> monkeys/CAJ4nFmiXDGSqh%3D5B-
> RHpbSCVNNFmEu3u%2BZVKjGZj1%2Bp2haUnDQ%40mail.gmail.com
> <https://groups.google.com/d/msgid/ashworth-code-
> monkeys/CAJ4nFmiXDGSqh%3D5B-
> RHpbSCVNNFmEu3u%2BZVKjGZj1%2Bp2haUnDQ%40mail.gmail.com?utm_m
> edium=email&utm_source=footer> .
>
>
>
>
> --
> The wiki is at:
> https://www.wiki.ed.ac.uk/display/AshCodes/Ashworth+Codemonk
> eys
> The mailing list archive is at:
> https://groups.google.com/forum/?fromgroups#!forum/ashworth-
> code-monkeys
> If you have trouble editing the wiki or emailing the group, let me
> know: sujai...@ed.ac.uk <mailto:sujai...@ed.ac.uk>
> ---
> You received this message because you are subscribed to the Google
> Groups "Ashworth Codemonkeys" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to ashworth-code-mo...@googlegroups.com.
> To view this discussion on the web, visit
> https://groups.google.com/d/msgid/ashworth-code-
> monkeys/AM4PR0501MB2834BC2AF10EF98CAEF05038CFBA0%40AM4PR0501
> MB2834.eurprd05.prod.outlook.com
> <https://groups.google.com/d/msgid/ashworth-code-
> monkeys/AM4PR0501MB2834BC2AF10EF98CAEF05038CFBA0%40AM4PR0501
> MB2834.eurprd05.prod.outlook.com?utm_medium=email&utm_source=foo
> ter> .
>
>
> --
> The wiki is at:
> https://www.wiki.ed.ac.uk/display/AshCodes/Ashworth+Codemonkeys
> The mailing list archive is at:
> https://groups.google.com/forum/?fromgroups#!forum/ashworth-code-
> monkeys
> If you have trouble editing the wiki or emailing the group, let me know:
> sujai...@ed.ac.uk
> ---
> You received this message because you are subscribed to the Google Groups
> "Ashworth Codemonkeys" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to ashworth-code-mo...@googlegroups.com.
> To view this discussion on the web, visit
> https://groups.google.com/d/msgid/ashworth-code-
> monkeys/CAE8nZxrtOMe%3DSO9e7j4g4_SgXj1XWkZHiK5er-
> bbWkC7rDF2LQ%40mail.gmail.com
> <https://groups.google.com/d/msgid/ashworth-code-
> monkeys/CAE8nZxrtOMe%3DSO9e7j4g4_SgXj1XWkZHiK5er-
> bbWkC7rDF2LQ%40mail.gmail.com?utm_medium=email&utm_source=foote
> r> .

Reply all
Reply to author
Forward
0 new messages