Repeat annotation

33 views
Skip to first unread message

Javier Herrero

unread,
Jan 18, 2012, 6:52:25 AM1/18/12
to align...@googlegroups.com
Dear all

I have a quick question about the repeat annotation. There are two
repeat files per sequence (I am looking at the mammals set), one called
repmask and the other called repeats. It seems the repeats files contain
all the annotation in the repmask file plus some more.

Could someone tell me what is the difference between both? Looking at
the repmaskNotes.txt file in the sequence directory suggests trf has
also been run on the sequence. Are tandem repeats the only difference
between both sets?

Thanks

Javier

--
Javier Herrero, PhD
Ensembl Coordinator and Ensembl Compara Project Leader
European Bioinformatics Institute (EMBL-EBI)
Wellcome Trust Genome Campus, Hinxton
Cambridge - CB10 1SD - UK

Dent Earl

unread,
Jan 18, 2012, 11:12:02 AM1/18/12
to align...@googlegroups.com
Hi Javier,

*.repMask.bed were generated using Repeat Masker. *.repeats.bed were generated by using bedTools to merge the *repMask.bed with a trf generated .bed.

I'll add more documentation to this effect on the website, sorry it wasn't clearer.

d

Javier Herrero

unread,
Jan 18, 2012, 12:05:59 PM1/18/12
to align...@googlegroups.com
Hi Dent

Thanks for the info. Could you also add more info on the other files,
please?

I am especially interested in understanding the structure of the
'coding' parts of the genomes. Is there always one and only one
transcript per gene? We need that information to be able to translate
the exons.

Thanks

Javier

Dent Earl

unread,
Jan 18, 2012, 12:34:51 PM1/18/12
to align...@googlegroups.com
Hey Javier,

Good question, I had to check back with the Evolver manual (http://www.drive5.com/evolver/EvolverUserGuide.pdf ) on this one.
"""
Genes may not overlap, and in particular there is no provision for specifying alternative splicing structures.
""" -page 17
So it appears there is always one, and only one, transcript per gene.

d

Javier Herrero

unread,
Jan 18, 2012, 1:02:56 PM1/18/12
to align...@googlegroups.com
OK, thanks.

Last question of the day, what are the tiny little genes features that
are only a few bp long?

And last request. I see in the manual that evolver output GFF files.
Would it be possible to get them? It seems easier to have the annotation
in one file.

Thanks

Javier

Dent Earl

unread,
Jan 18, 2012, 1:13:41 PM1/18/12
to align...@googlegroups.com
Those are probably the NGEs, non-geneic conserved elements. Unless you're talking about NXE which are non-exonic conserved elements.

I'll try get the gff files packaged and up in the next day.

d

Dent Earl

unread,
Jan 19, 2012, 7:09:58 PM1/19/12
to Alignathon
Hey Javier,

I added the gff annotations that evolver produces to each of the
packages' page (http://compbio.soe.ucsc.edu/alignathon/ ). Note that
these gffs omit the RepeatMasker information. Best,

d

On Jan 18, 10:02 am, Javier Herrero <jherr...@ebi.ac.uk> wrote:
> OK, thanks.
>
> Last question of the day, what are the tiny little genes features that
> are only a few bp long?
>
> And last request. I see in the manual that evolver output GFF files.
> Would it be possible to get them? It seems easier to have the annotation
> in one file.
>
> Thanks
>
> Javier
>
> On 18/01/12 17:34, Dent Earl wrote:
>
>
>
>
>
>
>
>
>
> > Hey Javier,
>
> > Good question, I had to check back with the Evolver manual (http://www.drive5.com/evolver/EvolverUserGuide.pdf) on this one.
Reply all
Reply to author
Forward
0 new messages