I have a quick question about the repeat annotation. There are two
repeat files per sequence (I am looking at the mammals set), one called
repmask and the other called repeats. It seems the repeats files contain
all the annotation in the repmask file plus some more.
Could someone tell me what is the difference between both? Looking at
the repmaskNotes.txt file in the sequence directory suggests trf has
also been run on the sequence. Are tandem repeats the only difference
between both sets?
Thanks
Javier
--
Javier Herrero, PhD
Ensembl Coordinator and Ensembl Compara Project Leader
European Bioinformatics Institute (EMBL-EBI)
Wellcome Trust Genome Campus, Hinxton
Cambridge - CB10 1SD - UK
*.repMask.bed were generated using Repeat Masker. *.repeats.bed were generated by using bedTools to merge the *repMask.bed with a trf generated .bed.
I'll add more documentation to this effect on the website, sorry it wasn't clearer.
d
Thanks for the info. Could you also add more info on the other files,
please?
I am especially interested in understanding the structure of the
'coding' parts of the genomes. Is there always one and only one
transcript per gene? We need that information to be able to translate
the exons.
Thanks
Javier
Good question, I had to check back with the Evolver manual (http://www.drive5.com/evolver/EvolverUserGuide.pdf ) on this one.
"""
Genes may not overlap, and in particular there is no provision for specifying alternative splicing structures.
""" -page 17
So it appears there is always one, and only one, transcript per gene.
d
Last question of the day, what are the tiny little genes features that
are only a few bp long?
And last request. I see in the manual that evolver output GFF files.
Would it be possible to get them? It seems easier to have the annotation
in one file.
Thanks
Javier
I'll try get the gff files packaged and up in the next day.
d