Hi Paola,
Thank you for your questions about using the UCSC Genome Browser to
find Alu repeats.
If you are new to using the UCSC Genome Browser, I would highly
recommend that you take advantage of the training material that we
provide:
http://genome-euro.ucsc.edu/training/index.html. In
particular, I would start with the OpenHelix videos:
http://www.openhelix.com/ucsc.
For those genomes and assemblies that we host, you can obtain Alu
repeat positions using the Table Browser,
http://genome-euro.ucsc.edu/cgi-bin/hgTables. You can see all of the
organisms that we host in the species tree on our "Gateway" page:
http://genome-euro.ucsc.edu/cgi-bin/hgGateway. Using the Table
Browser, you can filter the "RepeatMasker" table and extract only
the Alu repeats. To get both positions and sequence for these
repeats will require two different Table Browser queries. You can
obtain the Alu repeat positions using the following steps:
1. Navigate to the Table Browser,
http://genome-euro.ucsc.edu/cgi-bin/hgTables.
2. Select your genome and assembly. In this example, I will be using
the hg38 assembly of the human genome:
clade: Mammal
genome: Human
assembly: Dec. 2013 (GRCh38/hg38)
3. Make the following table selections:
group: Genes and Gene Predictions Tracks
track: RepeatMasker
table: rmsk
output: BED - browser extensible data
output file: enter a file name to save your results to a file,
or leave blank to display results in your browser
4. Next to "filter", click "create".
5. Enter "Alu" in the "repFamily" fields of the "Filter on Fields
from hg38.rmsk" section.
The "repFamily" line should read: repFamily does match Alu
6. Click "Submit".
7. Click "get output".
You should be able to sort the Alu repeats into the different
families (AluS, AluY, etc.) using the name in the fourth column.
You can use the same steps described above to obtain the sequence,
the only difference will be that in step 3, you will need to select
"sequence" as your output type, instead of BED.
If, however, you want repeat positions and sequence for a genome
that we don't host, you will need to obtain the RepeatMasker
software,
http://www.repeatmasker.org/, and run it for the genome
you're interested in. Questions about using the RepeatMasker utility
should be directed to the RepeatMasker group here:
http://www.repeatmasker.org/cgi-bin/form2mail?template=feedback.tmpl&title=Feedback%20Form.
I hope this is helpful. If you have any further questions, please
reply to
gen...@soe.ucsc.edu. All messages sent to that address are
archived on a publicly-accessible Google Groups forum. If your
question includes sensitive data, you may send it instead to
genom...@soe.ucsc.edu.
Matthew Speir
UCSC Genome Bioinformatics Group