Hi Shruti,
Thank you for your question about getting the sequences of items in a
RepeatMasker track. The issue is likely that the download of these
repeat sequences is timing out before it can complete, leaving you with
an incomplete file. The RepeatMasker track is quite large at over 5
million items, so it is inefficient to try downloading the sequences for
these repeat items using the Table Browser.
I recommend getting a BED file of the positions of items in the
RepeatMasker track and then using our command line tool "twoBitToFa" to
get the sequence for these items. First, get a BED file of the repeat
positions from the RepeatMasker track, you can do this using the
following steps (Note, I've used hg19 in this example, but you can
substitute in your assembly of interest):
1. Navigate to the Table Browser,
http://genome.ucsc.edu/cgi-bin/hgTables.
2. Make the following selections:
clade: Mammal
genome: Human
assembly: Feb. 2009 (GRCh37/hg19)
group: Repeats
track: RepeatMasker
table: rmsk
region: genome
output: BED - browser extensible data
output file: myRepeatPositions.bed
4. Click "get output".
5. Under "Create one BED record per", check "Whole Gene".
6. Click "get BED".
Next, download the 2bit file for your assembly under the section for
your assembly of interest:
http://hgdownload.soe.ucsc.edu/downloads.html. You can find 2bit files
under the "Full data set" link for a particular assembly. Then, download
the "twoBitToFa" file for your system here:
http://hgdownload.soe.ucsc.edu/admin/exe/. Lastly, you can run a command
like (again, using hg19 as an example):
twoBitToFa -bed=myRepeatPositions.bed hg19.2bit myRepeatSeqs.fa
This will output the sequences for all of the items in your
"myRepeatPositions.bed" file.
I hope this is helpful. If you have any further questions, please reply
to
gen...@soe.ucsc.edu. All messages sent to that address are archived
on a publicly-accessible Google Groups forum. If your question includes
sensitive data, you may send it instead to
genom...@soe.ucsc.edu.
Matthew Speir
UCSC Genome Bioinformatics Group
> --
>