Downloading repeat-masked genomic sequence

14 views
Skip to first unread message

Rachael Thomas

unread,
Sep 8, 2016, 3:44:58 PM9/8/16
to gen...@soe.ucsc.edu
Hi

I would like to download the genomic DNA sequence data for a series of subchromosomal intervals, with the repeat elements masked as Ns. I can do this easily for each individual interval using the 'GetDNA' option in the UCSC browser, by entering the genomic position and selecting 'Mask Repeats'.

I tried using the Table Browser to do a batch-download of all my regions at one time, but I don't see how I can have the downloaded sequences masked with Ns in the same way as is possible using the method I outlined above.

Please could you point me in the right direction on how I can achieve this?

Many thanks

Rachael

Luvina Guruvadoo

unread,
Sep 14, 2016, 12:05:12 PM9/14/16
to Rachael Thomas, gen...@soe.ucsc.edu
Hello Rachael,

Thank you for your email. There are a couple of ways to do this: via the command line, or using the Table Browser. Using the command line:

twoBitToFa -seqList=intervalList.txt \
  http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.2bit stdout \
    | gzip -c > intervalsMasked.fa.gz
To mask repeats using the Table Browser, you must create a custom track of your regions, then upload your file on the Table Browser. Select "sequence" as your output format, click "get output". On the following page, select "genomic" and click "submit". Finally, select the "Mask repeats" option under Sequence Formatting Options and click "get sequence".

If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Regards,
Luvina

--
Luvina Guruvadoo
UCSC Genome Browser

http://genome.ucsc.edu





Rachael

--



Reply all
Reply to author
Forward
0 new messages