Genomic SuperDups Textfile

141 views
Skip to first unread message

Christopher Kendall

unread,
Dec 27, 2021, 3:30:46 PM12/27/21
to genome...@soe.ucsc.edu
Hello,

I am having difficulties finding ways to use the GenomicSuperDups text file and/or Genome browser tracts properly with my files.  I am using the 1000 Genomes hg19 build and wanted to mask these highly replicated regions in my VCF file.  The format is not like anything I have come across before.  It almost looks like a chain file but still slightly different.  Is there a README or walkthrough tutorial that explains how you would be able to use this SuperDups file and feed it as a .bed file or .mask file through a VCF to remove these variants?  Conversely, if that is not possible, is there a way to make this file into a readable format by either VCFTools, BCFTools, or PLINK to remove these variants?

Thank you very much for your help!

Sincerely,

Chris Kendall

Gerardo Perez

unread,
Dec 29, 2021, 5:08:51 PM12/29/21
to Christopher Kendall, genome...@soe.ucsc.edu

Hello, Chris.

Thank you for your interest in the Genome Browser and for your question about the genomicSuperDups text file.

You can convert genomicSuperDups.txt to BED format by stripping the extra fields of the file.

The following command converts genomicSuperDups.txt.gz to BED format:

wget -p -O - https://hgdownload.soe.ucsc.edu/goldenPath/hg19/database/genomicSuperDups.txt.gz | gunzip -c | cut -f 2-7 > genomicSuperDups.bed

Another option to get the genomicSuperDups.txt in BED format would be the Table Browser. Navigate to the Table Browser (http://genome.ucsc.edu/cgi-bin/hgTables) and make the following selections:
1. Under Select dataset:

clade: Mammal
genome: Human
assembly: Feb. 2009 (GRCh37/hg19)
group: Repeats
track: Segmental Dups
table: genomicSuperDups

2. Set the region to “genome”
3. Set the output format to “BED - browser extensible data”
4. Insert a name next to “output filename:”, such as genomicSuperDups.bed
5. Click get output
6. Then on the “Output genomicSuperDups as BED” page, click get Bed. The output will then give you a file of the genomicSuperDups.txt in BED format.

If you would like to learn more about the genomicSuperDups.txt format and fields, please see the following autoSql file:
https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/lib/genomicSuperDups.as

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Gerardo Perez
UCSC Genomics Institute


--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Mirror-Specific Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome-mirro...@soe.ucsc.edu.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome-mirror/YT1PR01MB2955A424AFEF64966F30063CDF7E9%40YT1PR01MB2955.CANPRD01.PROD.OUTLOOK.COM.
Reply all
Reply to author
Forward
0 new messages