CAT-liftoff table from T2T table browser

Peter Shepard

unread,

May 31, 2022, 1:20:33 PM5/31/22

to gen...@soe.ucsc.edu

Hi UCSC Folks,

I am trying to download the table for "CAT-liftoff gene annotation table". According to the "describe table schema for this track in the Table Browser", the first 3 columns are "chrom" "chromStart" and "chromEnd". However, when I download this rack, the first column is not a chromosome name but a Genbank id e.g. CP068255.2. Can you please let me know what I am doing wrong and how I can retrieve the chromosome name for this track?

Thank you

Daniel Schmelter

unread,

May 31, 2022, 7:39:52 PM5/31/22

to Peter Shepard, UCSC Genome Browser Support

Hello Peter,

Thank you for contacting Genome Browser support about downloading data with the T2T chromosome names instead of GenBank IDs.

At this time, these GenBank IDs are the designated names according to the GenBank release and there is no simple way to download them with UCSC-style names. Fortunately, custom tracks and track hubs all allow a variety of chromosome names to be used. You can see the full list of chromosome name alternatives in the following file:

https://hgdownload.soe.ucsc.edu/hubs/GCA/009/914/755/GCA_009914755.4/GCA_009914755.4.chromAlias.txt

For what purpose are you using these Table Browser output files? Your inquiry about chromosome names is understandable and reflects something we've been working on and hope to improve on our site. We hope to have the UCSC-style names listed in the next few weeks instead of the GenBank IDs. The T2T Genome Browser data is slightly different than hg19 or hg38 because it is made on an assembly hub and based directly on the GenBank genome with fewer UCSC UI modifications.

There is one quick workaround that can swap your data file for the UCSC-style names. You will need to use the following command-line statement or equivalent, which runs a series of find-and-replace swaps.

sed -f sedContigNameReplaceT2T.txt T2Tgenes

Sed is a standard command found on all computers and the -f indicates to use the commands in the file. The "T2Tgenes" is an example filename. The file "sedContigNameReplaceT2T.txt" is one that I have made special for this purpose and can be downloaded below:

https://hgwdev.gi.ucsc.edu/~dschmelt/sedContigNameReplaceT2T.txt

I hope this was helpful. If you have any more questions, please reply-all to our public support email at gen...@soe.ucsc.edu. For private communication, please reply-all to genom...@soe.ucsc.edu.

All the best,

Daniel Schmelter

UCSC Genome Browser; UCSC Genomics Institute
Twitter | Facebook | YouTube

Daniel Schmelter

unread,

Jun 1, 2022, 12:58:43 PM6/1/22

to Peter Shepard, UCSC Genome Browser Support

Hello Peter,

It's come to my attention that pre-made files exist for the genes data with UCSC style names. These may suit your needs and be easier to use.

The CAT/Liftoff genes with UCSC style chromosome names in bigGenePred format are here:

  https://hgdownload.soe.ucsc.edu/hubs/GCA/009/914/755/GCA_009914755.4/bbi/GCA_009914755.4_T2T-CHM13v2.0.catLiftOffGenesV1/catLiftOffGenesV1.bb

each row will have all the metadata. Here are those same data in GTF and GFF3 formats:

  https://hgdownload.soe.ucsc.edu/hubs/GCA/009/914/755/GCA_009914755.4/genes/catLiftOffGenesV1.gff3.gz
  https://hgdownload.soe.ucsc.edu/hubs/GCA/009/914/755/GCA_009914755.4/genes/catLiftOffGenesV1.gtf.gz