Seeking chain file between two NCBI assemblies

48 views
Skip to first unread message

Jared Grummer

unread,
Sep 27, 2021, 1:45:04 PM9/27/21
to gen...@soe.ucsc.edu
Hello,

I found this email address through a tunnel of other webpages and help forums. I’m trying to perform a liftover between two versions of the rainbow trout (Oncorhynchus mykiss) genome: 


and


Is it possible that this could be done, and if so, on what sort of timeline? I found this website as well, but I need the whole genome, not just annotations:


Thanks!
Jared 

Dan Schmelter

unread,
Sep 27, 2021, 7:28:07 PM9/27/21
to Jared Grummer, UCSC Genome Browser Support
Hello Jared, 

Thanks for emailing Genome Browser support and your question about getting a LiftOver chain file. We have started this alignment and it should be done within a day or two. We will send you the resulting files once they're ready.

For further comments or questions, please reply-all to our team at gen...@soe.ucsc.edu. All messages sent to that address are publicly archived. If your question includes sensitive data, please reply-all to genom...@soe.ucsc.edu.

All the best,

Daniel Schmelter
UCSC Genome Browser

--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/95B7E07E-549A-402F-AA25-D8384C862FAF%40zoology.ubc.ca.

Jared Grummer

unread,
Sep 28, 2021, 11:07:58 AM9/28/21
to Dan Schmelter, UCSC Genome Browser Support
Hi Dan,

Great, thanks so much! This is hopefully going to make my life a lot easier… Sorry if this is obviously written somewhere on the website, but is the chain file that is being generated done with one or both of the genomes repeat-masked?

Thanks!
Jared

On Sep 27, 2021, at 4:27 PM, Dan Schmelter <dsch...@ucsc.edu> wrote:

[CAUTION: Non-UBC Email]

Jared Grummer

unread,
Oct 1, 2021, 12:03:09 PM10/1/21
to Dan Schmelter, UCSC Genome Browser Support
Hello,

Just checking to see if these files are ready?

Thanks!
Jared

On Sep 27, 2021, at 5:28 PM, Dan Schmelter <dsch...@ucsc.edu> wrote:


[CAUTION: Non-UBC Email]

Dan Schmelter

unread,
Oct 1, 2021, 3:46:10 PM10/1/21
to Jared Grummer, UCSC Genome Browser Support

Hello Jared,

Thanks for checking in about the chain alignment files. They are now available at the following download links:

https://hgdownload.soe.ucsc.edu/hubs/GCF/002/163/495/GCF_002163495.1/liftOver/GCF_002163495.1ToGCF_013265735.2.over.chain.gz
https://hgdownload.soe.ucsc.edu/hubs/GCF/013/265/735/GCF_013265735.2/liftOver/GCF_013265735.2ToGCF_002163495.1.over.chain.gz

These Chain/Net data can also be seen visualized in the following Genome Browsers under the Comparative Genomics section:

https://genome.ucsc.edu/h/GCF_002163495.1
https://genome.ucsc.edu/h/GCF_013265735.2

These genomes are both repeat masked as a standard requirement of the alignment with the lastz/chain/net alignment. The statistics on the percentages of the genome that is Repeat Masked can be seen in the GenArk assembly track data page under the rainbow trout entries, column 9: %5.52 and %7.32.

https://hgdownload.soe.ucsc.edu/hubs/fish/trackData.html

The repeat masker .out file can be obtained from the files:

https://hgdownload.soe.ucsc.edu/hubs/GCF/002/163/495/GCF_002163495.1/GCF_002163495.1.repeatMasker.out.gz
https://hgdownload.soe.ucsc.edu/hubs/GCF/013/265/735/GCF_013265735.2/GCF_013265735.2.repeatMasker.out.gz

I hope this was helpful! If you have any more questions, please reply-all to gen...@soe.ucsc.edu. All messages sent to that address are publicly archived. If your question includes sensitive data, please reply-all to genom...@soe.ucsc.edu.

All the best,

Daniel Schmelter
UCSC Genome Browser

Jared Grummer

unread,
Oct 4, 2021, 1:56:45 PM10/4/21
to Dan Schmelter, UCSC Genome Browser Support
Hello Daniel,

Many thanks for the files! I’m trying to use the command line liftOver program, but I’m not getting any output. There is not much information online about how to use the program, this is about the extent of information I can find:


I understand that there are more options at the command line, but I can’t get anything besides error messages in the unmapped output file. Something either about “Skipped in new” or “Deleted in new”. Even when I put in hundreds of bases in the input .bed file. Here’s the information I’m using:

liftOver oldFile map.chain newFile unMapped

where:
oldFile is the file you want to convert from
map.chain is the chain file used to convert from one build to another
newFile is the converted file you want to create
unMapped is a file that contains all the unmapped positions

So, I’m trying to map the coordinates of the old genome (GCF_002163495.1) onto the new genome (GCF_013265735.2). I believe oldFile is a .bed file that lists the positions of the fragments I want to convert? Since I want to do the whole genome, my bed file has 29 rows corresponding to the 29 chromosomes from the old reference genome. Something like this, where the third column represents the entire length of that chromosome:

chr1 1 84884017
chr2 1 85480851
chr3 1 84937469
chr4 1 85056421
chr5 1 92202553
.
.
.

Then the new file should also be a .bed file (as per the input format) that lists the corresponding sequence to the new reference? And I think the map.chain file should be GCF_002163495.1ToGCF_013265735.2.over.chain. So, I’m just trying to confirm which files to use, and if I’m doing the input.bed correctly. Sorry to be sending these questions. I’ve looked a lot on the web and haven’t found much help. And the help I’ve found is generally many years old!

Thanks for your help,
Jared

Dan Schmelter

unread,
Oct 7, 2021, 9:28:09 PM10/7/21
to Jared Grummer, UCSC Genome Browser Support

Hello Jared,

LiftOver might not be the proper tool for your investigation. You are generally using the correct commands, but probably an invalid input oldFile of BED regions. LiftOver is not made to perform alignments of entire chromosomes. It exists to convert regions within chromosomes, such as genes and variants, between two genome assemblies. It is an application of already made chain file alignments.

If you want to compare assemblies using data, you could examine the chain alignment files directly rather than using LiftOver. It might help to read the chain track description and the chain file format pages:

https://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hub_2658753_GCF_013265735.2&g=hub_2658753_chainNetGCF_002163495_1
https://genome.ucsc.edu/goldenPath/help/chain.html

If you want to visually compare chromosome level differences between assemblies, you can use the Genome Browser to see where the different assembly sequences map by going to the "Rainbow Trout chain/net track". There are multiple subtracks with slightly different alignment methods. Here is an example session where you can see that regions on the 2020 Arlee assembly align to regions on three chromosomes on the 2017 Swanson assembly, indicated by label and color.

https://genome.ucsc.edu/s/dschmelt/GCF_013265735.2.ChainNet

I hope this was helpful. If you'd like more assistance, could you share more about what type of analysis you are trying to do? 

All the best,

Daniel Schmelter
UCSC Genome Browser

Jared Grummer

unread,
Oct 13, 2021, 2:00:22 PM10/13/21
to Dan Schmelter, UCSC Genome Browser Support
Hi Daniel,

Thanks for the information. That’s unfortunate that liftover might not be the best tool. Even after reading the help/description page, it’s still a little unclear to me how to read the chain figures and interpret the output. I am literally just after how the two genomes align. I just need a table that says something like this:

Old (Arlee) Genome Chromosome Old.Position New (Swanson) Genome Chromosome New.Position
2 5000 2 8010
.
.
.

The new reference has three new chromosomes (3 of the old chromosomes were broken apart at the centromere), and there are big duplicated blocks of chromosome that map to other chromosomes within each genome. We know the positions of those blocks and which chromosomes they correspond to in the old genome, but I’d like to know where those blocks are in the new reference. I tried running BLAST to find corresponding regions, but you end up with multiple hits for the same genomic chunk that differ based on e-values. I just want the best alignment of large chromosomal chunks between the two genomes.

Does that make sense?

Thanks!
Jared

Gerardo Perez

unread,
Oct 27, 2021, 3:20:45 PM10/27/21
to Jared Grummer, UCSC Genome Browser Support

Hello, Jared.

I apologize for the delay in our response.

An engineer of ours made a session link that consists of the Arlee assembly (GCF_013265735.2) with a custom track that shows the corresponding region on the Swanson assembly (GCF_013265735.2):
https://genome.ucsc.edu/s/Braney/ArleeToSwanson

In addition, our engineer made a table with mapping in the format requested:
https://hgwdev.gi.ucsc.edu/~braney/ArleeToSwanson.txt

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Gerardo Perez
UCSC Genomics Institute


Jared Grummer

unread,
Oct 29, 2021, 12:51:36 PM10/29/21
to Gerardo Perez, UCSC Genome Browser Support
Hi Gerardo,

Thanks for the files! I’ll see how I can make use of them.

Jared

On Oct 27, 2021, at 12:20 PM, Gerardo Perez <gpe...@ucsc.edu> wrote:

[CAUTION: Non-UBC Email]
Reply all
Reply to author
Forward
0 new messages