chainStitchId failure with alignments of large genomes

50 views
Skip to first unread message

Oh, Dong Ha (NIH/NLM/NCBI) [C]

unread,
Jul 18, 2024, 3:12:45 PM (7 days ago) Jul 18
to gen...@soe.ucsc.edu

Dear UCSC genome browser team

Hi, I would like to report an error I encountered while creating a “liftOver” chain from an alignment of large genomes.

 

I generated a chain file aligning the corroboree frog and Xenopus tropicalis genomes, using the "make_lastz_chains" pipeline (the last "cleanChain" step skipped).  The corroboree frog genome is large and includes two chromosomes larger than 1Gb: https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_028390025.1/

 

Then, I tried to create a "liftOver" chain file, following "netChains.csh" mentioned in https://genomewiki.ucsc.edu/index.php/DoBlastzChainNet.pl

 

All steps ran quickly except for the last step "chainStitchId" which failed with an error.  

 

Here is the code I used:

f1=GCF_028390025.1.GCF_000004195.4.test1.chain.gz

t=GCF_028390025.1.chrom.sizes

q=GCF_000004195.4.chrom.sizes

 

chainPreNet $f1 ${t} ${q} test1.preNet

chainNet test1.preNet -minSpace=1 ${t} ${q} test1.net /dev/null

netSyntenic test1.net test1.noClass.net

netChainSubset -verbose=0 test1.noClass.net $f1 test1.subNet

chainStitchId test1.subNet test1.subNet.stitched

 

Here is the chainStitchId error message:

t end mismatch 1073742015 vs 1250200427 line 715788 of test1.subNet

 

Attached is the compressed "test1.subNet" file (~18.1MB), in case you want to reproduce the error and investigate.

 

I saw the same errors (with different numbers for coordinates) at the chainStitchId step for other alignments that include a genome with a chromosome >1Gb (e.g., locusts).  So I wonder if this is something specific to alignments of genomes with very large chromosomes. 

 

I would appreciate any help or advice.  Please let me know if you have any questions.  Thanks!

 

Cheers,

Dong-Ha

 

___
Dong-Ha Oh PhD (he/him)

NCBI contractor
Personal web: https://ohdongha.github.io/

 

test1.subNet.gz

Hiram Clawson

unread,
Jul 22, 2024, 3:49:59 PM (3 days ago) Jul 22
to Oh, Dong Ha (NIH/NLM/NCBI) [C], gen...@soe.ucsc.edu
Good Afternoon Dong-Ha:

I have reproduced this error here and am taking a look at it.

--Hiram

On 7/18/24 12:09 PM, 'Oh, Dong Ha (NIH/NLM/NCBI) [C]' via UCSC Genome Browser

Hiram Clawson

unread,
Jul 24, 2024, 10:33:31 AM (yesterday) Jul 24
to Oh, Dong Ha (NIH/NLM/NCBI) [C], gen...@soe.ucsc.edu
The chain file that chainStitchId is working on is broken. Steps before
that have made an error. It isn't going to help by skipping that step.

On 7/24/24 5:08 AM, Oh, Dong Ha (NIH/NLM/NCBI) [C] wrote:
> Thanks, a lot, Hiram. Please let me know if you need anything else, e.g., the starting material (GCF_028390025.1.GCF_000004195.4.test1.chain.gz, ~186MB) to check if previous steps may have an issue that is only detectable at the last step.
>
> I also wonder if we can skip the chainStitchId step. Will it create an over.chain file too much fragmented or otherwise unusable?
>
> Cheers,
> Dong-Ha
>> https://ohdo/
>> ngha.github.io%2F&data=05%7C02%7Cdongha.oh%40nih.gov%7C2c9625c4152240f
>> b875708dcaa877944%7C14b77578977342d58507251ca2dc2b06%7C0%7C0%7C6385727
>> 46034095864%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMz
>> IiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=MHLL8sl5RNnTymomd7
>> UhgcbW1fOgu8u%2BdVG9DkhwgIY%3D&reserved=0
>>
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and are confident the content is safe.
>

Oh, Dong Ha (NIH/NLM/NCBI) [C]

unread,
Jul 24, 2024, 12:08:21 PM (yesterday) Jul 24
to Hiram Clawson, gen...@soe.ucsc.edu
Thanks, a lot, Hiram. Please let me know if you need anything else, e.g., the starting material (GCF_028390025.1.GCF_000004195.4.test1.chain.gz, ~186MB) to check if previous steps may have an issue that is only detectable at the last step.

I also wonder if we can skip the chainStitchId step. Will it create an over.chain file too much fragmented or otherwise unusable?

Cheers,
Dong-Ha

-----Original Message-----
From: Hiram Clawson <hi...@soe.ucsc.edu>
Sent: Monday, July 22, 2024 3:50 PM
To: Oh, Dong Ha (NIH/NLM/NCBI) [C] <dong...@nih.gov>; gen...@soe.ucsc.edu
Subject: [EXTERNAL] Re: [genome] chainStitchId failure with alignments of large genomes

Reply all
Reply to author
Forward
0 new messages