Dear UCSC genome browser team
Hi, I would like to report an error I encountered while creating a “liftOver” chain from an alignment of large genomes.
I generated a chain file aligning the corroboree frog and Xenopus tropicalis genomes, using the "make_lastz_chains" pipeline (the last "cleanChain" step skipped). The corroboree frog genome is large and includes two chromosomes larger than 1Gb: https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_028390025.1/
Then, I tried to create a "liftOver" chain file, following "netChains.csh" mentioned in https://genomewiki.ucsc.edu/index.php/DoBlastzChainNet.pl.
All steps ran quickly except for the last step "chainStitchId" which failed with an error.
Here is the code I used:
f1=GCF_028390025.1.GCF_000004195.4.test1.chain.gz
t=GCF_028390025.1.chrom.sizes
q=GCF_000004195.4.chrom.sizes
chainPreNet $f1 ${t} ${q} test1.preNet
chainNet test1.preNet -minSpace=1 ${t} ${q} test1.net /dev/null
netSyntenic test1.net test1.noClass.net
netChainSubset -verbose=0 test1.noClass.net $f1 test1.subNet
chainStitchId test1.subNet test1.subNet.stitched
Here is the chainStitchId error message:
t end mismatch 1073742015 vs 1250200427 line 715788 of test1.subNet
Attached is the compressed "test1.subNet" file (~18.1MB), in case you want to reproduce the error and investigate.
I saw the same errors (with different numbers for coordinates) at the chainStitchId step for other alignments that include a genome with a chromosome >1Gb (e.g., locusts). So I wonder if this is something specific to alignments of genomes with very large chromosomes.
I would appreciate any help or advice. Please let me know if you have any questions. Thanks!
Cheers,
Dong-Ha
___
Dong-Ha Oh PhD (he/him)
NCBI contractor
Personal web: https://ohdongha.github.io/