Hello Xue,
Thank you for using the UCSC Genome Browser and your inquiry.
One of our engineers shares that the error you are experiencing looks like an overflow condition in a counter, most likely due to the immense input file. We suspect you may have all the lastz output in a single 1 Tb file instead of packaging such an alignment into a parts list. Although there are many scaffolds between these two assemblies, we have created alignments with similar sequences. For an example of such an alignment, look at the tarSyr2 vs. tupBel1 alignment.
You can download the scripts used to generate this alignment using the following link: http://hgwdev.cse.ucsc.edu/~jairo/MLQ/21158/tarSyr2VsTupBel1.tar.gz
Note the DEF file for the lastz run:
What this means is, package up to 1,500 sequences into one chunk for tarSyr2 as long as the sum total sequence is less than 20,000,000 bases. And up to 2,000 sequences into one chunk for tupBel1 as long as the sum total sequence is less than 10,000,000.
All of the partitioning is built into the scripts we use here. These scripts created 367 individual files for tarSyr2, and 393 files for tupBel1 for a total number of 144,231 cluster jobs:
144,231 = 367 * 393
The chaining worked on the 367 resulting alignments to the target, tarSyr2, and none of the files were larger than 120 Mb in size. We also don't use the raw axt files from lastz for chaining, instead, we turn the lastz results into PSL files and run axtChain on the psl files. This probably makes a big difference in the files sizes when comparing the axt and psl files.
There could also be a problem of masking, not enough in each genome. If the repeats are not masked, you will produce too much original lastz alignment output, and you should mask with both assemblies with repeat masker and window masker to eliminate the extra repeats.
You may also find the following previously answered questions helpful:I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu.
All messages sent to that address are archived on a publicly-accessible Google Groups forum.
If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.
Jairo Navarro
UCSC Genomics Institute
Want to share the Browser with colleagues?
Host a workshop: http://bit.ly/ucscTraining
--
---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To post to this group, send email to gen...@soe.ucsc.edu.
Visit this group at https://groups.google.com/a/soe.ucsc.edu/group/genome/.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/307d094a.2c679f.16250cbb1ae.Coremail.bettycatherine%40126.com.
For more options, visit https://groups.google.com/a/soe.ucsc.edu/d/optout.