Hi,
I am trying to transfer annotations from an earlier version of an assembly to an updated assembly (with a mix of large and small changes).
The general discussion can be seen here:
https://www.biostars.org/p/472543/
The specifics of how I am creating a .chain file for liftOver (and then CrossMap) can be seen here:
https://www.biostars.org/p/391080/#465890
Here is the code from that posting:
cd $ID1
faToTwoBit $ID1.fa $ID1.2bit
twoBitInfo $ID1.2bit chrom.sizes
cd ..
cd $ID2
faToTwoBit $ID2.fa $ID2.2bit
twoBitInfo $ID2.2bit chrom.sizes
cd ..
# create .chain file
blat $ID1/$ID1.2bit $ID2/$ID2.fa $ID1\to$ID2.psl -tileSize=12 -minScore=100 -minIdentity=98
axtChain -linearGap=medium -psl $ID1\to$ID2.psl $ID1/$ID1.2bit $ID2/$ID2.2bit $ID1\to$ID2.chain
Unlike Exonerate (and the other methods described in the general discussion), I don’t have testing of liftOver to the positive control (the exact starting sequence). However, at an earlier point, I did test using liftOver on a different set of small changes.
When I used liftOver with only a few small differences (less than 10 bp each), that is where I was describing losing ~20% of the exon blocks (~50 total blocks in the unmapped file).
While I was currently leaning towards using Exonerate, do you think there is anything that might help map more of the exons for liftOver (now that I have revised sequences that I am ready to annotate)?
Thank You,
Charles
Charles Warden
Bioinformatics Specialist
Integrative Genomics Core, City of Hope National Medical Center
Shamrock Monrovia Building (655 Huntington Dr, Monrovia, CA, 91016), Room 1086
E-mail: cwa...@coh.org
Internal Ext: 80375 | Direct: 626-218-0375
Work-From-Home Cell: 404-316-0012
Hello Charles,
Thank you for using the Genome Browser and for your question about LiftOver optimization.
In general, that method is not a great way to get quality chain file results. Our chain file creation process is slightly different and these differences may be important for cases like yours.
Our process generates alignment files with BLAT by first splitting each fasta file into 5kb regions, running BLAT, and doing a clean-up step. For complete genomes, we have partially automated this pipeline and recommend following the "doSameSpeciesLiftOver.pl" wiki guide to perform all the steps involved to make the chain file.
http://genomewiki.ucsc.edu/index.php/DoSameSpeciesLiftOver.pl
Depending on your sequence, mismatches may still produce results below the score thresholds. I see you set a few quality thresholds to be higher than their defaults. If you want more results, you could try changing these thresholds. Specifically, reducing tileSize to 11, minScore to 30, minIdentity to 90, and maxGap to 3. This may be the easiest solution for starters. You can access descriptions of these options by running the "blat" program without any options to get the usage message.
I hope this was helpful. If you have any more questions, please reply-all to gen...@soe.ucsc.edu or genome...@soe.ucsc.edu. All messages sent to that address are publicly archived. If your question includes sensitive data, please reply-all to genom...@soe.ucsc.edu.
All the best,
Daniel Schmelter
UCSC Genome Browser
--
---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Mirror-Specific Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome-mirro...@soe.ucsc.edu.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome-mirror/8D39BD52A38EC54B908CCE3AD963153F0132AAC31F%40ppwexch2kx03.coh.org.
Hi Daniel,
Great – thank you very much!
I will take a look at that.
Sincerely,
Charles
Hi Daniel,
As an update, I tried changing those parameters but I got worse results (everything is now in the unmapped file from CrossMap).
However, for this project, I think using a .pileup file to keep track of SNPs and indels between the versions of the sequence (from a BWA-MEM alignment) is working OK, and I am using Exonerate to compare the gene annotations to those predictions.
My guess was something about having large blocks of identical or closely related sequence might be causing a problem, such as causing problems with having 1:1 mappings? However, I will add a link to this suggestion in the Biostars discussion, in the event that this can help others in slightly different situations.
Thank You,
Charles
From: Daniel Schmelter <dsch...@ucsc.edu>
Sent: Tuesday, November 24, 2020 4:17 PM
To: Charles Warden <cwa...@coh.org>
Cc: genome...@soe.ucsc.edu
Subject: Re: [genome-mirror] Optimization of liftOver for region with large duplications (exact and not exact)?
Hello Charles,