Hello,
Thank you for your interest in the Genome Browser and for reaching out with your inquiry. Error messages from liftOver are usually related to some complexity encountered when trying to map a region between two assemblies (here, you're mapping from hg18/build 36 to hg38). This sometimes means that a part of the assembly was either removed or significantly rearranged between the two.
For example, one of your errors (from the 10Oct24hglft_Build36.err file) indicates that the region "chr1 147775495 147775497" was "Deleted in new". This region can actually be lifted successfully to the hg19 genome assembly, and the coordinates don't change at all, but the liftOver to hg38 failed. We can investigate this a bit further by opening that region in the hg19 browser and turning on the "Hg18 Diff", "Hg38 Diff", and "GRC Incident" tracks from the "Mapping and Sequencing" group (here's a session link: https://genome.ucsc.edu/s/jcasper/hg19_contig_dropped_in_hg38). In the "Hg38 Diff" track, you can see that this part of the hg19 assembly initially had nothing new with respect to hg18 (there is nothing in the Hg18 Diff track), but was patched at some point when an issue was discovered. Later, for the hg38 assembly, this contig was dropped completely (as indicated by the red item in the Hg38 Diff track). The "GRC Incident" track also indicates multiple issues in the region. So your "Deleted in new" error meant that we were unable to carry that region into hg38 because there was nothing to match it to; probably because it had enough problems that it needed to be re-examined.
After reviewing your 10Oct24hglft_Build36.err file, we found that most of the errors were categorized as either "Deleted in new" or "Partially deleted in new."
"Deleted in new": As in the example above, this indicates that nothing in hg38 (GRCh38) aligns with that region of the Illumina 450K methylation build 36 data. This error could be due to any of several reasons, such as the region being part of a problematic contig or later being identified as highly repetitive and masked before we built the alignment.
"Partially deleted in new": This means that only a fragment of the Illumina 450K methylation build 36 aligns with hg38. However, the alignment is below the threshold percentage of bases, and it is insufficient to map the entire region. By default, this threshold is set at 95% of the input region size, but it can be adjusted using the "Minimum ratio of bases that must remap" option.
We recommend exploring the GRC Incident track for hg19 and hg38, which highlights areas where assembly issues have been identified or resolved by the Genome Reference Consortium (GRC). It’s important to note that mappings between genome assemblies are not always symmetrical, so you may encounter differences when lifting between assemblies (e.g., from hg19 to hg38). These discrepancies are expected; regions in new assemblies may shift, and corrections for previous assembly errors (such as bad data) need to be made.
If your experiment or project depends on data from segments that couldn't be mapped, we are unable to provide scientific advice on how to proceed. We hope this helps explain what the mapping failure messages mean, though, and how you can investigate specific cases further.
You may want to explore online resources for potential answers, such as Biostars (https://www.biostars.org/). A similar question has been discussed on Biostars at the following link: https://www.biostars.org/p/128708. However, please note that we are unable to verify or guarantee the accuracy of the information provided on external websites.
If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.
Gerardo Perez
UCSC Genomics Institute
--
---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/SI2PR06MB458412391F45E3F88421AD54C6782%40SI2PR06MB4584.apcprd06.prod.outlook.com.
Hello, Yaqi.
The sheep to cow liftOver is already available on our download server, e.g.
The reciprocal file is in the corresponding sheep directory. You may notice that the file is from GCF_016772045.2 instead of GCF_016772045.1; these assemblies are the same, except the .2 version also has the Y chromosome.
We started generating the domestic yak file today. We'll follow up with you when it is ready; they typically take a few days.
--
---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To view this discussion visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/ANQAzgB2IU-cj68*IqU0r4o7.1.1729674984132.Hmail.zyq777%40nwafu.edu.cn.
Hi, Yaqi.
The remaining liftOver files are now available:
https://hgdownload.soe.ucsc.edu/hubs/GCF/002/263/795/GCF_002263795.3/liftOver/GCF_002263795.3ToGCA_005887515.3.over.chain.gz
https://hgdownload.soe.ucsc.edu/hubs/GCA/005/887/515/GCA_005887515.3/liftOver/GCA_005887515.3ToGCF_002263795.3.over.chain.gz
You may also notice the yak assembly is GCA_005887515.3 instead of GCA_005887515.2. The sequence is all the same; only some contamination scaffolds were removed.
Dear UCSC Team,
I would like to extend my heartfelt thanks to your team for the support and assistance in providing the chain files. Your prompt and efficient help has been invaluable for my research, and I greatly appreciate the hard work and dedication of everyone involved.
Thank you once again for your continued support.
| From | Luis Nassar<lrna...@ucsc.edu> |
| Date | 10/25/2024 05:50 |
| To | 周亚琦<zyq...@nwafu.edu.cn> |
| Cc | gen...@soe.ucsc.edu<gen...@soe.ucsc.edu> |
| Subject | Re: [genome] USCS LiftOver Queries |
| - External Email - |
Hi, Chin.
The online LiftOver tool will return whichever coordinate system you submitted as your input.
For example, from hg38 to T2T:
Results in:
That input is in BED format, which is 0-based.
If instead, I input:
That results in:
That input is in positional format, which is 1-based, which matches the file you sent us. So you are correct that if you input 1-based, you will be returned 1-based.