To. USCS Genome Informatics Group

71 views
Skip to first unread message

이동재

unread,
Jun 12, 2020, 12:24:27 PM6/12/20
to gen...@soe.ucsc.edu


To. USCS Genome Informatics Group

 

Data info.

I did liftovervcf using the bosTau8ToBosTau9.over.chain file (Linux, picard).

My data is WGS data (cow), there are a lot of snps, so only chr1 was processed first.

The result was about 90 percent correct, and 10 percent had mismatched reference allele errors.

Annotation was performed using snpEff for a successful file of 90%, and the result was strange.


Questions


1. Why does chr1 as well as other chromosomes appear in the lifted file? (See attached file)


2. Is there a paper that answered question 1?


3. Can you explain the principle based on the chain file of the attached file?



I can't seem to find the answer to the above three questions.


Any quick answers would be appreciated. Have a nice day.

 

best regards

 

 

Dong Jae Lee

PhD student

Animal Genomic & Breeding lab

Chungnam National University, 99, Daehak-ro, Yuseong-gu, Daejeon, Republic of Korea.

e-mail: ldj...@naver.com

mobile: +82)1073307466

 

 

 

in chain file.png
after_liftovervcf_problem.png

Matthew Speir

unread,
Jun 16, 2020, 7:25:09 PM6/16/20
to 이동재, gen...@soe.ucsc.edu
Hello, Dong Jae Lee. 

Thank you for your question about LiftOver.

The two assemblies you are looking at bosTau8 (https://www.ncbi.nlm.nih.gov/assembly/GCF_000003055.5/) and bosTau9 (https://www.ncbi.nlm.nih.gov/assembly/GCF_002263795.1/) come from two different groups (bosTau8: University of Maryland, bosTau9: USDA ARS). These two different groups are starting from different samples and using different techniques for sequencing and assembling reads into a final genome. The variability of each of these steps could cause variation as to where small bits of sequence are placed in one assembly versus the other.

Even assemblies produced by the same group can change from one version to the next as scaffolds or contigs that were placed on one chromosome in one assembly might be moved to another chromosome in a future release as assembly methods improve and more is learned about the genome structure. Additionally, it's possible that your variants fall within repetitive regions or segmental duplications, which occur throughout the genome and are often found on multiple chromosomes.

LiftOver chain files are created by first aligning the entire genome sequence of two assemblies to one another. From the resulting alignments, we then use alignment scores and other heuristics in an attempt to find the best matches for a region on one assembly to a region (or regions) in the other.

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Training videos & resources: http://genome.ucsc.edu/training/index.html

Want to share the Browser with colleagues? Host a workshop: http://bit.ly/ucscTraining

---

Matthew Speir

UCSC Cell Browser, Quality Assurance and Data Wrangler

Human Cell Atlas, User Experience Researcher

UCSC Genome Browser, User Support

UC Santa Cruz Genomics Institute

Revealing life’s code.



--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/b3cb1a920e9c39d9cd1151132e1675%40cweb015.nm.nfra.io.
Reply all
Reply to author
Forward
0 new messages