including and excluding of sex chromosomes

115 views
Skip to first unread message

peter huang

unread,
Aug 26, 2020, 10:51:13 AM8/26/20
to verifyBamID
Dear VerifyBamID,

We use verifyBamId a lot for the examination of cross contamination of our samples. It works well until we noticed something interesting. Here are two questions we are curious about.

Question 1:
Since version 1.1.2, verifyBamId has exclude all sex chromosomes in its contamination analysis. Would you mind telling us what is the logic behind it?

Question 2:
When using verifyBamId2, there seems to have two different SVDPrefix for hg38. One is the phase3 set and another one is not. Both sets have sex chromosomes excluded. In addition to that, the phase3 set has the chromosome 22 excluded as well. Would you please share with us why the exclusion of chromosome 22?

We tested all of them (including v1.1.0) and the results are slightly different among them.
You inputs are greatly appreciated!

Best,

Peter

Hyun Min Kang

unread,
Aug 28, 2020, 5:40:04 PM8/28/20
to verifyBamID, Fan Zhang
Q1 - Sex chromosomes are not diploid, so it does not fit into our model for estimating freemix.
Q2 - I did not know chr22 was excluded in verifyBamID2 phase 3 set. Fan -- could you comment?

Hyun.
-----------------------------------------------------
Hyun Min Kang, Ph.D.
Associate Professor of Biostatistics
University of Michigan, Ann Arbor
Email : hmk...@umich.edu


--
You received this message because you are subscribed to the Google Groups "verifyBamID" group.
To unsubscribe from this group and stop receiving emails from it, send an email to verifybamid...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/verifybamid/3604cd34-d1d6-4c37-a5d1-60b691675fe8n%40googlegroups.com.

Fan Zhang

unread,
Sep 12, 2020, 4:29:39 AM9/12/20
to verifyBamID
The markers in resource files are randomly selected (with limited rules, like MAF and callability, being applied, but no exclusion based on chromosomes). If you are interested in regions on chr22, you can prepare your customized version of resource files following the steps on the Github page(https://github.com/Griffan/VerifyBamID#generating-your-own-resource-files).

Fan 

Fan Zhang

unread,
Sep 12, 2020, 10:18:37 AM9/12/20
to verifyBamID
Correction: I mean other than sex chromosomes, we didn't exclude markers based on chromosomes(among 1-22). Sorry about the confusion.

peter huang

unread,
Sep 15, 2020, 10:04:19 AM9/15/20
to verifyBamID
Thanks Fan for the clarification!

If I choose to use your dataset, which one you recommend, phase3 or non-phase3? Thanks in advance!

Best,

Peter

Fan Zhang

unread,
Sep 16, 2020, 10:45:57 AM9/16/20
to verifyBamID
Hi, Peter

If you have a sample that has genetic ancestry close to the existing populations in one of the dataset, you should preferably go with that dataset first.
I personally have more experience of using vb2 with phase3 resource files, which gives reasonable results.

P.S. 
Following the previous Q2 question, I think a more detailed explanation is necessary: 
Specific to the resource files in the GitHub repo, there is one more filtering rule when I prepared these files: excluding markers that failed to be lift-over between b37 and b38. To avoid potential liftover failures, I randomly selected a few hundred more markers than I targeted, e.g. 10k+500. And because very few chr22 markers are selected initially, markers on chr22 are "pushed" outside of the marker set when we finalize the top 10k marker set.

Users of VB2 don't need to follow this filtering rule when preparing customized resource files. 

peter huang

unread,
Sep 16, 2020, 2:02:01 PM9/16/20
to verifyBamID
Thanks Fan for the kind help! Yes, you answer the question I am interested in!

Best,

Peter

Reply all
Reply to author
Forward
0 new messages