ROH analysis in PLINK 2.0

533 views
Skip to first unread message

Francisco Ceballos

unread,
Jun 5, 2024, 8:51:37 AM6/5/24
to plink2-users
Dear Christopher,
I hope this message finds you well. My name is Francisco C. Ceballos, a geneticist from Spain, and I am writing to discuss PLINK 2.0. First and foremost, I would like to congratulate you and express my gratitude for your efforts in providing the scientific community with such an invaluable tool. I have been using PLINK for over 15 years, and it is an honor to reach out to you.
My research focuses on the study of inbreeding in human populations. Therefore, I was quite surprised to learn that the hom analysis (Runs of Homozygosity estimation) is not included in PLINK 2.0. Given that PLINK 1.9 is optimized for array data with fewer SNPs, I was hoping to use PLINK 2.0 for ROH analysis with whole genome sequence deep coverage data.
ROH analysis has become increasingly important and widespread for various reasons. Therefore, I kindly ask you to consider reintroducing the --hom analysis feature in PLINK 2.0. This addition would greatly benefit many researchers, including myself.
Thank you for your time and consideration. I look forward to your response.
Best regards,
Francisco C. Ceballos

Christopher Chang

unread,
Jun 7, 2024, 12:58:46 PM6/7/24
to plink2-users
The primary way to address PLINK 1.9 --homozyg being "optimized for array data with fewer SNPs", while using the same underlying algorithm, is to use the documented --homozyg-... flags to change parameter settings to be more appropriate for your data.

In contrast, simply implementing --homozyg in PLINK 2.0 probably doesn't help you at all.  You would still need to figure out what parameters to change.

Francisco Ceballos

unread,
Jun 7, 2024, 4:09:27 PM6/7/24
to plink2-users

Thank you very much for your prompt response.

I apologize if I was not clear in my previous message. The issue I am encountering with PLINK 1.9 is its inability to handle very large files with substantial amounts of SNPs, such as those obtained from WGS deep coverage. When attempting to process these large VCF files with PLINK 1.9, to later obtain the ROH, I encounter errors due to the file size. I can solve this issue by breaking the cvf files by chromosome and then run them with PLINK1.9

To my understanding, PLINK 2.0 is specifically designed to manage large files. This is why I anticipated that the new version would include the --homozyg algorithm.

Please correct me if I am mistaken.

Thank you again for your assistance.

Best regards!

Francisco.

Christopher Chang

unread,
Jun 7, 2024, 7:52:47 PM6/7/24
to plink2-users
Thanks for the clarification.

I'm not intentionally excluding ROH-calling functionality from PLINK 2.0.  But as you note, chromosome-splitting is a simple and general workaround for the problem you had with PLINK 1.9 --homozyg, and for some purposes you can also split the samples into smaller groups.  In contrast, while the simpler unimplemented --pmerge[-list] use cases can be handled by PLINK 1.9 --merge[-list] and/or "bcftools merge", there are others that are currently much more painful to deal with.  And it's clear what the missing parts of --pmerge[-list] are supposed to do, whereas for ROH-calling... I am still hoping there will be something better than --homozyg to implement when the time comes.
Reply all
Reply to author
Forward
0 new messages