plink2 --pmerge question

534 views
Skip to first unread message

jie huang

unread,
Jun 6, 2023, 5:51:02 PM6/6/23
to plink2-users

Hi, there:

I am so happy that --pmerge could now do all the work of --bmerge. It seems that I don't need to use two different versions of plink to handle the merging of different datasets.

I do have some quick question on --pmerge. I would deeply appreciate if someone could clarify and help out.

1.  It seems that I could simply run "plink2 --pmerge-list" without first specifying "plink2 --bfile [--pfile]", correct? This is amazing since I don't need to figure out the first file and the rest of files to be merged. Previously, I had to remove the first/base file from the merge-list file.

2.  The plink online documentation says that --pmerge does "outer join" by default. So, will plink assign missing genetic data to those SNPs existing in one data but not existing in the other datasets to be merged?

3. I found that I could not run --pmerge-list --update-name --export in a single plink command line. Right now, I have to run 3 plink command consequentially. Is there a way to consolidate the 3 commands?

4. Previously, I had used "--export A" to export HARD calls while "--vcf merged.vcf.gz dosage=DS " to export DOSE calls. What is the correct and effecient way to extract both HARD and DOSE calls through plink --pfile ?

5. After i finish the --perge-file command, besides the regular XXX.pgen/psam/pvar files, I also see XXX-merge.pgen/psam/pvar files and also XXX.pgen/psam/pvar.~ files. Is there a plink command option to not generate or delete those files?

Thank you very much & best regards,
Jie 

Christopher Chang

unread,
Jun 7, 2023, 3:11:31 PM6/7/23
to plink2-users
1. --pmerge isn't yet able to handle "all the work of --bmerge".  In fact, it is only able to do one of the simplest things --bmerge does, concatenate a bunch of files with no overlapping variants.  But I figure that this is enough to be useful for now, since it is common for datasets (e.g. UK Biobank) to be provided in a chromosome-split manner.

2. Yes, plink assigns missing data in this case.

3. I just tried this, and it worked.  Can you clarify exactly what problem you ran into, providing a full .log file, etc.?

4. "--export A" exports the most precise value that is available.  If the dataset contains dosages, you'll see those, otherwise you'll just see hardcalls.  Can you clarify what problem you're having with that command?

5. --delete-pmerge-result.

jie huang

unread,
Jun 10, 2023, 5:25:48 AM6/10/23
to plink2-users

Dear Chris:

Thank you so much for your reponse and detailed exaplanation. I did not receive an email notification, and thought that nobody responded to my post. 

1. It seems that --bmerge is still fit for some operations.  Can "--pmerge-list merge-list.txt" pretty much do all the things that "--bmerge" would do?  In the case of merging only two files using "--pmerge-list", I will only include 2 files in the merge-list.txt file.

2. Thist is amazing. neither bcftools "merge" or "concatenate" will work when samples or SNPs are not exactly the same.

3. Yes, I tried again and this time it worked. Somehow, in my previous script,  i used "--update-name MY.file 2 1" instead of "1 2" and the names of my SNP are not updated. Again, this is really amazing!

4. Yes, "export A" could indeed output dosage. Previoulsy, the first few genotype of my extracted SNP happen to be all 0/1/2. That made me think that "export A" only exports hard calls. I think that is the default (exporting hard calls) for the previous version.  You said that "--export A" exports the most precise value that is available. What if I just want to export the hard calls (instead of dosage)?

5. --delete-pmerge-result works great! thanks!

BTW, below is my command log. Although the .raw file is generated successfully, the screenshot shows that the first line is messed up. There are "^A" and "<86>". My ukb.vip.snp file is separeted by tab, without any special character, as shown in the 3rd screenshot below.

Thanks!
Jie


1.png

屏幕截图 2023-06-10 170741.png

3.png

Christopher Chang

unread,
Jun 10, 2023, 11:06:05 AM6/10/23
to plink2-users
Hi,

Can you post a dataset that allows me to replicate the .raw corruption you're seeing?  I was not able to do this from the command-line alone.

jie huang

unread,
Jun 10, 2023, 6:41:22 PM6/10/23
to plink2-users
Thanks, Chris! I just sent the test data to you.

best regards,
Jie

Christopher Chang

unread,
Jun 10, 2023, 7:43:32 PM6/10/23
to plink2-users
Thanks.  Turns out the --update-name bugfix I made a few days ago was more consequential than I initially thought, and covered this case; I will backport the bugfix to alpha 4 today.

jie huang

unread,
Jun 11, 2023, 2:09:32 AM6/11/23
to plink2-users
Thanks, Chris! I will download the updated version of plink2 once you release.

Best regards,
Jie

Chris Chang

unread,
Jun 11, 2023, 2:34:20 AM6/11/23
to jie huang, plink2-users
Updated version is already posted.

--
You received this message because you are subscribed to the Google Groups "plink2-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/plink2-users/7255aca0-e6eb-49d9-adda-2c8b1b0f748an%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages