Dear PLINK Team,
Thank you for the amazing resource and a very active support community. We very much appreciate it.
I have a question regarding merging of files. I am trying to merge Thousand Genomes Project (TGP) and another reference genome file as well. Since the TGP VCF files have been phased, their respective PLINK files have the correct Ref/Alt (A2/A1) allele orientation. However, the secondary reference genome (unphased) is sequenced in a different continent (which might have gone through some bottlenecks), and thus has some Ref/Alt variant alleles swapped compared to TGP.
My steps for merging are
Log File Snippet - START
Options in effect:
--bfile file1
--bmerge file2.bed file2.bim file2.fam
--make-bed
--out merged.autosomes
2504 people loaded from
file1.fam.
1200 people to be merged from
file2.fam.
Of these, 1200 are new, while 0 are present in the base dataset.
15414514 markers loaded from
file1.bim.
1641968 markers to be merged from
file2.bim.
Of these, 406098 are new, while 1235870 are present in the base dataset.
Warning: Variants '1_837192_G_A' and '1_837192_A_G' have the same position.
… (Numerous Warnings)
Warning: Variants '22_51165664_G_A' and '22_51165664_A_G' have the same
position.
…
15820612 variants and 3792 people pass filters and QC.
…
Log File Snippet - END
Of the 406098 variants in there, Warnings were displayed for 311901 variants – and I verified that 311690 of those 311901 variants simply had a Ref/Alt allele swap. I initially tried using the –flip command, but the flip is meant to swap the strand and not A1/A2 alleles. How can I correct the orientation for a subset of variant alleles (i.e. swap A1/A2 alleles in a specific dataset)? Obviously, I would only want to do this for the second file as the first file (TGP) is the standard reference.
Thank you for the help,
David