--vcf question about converting vcf into plink

1,258 views
Skip to first unread message

Boo Guy

unread,
Jul 18, 2017, 4:33:51 PM7/18/17
to plink2-users
Hi 

I would like to know if the expectation that for a given sample in the <orginal.vcf> should be the same as the <check.vcf>.  I am trying to use to check to see if the --vcf command produces the expected  results.  


./plink --vcf <orginal.vcf> --make-bed --out <outfile1plink>


./plink --bfile <outfile1plink> --recode vcf-iid --out <check.vcf>


I find that for a file with 1.3million SNPs about 200k are inconsistent, which is a problem I think.  When I say inconsistent I mean 0/0 in one file and 1/1 in the other.

Am I thinking about this wrong?

Thanks 

Guy
using PLINK v1.90b4.4 64-bit (21 May 2017)    

Christopher Chang

unread,
Jul 18, 2017, 6:11:17 PM7/18/17
to plink2-users
plink 1.9 does not normally keep track of REF/ALT allele order; unless you specify --keep-allele-order, it will automatically switch to major vs. minor alleles.  Thus, the raw alleles will still be correct under the workflow you describe (i.e. whenever 0/0 changes to 1/1, the REF/ALT alleles will be swapped), but the REF/ALT labeling will be wrong sometimes.

Possible workarounds:
1. Add --keep-allele-order to your command line whenever you don't want the alleles to be switched to major/minor.
2. Use --a2-allele when exporting the VCF to recover the correct REF/ALT assignments.  (This lets you recover from forgetting --keep-allele-order in an earlier step.)
3. Use plink 2.0, which does keep track of REF/ALT.

Boo Guy

unread,
Jul 19, 2017, 11:58:51 AM7/19/17
to plink2-users
Thanks so much for the reply.  


I followed your advice and used Plink2

./plink2 --vcf <orginal.vcf> --make-bed --out <outfile1plink>
./plink2 --bfile  <outfile1plink> --recode vcf-iid --out <check.vcf>

of the 1.3million sites the only difference between the  <orginal.vcf>. and the <check.vcf> is on the haploid mitochondrial SNPs.  which is great so you have fixed my problem.
Thanks 



I originally was led to check for issues in the conversion I need to generate a number of allele frequency reports and after quite a lot of looking I figured that the F_A and F_U reports could not possibly be correct and went back to check to see if I had screwed up the conversion.

what I had done was using PLINK v1.90b4.4 64-bit (21 May 2017).  I had actually used the --keep-allele-order in my original work.

./plink --vcf <orginal.vcf> --keep-allele-order --make-bed --out <outfile1plink>

./plink --bfile <outfile1plink> --assoc --a2-allele <orginal.vcf> 4 3 '#' --allow-no-sex --out <output.as1>

I need to fix the alleles as I want to make comparisons with other subdivisions of the dataset.  However, the  allele frequency estimates from the above commands were wrong in a way I still cannot understand how or why- any suggestions as I would still like to use plink 1.9 due t the better documentation.

Christopher Chang

unread,
Jul 19, 2017, 4:02:22 PM7/19/17
to plink2-users
Can you clarify what the problem was with the old allele frequencies?  I can't help without enough specifics about what the discrepancy was.

Boo Guy

unread,
Jul 19, 2017, 5:10:08 PM7/19/17
to plink2-users

when I run 

./plink --bfile <outfile1plink> --assoc --snp chr3L_18261731 --a2-allele <orginal.vcf> 4 3 '#' --allow-no-sex --out <output.as1>


I get the frequency of the ALT alleles in the cases and control groups  0.3276   0.7905. respectively (file 2ID_105beagle_merge_chr3L_18261731.as1.assoc ).  However when I look at the VCF file the number should be 0.90 and 0.58 (chr3L_18261731.vcf).  I am using the attached .fam file to specify the case control (2ID_105beagle_merge.fam).

Does this show my problem?

Plink is really a great program I am fairly sure the problem is my lack of  understanding.

Thanks again for the help

Guy
2ID_105beagle_merge_chr3L_18261731.as1.assoc
chr3L_18261731.vcf
2ID_105beagle_merge.fam

Christopher Chang

unread,
Jul 19, 2017, 5:25:59 PM7/19/17
to plink2-users
Okay, I got F_A=0.9074 and F_U=0.58 and different CHISQ/P/OR values than you, so you must be importing the phenotypes in a different manner.

Try the following:
1. plink --vcf <original.vcf> --const-fid 1 --out <outfile1plink>
(The "--const-fid 1" brings the IDs inline with what's in your .fam file.)

2. plink --bfile <outfile1plink> --pheno <pheno.fam> --mpheno 4 --assoc --a2-allele <original.vcf> 4 3 '#' --allow-no-sex --out <output.as.1>
(--mpheno 4 specifies that you want to grab phenotypes from column 4+2=6 rather than the default column 3.)

Boo Guy

unread,
Jul 20, 2017, 4:59:14 PM7/20/17
to plink2-users
I did what you suggested.  The results now look much more consistent with what I expected.

Thanks so much for the rapid help.

Cheers 

Guy


On Tuesday, July 18, 2017 at 10:33:51 PM UTC+2, Boo Guy wrote:
Reply all
Reply to author
Forward
0 new messages