Hello,
I'm interested in converting a vcf file to bgen with plink2. When I run the command
plink2 \
--export bgen-1.2 ref-first bits=8 \
--dosage-erase-threshold 0.006 \
--out bgen/chr14 \
--threads 7 \
--vcf vcf/chr14.vcf.gz dosage=DS
I get the warning message :
Warning: Unphased heterozygous hardcalls in partially-phased variants are
poorly represented with bits=8.
It is necessary to use e.g. --dosage-erase-threshold 0.006 to re-import them
cleanly.
Does this mean that the --dosage-erase-threshold flag I provided was not recognized?
I'm converting to bgen files specifically to make them
compatible with the BOLTLMM program. And it seems that BOLTLMM is rejecting the bgen files made by plink, with message such as:
ERROR: 14:20004949 has Phased = 2 (not 0)
I have vcf files that were converted using the qctool program which do not produce this kind of error, so I am wondering if it has to do with plink, and possibly the dosage-erase-threshold flag.
According to the bgen format website:
If Phased=1 the row stores one probability per allele (other than the last allele) per haplotype (e.g. to represent phased data).
If Phased=0 the row stores one probability per possible genotype (other than the 'last' genotype where all alleles are the last allele), to represent unphased data.
Any other value for Phased is an error.
Converting vcf to bgen is extremely slow using the qctool program which does not implement multithreading for file conversions. Given I only need the dosage, and not the GP's, I would much rather be able to do file conversion in plink if possible.
Thanks in advance for any help or advice you can offer.