Short form of the question:
When PLINK2 documentation discusses "phase" (i.e. 'phased dosages'), should that be taken to mean:
a. related to the linkage of specific alleles across multiple variants
b. the splitting of one value into two. e.g. two 0-1 dosages as opposed to one 0-2 dosage
c both a and b?
Longer form with more detailed questions:
My understanding (quite possibly wrong!) of the VCF file spec is that the '|' versus '/' distinction is meant to be interpreted in conjunction with the PS ("phase sets") values that define which variants are phased together. Or, according to the VCF spec doc, if no PS is given, they are all assumed to be phased in a giant single set. This latter situation is what I believe one generally expects in results from imputation servers, since my understanding is they start by phasing the genotypes along entire chromosomes, and then impute each haplotype separately. Does PLINK load/preserve detailed "phase set" information? or just the one-giant phase set?
If I have a minimac4 imputation output that I load using "dosage=HDS" option, then sure enough I do then have a fileset for which a subsequent invocation of --pgen-info reports:
Explicitly phased hardcalls present
Explicitly phased dosages present
which sounds good.
Now if I want to get back the two 0-1 dosage values for a particular variant in some text file, how would I do that?
doing --export A seems to only output a combined 0-2 dosage value
doing --export AD adds a second column which the documentation calls the 'dominant' value, but I'm unclear if/how those two values can be converted to a pair of 0-1 dosages?
I also noticed that doing --export haps (the name sounded promising) refuses with:
Error: '--export haps' must be used with a fully phased dataset.
But since --pgen-info said that phased data was present, I assume the key word in the error is 'fully', which makes me wonder:
Is there some way to list which variants contain phased versus unphased data?
FWIW, I scanned the minimac4 VCF that I had loaded and there are no '/'s anywhere - it's all '|'s, so I feel like my fileset should be 'fully' phased. But I'm probably misunderstanding....
I'm using version: PLINK v2.00a3.4LM 64-bit Intel (1 Aug 2022)
Thanks for any enlightenment and thanks for PLINK(2)!