I am very new PLINK user, so please forgive the superficial question.
I am using the --recodeA flag in conjunction with --bfile (i.e. .bed / .bim / .fam files), e.g.
plink2 --bfile myfile --recode A --out myrawfile
My understanding is that myrawfile.raw should contain allelic dosage information for each variant (in column 7 onwards, one column per variant). I have not provided a reference genome or used any additional flags, so I would expect the coding to be as follows: 0 = homozygous for major allele; 1 = heterozygous; 2 = homozygous for minor allele. Is this correct? I ask because I am getting some unusual results when analysing my .raw file, e.g. >4x as many homozygous minor alleles (2s) as heterozygotes (1s), which seems unlikely/unrealistic given my population. Am I misunderstanding the output of this file? I am using the 850k variant UK Biobank genotyping array, if that helps at all. When using the toy dataset from the old PLINK tutorial (PLINK: Whole genome data analysis toolset (harvard.edu)) I note the suffixes *_0 and *_1 that appear on the variant names (e.g. below). Are these relevant to interpreting the values? Any help would be appreciated, thank you.
