I observed a difference in plink 1.9 in the output of --score for samples with missing alleles when passed individually v/s in a batch with other samples.
Here's my toy dataset:
Toy .ped file :
1 ALL_MATCH-XXX 0 0 1 0 A A G G T T
2 HALF_MISMATCH 0 0 1 0 A T G C T A
3 SOME_MISSING_ 0 0 1 0 A A 0 0 0 0
4 ALL_MISSING-X 0 0 1 0 0 0 0 0 0 0
5 2MISMATCH_MIS 0 0 1 0 0 0 C C 0 0
Toy .map file:
4 ak1 120.0974 102736987
1 ak2 35.77473 16365725
3 ak3_not_in_score 131.7472 124757702
Toy .score file:
ak1 A 1
ak2 G 1
ak4_not_in_map G 1
Toy .frq file (All frqs are 0.1) :
CHR SNP A1 A2 MAF NCHROBS
4 ak1 A T 0.1 2
1 ak2 G C 0.1 4
3 ak3_not_in_score T A 0.1 3
5 ak4_not_in_map G C 0.1 4
PLINK OUTPUT:
FID IID PHENO CNT CNT2 SCORESUM
1 ALL_MATCH-XXX -9 4 4 4
2 HALF_MISMATCH -9 4 2 2
3 SOME_MISSING_ -9 2 2 2.2
4 ALL_MISSING-X -9 0 0 0.4
5 2MISMATCH_MIS -9 2 0 0.2
FID IID PHENO CNT CNT2 SCORE
1 ALL_MATCH-XXX -9 4 4 1
2 HALF_MISMATCH -9 4 2 0.5
3 SOME_MISSING_ -9 2 2 0.55
4 ALL_MISSING-X -9 0 0 0.1
5 2MISMATCH_MIS -9 2 0 0.05
Here'e the command I ran:
plink2 --ped {} --map {} --read-freq {} --score [sum]
Consider 3 SOME_MISSING_
When I pass this sample in a batch (of any 2 peds or more), the score is as expected = (2*1 + 2*0.1*1)/4 = 2.2/4 = 0.55.
It's using my .frq file as expected.
However when I pass it as an individual sample to plink2, I get a different score (it calculates sum of non-missing alleles only, ignoring my .freq file)
Here's the output for that:
FID IID PHENO CNT CNT2 SCORESUM
3 SOME_MISSING_ -9 2 2 2
FID IID PHENO CNT CNT2 SCORE
3 SOME_MISSING_ -9 2 2 1
Why does it not calculate the score for missing values when I pass a sample individually ?
Also, I tried this with with different samples with missing values but it ignore the missing-value imputation using my freq file.
Is this a bug ?
Also I identified another issue:
When I pass a single ped with all missing values, I get this error: Error: No valid entries in --score file.
This is an odd error message as my .score file is the same as above. It just does not like a .ped with all missing values.