--score : Single sample v/s multisample output difference in plink 1.9

25 views
Skip to first unread message

AR

unread,
Oct 18, 2019, 7:56:51 PM10/18/19
to plink2-dev

I observed a difference in plink 1.9 in the output of --score for samples with missing alleles when passed individually v/s in a batch with other samples.

Here's my toy dataset:


Toy .ped file :

1 ALL_MATCH-XXX    0 0 1 0          A A        G G         T T

2 HALF_MISMATCH   0 0 1 0          A T         G C         T A

3 SOME_MISSING_   0 0 1 0          A A         0 0          0 0

4 ALL_MISSING-X      0 0 1 0          0 0          0 0          0 0

5 2MISMATCH_MIS    0 0 1 0          0 0          C C          0 0

 

 

Toy .map file:

4              ak1                                         120.0974             102736987

1              ak2                                         35.77473             16365725

3              ak3_not_in_score            131.7472             124757702

 

Toy .score file:

ak1                                   A             1

ak2                                   G             1

ak4_not_in_map              G             1

 

Toy .frq file (All frqs are 0.1) :

CHR        SNP                                 A1           A2           MAF       NCHROBS

4             ak1                                   A             T              0.1          2

1             ak2                                   G             C             0.1          4

3             ak3_not_in_score            T              A             0.1          3

5             ak4_not_in_map              G             C             0.1          4

 

PLINK OUTPUT:

 FID             IID  PHENO    CNT   CNT2 SCORESUM

   1   ALL_MATCH-XXX     -9      4      4        4

   2   HALF_MISMATCH     -9      4      2        2

   3   SOME_MISSING_     -9      2      2      2.2

   4   ALL_MISSING-X     -9      0      0      0.4

   5   2MISMATCH_MIS     -9      2      0      0.2


 FID             IID  PHENO    CNT   CNT2    SCORE

   1   ALL_MATCH-XXX     -9      4      4        1

   2   HALF_MISMATCH     -9      4      2      0.5

   3   SOME_MISSING_     -9      2      2     0.55

   4   ALL_MISSING-X     -9      0      0      0.1

   5   2MISMATCH_MIS     -9      2      0     0.05


Here'e the command I ran: 
plink2 --ped {} --map {} --read-freq {} --score [sum]

Consider 3 SOME_MISSING_
When I pass this sample in a batch (of any 2 peds or more), the score is as expected = (2*1 + 2*0.1*1)/4 = 2.2/4 = 0.55.
It's using my .frq file as expected.

However when I pass it as an individual sample to plink2, I get a different score (it calculates sum of non-missing alleles only, ignoring my .freq file)

Here's the output for that:
 FID             IID  PHENO    CNT   CNT2 SCORESUM
   3   SOME_MISSING_     -9      2      2        2

 FID             IID  PHENO    CNT   CNT2    SCORE
   3   SOME_MISSING_     -9      2      2        1

Why does it not calculate the score for missing values when I pass a sample individually ?
Also, I tried this with with different samples with missing values but it ignore the missing-value imputation using my freq file.

Is this a bug ?

Also I identified another issue:
When I pass a single ped with all missing values, I get this error: Error: No valid entries in --score file.
This is an odd error message as my .score file is the same as above. It just does not like a .ped with all missing values.

AR

unread,
Oct 18, 2019, 7:57:33 PM10/18/19
to plink2-dev
Sorry about the long post!

Christopher Chang

unread,
Oct 18, 2019, 8:48:49 PM10/18/19
to plink2-dev
Not a bug.  plink 1.9 matches plink 1.07's behavior in your primary example; I cannot change this without breaking plink 1.9's backward compatibility promise.  The problems go away when you use a .bed + .bim + .fam fileset which keeps track of what each variant's alleles are, instead of the obsolete .ped + .map format.

There could be more context around the error, but plink 1.9 has been in maintenance mode since 2016, so non-bugfix changes are only going into plink 2.0.  And plink 2.0 --score already prints "Warning: 1 --score file entry was skipped due to a missing variant ID, and 2 were skipped due to mismatching allele codes." before the final "Error: No valid variants in --score file." here.

On Friday, October 18, 2019 at 4:56:51 PM UTC-7, AR wrote:

AR

unread,
Oct 21, 2019, 6:59:42 PM10/21/19
to plink2-dev
Thanks for your response. I did that but I'm not getting the expected output on plink 1.9. Could you tell me what I'm doing wrong ?

toy.ped:
6 X_MISSIN_MALE 0 0 1 0   A A   0 0   0 0

toy.map:
4 ak1 120.0974 102736987
X ak2 35.77473 16365725
3 ak3_not_in_score 131.7472 124757702

I ran this command : 
plink2 --file toy --make-bed --out toy

This gave me the following .fam .bed and .bim files:
toy.fam
6 X_MISSIN_MALE 0 0 1 0

toy.bim:
3 ak3_not_in_score 131.7472 124757702 0 0
4 ak1 120.0974 102736987 0 A
23 ak2 35.77473 16365725 0 0


Then I ran the score command: plink2 --bfile toy --read-freq toy.frq --score toy.score --out toy

toy.frq:
CHR SNP A1 A2 MAF NCHROBS
4 ak1 A T 0.1 2
X ak2 G C 0.1 4
3 ak3_not_in_score T A 0.1 3
5 ak4_not_in_map G C 0.1 4

toy.score:
ak1             A               1
ak2             G               1
ak4_not_in_map  G               1


I get this .profile : 
 FID             IID  PHENO    CNT   CNT2    SCORE
   6   X_MISSIN_MALE     -9      2      2        1

Clearly it's not reading my .frq file again.

Christopher Chang

unread,
Oct 21, 2019, 7:16:43 PM10/21/19
to plink2-dev
That's because you created a .bim file lacking most of the alleles.  If you start with the .ped file at the beginning of this thread, convert to .bed immediately, AND NEVER USE .PED AGAIN, you will never lose the allele information and everything will work.
Reply all
Reply to author
Forward
0 new messages