Comparison between plink --make-king-table and king --kinship

Wei-Min Chen

unread,

Jun 17, 2017, 3:08:07 AM6/17/17

to plink2-users

Overall, the KING-robust (between-family) algorithm is very well implemented in PLINK. Great jobs! However, I do observe a couple of subtle differences:

1. KING is using both autosomes and XY chromosome SNPs while PLINK is using autosomes only. This sometimes results in somewhat small differences especially for NSNP between KING and PLINK. When there are no XY chromosomes available, PLINK and KING do give identical estimates.

2. The PLINK implementation is really fast, almost twice faster than my latest unreleased KING version when the sample size is ~7000 (1 minute vs nearly 2 minutes) . However, when the sample size is substantially larger (N=72,000), PLINK is unexpectedly much slower than KING. It took PLINK 6 hours while it took KING < 2 hours for the same dataset. Well it is not really a bug, but the developer can keep the scalability issue in mind for the future versions. Here is the computational time comparison between PLINK and KING:

Start time: Fri Jun 16 16:45:01 2017
258299 MB RAM detected; reserving 129149 MB for main workspace.
Using up to 64 threads (change this with --threads).
71965 samples (37581 females, 34384 males; 71965 founders) loaded from
../data/meganofam.fam.
179594 variants loaded from ../data/meganofam.bim.
1 binary phenotype loaded (25861 cases, 44955 controls).
71965 samples (37581 females, 34384 males; 71965 founders) remaining after main
filters.
25861 cases and 44955 controls remaining after main filters.
179594 variants remaining after main filters.
Excluding 2333 variants on non-autosomes from KING-robust calculation.
--make-king-table: 177261 variants processed.
--make-king-table: Results written to meganofam.kin0 .
End time: Fri Jun 16 22:41:33 2017

Relationship inference across families starts at Fri Jun 9 13:44:16 2017
32 CPU cores are used.
ends at Fri Jun 9 15:36:00 2017
Between-family kinship data saved in file meganofam.kin0

Christopher Chang

unread,

Jun 20, 2017, 1:03:40 PM6/20/17

to plink2-users

Hi,

1. Can you clarify how you believe the X and Y chromosomes should be handled?

2. Yes, the current algorithm is not cache-oblivious, so it's not too surprising that it gives up a factor of 3 on very large datasets. I'll look into some improvements for this case (transposing fewer variants at a time when 1920 can't fit in L3 cache, and maybe falling back on Intel MKL for sufficiently large n).

Wei-Min Chen

unread,

Jun 20, 2017, 1:30:37 PM6/20/17

to plink2-users

I meant chromosome XY, or chromosome 25, the pseudo-autosomal region of X, can be treated as autosomes. I did not mean the X and Y chromosomes, which requires a different algorithm. On the other hand, it is also OK to simply ignore chromosome XY since the number of SNPs is small there.

Christopher Chang

unread,

Jun 20, 2017, 1:40:25 PM6/20/17

to plink2-users

Ah, ok. XY/PAR1/PAR2 should already be included in the --make-king/--make-king-table computation.

Wei-Min Chen

unread,

Jun 20, 2017, 1:49:31 PM6/20/17

to plink2-users

Well, I'll check again for occasional discrepancies (likely my latest unreleased version has a bug). I'll give an update here in a few days.

Reply all

Reply to author

Forward