How does pca in plink handle missing data

167 views
Skip to first unread message

Ollie White

unread,
Jun 20, 2020, 8:11:20 AM6/20/20
to plink2-users
Hello, 

Does anyone know how plink handles missing data for a pca analysis? I think it replaces missing data with an average value but can't find this referenced anywhere.

Best wishes
Ollie

Christopher Chang

unread,
Jun 20, 2020, 8:21:27 AM6/20/20
to plink2-users
From the --pca documentation: "The randomized algorithm always mean-imputes missing genotype calls. For comparison purposes, you can use the 'meanimpute' modifier to request this behavior for the standard computation."

The standard computation is based on a GRM where the (sample A, sample B) entry is based on just the variants where neither sample A nor sample B have a missing genotype.  In theory, this matrix is not guaranteed to remain positive semidefinite, but in practice that isn't a problem unless your dataset is borked for other reasons.

Ollie White

unread,
Jun 20, 2020, 8:47:11 AM6/20/20
to plink2-users
Hi Christopher, 

Many thanks for the link and reply, that is just the detail I needed

Best wishes
Ollie
Reply all
Reply to author
Forward
0 new messages