Order of commands in plink2

623 views
Skip to first unread message

Maryiam Shöâeè

unread,
Dec 11, 2015, 2:26:41 PM12/11/15
to plink2-users


I have just run simple QC measures on some data that has been cleaned of heterozygous haploids etc..

Running 

--geno 0.1 
--maf 0.01
--mind 0.1 

on the data filters 97% of my samples out (383 out of 396 removed)

I don't think the data is so poor that it fails basic QC so badly. 

So I tried running each command by itself rather than in 1 starting with --maf, then --geno and finally --mind. I only lost one individual by doing this. 

Another version, i.e. running --maf first then running --geno and --mind on the new file  gives 


Error: All people removed due to missing genotype data (--mind).


I don't quite understand why this happens? Can someone explain what plink is actually doing that gives such dramatically different results when you change the order of the commands?


Thanks 


Maryam 

Christopher Chang

unread,
Dec 11, 2015, 3:05:43 PM12/11/15
to plink2-users
Hi,

Can you post the .log files for your runs?  Thanks.

Maryiam Shöâeè

unread,
Dec 12, 2015, 4:49:45 PM12/12/15
to plink2-users
Hi 


Sorry for the delay.

Here are the log files. 


Thanks for all your help


Maryam 
maf.log
mind.log
combined QC.log
geno.log

Christopher Chang

unread,
Dec 13, 2015, 1:37:55 PM12/13/15
to plink2-users
It looks like your dataset can be divided into two components: a small core of ~137k variants with nearly complete genotype information, and a collection of ~384k variants with a ~17% missing call rate.

By default, plink applies sample filters before variant filters.  (See https://www.cog-genomics.org/plink2/order for details.)  The average missing call rate in your dataset is around 13%, so "--mind 0.1" causes almost all your samples to be thrown out.

Manually running --geno first makes sense here.

Maryiam Shöâeè

unread,
Dec 16, 2015, 5:17:27 AM12/16/15
to plink2-users
Thanks for that! 


How did you figure out the missing rate of the 384k variants from the log files?? 

Christopher Chang

unread,
Dec 16, 2015, 11:36:06 AM12/16/15
to plink2-users
In addition to sample and variant counts, the log includes "Total genotyping rate" (i.e. fraction of non-missing calls).
Reply all
Reply to author
Forward
0 new messages