"nan" under Z0, Z1, Z2, PI_HAT fin outputs of Pairwise IBD estimation

703 views
Skip to first unread message

Jack

unread,
Mar 7, 2016, 3:46:28 PM3/7/16
to plink2-users

When Pairwise IBD estimation is conducted using "--genome" command (plink1.90) , the output,  *.genome",  contains "nan" values under z0 . . . PI_HAT columns for some rows.  Is that normal? Any reason to get that value? How do we get rid of that?

Thanks,
Jack

Christopher Chang

unread,
Mar 8, 2016, 1:57:39 AM3/8/16
to plink2-users
Hi,

(i) Have you filtered low-MAF variants out (with e.g. --maf)?
(ii) Do you get the same results with plink 1.07?
(iii) Can you post the .log file for your run?

Jack

unread,
Mar 8, 2016, 9:36:53 AM3/8/16
to plink2-users
No Maf filter. Could it be caused by missing of sharing loci between two samples?

The log file from 1.90:

Random number seed: 1457446306
48256 MB RAM detected; reserving 24128 MB for main workspace.
49244 variants loaded from .bim file.
10913 people (0 males, 0 females, 10913 ambiguous) loaded from .fam.
Ambiguous sex IDs written to dsp.qc.wes.10913.concordant_indel_IBD.nosex .
Using up to 15 threads (change this with --threads).
Calculating allele frequencies... done.
Total genotyping rate is 0.971105.
49244 variants and 10913 people pass filters and QC.
Note: No phenotypes present.
IBD calculations complete.
Finished writing dsp.qc.wes.10913.concordant_indel_IBD.genome .

The log file from 1.07:
Options in effect:
        --bfile ../indel/combine/adsp.qc.wes.10913.concordant
        --genome                                            
        --min 0.2                                           
        --out plink107_adsp.qc.wes.10913.concordant_indel_IBD
** For gPLINK compatibility, do not use '.' in --out **
Reading map (extended format) from [ ../indel/combine/adsp.qc.wes.10913.concordant.bim ]
49244 markers to be included from [ ../indel/combine/adsp.qc.wes.10913.concordant.bim ] 
Reading pedigree information from [ ../indel/combine/adsp.qc.wes.10913.concordant.fam ] 
10913 individuals read from [ ../indel/combine/adsp.qc.wes.10913.concordant.fam ]       
0 individuals with nonmissing phenotypes                                                
Assuming a disease phenotype (1=unaff, 2=aff, 0=miss)                                   
Missing phenotype value is also -9                                                      
0 cases, 0 controls and 10913 missing
0 males, 0 females, and 10913 of unspecified sex
Warning, found 10913 individuals with ambiguous sex codes
These individuals will be set to missing ( or use --allow-no-sex )
Writing list of these individuals to [ plink107_adsp.qc.wes.10913.concordant_indel_IBD.nosex ]
Reading genotype bitfile from [ ../indel/combine/adsp.qc.wes.10913.concordant.bed ]
Detected that binary PED file is v1.00 SNP-major mode
Before frequency and genotyping pruning, there are 49244 SNPs
10913 founders and 0 non-founders found
Total genotyping rate in remaining individuals is 0.971105
0 SNPs failed missingness test ( GENO > 1 )
0 SNPs failed frequency test ( MAF < 0 )
After frequency and genotyping pruning, there are 49244 SNPs
After filtering, 0 cases, 0 controls and 10913 missing
After filtering, 0 males, 0 females, and 10913 of unspecified sex
Converting data to Individual-major format
Writing whole genome IBS/IBD information to [ plink107_adsp.qc.wes.10913.concordant_indel_IBD.genome ]
Filtering output to include pairs with ( 0.2 <= PI-HAT <= 1 )
IBD(g) calculation: 0 of 59541328
ERROR: No nonmissing markers for individuals A-ACT-AC000004-BL-UPN-15872 A-ACT-AC000004-BL-UPN-15872 - A-ACT-AC000189-BL-UWA-11300 A-ACT-AC000189-BL-UWA-11300
 

Christopher Chang

unread,
Mar 8, 2016, 10:40:31 AM3/8/16
to plink2-users
Ah, yes, that would explain it; a bunch of statistics would have zero denominators in that case.

Jack

unread,
Mar 8, 2016, 11:28:33 AM3/8/16
to plink2-users
Just saw a new case in plink 1.90: 
After modifying the *fam by insertion of pedigree info there, those calculated z0, . . . PI_HAT results become "nan" in the output, *.genome,  There was no "nan" before changing of *fam,  and no error message out.
Any suggestion?
Thanks,

Christopher Chang

unread,
Mar 8, 2016, 11:59:22 AM3/8/16
to plink2-users
plink only takes "founders" (i.e. those without any parents in the dataset) into account when estimating minor allele frequencies, since counting both parents and their children introduces substantial bias (they obviously aren't independent draws from the population...).  The z0/z1/z2/pi_hat estimates are dependent on minor allele frequencies, and can break if any variants have an estimated minor allele frequency of zero.

If none of the samples are actually closely related, the --nonfounders flag is a simple solution.  If there are some relevant relations, you'll probably want to use --make-founders.  Alternatively, if you have a preexisting set of good minor allele frequency estimates, you can use --read-freq.

Regardless of which approach you use to improve the MAF estimates, you'll probably also want to use --maf to filter out the variants with the lowest frequencies, at least during your --genome run.

Jack

unread,
Mar 8, 2016, 2:05:03 PM3/8/16
to plink2-users
You are right that --nonfounders option plays a big effect. Thanks.


On Monday, March 7, 2016 at 3:46:28 PM UTC-5, Jack wrote:

Jack

unread,
Mar 9, 2016, 9:31:42 AM3/9/16
to plink2-users
Is there a general trend that introduce of  "--nofounders" option may increase/decrease the value of PI_HAT?   Thanks.


On Monday, March 7, 2016 at 3:46:28 PM UTC-5, Jack wrote:
Reply all
Reply to author
Forward
0 new messages