LD computations of r^2 give "nan" values

1,489 views
Skip to first unread message

Adrian Pelin

unread,
Jun 28, 2014, 8:30:36 PM6/28/14
to plink2...@googlegroups.com
Hello,

I am trying to use unphased .vcf files to compute r^2 estimations of LD using unphased data.

For some reason, all calls are "nan", and there is no other information in the output .ld file.

What could be causing this? There seems to be nothing wrong with the vcf file, here are a few lines:

gdr18_k75-85_NHC_conc   5845442 .       T       C       13275.2 .       AB=0.561181;ABP=26.1269;AC=1;AF=0.5;AN=2;AO=399;CIGAR=1X;DP=711;DPB=711;DPRA=0;EPP=14.0309;EPPR=26.4232;GTI=0;LEN=1;MEANALT=1;MQM=29.7218;MQMR=39.0897;NS=1;NUMALT=1;ODDS=2420.38;PAIRED=0.809524;PAIREDR=0.573718;PAO=0;PQA=0;PQR=0;PRO=0;QA=15108;QR=11856;RO=312;RPP=30.4449;RPPR=4.79202;RUN=1;SAF=118;SAP=147.606;SAR=281;SRF=153;SRP=3.26085;SRR=159;TYPE=snp;technology.illumina=1    GT:DP:RO:QR:AO:QA:GL 0/1:711:312:11856:399:15108:-10,0,-10
gdr18_k75-85_NHC_conc   5846296 .       A       T       1937.79 .       AB=0.362573;ABP=31.0617;AC=1;AF=0.5;AN=2;AO=62;CIGAR=1X;DP=171;DPB=171;DPRA=0;EPP=8.05372;EPPR=17.5333;GTI=0;LEN=1;MEANALT=1;MQM=30.6613;MQMR=48.3945;NS=1;NUMALT=1;ODDS=446.193;PAIRED=0.516129;PAIREDR=0.577982;PAO=0;PQA=0;PQR=0;PRO=0;QA=2366;QR=4140;RO=109;RPP=59.0483;RPPR=3.50834;RUN=1;SAF=32;SAP=3.15039;SAR=30;SRF=44;SRP=11.7958;SRR=65;TYPE=snp;technology.illumina=1   GT:DP:RO:QR:AO:QA:GL 0/1:171:109:4140:62:2366:-10,0,-10
gdr18_k75-85_NHC_conc   5846323 .       GCA     ACG     2474.03 .       AB=0.471338;ABP=4.13061;AC=1;AF=0.5;AN=2;AO=74;CIGAR=1X1M1X;DP=157;DPB=158.333;DPRA=0;EPP=3.47981;EPPR=5.12945;GTI=0;LEN=3;MEANALT=1;MQM=31.4189;MQMR=44.9639;NS=1;NUMALT=1;ODDS=569.666;PAIRED=0.445946;PAIREDR=0.554217;PAO=1;PQA=29;PQR=31;PRO=2;QA=2853;QR=3249;RO=83;RPP=19.9126;RPPR=12.4549;RUN=1;SAF=42;SAP=5.94472;SAR=32;SRF=36;SRP=6.17594;SRR=47;TYPE=complex;technology.illumina=1  GT:DP:RO:QR:AO:QA:GL     0/1:157:83:3249:74:2853:-10,0,-10
gdr18_k75-85_NHC_conc   5847100 .       T       C       7668.4  .       AB=0.489224;ABP=3.47829;AC=1;AF=0.5;AN=2;AO=227;CIGAR=1X;DP=464;DPB=464;DPRA=0;EPP=14.7286;EPPR=5.65821;GTI=0;LEN=1;MEANALT=1;MQM=37.8106;MQMR=48.8059;NS=1;NUMALT=1;ODDS=1765.72;PAIRED=0.54185;PAIREDR=0.578059;PAO=0;PQA=0;PQR=0;PRO=0;QA=8675;QR=8997;RO=237;RPP=40.9776;RPPR=23.2499;RUN=1;SAF=125;SAP=8.07069;SAR=102;SRF=116;SRP=3.23936;SRR=121;TYPE=snp;technology.illumina=1       GT:DP:RO:QR:AO:QA:GL 0/1:464:237:8997:227:8675:-10,0,-10
gdr18_k75-85_NHC_conc   5847141 .       G       A       4712.35 .       AB=0.411594;ABP=26.4307;AC=1;AF=0.5;AN=2;AO=142;CIGAR=1X;DP=345;DPB=345;DPRA=0;EPP=6.00754;EPPR=8.66896;GTI=0;LEN=1;MEANALT=1;MQM=36.3451;MQMR=47.4236;NS=1;NUMALT=1;ODDS=1085.06;PAIRED=0.697183;PAIREDR=0.561576;PAO=0;PQA=0;PQR=0;PRO=0;QA=5436;QR=7722;RO=203;RPP=38.2432;RPPR=9.69587;RUN=1;SAF=55;SAP=18.6694;SAR=87;SRF=94;SRP=5.4171;SRR=109;TYPE=snp;technology.illumina=1  GT:DP:RO:QR:AO:QA:GL 0/1:345:203:7722:142:5436:-10,0,-10
gdr18_k75-85_NHC_conc   5847545 .       A       G       956.275 .       AB=0.432099;ABP=6.2541;AC=1;AF=0.5;AN=2;AO=35;CIGAR=1X;DP=81;DPB=81;DPRA=0;EPP=20.9405;EPPR=15.095;GTI=0;LEN=1;MEANALT=1;MQM=28.5143;MQMR=37.8913;NS=1;NUMALT=1;ODDS=220.19;PAIRED=1;PAIREDR=0.673913;PAO=0;PQA=0;PQR=0;PRO=0;QA=1288;QR=1726;RO=46;RPP=20.9405;RPPR=102.898;RUN=1;SAF=35;SAP=79.0118;SAR=0;SRF=31;SRP=15.095;SRR=15;TYPE=snp;technology.illumina=1  GT:DP:RO:QR:AO:QA:GL    0/1:81:46:1726:35:1288:-10,0,-10
gdr18_k75-85_NHC_conc   5847841 .       A       C       2414.7  .       AB=0.431034;ABP=10.1986;AC=1;AF=0.5;AN=2;AO=75;CIGAR=1X;DP=174;DPB=174;DPRA=0;EPP=13.4623;EPPR=5.66432;GTI=0;LEN=1;MEANALT=1;MQM=47.56;MQMR=42.2626;NS=1;NUMALT=1;ODDS=556.005;PAIRED=0.8;PAIREDR=0.707071;PAO=0;PQA=0;PQR=0;PRO=0;QA=2812;QR=3777;RO=99;RPP=21.1059;RPPR=4.08507;RUN=1;SAF=34;SAP=4.429;SAR=41;SRF=43;SRP=6.71716;SRR=56;TYPE=snp;technology.illumina=1     GT:DP:RO:QR:AO:QA:GL0/1:174:99:3777:75:2812:-10,0,-10
gdr18_k75-85_NHC_conc   5848570 .       T       C       11319.3 .       AB=0.382908;ABP=110.309;AC=1;AF=0.5;AN=2;AO=345;CIGAR=1X;DP=901;DPB=901;DPRA=0;EPP=3.01659;EPPR=3.07279;GTI=0;LEN=1;MEANALT=1;MQM=37.5507;MQMR=46.0378;NS=1;NUMALT=1;ODDS=2606.37;PAIRED=0.704348;PAIREDR=0.717626;PAO=0;PQA=0;PQR=0;PRO=0;QA=12980;QR=21247;RO=556;RPP=3.16765;RPPR=6.07223;RUN=1;SAF=125;SAP=59.8148;SAR=220;SRF=244;SRP=21.0695;SRR=312;TYPE=snp;technology.illumina=1    GT:DP:RO:QR:AO:QA:GL 0/1:901:556:21247:345:12980:-10,0,-10
gdr18_k75-85_NHC_conc   5849151 .       T       G       3958.17 .       AB=0.424403;ABP=21.7241;AC=1;AF=0.5;AN=2;AO=160;CIGAR=1X;DP=377;DPB=377;DPRA=0;EPP=261.47;EPPR=209.57;GTI=0;LEN=1;MEANALT=2;MQM=37.2375;MQMR=43.2823;NS=1;NUMALT=1;ODDS=911.402;PAIRED=0.925;PAIREDR=0.822967;PAO=0;PQA=0;PQR=0;PRO=0;QA=5721;QR=7489;RO=209;RPP=350.446;RPPR=456.848;RUN=1;SAF=149;SAP=261.47;SAR=11;SRF=175;SRP=209.57;SRR=34;TYPE=snp;technology.illumina=1       GT:DP:RO:QR:AO:QA:GL 0/1:377:209:7489:160:5721:-10,0,-10
gdr18_k75-85_NHC_conc   5850932 .       C       T       15273.3 .       AB=0.435798;ABP=39.8155;AC=1;AF=0.5;AN=2;AO=448;CIGAR=1X;DP=1028;DPB=1028;DPRA=0;EPP=6.81038;EPPR=9.3147;GTI=0;LEN=1;MEANALT=2;MQM=47.3259;MQMR=45.9206;NS=1;NUMALT=1;ODDS=3516.8;PAIRED=0.513393;PAIREDR=0.478411;PAO=0;PQA=0;PQR=0;PRO=0;QA=17196;QR=22192;RO=579;RPP=3.18479;RPPR=3.4641;RUN=1;SAF=183;SAP=35.6018;SAR=265;SRF=229;SRP=57.9197;SRR=350;TYPE=snp;technology.illumina=1     GT:DP:RO:QR:AO:QA:GL 0/1:1028:579:22192:448:17196:-10,0,-10


The command I issue is: "plink --vcf GDR-18_diploid.vcf --r2 square --allow-extra-chr"

Thanks for the help,
Adrian

Christopher Chang

unread,
Jun 28, 2014, 11:10:35 PM6/28/14
to plink2...@googlegroups.com
You can't compute r^2 with only one sample.  I'll make PLINK error out in this situation.

Adrian Pelin

unread,
Jul 3, 2014, 9:56:59 AM7/3/14
to plink2...@googlegroups.com
It seems there are still issues, not sure how to deal with this error:

Logging to plink.log.
72501 MB RAM detected; reserving 36250 MB for main workspace.
Error: Multiple instances of '_' in sample ID.

Here is a sample, I put - infront of new lines:

-#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  whatever        GDR-19_diploid_whatever GDR-20_diploid_whatever GDR-21_diploid_whatever  GDR-22_diploid_whatever GDR-23_diploid_whatever GDR-42_diploid_whatever GDR-44_diploid_whatever GDR-45_diploid_whatever
-gdr18_k75-85_NHC_conc   5847996 .       A       G       694.14  .       AB=0.396552;ABP=8.40154;AC=7;AF=0.5;AN=14;AO=23;CIGAR=1X;DP=403;DPB=58;DPRA=0;EPP=14.4341;EPPR=35.8306;GTI=0;LEN=1;MEANALT=1;MQM=38.6522;MQMR=37.2571;NS=1;NUMALT=1;ODDS=152.71;PAIRED=0.73913;PAIREDR=0.885714;PAO=0;PQA=0;PQR=0;PRO=0;QA=908;QR=1317;RO=35;RPP=52.9542;RPPR=62.6327;RUN=1;SAF=6;SAP=14.4341;SAR=17;SF=1,3,4,5,6,7,8;SRF=4;SRP=48.239;SRR=31;TYPE=snp;technology.illumina=1       GT:GL:AO:DP:QA:QR:RO    .       0/1:-10,0,-10:23:58:908:1317:35 .       0/1:-10,0,-10:27:54:1056:1031:27 0/1:-10,0,-10:28:68:1053:1449:39        0/1:-10,0,-10:29:75:1115:1780:46        0/1:-10,0,-10:5:14:186:351:9    0/1:-10,0,-10:11:25:432:543:14   0/1:-10,0,-10:45:109:1693:2363:64
-gdr18_k75-85_NHC_conc   5848570 .       T       C       6717.27 .       AB=0.382908;ABP=110.309;AC=2;AF=0.5;AN=4;AO=345;CIGAR=1X;DP=1086;DPB=901;DPRA=0;EPP=3.01659;EPPR=3.07279;GTI=0;LEN=1;MEANALT=1;MQM=37.5507;MQMR=46.0378;NS=1;NUMALT=1;ODDS=2606.37;PAIRED=0.704348;PAIREDR=0.717626;PAO=0;PQA=0;PQR=0;PRO=0;QA=12980;QR=21247;RO=556;RPP=3.16765;RPPR=6.07223;RUN=1;SAF=125;SAP=59.8148;SAR=220;SF=0,7;SRF=244;SRP=21.0695;SRR=312;TYPE=snp;technology.illumina=1  GT:GL:AO:DP:QA:QR:RO    0/1:-10,0,-10:345:901:12980:21247:556   .       .       .       ..       .       0/1:-10,0,-10:68:185:2502:4401:117      .
-gdr18_k75-85_NHC_conc   5848580 .       T       G       5834.25 .       AB=0.429257;ABP=21.1371;AC=5;AF=0.5;AN=10;AO=179;CIGAR=1X;DP=2132;DPB=417;DPRA=0;EPP=3.11948;EPPR=3.93042;GTI=0;LEN=1;MEANALT=2;MQM=37.4469;MQMR=50.1229;NS=1;NUMALT=1;ODDS=1357.34;PAIRED=0.877095;PAIREDR=0.855932;PAO=0;PQA=0;PQR=0;PRO=0;QA=6730;QR=9021;RO=236;RPP=6.5162;RPPR=8.31016;RUN=1;SAF=61;SAP=42.4243;SAR=118;SF=2,3,4,5,8;SRF=103;SRP=11.2913;SRR=133;TYPE=snp;technology.illumina=1       GT:GL:AO:DP:QA:QR:RO    .       .       0/1:-10,0,-10:179:417:6730:9021:236     0/1:-10,0,-10:133:301:5022:6362:167      0/1:-10,0,-10:207:461:7824:9621:253     0/1:-10,0,-10:156:389:5905:8576:225     .       .       0/1:-10,0,-10:210:564:7913:13019:348
-gdr18_k75-85_NHC_conc   5849151 .       T       G       1722.76 .       AB=0.424403;ABP=21.7241;AC=6;AF=0.5;AN=12;AO=160;CIGAR=1X;DP=1008;DPB=377;DPRA=0;EPP=261.47;EPPR=209.57;GTI=0;LEN=1;MEANALT=2;MQM=37.2375;MQMR=43.2823;NS=1;NUMALT=1;ODDS=911.402;PAIRED=0.925;PAIREDR=0.822967;PAO=0;PQA=0;PQR=0;PRO=0;QA=5721;QR=7489;RO=209;RPP=350.446;RPPR=456.848;RUN=1;SAF=149;SAP=261.47;SAR=11;SF=0,1,3,4,6,7;SRF=175;SRP=209.57;SRR=34;TYPE=snp;technology.illumina=1    GT:GL:AO:DP:QA:QR:RO    0/1:-10,0,-10:160:377:5721:7489:209     0/1:-10,0,-10:49:123:1767:2508:70        .       0/1:-10,0,-10:58:149:2141:3133:89       0/1:-10,0,-10:95:247:3423:5017:144      .       0/1:-10,0,-10:13:30:494:536:15   0/1:-10,0,-10:29:82:1033:1728:50        .
-gdr18_k75-85_NHC_conc   5850773 .       A       G       327.75  .       AB=0.40625;ABP=5.45321;AC=1;AF=0.5;AN=2;AO=13;CIGAR=1X;DP=32;DPB=32;DPRA=0;EPP=4.51363;EPPR=8.61041;GTI=0;LEN=1;MEANALT=1;MQM=38.6923;MQMR=41;NS=1;NUMALT=1;ODDS=75.4664;PAIRED=0.384615;PAIREDR=0.315789;PAO=0;PQA=0;PQR=0;PRO=0;QA=468;QR=705;RO=19;RPP=31.2394;RPPR=44.2683;RUN=1;SAF=5;SAP=4.51363;SAR=8;SF=6;SRF=6;SRP=8.61041;SRR=13;TYPE=snp;technology.illumina=1  GT:GL:AO:DP:QA:QR:RO    .       .       .       .       .       .       0/1:-10,0,-10:13:32:468:705:19  .       .
-gdr18_k75-85_NHC_conc   5850932 .       C       T       6591.89 .       AB=0.435798;ABP=39.8155;AC=9;AF=0.5;AN=18;AO=448;CIGAR=1X;DP=4296;DPB=1028;DPRA=0;EPP=6.81038;EPPR=9.3147;GTI=0;LEN=1;MEANALT=2;MQM=47.3259;MQMR=45.9206;NS=1;NUMALT=1;ODDS=3516.8;PAIRED=0.513393;PAIREDR=0.478411;PAO=0;PQA=0;PQR=0;PRO=0;QA=17196;QR=22192;RO=579;RPP=3.18479;RPPR=3.4641;RUN=1;SAF=183;SAP=35.6018;SAR=265;SF=0,1,2,3,4,5,6,7,8;SRF=229;SRP=57.9197;SRR=350;TYPE=snp;technology.illumina=1     GT:GL:AO:DP:QA:QR:RO    0/1:-10,0,-10:448:1028:17196:22192:579  0/1:-10,0,-10:129:317:5013:7169:188      0/1:-10,0,-10:163:438:6321:10566:275    0/1:-10,0,-10:162:408:6184:9410:246     0/1:-10,0,-10:218:542:8364:12473:324     0/1:-10,0,-10:260:630:10051:14184:370   0/1:-10,0,-10:44:104:1701:2206:60       0/1:-10,0,-10:104:243:3918:5229:139     0/1:-10,0,-10:235:586:8886:13286:351

Christopher Chang

unread,
Jul 3, 2014, 10:00:03 AM7/3/14
to plink2...@googlegroups.com
This is due to the difference between how VCF sample IDs and plink sample IDs work (plink keeps track of separate family and individual IDs).

The simplest workaround is to use --const-fid.  You can see the other options at https://www.cog-genomics.org/plink2/input#double_id .

Adrian Pelin

unread,
Jul 3, 2014, 1:01:02 PM7/3/14
to plink2...@googlegroups.com
Thank you for your help, but given the above VCF file it still gives nan values all over the place:

@NCS ~/nosema/heterozygosity/TrimReads/bwa_Mapping_9iso_REF_GDR18/vcf/diploid $ plink --vcf GDR18-45.vcf --r2 square --allow-extra-chr --threads 4 --const-fid
PLINK v1.90b1i 64-bit (27 Jun 2014)        https://www.cog-genomics.org/plink2
(C) 2005-2014 Shaun Purcell, Christopher Chang   GNU General Public License v3

Logging to plink.log.
72501 MB RAM detected; reserving 36250 MB for main workspace.
--vcf: plink-temporary.bed + plink-temporary.bim + plink-temporary.fam written.
53130 variants loaded from .bim file.
9 people (0 males, 0 females, 9 ambiguous) loaded from .fam.
Ambiguous sex IDs written to plink.nosex .
Using up to 4 threads (change this with --threads).
Calculating allele frequencies... done.
Total genotyping rate is 0.54392.
53130 variants and 9 people pass filters and QC.
Note: No phenotypes present.
--r2 square to plink.ld ... done.  

And then I do:
@NCS /shared/AdrianP/home112013/3/heterozygosity/TrimReads/bwa_Mapping_9iso_REF_GDR18/vcf/diploid $ head plink.ld
nan    nan    nan    nan    nan    nan    nan    nan    nan    nan    nan    nan    nan    nan    nan    nan    nan    nan    nan    nan    nan    nan    nan    nan    nan    nan    nan    nan    nan    nan    nan    nan    nan    nan    nan    nan    nan    nan    nan    nan    nan    nan    nan    nan    nan    nan    nan    nan    nan    nan    nan    nan    nan    nan    nan    nan    nan    nan    nan    nan    nan    nan    nan    nan    nan    nan    nan    nan    nan    nan    nan    nan    nan    nan    nan    nan    nan    nan    nan    nan    nan    nan    nan    nan    nan    nan    nan    nan    nan    nan    nan    nan   

I wonder what is causing this. Something must be wrong with the VCF file. I called variants with FreeBayes.

Christopher Chang

unread,
Jul 3, 2014, 1:06:29 PM7/3/14
to plink2...@googlegroups.com
If the genotype calls are all identical for a marker (in the few sample lines you gave, every single call was heterozygous), all r^2 values involving that marker will be nan.

Adrian Pelin

unread,
Jul 3, 2014, 2:55:09 PM7/3/14
to plink2...@googlegroups.com
I thought you needed sites with variation to be able to call r^2. The data here is unphased, that's why it looks like all genotypes are the same.

Basically my sample consists of pool sequencing of spore populations, and they look highly clonal, and we want to show that there is strong LD. How should we call variants to be able to calculate r^2?

Christopher Chang

unread,
Jul 3, 2014, 3:04:17 PM7/3/14
to plink2...@googlegroups.com
You're best off filtering out the sites which have nothing but identical unphased calls.  "--maf 1e-8" deals with the homozygous case.  The heterozygous case is a bit trickier; you might want to use a combination of --freqx and a short custom script.

Adrian Pelin

unread,
Jul 3, 2014, 3:25:25 PM7/3/14
to plink2...@googlegroups.com
I have been trying to find a little bit more literature on how this r^2 is calculated, but was not lucky.

I am having difficulty understanding what would constitute an informative site. If all genotypes are identical along all samples, should that not be an elevated r^2 value?

Christopher Chang

unread,
Jul 3, 2014, 3:34:06 PM7/3/14
to plink2...@googlegroups.com
The unphased r^2 computed by PLINK is based on allele count correlation between a pair of markers.  This correlation is of the form Cov(m1, m2) / sqrt(Var(m1)Var(m2)).

When the allele counts are all identical, variance (and covariance) is zero.  0/0 gives you "nan".

Adrian Pelin

unread,
Jul 3, 2014, 3:45:16 PM7/3/14
to plink2...@googlegroups.com
Okay, I think I see what the problem is. FreeBayes estimates Allele Frequency in the AF tag assuming a certain degree of ploidy, so 0.49 or 0.45 will get converted to 0.5 all the time.

I want to try to call variants with another tool. Which do you suggest that works well with PLINK?

Adrian

Christopher Chang

unread,
Jul 3, 2014, 3:53:33 PM7/3/14
to plink2...@googlegroups.com
Choice of variant caller shouldn't depend on PLINK.  I think the real problem is that you just can't get very good r^2 estimates with only 9 samples.

Adrian Pelin

unread,
Jul 3, 2014, 11:21:13 PM7/3/14
to plink2...@googlegroups.com
Yes there a limit to set on the amount of nearby SNPs evaluated for r^2? The files produced are extremely large dozens of GB in size. I tried a few options but I was not allowed to with r^2

Christopher Chang

unread,
Jul 4, 2014, 12:32:12 AM7/4/14
to plink2...@googlegroups.com
You need to choose between 'square', which causes the entire matrix to be reported, and the usual windowed/thresholded mode.

Adrian Pelin

unread,
Jul 4, 2014, 10:40:12 AM7/4/14
to plink2...@googlegroups.com
Amazing, I tried "plink --vcf GDR-all.vcf --r2 --ld-window 10000 --allow-extra-chr --threads 4 --const-fid" and it worked like a charm!

Now these .vcf files were obtained assuming the individuals are diploid, and genotypes are called as X/x. We actually speculate that this organism is tetraploid. And when we call the variants using a prior for ploidy of 4, we get genotypes in the X/X/X/x format. For some reason, when I run plink on those VCF files, the table is completly empty, is it because plink does not support tetraploids?

Thank you,
Adrian

Christopher Chang

unread,
Jul 4, 2014, 11:40:57 AM7/4/14
to plink2...@googlegroups.com
Correct, plink does not support tetraploids.

Adrian Pelin

unread,
Jul 6, 2014, 1:45:51 PM7/6/14
to plink2...@googlegroups.com
Hello again,

When I issue
plink --vcf GDR18-45.vcf --r2 inter-chr --ld-window 10000 --allow-extra-chr --threads 4 --const-fid

I get sort of what I want, r2 values for SNPs between different contigs. However, I only get each SNP from one contig compared to 1 SNP from other contigs, as opposed to each SNP from one contig compared to all SNPs from other contigs.

How can I reshape my parameters to have all SNPs compared to all other SNPs?

Thank you,
Adrian

Christopher Chang

unread,
Jul 6, 2014, 2:00:49 PM7/6/14
to plink2...@googlegroups.com
That command line shouldn't work at all.  When I try to use "--r2 inter-chr" with "--ld-window 10000", I get

"Error: --ld-window flag cannot be used with the --r/--r2 'inter-chr' or matrix output modifiers."

("--r2 inter-chr", without --ld-window, should do what you want.)

Can you post your .log file?

Adrian Pelin

unread,
Jul 6, 2014, 2:06:52 PM7/6/14
to plink2...@googlegroups.com
You are right. Sorry about that, that was the wrong command I issued previously.

The command I issued is:
plink --vcf GDR18-45.vcf --r2 inter-chr --allow-extra-chr --threads 4 --const-fid

Christopher Chang

unread,
Jul 6, 2014, 2:15:11 PM7/6/14
to plink2...@googlegroups.com
Are you sure this isn't just because the other inter-contig r^2 values fall below the --ld-window-r2 threshold?  (Try --ld-window-r2 0.01 and see if you get more inter-contig entries).

Adrian Pelin

unread,
Jul 6, 2014, 2:30:19 PM7/6/14
to plink2...@googlegroups.com
Indeed I get more values. What is the default --ld-window-r2 threshold? And what do people normally use in papers when they are reporting r^2?

Christopher Chang

unread,
Jul 6, 2014, 9:44:39 PM7/6/14
to plink2...@googlegroups.com
The default --ld-window-r2 threshold is 0.2.

I'm not aware of any sort of standard reporting threshold; an appropriate value depends on why it is that you're reporting r^2 estimates.

Adrian Pelin

unread,
Jul 7, 2014, 12:51:53 AM7/7/14
to plink2...@googlegroups.com
Thank you so much for your help! I really appreciate it.

The reason why I am calculating r^2 from multiple samples is because I am trying to find out if a parasite propagates clonally or is also able to propagate sexually. Sexual propagation involves recombination, so low LD. Clonality lacks recombination, so distinct geographical isolates should have the same haplotypes, so high LD.

Since I only have 100bp PE HiSeq data, I can't really look for recombination in the conventional way. I also do not have phased data, so I am stuck with unphased SNPs.
Reply all
Reply to author
Forward
0 new messages