--blocks doesn't seem to be making logical haplotype blocks

383 views
Skip to first unread message

Dan Sapozhnikov

unread,
Jun 25, 2018, 10:34:16 PM6/25/18
to plink2-users

Hi all,

I am having some trouble using the --blocks function. I am new to plink so I may be misunderstanding something here and I hope someone can help.
I want to end up with a list of blocks in which every recombination point is cataloged. 

This is the command I am using:
plink --vcf SNP_variantsONLY.recode.vcf --blocks no-pheno-req --allow-extra-chr --blocks-max-kb 200 --blocks-min-maf .01 --blocks-strong-lowci 0.7005

The .vcf input file contains 24 samples. 
I have re-structured it and re-organized it in a way in which the recombination points are clearly visible:




However, when I head the plink output it looks like this:





As you can see, it ends a block at scaffold1:59501 (about 8 rows from the top), where the .vcf clearly has no different haplotype change and the next block ignores haplotype switch in red (scaffold1:156870). 
There seems to be no logic to the blocks.

Does anyone know what is going on?
I have tried different max block sizes, different lower confidence intervals.
One thing I have not tried is converting .vcf to plink binary, but I don't think that does anything as it can handle vcf.

Here is a direct link to the documentation:


Any help would be greatly appreciated. 

Thanks!

Christopher Chang

unread,
Jun 25, 2018, 11:18:39 PM6/25/18
to plink2-users
Hi Dan,

Can you post the full .log file for your run? Thanks.

Dan Sapozhnikov

unread,
Jun 25, 2018, 11:47:06 PM6/25/18
to plink2-users
Yes sure, here it is!

PLINK v1.90b4.6 64-bit (15 Aug 2017)
Options in effect:
  --allow-extra-chr
  --blocks no-pheno-req
  --blocks-max-kb 200
  --blocks-min-maf .01
  --blocks-strong-lowci 0.7005
  --vcf temp.recode.vcf

Hostname: sw-4r04-n06
Working directory: /gs/project/fkr-592-aa/sb_fkr-592-aa/Danzqianqi/Cflo/WGS/24AntsSNPSandIndels
Start time: Mon Jun 25 23:31:28 2018

Random number seed: 1529983888
516849 MB RAM detected; reserving 258424 MB for main workspace.
--vcf: plink-temporary.bed + plink-temporary.bim + plink-temporary.fam written.
135544 variants loaded from .bim file.
24 people (0 males, 0 females, 24 ambiguous) loaded from .fam.
Ambiguous sex IDs written to plink.nosex .
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 24 founders and 0 nonfounders present.
Calculating allele frequencies... done.
135544 variants and 24 people pass filters and QC.
Note: No phenotypes present.
--blocks: 2300 haploblocks written to plink.blocks .
Extra block details written to plink.blocks.det .
Longest span: 200.001kb.

End time: Mon Jun 25 23:37:12 2018

Christopher Chang

unread,
Jun 26, 2018, 1:50:50 AM6/26/18
to plink2-users
Hmm, this looks like a combination of two factors:
1. A single sample switching out of 24 may not reduce r^2 enough to cause a block boundary to be declared there, under your current thresholds.
2. The artificial 200 kb block size limit is causing the initial 217 kb block to be split 17-200, etc.

There was also a —blocks corner case bugfix in Oct 2017, so you should update your copy of plink 1.9, but I’m guessing it doesn’t affect this run. Main things to do are increase the block size limit and/or tweak some other constants.

Dan Sapozhnikov

unread,
Jun 26, 2018, 10:50:49 AM6/26/18
to plink2-users
Thank you for your advice ! It did help, but I'd like to ask if maybe you know how to make it any better.

In order to try to be able to recognize that single sample switching, I increased --blocks-strong-lowci from 0.7005 to 0.8005. I also increased the maximum block size to 1500. Both numbers I changed arbitrarily.
Now I see the following output:


As you can see, a new block starts at 156870! Just like I wanted! I also checked  the next one manually and there is indeed a switch at 492968, the next block. So this is good news.
Still, I am wondering if I can make it better...

First, as you see, it is still perceiving some block switch in the very beginning, even though there is very clearly no block switch, and --blocks-max-kb is no longer a limiting factor. Is this the edge/corner problem you were talking about?

Second, do you have any suggests to the value of --blocks-strong-lowci? I really chose it arbitrarily...At the same time, is it worth playing with the other options? Like the top end of the confidence interval?

Third, and this is slightly off-topic, but is there any plink option (or are you aware of any other program) that can now genotype each block such that I get a kind of VCF but with blocks instead of SNPs? 

Thank you so much for your help.

As for the second part, I am not sure how to resolve that. I constantly have one sample switching and would like to detect that.

Maybe it is because my samples are non-traditional? They are essentially all siblings, and are also diploids in a system of haplodiploidy...meaning one allele will always be the same in all samples and only one allele can vary.

Dan Sapozhnikov

unread,
Jun 26, 2018, 2:36:51 PM6/26/18
to plink2-users
Oops , strike out everything after "Thank you", that was from before when it didn't work.
Reply all
Reply to author
Forward
0 new messages