sort physical positions using plink/plink2

811 views
Skip to first unread message

zillur rahman

unread,
Oct 18, 2021, 4:54:23 AM10/18/21
to plink2-users
Hi, I was trying to QC of my genotype data with plink1.9 and plink2. The objective is to do QC with plink, then phasing and imputation. So, far it works great. For a few batches, during phasing with shaepit2 give error message: "Wrong ordering in physical positions curr_pos=77809318 prev_pos=190915650"

So, I tried to sort variants as follows: "plink2 --bfile chr4.step11 --sort-vars n --make-pgen --out chr4.step12"
and then "plink2 --pfile chr4.step12 --make-bed --out chr4.step13"
But same error persists during phasing. So, still I am unable to phase the plink files.
Attached is the log and error message. Any help?
chr4.step13.log
chr4.step12.log
chr4.step11.log
slurm-11660300.txt

Christopher Chang

unread,
Oct 18, 2021, 11:31:00 AM10/18/21
to plink2-users
If --sort-vars didn't help, the problem is probably with one of the non-plink inputs to shapeit2.  Feel free to post chr4.step13.bim here if you aren't able to verify for yourself that it's sorted.

zillur rahman

unread,
Oct 18, 2021, 2:38:29 PM10/18/21
to plink2-users
Thanks. The non-plink inputs are HRC reference and the same reference was used for phasing other batches without this type of issue. Not sure what is happening here.
Here is the chr4.step13.bim 

chr4.step13.bim

Christopher Chang

unread,
Oct 18, 2021, 2:42:44 PM10/18/21
to plink2-users
Ok, the actual issue is that the last three lines of your .bim file are as follows:

4       rs6838769       0       190915650       G       A
8       rs10957824      0       77809318        A       C
12      rs2731620       0       19927800        C       T

I.e. this is correctly sorted, but it doesn't look like it if you ignore the chromosome field.

I'm guessing this file went through a liftOver operation; these move a few variants between chromosomes.  A quick fix is to use "--chr 4" to filter these moved variants out; a better one is to perform liftOver on a merged dataset and re-split by chromosome afterward.

zillur rahman

unread,
Oct 19, 2021, 9:18:22 AM10/19/21
to plink2-users
Ohh! Thanks a lot. To start, I would go with the quick fix. Is there any plink flag to filter out those miss match chromosomes or I have to do it manually using awk/sed etc?
I would like do the same procedure for all chromosomes with a for loop/slurm job array/nextflow. So, for me plink options would be the best.

zillur rahman

unread,
Nov 9, 2021, 7:23:29 AM11/9/21
to plink2-users
In case anyone faces the same issue. Sorting by chromosome and BP order resolved the issue.


for i in $(seq 22); do awk '{if($1 != '$i') print $2}' chr${i}.step12.bim;done > disordered_variants1.txt


for i in $(seq 22); do awk '$4 < prev { print $2 } { prev = $4 }' chr${i}.step12.bim;done > disordered_variants2.txt


for i in $(seq 22); do ${PLINK2} --bfile chr${i}.step12 --exclude disordered_variants1.txt --make-bed --out chr${i}.step13; done


for i in $(seq 22); do ${PLINK2} --bfile chr${i}.step13 --exclude disordered_variants2.txt --make-bed --out chr${i}.step14; done


Christopher Chang

unread,
Nov 9, 2021, 10:01:44 PM11/9/21
to plink2-users
Please reread what my quick fix was.  "--chr" is much simpler than what you are doing.
Reply all
Reply to author
Forward
0 new messages