Warning: Variants have the same position

559 views
Skip to first unread message

Norman Peters

unread,
Jan 13, 2023, 7:15:24 AM1/13/23
to plink2-users
I try to liftover 23andme files from hg19 to hg38. First I used the `--23file` parameter to convert from 23andme to plink using plink 1.9. I had some (around 100 snps) het. haploid genotypes present which I decided to remove from these files. Afterwards I wanted to merge all the .bed files which worked but gave me a large warning:

`Warning: Variants 'rs3748816' and 'i6059967' have the same position.
Warning: Variants 'rs1281013' and 'i6052145' have the same position.
Warning: Variants 'rs1805054' and 'i6012699' have the same position.
2565 more same-position warnings: see log file.`

I append the full logfile for this operation in this post.

My code to achieve all the operations above is the following:

`#create file list for raw data
file_list <- str_c(trio_wd,dir(trio_wd)) %>% str_extract('.+\\d.txt') %>% str_extract('^(?:(?!admix).)+$') %>%
  {.[!is.na(.)]}

#start with conversion to plink
for (x in file_list){
  name = str_extract(x,'(?<=genome_)(.*?)(?=(_v5|_Full))')
  call = (str_glue("plink --23file {x} {name} --snps-only just-acgt --make-bed --out {trio_wd}{name}"))
  system(call)
  call = (str_glue("plink2 --bfile {trio_wd}{name} --set-all-var-ids --make-bed --out {trio_wd}{name}_idfix"))
  system(call)
  call = (str_glue('plink2 '))
}

#Emil,Ole,Viktor files contain het. haploid genotypes, remove
file_list <- str_c(trio_wd,dir(trio_wd)) %>% str_extract('.+hh') %>% str_extract('^(?:(?!admix).)+$') %>%
  {.[!is.na(.)]}

for (x in file_list){
  name = str_extract(x,'(?<=genome_)(.*?)(?=(_v5|_Full))')
  call = (str_glue("plink --bfile {trio_wd}{name} --exclude {trio_wd}{name}.hh --make-bed --out {trio_wd}{name}_subset"))
  system(call)
}
`
K.log

Norman Peters

unread,
Jan 13, 2023, 7:51:41 AM1/13/23
to plink2-users
I forgot to say that the total genotyping rate dropped down to 0.48649 after they merge despite the single files had an average genotyping rate of 0.95+. Is this problem related to the mentioned warning? 

How can I fix the problematic SNPs in the warning? In some other question using plink2.0s --set-all-var-ids flag could be used to solve this. But I'm unsure how to use this argument in this case. I also checked some of the SNPs in question and it seems that those share the same allele.

E.g.
Warning: Variants 'i706404' and 'i4000691' have the same position.

Searching the *.bim files for this ids:

26    i4000691    0    16488    0    C
26    i706404    0    16488    0    C


Christopher Chang

unread,
Jan 14, 2023, 12:57:13 PM1/14/23
to plink2-users
If you believe all same-position variants refer to the same SNPs, something like "--set-all-var-ids @_#" on both your datasets will be enough to bring them into sync.  If that assumption is incorrect, you will find out during the merge; in that case, see https://www.cog-genomics.org/plink/1.9/data#merge3 .

Norman Peters

unread,
Jan 16, 2023, 10:19:43 AM1/16/23
to plink2-users
After setting the var ids as mentioned and removed some duplicates using plink2 --rm-dup I ended up with some new weird mergeing error. 

New data cleaning loop:
for (x in file_list){
name = str_extract(x,'[A-Za-z]+(?=_((k|K)|P)).+')
call = str_glue("plink2 --bfile {trio_wd}{name} --set-all-var-ids '@_#' --make-bed -out {trio_wd}{name}_fixvar")
system(call)
call = str_glue("plink2 --bfile {trio_wd}{name}_fixvar --rm-dup --make-bed -out {trio_wd}{name}_dup")
system(call)
#a few (<10 SNPs): 'duplicate IDs with inconsistent genotype data or variant information', remove if there
if(file.exists(str_glue("{trio_wd}{name}_dup.rmdup.mismatch"))){
  call = str_glue("plink -bfile {trio_wd}{name}_fixvar --exclude {trio_wd}{name}_dup.rmdup.mismatch --make-bed -out {trio_wd}{name}_dup_mis")
  system(call)
  call = str_glue("plink2 --bfile {trio_wd}{name}_dup_mis --set-all-var-ids '@_#' --rm-dup --make-bed -out {trio_wd}{name}_dup")
  system(call)
}
#ensure that there are no positions with same allele coding
call = str_c("awk '$5==$6 {print $2}' " ,trio_wd, name, "_dup.bim >", trio_wd, name, "_dup_noalle.txt")
system(call)
call = str_glue("plink -bfile {trio_wd}{name}_dup --exclude {trio_wd}{name}_dup_noalle.txt  --make-bed -out {trio_wd}{name}_dup_fixal")
system(call)
}

After merging the files plink prompts that over 200k variants would show 3+ alleles.
Of course one could just remove those with --exlcude but this looks like to many variants or? Did I do some coding mistake in the data cleaning process?

Christopher Chang

unread,
Jan 16, 2023, 1:32:53 PM1/16/23
to plink2-users
Have you looked at a few of the .bim entries corresponding to the 3+ allele positions?  I can't answer all your questions if you don't post even a little bit of representative data.

Norman Peters

unread,
Jan 16, 2023, 2:24:00 PM1/16/23
to plink2-users
From *.merge.missnp:
10_100004799
10_100016339
10_100017453
10_100025924

From BettyMarie.bim
10    10_100004799    0    100004799    .    A
10    10_100011084    0    100011084    .    A
10    10_100012219    0    100012219    .    G
10    10_100012527    0    100012527    .    G
10    10_100016339    0    100016339    .    C


Does this help?

Christopher Chang

unread,
Jan 16, 2023, 2:47:15 PM1/16/23
to plink2-users
For these two positions:
- What are the .bim entries in the other samples you're trying to merge?
- What did they look like *before* --set-all-var-ids?  I mentioned in my first response that, if it was not actually appropriate to try to merge these variants, you'd find out during the merge; we may be finding that out now.

Norman Peters

unread,
Jan 16, 2023, 3:45:57 PM1/16/23
to plink2-users


What did they look like *before* --set-all-var-ids?  I mentioned in my first response that, if it was not actually appropriate to try to merge these variants, you'd find out during the merge; we may be finding that out now.

From BettyMarie.bim before --set-all-var-ids
10    rs77264786    0    100004799    0    A
10    rs140156290    0    100011084    0    A
10    rs140271302    0    100012219    0    G
10    i713489    0    100012527    0    G
10    rs1983865    0    100016339    0    C


- What are the .bim entries in the other samples you're trying to merge?
Viktor.bim
10    10_100004799    0    100004799    .    A
10    10_100011084    0    100011084    .    A
10    10_100012219    0    100012219    .    G
10    10_100012527    0    100012527    .    G
10    10_100016339    0    100016339    .    C

Ole.bim
10    10_100004799    0    100004799    .    A
10    10_100011084    0    100011084    .    A
10    10_100012219    0    100012219    .    G
10    10_100012527    0    100012527    .    G
10    10_100016339    0    100016339    T    C

hanne.bim
10    10_100004799    0    100004799    C    A

10    10_100011084    0    100011084    .    A
10    10_100012219    0    100012219    .    G
10    10_100012527    0    100012527    .    G
10    10_100016339    0    100016339    .    C

Emil.bim
none of theses entries exist 

Christopher Chang

unread,
Jan 16, 2023, 5:07:56 PM1/16/23
to plink2-users
Okay, the issue is that plink 1.9 does not recognize the "." allele code as "missing".  Try using plink 2.0 "--make-bed --output-missing-genotype 0" with the last three filesets before retrying the merge.

Christopher Chang

unread,
Jan 17, 2023, 12:42:38 AM1/17/23
to plink2-users
Alternatively, today's plink 1.9 build adds recognition of the '.' allele code as missing; you can try merging with that.

Norman Peters

unread,
Jan 17, 2023, 2:49:10 AM1/17/23
to plink2-users
Thanks for the update! Now this command runs without an error. I guess, the quite low genotype rate after the merge, as mentioned in the OP, is due to bad data and has nothing to do with the data cleaning process.

Christopher Chang

unread,
Jan 17, 2023, 11:57:48 AM1/17/23
to plink2-users
The low post-merge genotyping rate is probably due to poor overlap between some of your files.  PLINK 1.x merge performs an "outer join": when a variant is present in some samples but not others, it makes it into the merged dataset, with missing genotype calls for the additional samples.  --geno with an appropriate threshold can be used to pare the dataset down to the variants that are present in most samples.

Mulusew Kassa

unread,
Feb 3, 2023, 11:15:10 AM2/3/23
to plink2-users
Dear respected team of plink. How can I overcome this warnings. I have read similar posted before but doesn't work.
warnings of merged file.JPG
Reply all
Reply to author
Forward
0 new messages