I try to liftover 23andme files from hg19 to hg38. First I used the `--23file` parameter to convert from 23andme to plink using plink 1.9. I had some (around 100 snps) het. haploid genotypes present which I decided to remove from these files. Afterwards I wanted to merge all the .bed files which worked but gave me a large warning:
`Warning: Variants 'rs3748816' and 'i6059967' have the same position.
Warning: Variants 'rs1281013' and 'i6052145' have the same position.
Warning: Variants 'rs1805054' and 'i6012699' have the same position.
2565 more same-position warnings: see log file.`
I append the full logfile for this operation in this post.
My code to achieve all the operations above is the following:
`#create file list for raw data
file_list <- str_c(trio_wd,dir(trio_wd)) %>% str_extract('.+\\d.txt') %>% str_extract('^(?:(?!admix).)+$') %>%
{.[!
is.na(.)]}
#start with conversion to plink
for (x in file_list){
name = str_extract(x,'(?<=genome_)(.*?)(?=(_v5|_Full))')
call = (str_glue("plink --23file {x} {name} --snps-only just-acgt --make-bed --out {trio_wd}{name}"))
system(call)
call = (str_glue("plink2 --bfile {trio_wd}{name} --set-all-var-ids --make-bed --out {trio_wd}{name}_idfix"))
system(call)
call = (str_glue('plink2 '))
}
#Emil,Ole,Viktor files contain het. haploid genotypes, remove
file_list <- str_c(trio_wd,dir(trio_wd)) %>% str_extract('.+hh') %>% str_extract('^(?:(?!admix).)+$') %>%
{.[!
is.na(.)]}
for (x in file_list){
name = str_extract(x,'(?<=genome_)(.*?)(?=(_v5|_Full))')
call = (str_glue("plink --bfile {trio_wd}{name} --exclude {trio_wd}{name}.hh --make-bed --out {trio_wd}{name}_subset"))
system(call)
}
`