

plink2 --bgen ukb_imp_chr21_v3.bgen ref-first 'snpid-chr' --allow-extra-chr --extract mylist.txt --sample ukb34697_imp_chr21_v3_s487317.sample --threads 16 --memory 32000 --export vcf 'vcf-dosage=HDS-force' --out myOutputFile
In the code above, mylist.txt is a single column of alternate_id, as shown in Excel screenshot in my first post.
It seems that making mylist.txt using the column of rsID (i.e. column 2 in the Excel of my second post) instead of alternate_id (column 1), and deleting 'snpid-chr' in my code should be the solution? Like below:
plink2 --bgen ukb_imp_chr21_v3.bgen ref-first --allow-extra-chr --extract mylist_RsID.txt --sample ukb34697_imp_chr21_v3_s487317.sample --threads 16 --memory 32000 --export vcf 'vcf-dosage=HDS-force' --out myOutputFile
I am still having "--allow-extra-chr" in place, because not all SNPs were provided with rsID in column 2. For example, on my list of 334,016 snps, nearly 20,000 of them do not have rsID in the original txt provided by UKB. For these snps, their column 2 is filled with alternate_id as a repetition of their column 1. So I am adding "--allow-extra-chr" to tolerate some nonstandard format of ID. Does this sound like a reasonable solution?
Thanks,
Mey
--
You received this message because you are subscribed to the Google Groups "plink2-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/plink2-users/80388476-afd7-44ca-a86a-2ed0e5a3d156n%40googlegroups.com.
