Dear Christopher, dear All,
I screened previous posts, but found no exact answer to my question.
What is the smoothest way to merge several files (FILE1, FILE2, FILE3, either in plink map/ped or bed/bim/fam format) based on intersection? The --variant-inner-join in plink2 is not fully functional yet, correct? The data array is same, but different versions, and I still anticipate some errors and warnings. Is there a way to handle them automatically upfront? An example command would be very helpful.
I would prefer sticking to plink, but other described ideas could be recoding to vcf, maybe using a fasta reference or fixref (?). By using some of these methods I ended up with low genotyping rate and enormous exclusions with --geno 0.02, while the general rate of single files is about 0.97). This makes me questioning the correctness of my tries, as data quality is good.
I would greatly appreciate your advice.
Regards,
Lukas
Dear Christopher,
Your answer is very appreciated.
I wanted to run it first, hence a delayed feedback. After making binary files (--make-bed) (map/ped was the initial format), I exported SNPs for each file (--write-snplist) and generated outputs based on intersection (--extract-intersect, --make-bed), as suggested. I think using the list of common SNPs should guarantee a high genotyping rate of the end file, what was my problem before. However, merging (plink1.9 --bfile FILE1 --merge-list FILE2_TO_8.txt --make-bed --out MERGED_8_FILES) produced warnings (Multiple chromosomes seen for variant …/ Multiple positions seen for variant) and error for almost all variants (Error: 689938 variants with 3+ alleles present). I know that this problem has been described before, however how to know what is causing it? Maybe there is a way to set the reference or other flags to avoid it?
Kind regards,
Lukas