Hi Chris,
I have been trying to check if this command runs to clump the SNP's(the HPC has my BCF conversion in a queue at the moment).
________________________________________________
for i in $(seq 1 22)
do
plink \
--bfile /path/to/file/ukb_imp_chr$i \
--clump-p1 1 \
--clump-r2 0.1 \
--clump-kb 250 \
--clump /path/to/file/nalls_park.QC.nodup.nonamb.rearranged \
--clump-snp-field SNP \
--clump-field P \
--out /path/to/file/ukb_imp_chr$i \
done
________________________________________________
Even for the first chromosome, a error was generated.
________________________________________________
Random number seed: 1638214677
386941 MB RAM detected; reserving 193470 MB for main workspace.
7402791 variants loaded from .bim file.
487409 people (223038 males, 264368 females, 3 ambiguous) loaded from .fam.
Ambiguous sex IDs written to
/path/to/file/ukb_imp_chr1.nosex .
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 487409 founders and 0 nonfounders present.
Calculating allele frequencies... done.
Total genotyping rate is 0.996633.
7402791 variants and 487409 people pass filters and QC.
Note: No phenotypes present. (1st question)
Error: Duplicate ID 'rs151120166'. (2nd Question)
End time: Mon Nov 29 21:46:23 2021
________________________________________________
I wanted to clarify/help with two things:
1st. I did not specify a phenotype, is a phenotype required for clumping during this process or any other plink1.9/2 process, or would the "--no-pheno" command be appropriate in all use cases? As I want to analyse all samples and no not want to create case/control scenario.
2nd. Once I have normalised my BCF files and convered back to BED, this duplication error would be fixed? Or alternatively should I use "--rm-dup force-first" but this option is only possible in plink2 but clumping has not been implemented as of yet in plink2.