Thank you for these pointers, I found --set-all-var-ids very useful! And of course plink2 overall is super useful, I appreciate all the work you're putting into it.
In my case I was generating a list of millions of SNPs to keep (goal is to ascertain for biallelic loci polymorphic in YRI), so "--extract range" wasn't helpful in that setting.
For everybody else, in case this helps, I had to do things in several steps.
1) Rewrite missing SNP IDs with "chr:pos" so these are treated each as a different ID:
> plink2 --pfile all_phase3 vzs --set-missing-var-ids '@:#' --make-just-pvar zs --out all_phase3_uniq
> # replace data
> mv all_phase3.pvar.zst all_phase3_orig.pvar.zst
> mv all_phase3_uniq.pvar.zst all_phase3.pvar.zst
2) Remove all remaining duplicate IDs (I gave up with these, though originally I wanted to differentiate them too, they are very few and most of these are multiallelic but unmarked/unmerged properly due to a 1000 Genomes bug documented on their website).
> plink2 --pfile all_phase3 vzs --rm-dup exclude-all --write-snplist zs --out nodups
3) Apply my other filters (in this case all_phase3_YRI.psam has YRI samples only, also only want biallelic SNPs that are polymorphic in YRI)
> plink2 --pfile all_phase3 vzs --keep all_phase3_YRI.psam --extract nodups.snplist.zst --autosome --snps-only just-acgt --max-alleles 2 --mac 1 --write-snplist zs --out YRI
I found that I couldn't '--rm-dup exclude-all' together with the filters of step 3 because this uniqueness filter is set after SNPs are removed due to not being polymorphic biallelic SNPs, so it kept some IDs that are duplicated in the full 1000 Genomes (I wanted to remove all of those).
-Alex