Here is my python script for your reference.
import md5
...
new_id=chr+':'+pos+':'+ref+':'+alt
md5_id=md5.new(new_id).hexdigest()
...
You may want to truncate REF and ALT to short strings to avoid the error above. But you cannot guarantee the uniqueness of truncated IDs, because some variants are split from a multi-allelic variant and they have the same CHR and POS. And, if the truncated IDs are not uniqe, plink will throw out this error again: Error: Duplicate ID ...
3. Add three parameters to plink and run it(input.vcf is the VCF file you want to convert):
--vcf input.vcf
--keep-allele-order
--a1-allele input.vcf 4 3 '#'
4. Restore the original IDs after conversion.
You need to record the corresponding relation between each MD5 and i'ts original ID in step 2, and restore the original IDs after conversion. I will not give my script here because it's not hard to implement.
By the way, I suggest the author of plink to modify plink's codes and add an appropriate parameter to avoid this problem, because my solution is just for emergency usage after all.
Frankly speaking, it's kind of ridiculous to reverse REF and ALT in my view. I admit there may be strong reasons for doing so, but there are also a lot of users like me simply want to convert VCF files with REF and ALT unchanged.
在 2016年3月2日星期三 UTC+8下午5:02:23,Abel Chang写道: