Hi all,
Ben helped me with the debug and we found the potential causes of the issue are tri-allelic positions.
Here's a brief summary.
In VCF file, I found that tri-allelic loci are usually coded in two formats as follows:
1. Some tri-allelic positions are coded in the same row using a comma to separate different alternatives.
REF=<ref type> ALT=<alt type1>,<alt type2>
For example
3 12010 . A G,C 100 PASS
This type of tri-allelic can be excluded by using a perl script
perl -lane 'if(/^#/ or length("$F[3]$F[4]")==2){print}' <vcf_file>
2. Some tri-allelic positions are coded in different rows. (see an example below)
3 82988805 rs544932678 T A 100 PASS .
3 82988805 rs544932678 T G 100 PASS .
This type of tri-allelic can be exclude by using the following script
cat <vcf_file> | awk '!a[$2]++{print}'
Because there are not many of them so personally I just exclude them before simulation.
Here's our debug process.
1. Because the error message said there were multiple mutations, I first tried setting the mutation rate to 0 and I found it didn't work.
So our first guess was that the input file might be problematic.
2. I had a brief view of the input file and noticed the first type of tri-allelic loci and exclude them. This time simulation ran smoothly
for most of the chromosomes except chr3 and chr8. I realized there might be other causes of the issue.
3. Although slim reported an error message and stopped for chr3 and chr8, it still outputted partial VCF. My guess was that it prints
the VCF of all loci right before the problematic position one. So Ben help me checking the input file again and found the second type
of position that caused the issue.
Thanks for reading and let us know if you have any comments on this.
Best,
Terry