Meren,
I am wondering if you have any insight into tuning the min-substantive-abundance parameter. I am processing a run of stool samples with about 10.7 million reads. The min-substantive-abundance is set to 2149 by the MED algorithm. That is much higher than a lot of the other data I have tried to process (since my other inputs were only around 1 million reads). As a result the sequences removed due to the min-substantive-abundance is really high (the default 2149 removes 7.4 million reads or ~70% of my total reads)
I stepped the min-substantive-abundance by 50 from 100 to 2700 and compiled the results into a TSV, I have attached it if you want to look at it.
Do you have a rule of thumb you go by on setting the min-substantive-abundance? Do you consider a value way too low or too high?
When setting the value, does the environment the samples came from make a huge difference? I.E. vaginal (low diversity) vs stool (high diversity)
Thanks,
Gene Blanchard