Hi there,
The NSTI and sequence similarity values are closely linked, but not perfectly. It does definitely correlate with % similarity overall. When you say that the cut-off for PICRUSt1 was 0.03, do you mean the recommended level for the weighted NSTI? That’s the abundance-weighted average NSTI value of all OTUs per sample, so it’s a little different to interpret as well (i.e., the most abundant OTUs could have low NSTI values, but there could still be many rare ones above 0.03).
The NSTI refers to the distance in the tree to the nearest reference genome. So a NSTI value of 2 is quite high and is really just meant to filter out junk sequences.
You could lower it to something like 0.03 if you wanted to make the predictions were only based on 16S sequences that were very close to reference genomes.
Does that help?
Thanks,
Gavin