I see that WER = (S + D + I)/N, where N is the total number of words. I also see that Levenshtein edit distance is used in the compute-wer.cc program.
Do these three edits correspond to phonemes or words. Meaning, does substitution imply the number of words substituted in the edit distance calculation between the utterance and the reference sentence? Or does it imply the number of phonemes substituted in finding the Levnenshtein distance between the word of utterance and the reference word?
I hope it corresponds to phonemes. Otherwise, it doesn't make sense.