It's not clear what you mean by 'normalize', or ultimately hope to achieve by this step. Are you sure you need it?
Cosine-similarities will already be in a range from -1.0 to 1.0. Further, when they come from the same model/process, they'll be comparable to each other. For example, for sentences a, b, c, d, e, and f, if cossim(a,b) > cossim(d,e), then it'd be typical/defensible to say that "a and b are more similar to each other than d and e".
However, if you also calculated cossim(a,c), and then *scaled* the cossim(a,b) and cossim(a,c) values based on just the min/max seen in those pairings, the scaled version wouldn't necessarily be meaningfully comparable to some values scaled based on a different set of pairings. (And if you didn't care about such longer-range comparability – just ranks – you probably wouldn't be doing scaling at all.)
For WMDistance, the values are positive and vary more – indeed I'm not sure there is an obvious 'max' value to the distance, as longer and more-different texts could get much larger distances. And for some downstream tasks, there's no need to re-scale the values: the raw distances, or sorted rank of results, or relative differences between raw values, may be enough.
But if you do need some similarity-value that ranges from 0.0 to 1.0, rather than scaling by observed ranges, a common transformation that's used is:
similarity = 1 / (1 + distance)
Then the re-scaled values don't depend on what max happened to be in the same grouping. (You could also then shift-and-scale that value to be in the -1.0 to 1.0 range, by multiplying by 2 and substracting 1, but even if doing that comparing the WMD-derived similarity with the cosine-similarity might be nonsensical, given their very-different methods-of-calculation and typical distributions.)
- Gordon