A request for someone to verify this fix please:
One of the functions used in the METEOR score will try to "match" words together, but failed to remove these newly matched words from the list of yet-to-match words, causing another matching method to find the same matches, resulting in duplicates matches.
This produced very incorrect METEOR scores whenever sentences include words which nearly match, such as "create" and "creates".
https://github.com/nltk/nltk/pull/2763