Hey,
You're right that there is a little bit of sensitivity here.
Regarding all the UNK annotations, UNK stands for Unknown, which means annotators identified errors in the text but were unable to correct them. Consequently, the "correction" is just a repetition of the original string to make it clear what the erroneous uncorrected word is. This is also why these errors disappear if you run parallel_to_m2.py on them.
These errors are kept mainly for the purposes of error detection; a system should be rewarded if it identifies the token as an error, even if there is no known correction. All UNK errors will be excluded from the evaluation on error correction.
As for cases where the edits are broken differently, this largely depends on which version/model of spacy you use. Since the automatic annotation relies on POS tags and other information automatically obtained from spacy, different models will produce slightly different results. We used spacy 1.9.0 with the en_core_web_sm-1.2.0 model to generate the official data, and will be using the same setup to evaluate all system output.
You can still use a different version to develop your system however, and may get slightly different numbers when you evaluate yourself, but the official output will all be ranked using the same aforementioned spacy version/model.
Hopefully that all makes sense!
Chris