I think the reason is that your input is bad so the model is confused and a few pixels are enough to see an extra letter.
Your input is "bad" because it is different from the one used to train the neural network. The difference between the two images is small but the difference from the training data for both is big.
If you improve your image with zero borders, less noise and a much stronger contrast, maybe even straighten the text this kind of problem should become much less common.
If you want to understand a little more why this is possible read something about how an LSTM ocr works. This is likely something in the step that tries to decide the letters from the neural network output (beam search, CTC). Not a bug just how it works.
I do not think there is much you can do, parameters, etc., other than improve your image or tesseract. Sometimes it happens even with fine tuned models.
Lorenzo