I have exactly the same problem as you have: and neither am I a specialist in Tesseract. I have been experimenting with various setups.
Training from a layer seems to offer the best option for introducing a missing character. But, I am still struggling because I am not getting the same accuracy the default Best model.
- I have been training using 400,000 text lines. It is giving good accuracy on the synthetic data; but terrible output on scanned documents.
Training Tesseract is very daunting task. I spend many weeks on it; and got not satisfactory results. You need to experiment with various set ups and see the outcomes.