Yes, old OCR solutions use binarized content but I see this as a legacy limitation. It was probably done to speed up the processing and also, I suppose, because the algorithms used would not benefit from the extra gray details anyway. Old ocr tech was also print oriented so the text was already near binary.
With a neural network there is no extra time cost in processing grayscale or binary text, they are just float values in both cases. Binarization throws away a lot of data, especially with noisy images, complex backgrounds, etc. (ID documents, smartphone pictures, etc.).
Binarization may improve OCR performance but I doubt: a CNN should be easily able to learn to binarize the image itself if this can improve the results.
The only advantage I see for binarization is that syntetic training data is binary, so you try to match the real input data to the one used for training. Of course you could corrupt this data to make it grayscale.
But I would expect a full grayscale training and prediction to give slightly better results, especially for complex cases.
I fine tuned my models using grayscale data (real world crops, not synthetic) and, if possible, I'd like to try to disable the binarization step to see if I get an improvement. Maybe there are some parameters controlling this step.
Thanks
Lorenzo