Yes, I have the same problem, some characters are split, sometimes from one character you even get three ("O0O" for example).
I wrote quite a complex code to try to limit the problem (with psm 13). The idea is this:
Process each symbol individually with iterator:
- add symbol to current group
- check if you can close the group
- if you can close it pick the best symbol/symbols and add them to the result, leave the rest for the following check.
The criteria to "close" a group is based on the distance between symbols, symbol size and confidence. You also need to take care of the spaces, not to drop them, as these are not handled as symbols. Quite a mess.
You need to look at the next symbol to decide what to do. A symbol can be "cancelled" by the next one or by the following one. My code does not fix it completely but is reasonable (with false negatives and a few false positives).
If you want to try this I suggest to first write some code to visualize the boxes, like this.
The very latest version of tesseract (checkout and build from github) handles boxes in a different (better) way, if you want to try this you may want to use that. I do not know if it could fix this problem too.
Lorenzo