I'm trying to pull text out of Kannada documents published by the state of Karnataka and I'm consistently running into the issue that Tesseract recognizes ಕರ್ನಾಟಕ as ಕನಾ೯ಟಕ. In case it's not clear, Tesseract is mistaking ೯ for ರ್ನಾ (the last bit). I guess maybe because the actual characters for a compound construction?
Anyway, as a Tesseract noob, how do I fix this? I've attached the source image file for the text.