Bad data for tesseract means an invalid image of some kind. It uses the leptonica library which does a number of sanity checks on images. It does not do anything special.
In its current form I would not consider it safe to allow a potential attacker to submit a chosen image to tesseract. I would assume that remote code execution vulnerabilities exist. Using ImageMagick or Pillow to sanitize the image before tesseract gets to see it.
To my knowledge, no, there's never been a formal security review. There's a lot of ugly legacy C++ and C and questionable practices in the code, honestly.
- Or just functional testing?
The CI scripts only checks that Tesseract compiles on a some supported platforms. There's a test suite that checks OCR quality in a statistical sense, but not correctness or valid output per se.