Remove certain characters while fine tuning (training) tesseract

78 views
Skip to first unread message

Murtuza Dahodwala

unread,
Mar 9, 2021, 2:30:17 AM3/9/21
to tesseract-ocr
Hello,
Currently, my OCR model detects certain characters like & |.
Is it possible that I can remove these characters by correcting my lstm bounding box dataset and then fine-tuning it so that it does not detect these symbols in my test images ?? 


Greg Dunkel

unread,
Mar 10, 2021, 12:50:31 PM3/10/21
to tesser...@googlegroups.com
Would it be easier to remove these characters from the output using editing tools?

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/ecd726d5-8ab0-4986-87b0-7ff344d3271cn%40googlegroups.com.

Murtuza Dahodwala

unread,
Mar 10, 2021, 12:52:29 PM3/10/21
to tesser...@googlegroups.com
I guess that would be manual work. I want to not detect them during inference

Reply all
Reply to author
Forward
0 new messages