English Data Only

54 views
Skip to first unread message

Shailesh Kulkarni

unread,
Sep 3, 2021, 9:06:26 AM9/3/21
to tesseract-ocr
HI,

I am new to OCR & using library in C# & Vb.net.
I have converted pdf to image (pdf is actually print out and contains a Image.).

Image contain two languages English and Marathi/Hindi.
I only want to process English Data.

Also when I process images its giving output in single line from two different columns which are beside one another.

Can you please guide how to go for it.

Thank You, Regards,
Shailesh

Soul Green

unread,
Sep 8, 2021, 12:30:28 AM9/8/21
to tesser...@googlegroups.com
I am also new to OCR
What helped me in a similar issue was to modify what PSM tesseract was using
https://tesseract.patagames.com/help/html/T_Patagames_Ocr_Enums_PageSegMode.htm
Perhaps try all different PSMs

Not sure how to filter languages as I use custom traineddata

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/57f66a04-dc3d-4ac8-9c4e-689ab903791an%40googlegroups.com.

Jaspreet Kaur

unread,
Sep 8, 2021, 12:32:52 AM9/8/21
to tesser...@googlegroups.com
It depends. 
Psm 6 recommended

Reply all
Reply to author
Forward
0 new messages