--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/95138faa-307f-4417-b72c-648ab84993d9%40googlegroups.com.
have you tried `osd` - orientation and script detection?
On Mon, Nov 25, 2019 at 8:13 PM Jeetendra Ahuja <jeetendr...@gmail.com> wrote:
So before processing a document, we want to rejects ones which are CJK so I've used Tesseract for this.. It does pretty good job but some times when document quality is low then from "Table of Contents" page, most of the dots are recognized as "CJK" characters. I am planning to create own training data but wanted to get advice from experts.--Config:
- Tesseract 4.0
- instance.setLanguage("chi_simB+chi_traB+korB+jpnB+engB");
- instance.setOcrEngineMode(1);
Image is zoomed to 600% in Adobe PDF reader.Please let me know.
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesser...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/95138faa-307f-4417-b72c-648ab84993d9%40googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/2656ebd0-6116-4f5b-9a8e-975730ba44c1%40googlegroups.com.
Also try with 300 dpi
On Mon, Nov 25, 2019 at 9:45 PM Jeetendra Ahuja <jeetendr...@gmail.com> wrote:
Nopes, I will do it. Thanks.--
On Monday, November 25, 2019 at 9:48:08 AM UTC-5, shree wrote:have you tried `osd` - orientation and script detection?On Mon, Nov 25, 2019 at 8:13 PM Jeetendra Ahuja <jeetendr...@gmail.com> wrote:So before processing a document, we want to rejects ones which are CJK so I've used Tesseract for this.. It does pretty good job but some times when document quality is low then from "Table of Contents" page, most of the dots are recognized as "CJK" characters. I am planning to create own training data but wanted to get advice from experts.--Config:
- Tesseract 4.0
- instance.setLanguage("chi_simB+chi_traB+korB+jpnB+engB");
- instance.setOcrEngineMode(1);
Image is zoomed to 600% in Adobe PDF reader.Please let me know.
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesser...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/95138faa-307f-4417-b72c-648ab84993d9%40googlegroups.com.
--
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesser...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/2656ebd0-6116-4f5b-9a8e-975730ba44c1%40googlegroups.com.