Too few characters. Skipping this page

2,051 views
Skip to first unread message

Chris Nevin

unread,
Apr 19, 2014, 1:13:25 PM4/19/14
to tesser...@googlegroups.com
Hello,

 I am having some trouble getting Tesseract to recognize individual characters. Whenever I think I have overcome actual errors, I get the line "Too few characters. Skipping this page"

Because I am using Tess4J I have been struggling to find out exactly what all of the different options you can set for Tesseract actually are. Would anyone be able to tell me if there is a way to set it to not limit the minimum number of characters on a page?

Also, I am trying to get Tesseract to recognise characters from chemical elements (example attached.) Will Tesseract be able to ignore the structure and just pick up on the characters?

Basically any advice as to what would be a good way to go about this would be helpful! Even if I should look at training Tesseract or creating a word list with the chemical elements or something?

Thanks a lot!

   Chris
test.png

Hakan Usakli

unread,
Jan 11, 2018, 9:09:43 AM1/11/18
to tesseract-ocr
In case it helps someone,
Yes there is a way to change the behaviour of 'minimum number of characters' I struggled with the same problem you have as well for a while

In this file,
change the value of this constant to something like 5. Recompile and you are done.

const int kMinCharactersToTry = 50

I have asked the developers to make a command line setting of that internal constant. If/When they do it I dont know.
Enjoy
Hakan

Zdenko Podobný

unread,
Jan 11, 2018, 9:16:15 AM1/11/18
to tesser...@googlegroups.com
If you need to detect just orientation it should be faster to use only leptonica functions. See

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/b95edf04-1155-4a5f-9c5b-08d4cfb5271d%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Hakan Usakli

unread,
Jan 11, 2018, 11:21:43 AM1/11/18
to tesseract-ocr
Hello Zdenko,

Thank you for that tip.
Yes I am extremely interested in using Leptonica functions directly, especially if they are expected to run faster.
But I am almost illiterate on C - I have the precompiled Leptonica DLL's

they are called

liblept-5.dll (7969kb)
or
pvt.cppan.demo.danbloomberg.leptonica-1.74.4.dll (1681kb)

I tried finding the function entry points with InterOpSignatureToolkit, that can make a .NET wrapper signature but It fails to load those DLL's, saying there is no assembly manifest.

How can I use Leptonica from the command line or call the DLL's from .Net ?

Any tips much appreciated
Hakan
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages