--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To post to this group, send email to tesser...@googlegroups.com.
To unsubscribe from this group, send email to tesseract-oc...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.
api.Init(argv[0], lang, &(argv[arg]), argc-arg, false);
api.SetVariable("tessedit_char_whitelist", "ABCDEFGHIJKLMNOPQRSTUVWXYZ.0123456789 ");
this is JPG look likeWORD1 WORD2 (white space is quite "big"WORD1 WORD2WORD1 WORD2WORD1 WORD2WORD1 WORD2WORD1 WORD2WORD1 WORD2and it reads like:WORD1WORD1WORD1WORD1WORD1WORD1WORD2WORD2WORD2WORD2WORD2WORD2WORD2any help would be really apreciated! I've been stuck with this for a month :(
PSM_SINGLE_COLUMN, ///< Assume a single column of text of variable sizes.
--Is there no other workarround? If I reduce the white space size of the WORD1 WORD2 then it all works fine! This space is making the OCR think it's another column! Is there no another way? Splitting the image as many rows looks something not really eficient
ok I'll try that! I have to modify this on the tesseractmain.cpp right? (I'm using command line execution)I replace this line : api.SetPageSegMode(tesseract::PSM_AUTO); for api.SetPageSegMode(tesseract::PSM_SINGLE_COLUMN); and then recompile right?thanks for the help
you now Saurabh, that was EXACTLY was I was looking for! I couldn't be more thankful to you! that line of code changed my life :Dthank you again :)
--
Long time ago, I had started my project relying on Tess's segmentation
and struggled much with it, until I came to a word-by-word approach.
Finally, I even switched to the character-wise recognition which at
last produces decent results. Mostly this transition was caused by
specifics of input images I'm working on (photos, usually of low
quality), but I think this is almost required for ideally scanned
images too.
There are some fruitful math ideas behind Tess's segmentation, but I
think the current implementation is not mature enough to be used
extensively in the production mode.
Warm regards,
Dmitry Silaev
However you can run as many as possible tests on your images and
"prove" that this probably is not the case, and hope that segmentation
errors are won't be "destructive" and only will introduce this kind of
"disorder". Then certainly you can use your (x,y)-sort method and be
happy ))
Warm regards,
Dmitry Silaev
Warm regards,
Dmitry Silaev
I run Tesseract revision 549 from the command line under Windows with
no special config and get the segmentation which is almost correct.
What language file do you use? I used the following command line
tesseract 3.tiff test3 -l eng
with no pageseg_mode (-psm argument) as well as with it, and always
the result was satisfactory.
Let me know the details on your command line and OS.
Warm regards,
Dmitry Silaev
Warm regards,
Dmitry Silaev
Warm regards,
Dmitry Silaev
Warm regards,
Dmitry Silaev
Even i wanted to know how to make tesseract to read my image horizontally. I have an image consisting of 6 rows, After training i found that my image is read from right side(Should be from left) and also its going down by column and not the row. How to solve this issue?
--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesser...@googlegroups.com
To unsubscribe from this group, send email to
tesseract-oc...@googlegroups.com
A lot of times I have seen fairly good number plate images being OCRed inaccurately. This could possibly be due to the word recognition stage. Has anyone found a way to disable the dictionary / word recognition.Saurabh, Have you been able to accomplish this ? Could you kindly share your insigths ? I have a similar need.
Thanks a lot in advance.
On Wednesday, February 16, 2011 10:48:56 PM UTC-6, Saurabh Gandhi wrote:Hello everyone,I am currently using tesseract 3.x for license plate recognition.I have an algorithm which does a good job in pre-processing the input image to localize the plate.However, when I use the Tesseract OCR engine to classify the plate number, the recognition is not that accurate. I have gone through the tesseract whitepapers as well as some of the threads discussing the LPR using tesseract.From all this, I have identified the following ways of improving the results:
- Customise the tesseract engine to recognize only the characters from A-Z,0-9,.(dot), (space) by setting the character white-list. My understanding is that the white-list is the list of characters that are going to be sensed. I was inquisitive to know what the blacklist is meant to do?
- A lot of times I have seen fairly good number plate images being OCRed inaccurately. This could possibly be due to the word recognition stage. Has anyone found a way to disable the dictionary / word recognition.
- Then there are some page segmentation modes (PSM_AUTO,PSM_SINGLE_BLOCK, PSM_CHAR etc). Does PSM_CHAR imply that it will consider the input image as a single character and run the algorithm accordingly without attempting word recognition?
- Another important configuration macro that I have seen within the code was AVS_FASTEST = 0, AVS_MOST_ACCURATE = 100. However, I could not find the same being used anywhere in the code. Does this have any impact on the character recognition accuracy?
- Finally, I also plan to use the confidence level data. Are there any indicators of confidence for characters as well. There is word confidence data which can be found in TessBaseAPI::AllWordConfidences().
Awaiting your valuable insights.Thank you.Regards,Saurabh Gandhi