obtaining pre-processed image

1,115 views
Skip to first unread message

Mayu Shukla

unread,
Oct 18, 2015, 2:00:39 AM10/18/15
to tesseract-ocr
Hi,
I have been trying my hands on tesseract recently. I understand that the efficiency of it can be increased many folds by increasing the clarity of the image being passed to it.
For the same i was trying to understand how tesseract pre-procces the image (using laptonica if i am understanding correctly).
So i was thinking if i can have a look of the image which is mentioned in the Wiki page of tesseract (given below),i will be able to work better on where things are lacking and where i need to work.

"You can see how Tesseract has processed the image by using the configuration variable tessedit_write_images to true when running Tesseract"

But i am not able to understand where to look for and which config file in config folder,since there are many. which one tesseract calls and how it works?

Thanks

Tom Morris

unread,
Oct 18, 2015, 2:04:57 PM10/18/15
to tesseract-ocr
On Sunday, October 18, 2015 at 2:00:39 AM UTC-4, Mayu Shukla wrote:

But i am not able to understand where to look for and which config file in config folder,since there are many. which one tesseract calls and how it works?

The config file is the one you specify on the command line.  Use  tesseract -h to get help on the command line structure:

$ tesseract -h

Usage:

  tesseract imagename|stdin outputbase|stdout [options...] [configfile...]


OCR options:

  --tessdata-dir /path specify the location of tessdata path

  --user-words /path/to/file specify the location of user words file

  --user-patterns /path/to/file specify the location of user patterns file

  -l lang[+lang] specify language(s) used for OCR

  -c configvar=value set value for control parameter.

Multiple -c arguments are allowed.

  -psm pagesegmode specify page segmentation mode.

These options must occur before any configfile.

Mayu Shukla

unread,
Oct 20, 2015, 5:24:55 AM10/20/15
to tesseract-ocr
hello Tom,

I got your point.
My question was more on general terms,i mean what are the default config files that is called.
like...default commant which i use is
"tesseract image.jpg outputFile"
So there must be some default config file that's loaded., i wanted to know about that. and how can i tweak that config file to obtain the pre processed image.

zdenko podobny

unread,
Oct 20, 2015, 5:35:33 AM10/20/15
to tesser...@googlegroups.com
Why there must be default config file???
Default values are defult because there are alredy set. config files just modified them.

Zdenko

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/5de7297c-c8d4-42ad-8580-ed0a1644ca97%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Mayu Shukla

unread,
Oct 20, 2015, 7:20:28 AM10/20/15
to tesseract-ocr
ohhh... i got it.
I just created a new config file and just tessedit_write_images to true and i have what i want.
Thanks for the help. I didnt know that there is no defualt config file. I assumed that there must be one which doesn't need specific mention as with the case of -psm 3 (which is by default and needs no mention).
Anyway i got it and thanks a ton.
regards
Reply all
Reply to author
Forward
0 new messages