Tesseract Classification

345 views
Skip to first unread message

Karin

unread,
Apr 12, 2012, 12:04:09 AM4/12/12
to tesseract-ocr
Can we do another classification on tesseract?
Currently I am using Tesseract 2.00 and I go through all the
variables that can be set before recognize. Is there a mechanism to
add KNN for classification in Tesseract 2.00?

Mayur Mudigonda

unread,
Apr 12, 2012, 12:25:33 AM4/12/12
to tesser...@googlegroups.com
I think if you were writing your own classification code, you'd have to edit the classification (cpp files) and compile them manually. You would also branch from the default Tesseract build.

I am interested in analyzing the use of more powerful classifiers like LTSM NN and Boltzman machines based NNs. Although that is a more mammoth task and I would require support from more people.

M


--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesser...@googlegroups.com
To unsubscribe from this group, send email to
tesseract-oc...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en



--

URL:
www.cse.msu.edu/~mudigon1
www.blindsight.com/team
Elegance is not a dispensable luxury but a factor that decides between success and failure.
Edsger Dijkstra

Ankur Rana

unread,
Apr 12, 2012, 2:24:16 AM4/12/12
to tesser...@googlegroups.com
how can i pass my already binarized image data to Tesseract 2.01?
Regards
---------------------------------------------------------------------------------------
Ankur Rana
(ਅੰਕੁਰ ਰਾਣਾ)

Mayur Mudigonda

unread,
Apr 12, 2012, 3:15:03 AM4/12/12
to tesser...@googlegroups.com
If this is from the command line - save it as .png image file and call Tesseract

It should be no different from any other image

Ankur Rana

unread,
Apr 12, 2012, 11:00:10 AM4/12/12
to tesser...@googlegroups.com
I tired to pass only image binary data in text file to tesseract but not working. Can anybody explain how to tesseract read the image file?

Mayur Mudigonda

unread,
Apr 12, 2012, 2:35:29 PM4/12/12
to tesser...@googlegroups.com
Ankur, you need to be more specific. Please attach the image if possible for other users on the thread to attempt to give you a solution. It is hard to help without more information.

M

Karin

unread,
Apr 12, 2012, 9:59:25 PM4/12/12
to tesseract-ocr


On Apr 12, 11:25 am, Mayur Mudigonda <mayur.mudigo...@gmail.com>
wrote:
> I think if you were writing your own classification code, you'd have to
> edit the classification (cpp files) and compile them manually. You would
> also branch from the default Tesseract build.
>
> I am interested in analyzing the use of more powerful classifiers like LTSM
> NN and Boltzman machines based NNs. Although that is a more mammoth task
> and I would require support from more people.
>
> M
>

Karin

unread,
Apr 12, 2012, 10:01:08 PM4/12/12
to tesseract-ocr
I use tesseract in java..
can u give me script tesseract in java?

Ankur Rana

unread,
Apr 13, 2012, 12:20:10 AM4/13/12
to tesser...@googlegroups.com
Please see the attached files. Image file contains mixture of Punjabi and English language text. I have made Punjabi recognition system. What i need is to recognize the English text. When i run tesseract on bmp image it recognized English text successfully.  "224_1.dat" is binary file of the image file (224_1.bmp) that i have created. I want to pass 224_1.dat file as a input to tesseract for recognition of English text.
224_1.bmp
224_1.dat

Mayur Mudigonda

unread,
Apr 13, 2012, 3:15:30 AM4/13/12
to tesser...@googlegroups.com
Do you want both recognized simultaneously? What happens when you run it with eng/punj language models sequentially?

M

Zdenko Podobný

unread,
Apr 13, 2012, 5:45:54 AM4/13/12
to tesser...@googlegroups.com
2.01 is too old. So I would suggest to use tesseract executable only or
upgrade to tesseract 3.02.
Save your data as image (in format that is recognized by tesseract 2.01)
and run:
tesseract image_file output_file

Zdenko


Dňa 13.04.2012 06:20, Ankur Rana wrote / napísal(a):
> Please see the attached files. Image file contains mixture
> of Punjabi and English language text. I have made Punjabi recognition
> system. What i need is to recognize the English text. When i run tesseract

> on bmp image it recognized English text successfully. "*224_1.dat" *is

>>>>>> www.cse.msu.edu/~mudigon1 <http://www.cse.msu.edu/%7Emudigon1>

Ankur Rana

unread,
Apr 16, 2012, 7:43:57 AM4/16/12
to tesser...@googlegroups.com
what are the name of reuqired files used in tesseract.2.0

2012/4/13 Zdenko Podobný <zde...@gmail.com>

Ankur Rana

unread,
Apr 17, 2012, 7:50:46 AM4/17/12
to tesser...@googlegroups.com
can i put all the header and source file in folder and compile?
Reply all
Reply to author
Forward
0 new messages