Can this be used for handwriting recognition?

1,352 views
Skip to first unread message

merve

unread,
Aug 24, 2011, 8:40:25 AM8/24/11
to tesseract-ocr
Simple question, but i must be sure.
Thanks in advance

Dmitri Silaev

unread,
Aug 24, 2011, 2:37:13 PM8/24/11
to tesser...@googlegroups.com
Simple answer: in general, no.
However, in particular, it might.
Send sample images to get a more certain answer

Warm regards,
Dmitri Silaev
www.CustomOCR.com

On Wed, Aug 24, 2011 at 4:40 PM, merve <merve...@gmail.com> wrote:
> Simple question, but i must be sure.
> Thanks in advance
>

> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to tesser...@googlegroups.com
> To unsubscribe from this group, send email to
> tesseract-oc...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

Rob Townley

unread,
Aug 24, 2011, 3:17:01 PM8/24/11
to tesser...@googlegroups.com
If you happen to be looking for a handwriting recognition for Linux,
cellwriter supports international languages. It enables you to input
characters from any language using TabletPC stylus. It does not
convert handwriting from paper however.

merve t

unread,
Aug 24, 2011, 2:58:55 PM8/24/11
to tesser...@googlegroups.com
Then, i can not use it because i want to recognize freely written handwritings.
Thanks for reply.

2011/8/24 Dmitri Silaev <daemo...@gmail.com>

merve t

unread,
Aug 25, 2011, 6:26:24 AM8/25/11
to tesser...@googlegroups.com
CellWriter needs to be trained but i need a handwriting recognition which can recognize many different writers. Do you know about CellWriter, can i use it by training it with a universal data set so it can recognize different writers without training for every writer?
Maybe it is not the correct plase but i wrote because you mention about CellWriter. Thanks

2011/8/24 Rob Townley <rob.t...@gmail.com>

merve t

unread,
Sep 12, 2011, 8:34:31 AM9/12/11
to tesser...@googlegroups.com
Hello,
There is a file attached,
I must confess, i wrote it with mouse, but the data that is needed to be solved is like this.
Because we are developing a white board application.
I tried to solve it with ocropus but it could not.
I can not install tesseract alone, if you say it can solve this pic, i will try again.
Thanks for your time.

2011/8/24 Dmitri Silaev <daemo...@gmail.com>
example.png

Sriranga(78yrsold)

unread,
Sep 12, 2011, 9:47:29 AM9/12/11
to tesser...@googlegroups.com, Omshivaprakash H L
merve,
It is possible in tesseract -vide attached files which is self explanatory.
Cheers,
-sriranga(78yrs)
testexample.txt
han.traineddata
example.tif

merve t

unread,
Sep 12, 2011, 10:03:13 AM9/12/11
to tesser...@googlegroups.com
Hello, thanks for your hopeful answer.
I want to ask, what is han.traineddata?
Is it the training data to use to train the system to recognize hand writing, thus it is too small, isnt it?
Also can you share the method to transform png to tif?
Thanks.

2011/9/12 Sriranga(78yrsold) <withbl...@gmail.com>

Sriranga(78yrsold)

unread,
Sep 12, 2011, 12:37:15 PM9/12/11
to tesser...@googlegroups.com
Extract of cmd is reproduced below:
M:\>tesseract example.png example batch.nochop makebox
Tesseract Open Source OCR Engine with Leptonica

M:\>tesseract example.tif example  nobatch box.train logfile
Number of found pages: 1.

M:\>unicharset_extractor.exe example.box
Extracting unicharset from example.box
Wrote unicharset file ./unicharset.

M:\>mftraining.exe example.tr
Reading example.tr ...
example has no defined properties.

Writing Merged Microfeat ...Done!

M:\>cntraining.exe example.tr
Reading example.tr ...
Clustering ...

Writing normproto ...

M:\>tesseract example.tif example  nobatch box.train logfile
Number of found pages: 1.

M:\>unicharset_extractor.exe example.box
Extracting unicharset from example.box
Wrote unicharset file ./unicharset.

M:\>unicharset_extractor.exe example.box
Extracting unicharset from example.box
Wrote unicharset file ./unicharset.

M:\>mftraining.exe example.tr
Reading example.tr ...
example has no defined properties.

Writing Merged Microfeat ...Done!

M:\>cntraining.exe example.tr
Reading example.tr ...
Clustering ...

Writing normproto ...

M:\>combine_tessdata.exe ./han.
Combining tessdata files
TessdataManager combined tesseract data files.
Offset for type 0 is -1
Offset for type 1 is 108
Offset for type 2 is -1
Offset for type 3 is 368
Offset for type 4 is 127673
Offset for type 5 is 127715
Offset for type 6 is -1
Offset for type 7 is -1
Offset for type 8 is -1
Offset for type 9 is -1
Offset for type 10 is -1
Offset for type 11 is -1
Offset for type 12 is -1

M:\>tesseract example.png testexample -l han
Tesseract Open Source OCR Engine with Leptonica

For further details please read http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3  which are self explanatory.

merve t

unread,
Sep 16, 2011, 8:59:06 AM9/16/11
to tesser...@googlegroups.com
Hi,

@sriranga, thanks for your encouraging reply,

i train tesseract with my hand writing and get very beautiful results.

Now i wonder is there a way to recognize adjacent letters in an handwriting like in attached example.

This search is more than an ocr software's capability, okay, but maybe you advise some other thing.

Or do you have an idea for segmenting adjacent letters?

Thanks in advance.


2011/9/12 Sriranga(78yrsold) <withbl...@gmail.com>
example.png

Sriranga(78yrsold)

unread,
Sep 16, 2011, 10:32:17 AM9/16/11
to tesser...@googlegroups.com, Omshivaprakash H L
Merve,
Please  see attached file. your handwriting  image have to be split into independent char with help of paintbrush vide 2example.tif and then generated trained data. Tool have to be designed to split the handwriting into independent char, if you are programmer or developer.
Cheers,
Sriranga(78yrs)
2exampletest.txt
2ex.traineddata
2example.tif
2example.box

merve t

unread,
Sep 18, 2011, 5:18:17 AM9/18/11
to Sriranga(78yrsold), tesser...@googlegroups.com
Hello,
I am computer scientist and have programming experience, thus i think i can cut off the letters automatically, i think i will have questions on how can i get image of words from tesseract.

Anyway now i have a different question, i copy it here:

-----------------------------------------------------------------------------------------------------------------------------------------------------

Hello,

i wrote what i did;

bnv is my lang code, files i used are attached.

>>tesseract bnv.denemem.exp0.tif bnv.denemem.exp0 batch.nochop makebox

i have a box file

i edit it because there was mistakes. ok no problem.

>>tesseract bnv.denemem.exp0.tif bnv.denemem.exp0 nobatch box.train

>>unicharset_extractor bnv.denemem.exp0.box

>>mftraining -F font_properties -U unicharset -O bnv.unicharset bnv.denemem.exp0.tr

>>cntraining bnv.denemem.exp0.tr

change names;

inttemp
Microfeat
normproto
pffmtable

to;

bnv.inttemp
bnv.Microfeat
bnv.normproto
bnv.pffmtable


>>combine_tessdata bnv.

my traning procedure finishes at this point

move bnv.traineddata into /tessdata folder


>>tesseract 3example.tif output -l bnv

i do nothing about training with file 3example.tif, should i do?

I trained tesseract with a little dataset of my hand writing and i get some good results, but when i try to "test" the image attached i get

"fcgbcd"

as output.

the last three chars are correct "bcd".

But for "a" it returns "fcg" , three chars.

As another process i tried to generate a box file using the box file generating step of training, for the file attached, it recognizes "a" and its box correctly.

The main problem is getting 6 letters instead of 4 in "testing".

Also the situation about not to be able to get the right char is a problem too.

Thanks for your idea and time.



2011/9/18 Sriranga(78yrsold) <withbl...@gmail.com>
Merve,
You can ask Alex,Centre Raime reg: program for  joined handwriting and evaluate suitability of YagpoOCR for your purpose. If you find YagpoOCR is better than tesseract-OCR,
you can use it. but don't ask me for help since zero hand son experience with YagpoOCR.
With best of Luck,
-sriranga(78yrs)


On Sun, Sep 18, 2011 at 11:28 AM, Sriranga(78yrsold) <withbl...@gmail.com> wrote:
Merve,
reg:I have another question in this mail list, it would be appreciated if you share your idea about it, i have sent my cmd transcript to the mail list. - I could not locate in the forum.


On Sun, Sep 18, 2011 at 8:41 AM, Sriranga(78yrsold) <withbl...@gmail.com> wrote:
Merve,
thanks for the frank email. you have not answered about programing knowledge you have?
Yes You are correct. joined handwritten text will not work unless it is cut off(split the joined portion of two chars). You have to train the handwriting(which has generally have different shape/style) - number of times just like fonts of regular, bold etc. please remember that output will not have 100% accuracy similar to regular fonts of any lang because of relevant source code have to be modified by the creator. As such  by post processing program the accuracy can be improved further which i feel.
Wishing you success in the your project.
-sriranga(78yrs)


On Sat, Sep 17, 2011 at 9:57 PM, merve t <merve...@gmail.com> wrote:
Sriranga,
Thanks very much for attention, i have a solution in my mind to solve joined handwritten text. I am going to try to cut off letters and try if the words are in dictionary or not. The best solution i have ever found is this. I have another question in this mail list, it would be appreciated if you share your idea about it, i have sent my cmd transcript to the mail list.
Thanks very much.

2011/9/17 Sriranga(78yrsold) <withbl...@gmail.com>

Mervert,
I like to know which program you are specialised/well versed?
With best wishes,
-sriranga(78yrs)


On Sat, Sep 17, 2011 at 11:47 AM, Sriranga(78yrsold) <withbl...@gmail.com> wrote:
Mervet,

regarding KannadaOCR = Since I am not trained properly for generating Kannada datafiles for Yagpo OCR by Center Rime.
As such I do not know how to generate datafile or operate the yagpoOCR for OCR purpose. and also I am not in position to offer any comments about joined handwrite text(as stated by Center Rime) - which is new to me and just now I hearing. Further I am not using YagpoOCR for my project like English,Kannada, etc.
In the circumstances, I am not in position to help/guide you about YagpoOCR, in case, if you approach me.
Wishing you Good Luck,
-sriranga(78yrs)


On Sat, Sep 17, 2011 at 2:46 AM, Center Rime <go...@mail.ru> wrote:
Dear friends!
At present we has engine for OCR sanskrit and joined hand write text. 
With help or Shriranga we has base model for Kannada OCR.
We has frame agreement on sanskrit devanagary recognition. On next year we has in plan
recognition of main Unicode Asian area.
Send you current project status


 We invite you to cooperation in using the open source tibetan text computer recognition software.
This program already use TBRC for input of tibetan text.
It is inputed more than 200 volumes already.
In printed text we can OCR with 1-2 errors on page. Also we start work with woodblock and hand write text.

At present OCR program can recognize printed text with 300 dpi grayscale scanned images.
With support of Trace Foundation we start server for tibetan OCR project www.dharmabook.ru
Material for OCR you can upload on our server or provide access for scanned material on your server.
All OCR work free of charge, till end of this year it has support from Trace Foundation.

We start work with woodblock also. It is need more advanced program and we work on it. Now we
can OCR clear printed woodblock and handwrite text with OCR level about 90%.
Also we can OCR dictionary and mixed tibetan-endlish or tibetan-sanskrit text.
From our side most problem it is proofreading. For that we provide spellchecker - see example of recognition. 
Also we can develop tibetan software for your projects.
སྒ་རབ་འབྱམས་པ་ཀུན་དགའ་ཡེ་ཤེས་ book OCR example
http://www.dharmabook.ru/work_file/W00EGS1016747-I01JW143/index.php?img_page=I01JW1430066.tif&photo_index=65

all this book in Zip
http://www.dharmabook.ru/work_file/W00EGS1016747-I01JW143.zip

We will be happy help you in your activity 

Also some example of Kanjur printed edition OCR
http://www.dharmabook.ru/work_file/W1PD95844-I1PD95855/index.php?img_page=I1PD958550071.tif&photo_index=70
TBRC now scan a half of this volumes. We has OCR for that text and can introduce for Trace and TBRC
project of proofreading of this edition.


Sarva Mangalam!
alex
www.code.google.com/p/ocrlib
www.dharmabook.ru








bnv.denemem.exp0.tif
3example.tif

Sriranga(78yrsold)

unread,
Sep 18, 2011, 12:34:39 PM9/18/11
to merve t, tesser...@googlegroups.com, "alex" Center Rime
Mervet,
Yes, you can cut off the letters pragmatically. attached files are answer to your different question. If you forward your all datafiles generated by you,I shall investigate where mistake happens and feedback to you.
With Best Wishes,
-sriranga(78yrs)
bnv-test.txt
4example.exp0.tif
bnv.traineddata

Aayushee Gupta

unread,
Sep 20, 2016, 12:44:22 AM9/20/16
to tesseract-ocr, merve...@gmail.com, go...@mail.ru, withbl...@gmail.com

Hi

I am trying to use Tesseract to do doctor's handwriting recognition on Windows. It seems like an impossible task but I am trying to see what kind of accuracy can be obtained using Tesseract. I have used a doctor's font image for training, created the box file, trained file, unicharset, font_properties. But the shape clustering command is giving the following error:

C:\Program Files (x86)\Tesseract-OCR>shapeclustering -F font_properties -U unicharset eng.a.exp0.box.tr
Reading eng.a.exp0.box.tr ...
Font id = -1/0, class id = 1/63 on sample 0
font_id >= 0 && font_id < font_id_map_.SparseSize():Error:Assert failed:in file ..\..\classify\trainingsampleset.cpp, line 622

Can someone please tell me how to deal with this error? Any help would be appreciated.

Thanks!
Reply all
Reply to author
Forward
0 new messages