How to add space between strings of document. Punjabi (Gurmukhi) language have a space issue, after ocr the image it is showing no space b/w the text.

195 Aufrufe
Direkt zur ersten ungelesenen Nachricht

Mandeep Singh

ungelesen,
24.05.2017, 01:31:1624.05.17
an tesseract-ocr
Hello Guys,

I am training data for Punjabi language i am getting space issue. How do i edit config file and how do i make own personel config file for my own custom language. Please assist me.


Output is : ੳਸਦਡਗ
i want and i assume output like this => ੳ ਸ ਦ ਡ ਗ
pan.raavi.exp0.tif

ShreeDevi Kumar

ungelesen,
24.05.2017, 02:14:4224.05.17
an tesser...@googlegroups.com
Which O/S?
Which version of Tesseract?
How are you training?

Have you tried the packaged traineddata for Punjabi? What result do you get with that?

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/9e0aa40e-85e8-4659-87fb-9b586817e377%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Mandeep Singh

ungelesen,
31.05.2017, 06:17:1131.05.17
an tesseract-ocr
I am using Window 8.1 and tesseract version 3.04.

i am training the data with jTessBox editor and another method with C# Serak Trainer , but i didn't find any good solutions. There is major issue space.


On Wednesday, 24 May 2017 11:44:42 UTC+5:30, shree wrote:
Which O/S?
Which version of Tesseract?
How are you training?

Have you tried the packaged traineddata for Punjabi? What result do you get with that?

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Wed, May 24, 2017 at 10:14 AM, Mandeep Singh <mande...@gmail.com> wrote:
Hello Guys,

I am training data for Punjabi language i am getting space issue. How do i edit config file and how do i make own personel config file for my own custom language. Please assist me.


Output is : ੳਸਦਡਗ
i want and i assume output like this => ੳ ਸ ਦ ਡ ਗ

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.

ShreeDevi Kumar

ungelesen,
31.05.2017, 06:24:5431.05.17
an tesser...@googlegroups.com

The output you posted, is it using the 3.04 traineddata from repo?

What PSM did you use?

Try using the experimental tesseract4 version for windows , see wiki for links.


To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

Mandeep Singh

ungelesen,
31.05.2017, 06:46:3731.05.17
an tesseract-ocr
kindly provide me your email address i want to discuss with this issue. yes i used 3.04 and what does it mean PSM?

ShreeDevi Kumar

ungelesen,
31.05.2017, 07:35:5531.05.17
an tesser...@googlegroups.com

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

ShreeDevi Kumar

ungelesen,
31.05.2017, 08:41:1031.05.17
an tesser...@googlegroups.com
Use --oem 1 (LSTM engine) with tesseract 4.0. You will get correct output.

Use for command line interface

                        binaries from https://github.com/UB-Mannheim/tesseract/wiki

Use for GUI - look for tesseract 4.0 versions

                      gImagesReader  https://github.com/manisandro/gImageReader/releases




ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

Mandeep Singh

ungelesen,
01.06.2017, 02:10:2201.06.17
an tesseract-ocr

There is still space issue. kindly review this attachment .


Please help me out .

issue.PNG

Mandeep Singh

ungelesen,
01.06.2017, 03:43:3701.06.17
an tesseract-ocr
kindly view this issue or please guide me how do i add config file for punjabi language.

ShreeDevi Kumar

ungelesen,
01.06.2017, 04:34:3401.06.17
an tesser...@googlegroups.com
Are you using the 4.0 version of tesseract with --oem 1 (LSTM engine only)?

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

Mandeep Singh

ungelesen,
01.06.2017, 04:46:0401.06.17
an tesseract-ocr
i had install tesseract.exe 4.0 on my system after that i am using jTessBoxEditor 2.0 for training data punjabi language. Thats it. i dont what does it mean by lstm? please guide me

ShreeDevi Kumar

ungelesen,
01.06.2017, 05:03:0101.06.17
an tesser...@googlegroups.com
Please read the wiki links I sent.

If you have installed tesseract 4.0, please test first with the provided traineddata for Punjabi before trying to train.

Most times, existing traineddata provides the best result.



ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

ShreeDevi Kumar

ungelesen,
01.06.2017, 05:04:1401.06.17
an tesser...@googlegroups.com

has the traineddata for 4.0.

Mandeep Singh

ungelesen,
01.06.2017, 05:18:3301.06.17
an tesseract-ocr

ohhh Thank you very much it is working. many many thanks to you.


but i have more questions.

1. if i am training new data still there is space problem.

2. How do i add more data in pan.traindata or can i edit existing traindata?

ShreeDevi Kumar

ungelesen,
01.06.2017, 05:24:5001.06.17
an tesser...@googlegroups.com
Are you training for 3.0 or 4.0?

Do you have spaces between the letters in your training text?


ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

Mandeep Singh

ungelesen,
01.06.2017, 06:37:5501.06.17
an tesseract-ocr

Now i am using Tesseract 4.0 version as per your guidance. I want to train data for version 4.0 . Yes i am making spaces b/w the text but it is not showing spaces b/w the text.
Please now tell me how do i train the data again for the new version.


On Thursday, 1 June 2017 14:54:50 UTC+5:30, shree wrote:
Are you training for 3.0 or 4.0?

Do you have spaces between the letters in your training text?


ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Thu, Jun 1, 2017 at 2:48 PM, Mandeep Singh <mande...@gmail.com> wrote:

ohhh Thank you very much it is working. many many thanks to you.


but i have more questions.

1. if i am training new data still there is space problem.

2. How do i add more data in pan.traindata or can i edit existing traindata?

On Thursday, 1 June 2017 14:34:14 UTC+5:30, shree wrote:

has the traineddata for 4.0.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

ShreeDevi Kumar

ungelesen,
01.06.2017, 07:09:5701.06.17
an tesser...@googlegroups.com

kmpre...@gmail.com

ungelesen,
17.03.2019, 15:56:1417.03.19
an tesseract-ocr
your(Mandeep Singh) code is working for Punjabi because I'm also facing the same problem(space problem).

neet k

ungelesen,
12.01.2020, 07:31:2112.01.20
an tesseract-ocr
Hiii Mandeep Singh,

I am facing same problem related to spaces , using Tesseract to recognize Text from images. The spaces between words are ignored for Punjabi text.

Library : Tess-Two

Platform : Android

it would be grateful if you could help me to fix the problem related to spaces. Hereby, attaching a screenshot, input and output text.

Regards

Tess OCR.jpg

Suresh Anand

ungelesen,
12.01.2020, 10:38:5512.01.20
an tesser...@googlegroups.com
There's a parameter preserve word space .Have a look

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.

Ravneet Kaur

ungelesen,
13.01.2020, 01:03:4313.01.20
an tesser...@googlegroups.com
Please Let me know about Parameter. Thanks

You received this message because you are subscribed to a topic in the Google Groups "tesseract-ocr" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/tesseract-ocr/Q7mFMki7mRk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAMk_d_XbFqUWKb8T9vu%2BEUihGM5HiP_Zmzxh9H-wuboqjXGj1g%40mail.gmail.com.
Allen antworten
Antwort an Autor
Weiterleiten
0 neue Nachrichten