How to add space between strings of document. Punjabi (Gurmukhi) language have a space issue, after ocr the image it is showing no space b/w the text.

閲覧: 196 回
最初の未読メッセージにスキップ

Mandeep Singh

未読、
2017/05/24 1:31:162017/05/24
To: tesseract-ocr
Hello Guys,

I am training data for Punjabi language i am getting space issue. How do i edit config file and how do i make own personel config file for my own custom language. Please assist me.


Output is : ੳਸਦਡਗ
i want and i assume output like this => ੳ ਸ ਦ ਡ ਗ
pan.raavi.exp0.tif

ShreeDevi Kumar

未読、
2017/05/24 2:14:422017/05/24
To: tesser...@googlegroups.com
Which O/S?
Which version of Tesseract?
How are you training?

Have you tried the packaged traineddata for Punjabi? What result do you get with that?

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/9e0aa40e-85e8-4659-87fb-9b586817e377%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Mandeep Singh

未読、
2017/05/31 6:17:112017/05/31
To: tesseract-ocr
I am using Window 8.1 and tesseract version 3.04.

i am training the data with jTessBox editor and another method with C# Serak Trainer , but i didn't find any good solutions. There is major issue space.


On Wednesday, 24 May 2017 11:44:42 UTC+5:30, shree wrote:
Which O/S?
Which version of Tesseract?
How are you training?

Have you tried the packaged traineddata for Punjabi? What result do you get with that?

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Wed, May 24, 2017 at 10:14 AM, Mandeep Singh <mande...@gmail.com> wrote:
Hello Guys,

I am training data for Punjabi language i am getting space issue. How do i edit config file and how do i make own personel config file for my own custom language. Please assist me.


Output is : ੳਸਦਡਗ
i want and i assume output like this => ੳ ਸ ਦ ਡ ਗ

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.

ShreeDevi Kumar

未読、
2017/05/31 6:24:542017/05/31
To: tesser...@googlegroups.com

The output you posted, is it using the 3.04 traineddata from repo?

What PSM did you use?

Try using the experimental tesseract4 version for windows , see wiki for links.


To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

Mandeep Singh

未読、
2017/05/31 6:46:372017/05/31
To: tesseract-ocr
kindly provide me your email address i want to discuss with this issue. yes i used 3.04 and what does it mean PSM?

ShreeDevi Kumar

未読、
2017/05/31 7:35:552017/05/31
To: tesser...@googlegroups.com

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

ShreeDevi Kumar

未読、
2017/05/31 8:41:102017/05/31
To: tesser...@googlegroups.com
Use --oem 1 (LSTM engine) with tesseract 4.0. You will get correct output.

Use for command line interface

                        binaries from https://github.com/UB-Mannheim/tesseract/wiki

Use for GUI - look for tesseract 4.0 versions

                      gImagesReader  https://github.com/manisandro/gImageReader/releases




ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

Mandeep Singh

未読、
2017/06/01 2:10:222017/06/01
To: tesseract-ocr

There is still space issue. kindly review this attachment .


Please help me out .

issue.PNG

Mandeep Singh

未読、
2017/06/01 3:43:372017/06/01
To: tesseract-ocr
kindly view this issue or please guide me how do i add config file for punjabi language.

ShreeDevi Kumar

未読、
2017/06/01 4:34:342017/06/01
To: tesser...@googlegroups.com
Are you using the 4.0 version of tesseract with --oem 1 (LSTM engine only)?

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

Mandeep Singh

未読、
2017/06/01 4:46:042017/06/01
To: tesseract-ocr
i had install tesseract.exe 4.0 on my system after that i am using jTessBoxEditor 2.0 for training data punjabi language. Thats it. i dont what does it mean by lstm? please guide me

ShreeDevi Kumar

未読、
2017/06/01 5:03:012017/06/01
To: tesser...@googlegroups.com
Please read the wiki links I sent.

If you have installed tesseract 4.0, please test first with the provided traineddata for Punjabi before trying to train.

Most times, existing traineddata provides the best result.



ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

ShreeDevi Kumar

未読、
2017/06/01 5:04:142017/06/01
To: tesser...@googlegroups.com

has the traineddata for 4.0.

Mandeep Singh

未読、
2017/06/01 5:18:332017/06/01
To: tesseract-ocr

ohhh Thank you very much it is working. many many thanks to you.


but i have more questions.

1. if i am training new data still there is space problem.

2. How do i add more data in pan.traindata or can i edit existing traindata?

ShreeDevi Kumar

未読、
2017/06/01 5:24:502017/06/01
To: tesser...@googlegroups.com
Are you training for 3.0 or 4.0?

Do you have spaces between the letters in your training text?


ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

Mandeep Singh

未読、
2017/06/01 6:37:552017/06/01
To: tesseract-ocr

Now i am using Tesseract 4.0 version as per your guidance. I want to train data for version 4.0 . Yes i am making spaces b/w the text but it is not showing spaces b/w the text.
Please now tell me how do i train the data again for the new version.


On Thursday, 1 June 2017 14:54:50 UTC+5:30, shree wrote:
Are you training for 3.0 or 4.0?

Do you have spaces between the letters in your training text?


ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Thu, Jun 1, 2017 at 2:48 PM, Mandeep Singh <mande...@gmail.com> wrote:

ohhh Thank you very much it is working. many many thanks to you.


but i have more questions.

1. if i am training new data still there is space problem.

2. How do i add more data in pan.traindata or can i edit existing traindata?

On Thursday, 1 June 2017 14:34:14 UTC+5:30, shree wrote:

has the traineddata for 4.0.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

ShreeDevi Kumar

未読、
2017/06/01 7:09:572017/06/01
To: tesser...@googlegroups.com

kmpre...@gmail.com

未読、
2019/03/17 15:56:142019/03/17
To: tesseract-ocr
your(Mandeep Singh) code is working for Punjabi because I'm also facing the same problem(space problem).

neet k

未読、
2020/01/12 7:31:212020/01/12
To: tesseract-ocr
Hiii Mandeep Singh,

I am facing same problem related to spaces , using Tesseract to recognize Text from images. The spaces between words are ignored for Punjabi text.

Library : Tess-Two

Platform : Android

it would be grateful if you could help me to fix the problem related to spaces. Hereby, attaching a screenshot, input and output text.

Regards

Tess OCR.jpg

Suresh Anand

未読、
2020/01/12 10:38:552020/01/12
To: tesser...@googlegroups.com
There's a parameter preserve word space .Have a look

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.

Ravneet Kaur

未読、
2020/01/13 1:03:432020/01/13
To: tesser...@googlegroups.com
Please Let me know about Parameter. Thanks

You received this message because you are subscribed to a topic in the Google Groups "tesseract-ocr" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/tesseract-ocr/Q7mFMki7mRk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAMk_d_XbFqUWKb8T9vu%2BEUihGM5HiP_Zmzxh9H-wuboqjXGj1g%40mail.gmail.com.
全員に返信
投稿者に返信
転送
新着メール 0 件