Tesseract with Thai language

pvsk....@gmail.com

unread,

Jan 28, 2019, 2:50:30 AM1/28/19

to tesseract-ocr

Hi,

I am using Tesseract OCR v 4 for extracting text form an Thai language image file. I am able to extract the Thai characters perfectly on Windows environment whereas when I extract the same on Ubuntu I found spaces between the characters in the extracted text.

Can any one help me out on this?

Thanks in advance.

KM

易鑫

unread,

Jan 29, 2019, 11:34:55 PM1/29/19

to tesseract-ocr

Please upload your image file,I can try in my environment.

<pvsk....@gmail.com> 于2019年1月28日周一下午3:50写道：

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/4538ee12-d9ab-4851-a8f3-bdbb8a8f3ffd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Shree Devi Kumar

unread,

Jan 30, 2019, 3:06:42 AM1/30/19

to tesser...@googlegroups.com

> I am able to extract the Thai characters perfectly on Windows environment whereas when I extract the same on Ubuntu I found spaces between the characters in the extracted text.

What are the exact versions of tesseract in both environments?

`tesseract -v`

Also, which trineddata file are you using on each (tessdata, tessdata_best or tessdata_fast)

To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAPiKE231nbHvqvyG88fjoK%2BKS7A-_N6%2BiWML%2BdSqNce1m8_kWQ%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

--

____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

Reply all

Reply to author

Forward