--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/edd64e28-9e52-4b44-80cc-0aaa442caa85%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Ray was looking for comparative feedback regarding the new traineddata for RTL languages, so this will be useful.As far as I know, Google Docs does not use tesseract OCR engine for recognizing the text.
Its OCR accuracy is better than Tesseract for some Indian languages also. However, it doesn't seem to handle tifs, and processes only first 10 pages of a pdf.ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.comOn Sun, Aug 16, 2015 at 7:14 PM, Hossein Razizadeh <sm.h...@gmail.com> wrote:It seems 'fas' is for Persian, but there are no cube files, resulting in poor results. Arabic language files work much better for Persian images. There is another 'per' folder for Persian, but there isn't even '.traieddata' file for it. Does anyone know if 'Google Doc' has used 'Tesseract' for its OCR engine? Google Docs performs OCR for Persian images with good accuracy!--
On Saturday, July 18, 2015 at 8:14:07 AM UTC+4:30, Jeff Breidenbach wrote:I think 'fas' is the language code for Persian.
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/edd64e28-9e52-4b44-80cc-0aaa442caa85%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduX%2B9UqeXbWr-E7sADWK3SeyjiyUiJBH6wSJoMy_E2geuQ%40mail.gmail.com.
On Mon, Aug 17, 2015 at 6:07 AM, ShreeDevi Kumar <shree...@gmail.com> wrote:Ray was looking for comparative feedback regarding the new traineddata for RTL languages, so this will be useful.
As far as I know, Google Docs does not use tesseract OCR engine for recognizing the text.Interesting. Can you please clarify source of your knowledge?
Its OCR accuracy is better than Tesseract for some Indian languages also. However, it doesn't seem to handle tifs, and processes only first 10 pages of a pdf.
--On Sun, Aug 16, 2015 at 7:14 PM, Hossein Razizadeh <sm.h...@gmail.com> wrote:It seems 'fas' is for Persian, but there are no cube files, resulting in poor results. Arabic language files work much better for Persian images. There is another 'per' folder for Persian, but there isn't even '.traieddata' file for it. Does anyone know if 'Google Doc' has used 'Tesseract' for its OCR engine? Google Docs performs OCR for Persian images with good accuracy!--
On Saturday, July 18, 2015 at 8:14:07 AM UTC+4:30, Jeff Breidenbach wrote:I think 'fas' is the language code for Persian.
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/edd64e28-9e52-4b44-80cc-0aaa442caa85%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduX%2B9UqeXbWr-E7sADWK3SeyjiyUiJBH6wSJoMy_E2geuQ%40mail.gmail.com.--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-ocr.
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8wxnq4BBwAZD%2BL-7rg80z2FmRpCQg4b8QMaXi-SLUoUcQ%40mail.gmail.com.