Please teach me What is improved ver. 3.04.

87 views
Skip to first unread message

기옥주

unread,
Feb 25, 2016, 3:32:10 AM2/25/16
to tesseract-ocr

I wonder what is improved ver. 3.04. more detail especially these list.

  • Improved font identification
  • Fixed problems with shifted baselines so recognition can recover from layout analysis errors.
  • Improved single column layout analysis
  • Many bug fixes.

thanks!!

Tom Morris

unread,
Feb 25, 2016, 3:41:00 PM2/25/16
to tesseract-ocr
On Thursday, February 25, 2016 at 3:32:10 AM UTC-5, 기옥주 wrote:

I wonder what is improved ver. 3.04. more detail especially these list.


3.04 compared to what? You can see the changes from 3.03 by using Github with a URL like this:


One of the biggest changes though is not in Tesseract itself, but the associated language data. Almost all languages were updated and many languages added.  You can see a complete list here: https://github.com/tesseract-ocr/langdata/

There are release notes with a summary of the changes here:

Tom

Tom Morris

unread,
Feb 25, 2016, 4:14:52 PM2/25/16
to tesseract-ocr
The correct repo for the language data is:

3.04 added 39 new languages including: amh, asm, aze_cyrl, bod, bos, ceb, cym, dzo, fas, gle, guj, hat, iku, jav, kat, kat_old, kaz, khm, kir, kur, lao, lat, mar, mya, nep, ori, pan, pus, san, sin, srp_latn, syr, tgk, tir, uig, urd, uzb, uzb_cyrl, yid


There are a total of 107 languages supported now.


On Thursday, February 25, 2016 at 3:41:00 PM UTC-5, Tom Morris wrote:
On Thursday, February 25, 2016 at 3:32:10 AM UTC-5, 기옥주 wrote:

I wonder what is improved ver. 3.04. more detail especially these list.


3.04 compared to what? You can see the changes from 3.03 by using Github with a URL like this:


One of the biggest changes though is not in Tesseract itself, but the associated language data. Almost all languages were updated and many languages added. 

peiman F.

unread,
Feb 25, 2016, 4:42:42 PM2/25/16
to tesseract-ocr
@tom

​​but a lot of new language data have poor quality
for example arabic just trained for a font and i didn't see and good result for that
is there any way to reach a better train !?​


Reply all
Reply to author
Forward
0 new messages