Doubt about using 5.0.0-beta-20210916 before release version is available

87 views
Skip to first unread message

juan carlos hernández

unread,
Oct 19, 2021, 6:56:33 AM10/19/21
to tesseract-ocr
Hello
I'm working in a project that needs OCR and we have choosed to use Tesseract. We would like to use v5.0.0, but our IT Infrastructure team doesn't want to install it because it is a beta version. I've read in another conversation (https://groups.google.com/g/tesseract-dev/c/Js2Hy22d1B0) that version 5.0.0 can be considered stable. We are going to use tesseract from scratch, so we don't have any problem with backward compatibility. 

Please can you confirm that version  5.0.0-beta-20210916 can be installed in production environments? 
In any case, do you have a planned date for the 5.0.0 release?

Thanks in advance

Shree Devi Kumar

unread,
Oct 19, 2021, 7:35:09 AM10/19/21
to tesseract-ocr

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/217930bf-7032-4c8d-9947-a8b348410726n%40googlegroups.com.

Merlijn B.W. Wajer

unread,
Oct 19, 2021, 10:05:19 AM10/19/21
to tesser...@googlegroups.com
Hi,

On 19/10/2021 11:08, juan carlos hernández wrote:
> Hello
> I'm working in a project that needs OCR and we have choosed to use
> Tesseract. We would like to use v5.0.0, but our IT Infrastructure team
> doesn't want to install it because it is a beta version. I've read in
> another conversation
> (https://groups.google.com/g/tesseract-dev/c/Js2Hy22d1B0) that version
> 5.0.0 can be considered stable. We are going to use tesseract from
> scratch, so we don't have any problem with backward compatibility. 
>
> Please can you confirm that version  5.0.0-beta-20210916 can be
> installed in production environments?

I would say that you can. At archive.org we've been running millions
(literally) of documents through Tesseract 5.0 alpha and beta:

https://archive.org/search.php?query=ocr%3A%22tesseract%205.0.0%2A%22

Regards,
Merlijn

Lorenzo Bolzani

unread,
Oct 19, 2021, 10:47:36 AM10/19/21
to tesser...@googlegroups.com
Hi Merlijn,
out of curiosity, did you note an impovement over the previous version?

Thanks

Lorenzo

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.

Merlijn B.W. Wajer

unread,
Oct 19, 2021, 12:37:09 PM10/19/21
to tesser...@googlegroups.com
Hi,

On 19/10/2021 16:47, Lorenzo Bolzani wrote:
> Hi Merlijn,
> out of curiosity, did you note an impovement over the previous version?

Yes. Speed and stability is better, and accuracy is also up (IMHO). See
(for example) this link:
https://github.com/tesseract-ocr/tesseract/pull/3141

Regards,
Merlijn

juan carlos hernández

unread,
Oct 20, 2021, 4:32:39 AM10/20/21
to tesseract-ocr
Hi

Many thanks for your answers. 

I've checked pending milestones of v 5.0.0 (https://github.com/tesseract-ocr/tesseract/milestone/6), and from the 17 pending ones, I see that some of them come from v4, and others are related to training or languages different from english or spanish, . 
So the only issues that could really affect my project are these ones:
These two issues are only related to performance. As v5 runs 3 or 4 times faster in my tests, I don't see any reason for not using current beta version in production environments. We also have the experience that the archive.org project has had with v5, which is very positive.
I think I can try to convince my IT infraestructure team with these arguments. In any case I think I can also wait for the release of 5.0.0 if it doesn´t take months to be delivered.

Regards
Juan Carlos
Reply all
Reply to author
Forward
0 new messages