--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/a8162fc0-edb2-4b7d-93b8-f2bb99612f0b%40googlegroups.com.
Thanks for the info. It looks like a helpful set of tools.Please confirm whether this is for training legacy tesseract and which versions of tesseract are compatible with it.
On Sun, Jan 5, 2020, 02:22 Wincent Balin <wincen...@gmail.com> wrote:
--Hi all,I would like to announce pytesstrain, a collection of Tesseract training tools, as well as the underlying library. The tools were created while training Tesseract to recognise Akkadian language (stay tuned for more posts!), to solve the problems that emerged in the process.You can install it with pip install pytesstrain.The PyPI page for the package is https://pypi.org/project/pytesstrain/. The GitHub project page is https://github.com/wincentbalin/pytesstrain.This package contains the tools to create dictionary data (wordlist, bi- and unigram lists, etc.), rewrap lines in text files to the specified length, collect most frequent recognition errors and dump them into unicharambigs file, and to perform recognition metrics (WER and CER). It also contains the run_test() function, which creates an image file from the given string and performs OCR on it afterwards, as well as its parallelised version, run_tests(), which can be used in future tools.Feedback, suggestions, etc would be most welcome.Yours truly,Wincent
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesser...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/3df5801b-7119-4451-9bb5-5fabc3e66bb1%40googlegroups.com.
By the way, I added a create_ground_truth utility, which creates .gt.txt files as well as the associated .tif files for every specified font, to the package. I think it could be useful for anyone who does not have a ground truth collection yet.
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/ec83d722-4bac-46cf-b501-d4d990816596%40googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/ec83d722-4bac-46cf-b501-d4d990816596%40googlegroups.com.