Hi everyone,
Tesseract does not have much in the way of a testing infrastructure.
Tesseract does have the UNLV OCR accuracy test. Setting this up is described in testing/README (although the information in that file is out of date) so there is nothing community contributors can use to easily check that their changes have not broken Tesseract in some way.
Continuous integration currently only checks if Tesseract builds:
I propose a simple validation testing framework. Testing could focus on these areas:
-CLI validation
-input validation, such as rejection of malformed images
-output validation - are PDFs well-formed, etc.
-API validation - to avoid unintentional API breakage as came up recently between 3.03 and 3.04
-ensuring that accelerated versions of Tesseract (OpenMP, OpenCL, AVX) produce results that match non-accelerated builds, within reasonable tolerance
Ideally this would be a quick series of tests that a typical developer PC could grind through in 5-10 minutes, so that people will actually use it, and so it's suitable for CI. The tests shouldn't be dependent on OCR accuracy or repeatability, so that OCR can improve.
To simplify the creation of tests, I would use Python 3, pytest and pybind11. Python 3's excellent Unicode support makes it a sound choice. pytest is a solid and widely used testing framework that eliminates a lot of the busywork of writing tests. pybind11 is a terrific Python to C++11 header-only template library that makes it easy to write tests against C++.
Are the core developers interested in seeing this added?
--
You received this message because you are subscribed to the Google Groups "tesseract-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-dev+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-dev.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-dev/76bf9d0a-c66a-4123-8f07-dea1305fba25%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
My vote is for C++ test suite.1. Python is unnecessary dependency.2. C++ has a lot of nice test libraries.3. std::string handles utf8 good enough.4. C++ can give you every other possible opportunity. While with python you can expect some weaknesses. (They begin from pybind11 in your post because of py->c++ connection.)
--
You received this message because you are subscribed to the Google Groups "tesseract-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-dev+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-dev.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-dev/f3c4ca72-8e9f-45f0-a94e-cbf956e49a9b%40googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-de...@googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-dev+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-dev.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-dev/a17ff71d-9b1b-48ff-9101-1b941f822d83%40googlegroups.com.
Yes, Google Test is the framework that I meant.I had a quick look at what it would take, and decided the biggest difficulty would be the images that are used in the tests. There will be copyright concerns, as they are taken from a variety of sources.One simple solution might be to solicit submissions of test images and/or collect images from existing issues that have been submitted to the github site.The tests could then be modified to expect appropriate results from these images instead of the ones that are currently used.
--
You received this message because you are subscribed to the Google Groups "tesseract-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-dev+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-dev.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-dev/bb30f437-58a9-4832-bf04-9e79e80a71ee%40googlegroups.com.
I'm taking a look at the Google Test framework. Inside Google, all the testsrun in the cloud. As far as I can tell, that is not the case for a GitHub project.I think they expect you to run tests locally, with "make" or "cmake" somethinglike that.
--
You received this message because you are subscribed to the Google Groups "tesseract-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-dev+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-dev.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-dev/bc710c20-864f-456b-883b-72ac55b6d412%40googlegroups.com.
On Fri, Jul 21, 2017 at 11:35 PM, Ray Smith
I have the OK to "throw the tests over the wall" already. ie provide them in a non-working form.There are actually very few copyrighted images that would need to be replaced. Most of the tests run on synthetic data, existing test data, or don't require images.If someone would put together the build pieces necessary to build and run an empty test (using Google test), then I will port at least one example, and then push the rest out.
I could build Google test locally in tesseract directory with the following commands and run the sample test.git submodule add https://github.com/google/googletest.gitln -s ./googletest/googletest ./testcd test/makemake./sample1_unittest
ShreeDevi
____________________________________________________________
Making GoogleTest's source code available to the main build can be done a few different ways:
Ray, why don't you throw a tiny piece of one test "over the wall" right now and we'll see if we can get it to run.
--
You received this message because you are subscribed to the Google Groups "tesseract-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-dev+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-dev.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-dev/0ce45137-fef2-4519-a0c4-bae5fff44bcc%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-dev/CAGuE8nUOiNfGeKkVU5EVMsWChj7TH4fa5n80P0_Ufoq9jcnrZg%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-dev/CAG2NduXiOfux_veK4%3D3MNgYcT9EPM2_xmg%3D5Raa1xKyAAn-AWA%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-dev/CAGuE8nXAJD7EJW6oZBDZiLySFpp9gB2g4TwRS7Hf1ZdQDMLK%2BQ%40mail.gmail.com.