Reading handwritten subscripts and superscripts in tesseract

106 views
Skip to first unread message

Sampurn Rattan Jain

unread,
Jan 4, 2017, 2:10:58 AM1/4/17
to tesseract-ocr
I want to read handwritten mathematical equations, which will not only contain regular the English alphabet, but also Greek  letter, along with numbers, subscript and superscript.

I have had some experience with reading handwritten notes using tesseract.

I have attached a sample image.

I am open to all advice, suggestions and guidance. 
image1.png

Jed Isom

unread,
Jan 6, 2017, 1:52:11 AM1/6/17
to tesseract-ocr
When I recently installed my version of tesseract, one of the language options was "Math / equation detection module".  Have you tried that yet?

Sampurn Rattan Jain

unread,
Jan 15, 2017, 1:56:21 PM1/15/17
to tesseract-ocr
I am facing similar issues this guy is: https://groups.google.com/d/msg/tesseract-ocr/_V7pOll2kPo/JKkJGJMNqUAJ
That is, tesseract is returning garbage values.

I downloaded this training file from tesseract's git.
Ran the following command on this image.
tesseract -l equ ~/255286.png ~/one

cat ~/one.txt
∍⊥↙⊹≳−−∐ ∘↕ ∍⊥↙⊹≳− ∐

⇄ ⋮

⇄↸∂⊺≍≁⇄⊢≇≼−∐≱ ⇄↸∍⊺≍≁≇⊢⊈↻↥⊐
∍≍∙⊦⇂≨∶−∅≵ ∍≍∙⊦⇂≨∶⇄≀
∍∷≔ ≵∊ ∍∷≔↕⊠

≳∊ ∷∶∊

    



My tesseract information:
tesseract 3.04.00
 leptonica-1.72
  libjpeg 8d (libjpeg-turbo 1.3.0) : libpng 1.2.50 : libtiff 4.0.3 : zlib 1.2.8
Reply all
Reply to author
Forward
0 new messages