--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/f062f430-35ac-4010-8e80-e1864d3f1cb3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
If your process to identify musical objects gives coordinates, you might be able to leverage those to divide the image into smaller sections and then apply tesseract to those. I tried removing lines from the image with leptonica and then using olena to identify text sections on the page (olena will think the staves designate text without removing the lines). The attachment shows how close olena could get to identifying text sections, I suspect the trick is an approach like this where you extract the text regions and then use tesseract on them individually.
art
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
You could try tesseract4.0.0alpha(latest commit from master branch) which will allow you to use 'Latin' traineddata which supports most languages written in Latin script. See if that gives you better recognition for the text.
If your process to identify musical objects gives coordinates, you might be able to leverage those to divide the image into smaller sections and then apply tesseract to those. I tried removing lines from the image with leptonica and then using olena to identify text sections on the page (olena will think the staves designate text without removing the lines). The attachment shows how close olena could get to identifying text sections, I suspect the trick is an approach like this where you extract the text regions and then use tesseract on them individually.
Hi Max,
Gosh, I am out of my depth on most of this. You might have an odd advantage with some of the unique symbols since they might lend themselves to something like template matching. Best of luck,
art
From: 'Max Poliakovski' via tesseract-ocr [mailto:tesser...@googlegroups.com]
Sent: Tuesday, January 23, 2018 7:44 PM
To: tesseract-ocr <tesser...@googlegroups.com>
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
tesseract-oc...@googlegroups.com.
To post to this group, send email to
tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/f0d34df5-9bd8-4ed6-9c27-06a8eeecfa64%40googlegroups.com.
Gosh, I am out of my depth on most of this.
You might have an odd advantage with some of the unique symbols since they might lend themselves to something like template matching.