Improving text recognition in musical scores

226 views
Skip to first unread message

Max Poliakovski

unread,
Jan 21, 2018, 8:19:57 PM1/21/18
to tesseract-ocr
Hello,

the Audiveris music scanner utilizes Tesseract OCR v3.05.01 for recognition of textual items. The OCR is invoked after all basic musical objects (staves, notes, beams) have been recognized.

Text recognition is performed on the preprocessed image with staves removed. Tesseract is currently executed in the PSM_AUTO mode. Text language(s) will be usually specified a priori by the user.

We're currently looking for ways to improve text recognition because the current results we obtain with Tesseract are far from being satisfactory.

Needless to say, musical scores usually represent a very difficult target for OCR systems. In order to understand why, let us analyze textual items in such a score (see attachment):
  1. we got the title of piece, its composer and the arranger's name written in bold typeface
  2. there is a tempo indication ("With conviction") that contains a musical symbol (the crotchet) Tesseract fails to recognize properly
  3. the lyrics are scattered between the staves in form of syllables followed by whitespaces and hyphens ("-"/"_")
  4. chord symbols are located above the staves and usually contains characters and character sequences confusing the OCR
The above mentioned is just the tip of the iceberg because the items from the categories 1-3 can be written in different languages or even mix several languages together.

Improved recognition of lyrics(3) and chords(4) is crucial because of their importance for the musical context.

What can be done in oder to tweak Tesseract towards a better recognition of scattered syllables (as in the case of lyrics) and unusual character sequences (as in the case of chords)?

We'd greatly appreciate any suggestions.

Thank you in advance!
Cheers
Max Poliakovski from Audiveris project
score_with_text.tiff

ShreeDevi Kumar

unread,
Jan 22, 2018, 3:22:19 AM1/22/18
to tesser...@googlegroups.com
You could try tesseract4.0.0alpha(latest commit from master branch) which will allow you to use 'Latin' traineddata which supports most languages written in Latin script. See if that gives you better recognition for the text.

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/f062f430-35ac-4010-8e80-e1864d3f1cb3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Art Rhyno.

unread,
Jan 23, 2018, 8:22:48 AM1/23/18
to tesser...@googlegroups.com

If your process to identify musical objects gives coordinates, you might be able to leverage those to divide the image into smaller sections and then apply tesseract to those. I tried removing lines from the image with leptonica and then using olena to identify text sections on the page (olena will think the staves designate text without removing the lines). The attachment shows how close olena could get to identifying text sections, I suspect the trick is an approach like this where you extract the text regions and then use tesseract on them individually.

 

art

--

You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.

To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.

score.jpg

Max Poliakovski

unread,
Jan 23, 2018, 6:22:59 PM1/23/18
to tesseract-ocr

You could try tesseract4.0.0alpha(latest commit from master branch) which will allow you to use 'Latin' traineddata which supports most languages written in Latin script. See if that gives you better recognition for the text.

Thanks, I'll definitely try out tesseract v4 as soon as its Java bindings become available.

Max Poliakovski

unread,
Jan 23, 2018, 7:43:42 PM1/23/18
to tesseract-ocr
Hello Art,


If your process to identify musical objects gives coordinates, you might be able to leverage those to divide the image into smaller sections and then apply tesseract to those. I tried removing lines from the image with leptonica and then using olena to identify text sections on the page (olena will think the staves designate text without removing the lines). The attachment shows how close olena could get to identifying text sections, I suspect the trick is an approach like this where you extract the text regions and then use tesseract on them individually.


The results you obtained with Scribo look promising! It sounds like Scribo could help to overcome shortcomings in the Tesseract's layout analysis Audiveris is currently relying on.

There are still several difficult cases we need to address, among those:
  • lyrics syllables consisting of a single character (mostly a vowel). I doubt Scribo/Tesseract would be ever able to recognize those automatically
  • dynamics written in italic (p mf ff fff)
  • certain character sequences being mis-interpreted as text (tuplets symbols involving brackets)
It looks like we need to adapt a more sophisticated approach instead of the current "single pass" one. Here is a sketch:

1) image preprocessing and binarization
2) labeling of staves and long-and-thin symbols (beams, slurs etc.) because those will likely confuse OCR layout analysis
3) temporal removal of symbols labeled in step 2
4) OCR layout analysis (without actual text recognition)
5) recognition of fixed-shape musical symbols
6) recognition of textual items
7) putting everything into a graph and trying to find a feasible interpretation of the data gathered during 1-6
8) interactive refinement involving human operator

Because a fully automatic text identification isn't possible (as opposite to addressing the most common cases), a simple UI letting the user to verify/correct the result of the layout analysis could be incorporated after step 4.

Let's assume we've successfully identified all text items. Now we need to properly recognize them which raises another challenge.

Chords, for example, utilize a very restricted symbol set and can also contain musical symbols like ♯,♭as well as superscript characters. I'm afraid that we have to train Tesseract to recognize musical symbols first and then play with specifying external grammars, disabling dictionaries and using "whitelists". Otherwise, Tesseract will most likely spit out garbage instead of properly recognized chords.

Is there someone that was able to successfully recognize unusual character sequences (math formulas, special codes etc.) with Tesseract? Which tricks were involved? Real-worlds examples would be great...

For lyrics, we'll need to tell Tesseract to consider standalone syllables as part of longer words for ambiguities to be resolved automatically. One possibility is to remove whitespaces between the syllables relying on some heuristics. I'm afraid we will end up having a fragile system...

Further ideas?

Art Rhyno.

unread,
Jan 23, 2018, 9:07:33 PM1/23/18
to tesser...@googlegroups.com

Hi Max,

 

Gosh, I am out of my depth on most of this. You might have an odd advantage with some of the unique symbols since they might lend themselves to something like template matching. Best of luck,

 

art

 

From: 'Max Poliakovski' via tesseract-ocr [mailto:tesser...@googlegroups.com]

Sent: Tuesday, January 23, 2018 7:44 PM
To: tesseract-ocr <tesser...@googlegroups.com>

--

You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

Max Poliakovski

unread,
Jan 23, 2018, 9:28:41 PM1/23/18
to tesser...@googlegroups.com
Hi Art,

Gosh, I am out of my depth on most of this.


Oh, sorry for my long and overloaded post! (I should probably split it up several parts to avoid further confusion).
You already helped me alot by recommending Scribo. I'll definitely investigate it further.
 

You might have an odd advantage with some of the unique symbols since they might lend themselves to something like template matching.


Yes, sure. For this to work, we first need to identify these symbols in the image. I believe that training Tesseract to recognize some common music symbols and specifying the grammar of chords should do the thing. As usual, the devil's in the details. I simply lack any experience in this area (OCR training).

Thank you very much!
Cheers
Max
Reply all
Reply to author
Forward
0 new messages