Hi,
OCR extraction doesn't care about lyrics or chords, or spaces. It simply returns blocks of text. The app then has to try to put the blocks back together. Currently the app simply adds these blocks of text back together with a single space between them and as a result the chords often end up bunched up together. The automatic detection of chord lines is then done by the app, not the ocr. It does this by looking for lines that meet one of the following criteria:
- Contains lots of short (1-3 character) blocks and not much else (the blocks should average at 2.4 characters or less)
- After removing all of the white space (normal spaces), the entire line content should be 25% or less of the entire line length including spaces
A pass of either of these identifies the line as likely to be a chord line (the app will add . at the beginning of the line)
Here's some examples from your song showing why the lines have been determined to be lyrics or not.
G Bb
The average character length is (1 + 2)/2 = 3/2 = 1.5 - passes check 1
The percentage of the line used (text / all text including spaces) is = (3/4) * 100 = 75% - fails check 2
Passing one of these suggests that it is likely to be a chord line
Em D6 Cmaj7 H7
The average character length is (2 + 2 + 5 + 2)/4 = 11/4 = 2.75 - fails check 1
The percentage of the line used (text / all text including spaces) is = (11/14) * 100 = 79% - fails check 2
Failing both of these suggest that it is not a chord line
It will never be perfect though as the next line shows
La la la lah
The average character length is (2 + 2 + 2 + 3)/4 = 9/4 = 2.25 - passes check 1
The percentage of the line used (text / all text including spaces) is = (9/12) * 100 = 75% - fails check 2
This would wrongly be identified as a chord line.
I've had a go at improving the OCR feature by manually trying to guess how many spaces might fit between the text blocks and add these back in. It isn't perfect when the font used isn't a monospaced font, but it should help. I will also increase step 1 to an average of 2.75 or less and increase the percentage used to be a max of 30% to help with some other lines. I've got quite a lot of other code changes in progress, so I can't release this until I fix these too.
Watch out for the next release (it will be the one that includes light/dark modes for the entire app, not just the song display).
Using your PDF with the new code would give this output (not perfect, but definitely better)
Best wishes,
Gareth