As preprocessing steps I made a few:
1) DPI is as high as possible (letters are about 30-50 pixel high)
2) adaptive thresholding is used to remove the most of the noise and it works quite well
3) image is framed with white rectangle.
I didn't do:
1) deskewing - image is sometimes not perfectly horizontal (but it's just couple degrees off)
2) any of morphology filters such as erosion, dilation: in the most cases it was worsening results
3) any other image processing (bluring, enhancing, smoothing etc.)
Not sure if any other ideas were proposed. What makes me wonder is why those boxes are well placed some times and the other time placed just plain awfully? The biggest problem - as you can see - is taking two lines as one. I used also version without adaptive threshold, but the problem stays the same.