I was referring to the image sample you posted where there are three columns.
Regarding the new diagrams, I do not know what informations you need and if all the diagrams have the same layout.
Anyway I would first cut individual boxes from the bottom right table or at least three columns. I would also isolate the top left text.
I could also isolate individual lines of text (something like
this) and process each fragment, this should be easy in the first sample you posted.
This could be quite easy or a nightmare it really depends on the images layout, how many variations, what is constant and what not. For example, first crop the top-left corner, then run the "text detection" code.
If you should find performance problems just merge all the fragments into a single image and process that.
I do not know your project, but usually you also need to be able to tell what the text means (part number, description, etc.) and this means that you need to know where each fragment comes from.
Even if you could process the first sample in one pass, later it would be hard to understand what the text means, what was on the left, on the right, etc. But I do not know what you need the text for.
Lorenzo