Hi,
It depends on the PDF. We are using the library pdfbox to extract the text information from the PDF. It has a configuration option to "sort characters by position" which can be on or off.
INCEpTION currently uses the "off" option. This means that we read the text in the order in which it is stored in the PDF data stream. This is generally a good idea, in particular for computer-generated PDFs, because the text is usually stored in reading order. E.g. if you have a two column text, the tool that generates the PDF will usually first write the left column, then the right, then the footer, then the header of the next page, and so on.
The "on" option can help for PDFs in which the above does not work well. However, "sort characters by position" is using a pretty basic algorithm. It does for example not detect multi-column text. So if you start selecting in the left column, the selection would carry over to the parallel row in the right column before it goes to the next row in the left column.
It is a little hard to see in your screenshot what exactly is going on. You might try starting your selection at the end of paragraph 4.3 and slowly move the mouse towards the beginning of the paragraph to see exactly when the undesired text starts getting selected. My guess is it might happen when you reach the "4.3" number. Try it out please.
-- Richard