I was thinking that perhaps I could enforce that every fourth txt is in alphabetical order and use that to detect misalignment? But if even the first row is incorrect, I'm not sure how much I want to hard code corrections. Additionally, sometimes the multiple line entries arise from the address column while other times it arises from the name column (e.g. 258 Interstate Commercial Park Loop on the left-hand side of the page).
Below are some screenshots of mixups on the left and right.
Any help would be greatly appreciated! Thank you!
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/fbdeeed7-87b6-4e8c-9cf9-d91e0d84f04an%40googlegroups.com.
You received this message because you are subscribed to a topic in the Google Groups "tesseract-ocr" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/tesseract-ocr/WUDHFmyadXE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduU3XnZ2wgnNtkAJqpA5tr-GQk3aR0j2-fAxRKL5TPWiqg%40mail.gmail.com.