You might consider using the lines to identify columns in the table using OpenCV. There is an example here [1] of removing lines, but you can also use the same approach to try to identify line coordinates. With the coordinates, you could then try to extract the columns of numbers and work through them from there. Your sample image is challenging, my sense is that Tesseract could do a lot if you can segment the table into individual numbers and leverage Tesseract’s accuracy metrics. You would probably want a lot of very consistent layouts to justify the effort to do that.
Best,
art
---
From: tesser...@googlegroups.com <tesser...@googlegroups.com>
On Behalf Of Sean Pham
Sent: Thursday, October 3, 2024 1:17 PM
To: tesseract-ocr <tesser...@googlegroups.com>
Subject: [tesseract-ocr] Table Extraction using Tesseract
|
You don't often get email from seanp...@gmail.com. Learn why this is important |
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
tesseract-oc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/09b8e0d0-cea5-4c86-95cf-f6b8e95fd0d4n%40googlegroups.com.