Table without borders - getting each cell data as a block

137 views
Skip to first unread message

irtmem intellect

unread,
Sep 28, 2019, 2:06:33 AM9/28/19
to tesseract-ocr
Hi ,


How to get each cell of a table detected as a block , and then the embedded text can be read,  This works in tesserocr with RIL.BLOCK level parameter when the tabbles have inner and outer borders.

But for a table without borders we are not getting the similar output.. We get single row/multiple rows or columns in a single block,

The images are attached .


Regards,

4-tab2.jpg
4-tab2.jpg

irtmem intellect

unread,
Oct 2, 2019, 3:15:27 PM10/2/19
to tesseract-ocr


On Saturday, 28 September 2019 11:36:33 UTC+5:30, irtmem intellect wrote:
Hi ,


How to get each cell of a table detected as a block , and then the embedded text can be read,  This works in tesserocr with RIL.BLOCK level parameter when the tables have inner and outer borders.

But for a table without borders we are not getting the similar output.. We get single row/multiple rows or columns in a single block,

The images are attached .


Regards,


Please find attached the code samples

1. test3.py - Using tesserocr to read blocks within a table.--  tesser-ocr-output.txt  is the output 

2. Image2data.py - Using pytesseract  Image2Data to read data with in a table.-- 3-res1.csv is the output


From the output it is very clear that the data in blocks does not follow any order/consistency. Few blocks have one  cell data and few have more than one. cell blocks as per tesseract OCR. 

If we are able to get one cell data in a single block ,,  then it is useful for us. Is there any configuration required for getting the required output??

Do let us know ,, 

 
test3.py
image2data.py
3-res1.csv
tesser-ocr-output.txt
Reply all
Reply to author
Forward
0 new messages