how to use tesseract to detect table?

21,339 views
Skip to first unread message

Johnny ho

unread,
Jul 18, 2016, 6:05:54 PM7/18/16
to tesseract-ocr
Are there any examples to show how to use Tesseract to detect tables in an images?

dinh van Chinh

unread,
Aug 26, 2016, 6:44:36 AM8/26/16
to tesseract-ocr

Azka Gilani

unread,
Apr 17, 2017, 4:03:03 AM4/17/17
to tesseract-ocr
@johnny did you find anything in that? i am stuck on the same problem.
@dinh van Chinh that method doesn't use tesseract api!

Ibr

unread,
Jun 5, 2017, 7:37:16 AM6/5/17
to tesseract-ocr

Hi, I think for detecting an image which contains a table you should use the argument --psm # with the detection command, psm stands for Page Segmentation Mode, the default is 3 I think for a table use 6 so it will be --psm 6 , anyway just type tesseract and it will be printed on the terminal what arguments the tesseract has, also on the terminal will be printed "Page segmentation modes:" which will show every value of the --psm argument and what does it mean

akhil katpally

unread,
Jun 7, 2017, 3:39:08 PM6/7/17
to tesseract-ocr
You can use tesseract parameters .. internally tesseract detects the tables you can leverage that information and print it out ... and also one of the parameter will print you out the detected table information (coordinates). 
textord_dump_table_images ---  Show table regions (this would dump intermediate images which will ) 
textord_tablefind_show_stats  -------    Show page stats used in table finding 
and there are some more you can try them ... 
to use the parameters in the command line you can use -c option followed by parameters. 

Hope Scheffert

unread,
Jul 11, 2017, 4:30:56 AM7/11/17
to tesseract-ocr
Can you explain how you get just the bounding box coordinates for the recognized/detected table?
I have it popping up in a window using ViewerDebugger so I know it's doing it and it has the information, I just don't know how to get it. I need just the coordinates of the table it found so that I can then crop just the table.

My config file:
tessedit_create_txt 1
tessedit_create_tsv 1
tessedit_create_hocr 1
hocr_font_info 0
tessedit_pageseg_mode 1
textord_dump_table_images 1
textord_tablefind_show_stats 1
textord_tablefind_recognize_tables 1
textord_tabfind_find_tables 1
gapmap_debug 1
textord_show_tables 1

Any idea what I'm doing wrong/how to get the data?
Thanks,
Hope

hux _0

unread,
Feb 25, 2020, 11:24:43 AM2/25/20
to tesseract-ocr

Essam Zaky

unread,
Feb 26, 2020, 1:41:44 AM2/26/20
to tesser...@googlegroups.com
would you download the article you described and attach it here , because the medium site needs payed registration  

‫في الثلاثاء، 25 فبراير 2020 في 6:24 م تمت كتابة ما يلي بواسطة ‪hux _0‬‏ <‪hucker...@gmail.com‬‏>:‬
Check this out, its a guide how to do it https://medium.com/@hucker.marius/a-table-detection-cell-recognition-and-text-extraction-algorithm-to-convert-tables-to-excel-files-902edcf289ec

Am Dienstag, 19. Juli 2016 00:05:54 UTC+2 schrieb Johnny ho:
Are there any examples to show how to use Tesseract to detect tables in an images?

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/17afc6ab-6b03-43a7-8141-a153fcaaf100%40googlegroups.com.

Zdenko Podobny

unread,
Feb 26, 2020, 2:04:12 AM2/26/20
to tesser...@googlegroups.com

st 26. 2. 2020 o 7:41 Essam Zaky <essa...@gmail.com> napísal(a):

KOLLOL CHOWDHURY

unread,
Feb 26, 2020, 5:02:14 AM2/26/20
to tesseract-ocr
Does anyone have solution to this? In the newer tesseract(4.x), the option textord_dump_table_images isn't available.
I tried with tesseract 3.04, it works but creates an empty tess-table.txt file. I expected it to have co-ordinates of the detected file. Although when i use the option 
 gapmap_debug=1 , it shows Table found , but i don't get to see the co-ordinates.

Any help??

Thanks

mit

unread,
Feb 26, 2020, 5:07:18 AM2/26/20
to tesseract-ocr
I am using the below command:

tesseract test1.tif test -c textord_tabfind_find_tables=1 -c textord_tablefind_recognize_tables=1 -c textord_dump_table_images=1 -c gapmap_debug=1.

Let me know if i am missing anything.

Thanks

Zdenko Podobny

unread,
Feb 26, 2020, 5:43:24 AM2/26/20
to tesser...@googlegroups.com
maybe have a look at 

(I have no time to test it yet)
 
Zdenko


st 26. 2. 2020 o 11:02 KOLLOL CHOWDHURY <kollol...@gmail.com> napísal(a):
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.

Zdenko Podobny

unread,
Feb 26, 2020, 5:53:13 AM2/26/20
to tesser...@googlegroups.com

st 26. 2. 2020 o 8:03 Zdenko Podobny <zde...@gmail.com> napísal(a):
Reply all
Reply to author
Forward
0 new messages