Need to understand the wordStr File format

57 views
Skip to first unread message

Gaurav Shegokar

unread,
Mar 13, 2021, 1:20:29 AM3/13/21
to tesseract-ocr
I am planning to generate the training data in wordStr format, 

consider the following example - 

WordStr 114 4640 1907 4692 0 #Information Groups for public OPTIONAL, jaundice Proterozoic Have LOCATION 
1908 4640 1912 4692 0

From above data 
[114, 4640, 1907, 4692] is the bounding box for the text that is -> "Information Groups for public OPTIONAL, jaundice Proterozoic Have LOCATION"
But I am confused about the second line => "\t 1908 4640 1912 4692 0"

Why do we need it and what it represents. [1908, 4640, 1912, 4692] this bounding box represents what information. 

Best,
Gaurav.

Reply all
Reply to author
Forward
0 new messages