tessedit_create_boxfile condensed like boxaGetBox

102 views
Skip to first unread message

Baris Unsal

unread,
Apr 21, 2021, 8:17:04 AM4/21/21
to tesseract-ocr
Hi, when I pass tessedit_create_boxfile 1 argument to tesseract it outputs individual chars' location. But when I use api like this:

```
Boxa* boxes = api->GetComponentImages(tesseract::RIL_TEXTLINE, true,NULL,NULL);
for(int i = 0; i < boxes->n; i++){
BOX* box =boxaGetBox(boxes,i,L_CLONE);
api->SetRectangle(box->x,box->y,box->w,box->h);
char* outText = api->GetUTF8Text();
int conf = api->MeanTextConf();
fprintf(stdout,"Box[%d]: x=%d, y=%d, w=%d, h=%d, confidence: %d, text: %s",
i, box->x, box->y, box->w, box->h, conf, outText);
boxDestroy(&box);
delete[] outText;
}
```
it outputs whole line like this:
Box[1]: x=36, y=84, w=246, h=14, confidence: 44, text: #Spor #siyaset Fanket FIliskiler

Is there any way to combine individual boxes to print like API? Thanks in advance.






############
### Environment

* **Tesseract Version**: <!-- compulsory. you must provide your version -->
tesseract 4.1.1-rc2-25-g9707
 leptonica-1.78.0
  libgif 5.1.4 : libjpeg 6b (libjpeg-turbo 1.5.2) : libpng 1.6.36 : libtiff 4.1.0 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0
 Found AVX2
 Found AVX
 Found FMA
 Found SSE
 Found libarchive 3.3.3 zlib/1.2.11 liblzma/5.2.4 bz2lib/1.0.6 liblz4/1.8.3 libzstd/1.3.8

* **Platform**: <!-- either `uname -a` output, or if Windows, version and 32-bit or 64-bit -->
Linux pardus 4.19.0-13-amd64 #1 SMP Debian 4.19.160-2 (2020-11-28) x86_64 GNU/Linux

Zdenko Podobny

unread,
Apr 21, 2021, 9:41:24 AM4/21/21
to tesser...@googlegroups.com
Hello,

it is unclear for what do you do/want to do:
  • you wrote want individual chars, but request from API line (RIL_TEXTLINE)
  • then you wrote " Is there any way to combine individual boxes to print like API" so what do you want to combine?
Maybe it would be better if you provide input images and desired output...

Zdenko


st 21. 4. 2021 o 14:17 Baris Unsal <yosoyl...@gmail.com> napísal(a):
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/afa7a425-7946-4bf1-b6f6-7f5d39ab2d6cn%40googlegroups.com.

Quan Nguyen

unread,
Apr 21, 2021, 10:36:35 AM4/21/21
to tesseract-ocr
I think it would need to operate at RIL_SYMBOL level, not RIL_TEXTLINE.

Baris Unsal

unread,
Apr 21, 2021, 10:38:52 AM4/21/21
to tesseract-ocr
I want the opposite way. Getting ril_textline like output from passing argument to tesseract.

Zdenko Podobny

unread,
Apr 21, 2021, 2:22:17 PM4/21/21
to tesser...@googlegroups.com
Use tsv output but you will still need to parse it to get line information.

Zdenko


st 21. 4. 2021 o 16:38 Baris Unsal <yosoyl...@gmail.com> napísal(a):
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages