What does ALIGNED TRUTH exactly mean?

56 views
Skip to first unread message

Janghyuk Choi

unread,
Jan 6, 2020, 12:17:41 AM1/6/20
to tesseract-ocr
Hi, I'm training tesseract on my own dataset which only contains digits, comma, and period.
During training phase, the printed output displays at most 3 lines for each text: GROUND TRUTH, ALIGNED TRUTH, BEST OCR TEXT.
The below is an example:



I'm wondering what those ALIGNED TRUTH and BEST OCR TEXT exactly mean.
The Wiki page says just " GROUND TRUTHfor the line is displayed in all cases. ALIGNED TRUTH and BEST OCR TEXT are displayed only when different from the GROUND TRUTH. "

Is there anyone who know what they are?
Thank you for your reply in advanced.

Ashwini Nande

unread,
Jan 6, 2020, 12:21:52 AM1/6/20
to tesser...@googlegroups.com
Hi, 
As per my information ground truth is actual text which is to be read ocr. Best ocr text is ground truths prediction by tesseract 

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/0db6e2d8-a4ae-4bf7-8f49-d4ae62d1426e%40googlegroups.com.

Janghyuk Choi

unread,
Jan 6, 2020, 12:23:21 AM1/6/20
to tesseract-ocr
Image is got broken. Example Here!

Iteration 45: GROUND  TRUTH : 7,555,420 759,541.461
Iteration 45: ALIGNED TRUTH : 7,5555,4420 759,541.461
Iteration 45: BEST OCR TEXT : 6,0999,54.7069697.73,65.792
Screen Shot 2020-01-06 at 14.21.55.png

Janghyuk Choi

unread,
Jan 6, 2020, 1:53:30 AM1/6/20
to tesseract-ocr
Thank you for your kind reply, Ashwini.
Yeah, I totally agree with your mention.

And, could you tell me what the ALIGNED TRUTH is?


2020년 1월 6일 월요일 오후 2시 21분 52초 UTC+9, Ashwini Nande 님의 말:
Hi, 
As per my information ground truth is actual text which is to be read ocr. Best ocr text is ground truths prediction by tesseract 

On Mon, 6 Jan 2020, 10:47 am Janghyuk Choi, <jah...@gmail.com> wrote:
Hi, I'm training tesseract on my own dataset which only contains digits, comma, and period.
During training phase, the printed output displays at most 3 lines for each text: GROUND TRUTH, ALIGNED TRUTH, BEST OCR TEXT.
The below is an example:



I'm wondering what those ALIGNED TRUTH and BEST OCR TEXT exactly mean.
The Wiki page says just " GROUND TRUTHfor the line is displayed in all cases. ALIGNED TRUTH and BEST OCR TEXT are displayed only when different from the GROUND TRUTH. "

Is there anyone who know what they are?
Thank you for your reply in advanced.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesser...@googlegroups.com.

Ashwini Nande

unread,
Jan 6, 2020, 1:35:56 PM1/6/20
to tesser...@googlegroups.com
Hi, 
I need to check for it.

To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/168b38a1-6c49-43c5-8571-d30d9ef480f4%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages