What does word level confidence of zero mean?

196 views
Skip to first unread message

farhad khalafi

unread,
Aug 18, 2019, 9:10:41 AM8/18/19
to tesseract-ocr
Hi,

On occasions when I get the confidence level for a specific word from page level iterator, I receive a value of 0. 

When I examine the word further, it appears that all characters have high confidence and often the word is marked as in the dictionary.

What I would like to know is how a word-level confidence should be interpreted in such cases. 

If I crop the same image to the bottom 2 lines and run recognition on the smaller image, the confidence for the same word goes up to 96%. 


Thanks.


Here is an example for word "England:" towards the bottom of the linked TIFF image:
NameValueType
[176]{FlowingText [0%] [0] England:}Tesseract.TessWord
▶ BaseLine{X = 811 Y = 2253 Width = 91 Height = 0}System.Drawing.Rectangle
BlockTypeFlowingTextTesseract.PolyBlockType
▶ Bounds{X = 811 Y = 2238 Width = 91 Height = 18}System.Drawing.Rectangle
▶ ChoicesCount = 0System.Collections.Generic.List<Tesseract.TessChoice>
Confidence0float
DeskewAngle0.0002258866float
▶ Font{Tesseract.TessFontAttributes}Tesseract.TessFontAttributes
InDictionarytruebool
IsNumericfalsebool
Language"eng"string
OrientationPageUpTesseract.Orientation
Text"England:"string
TextLineOrderTopToBottomTesseract.TextLineOrder
WritingDirectionLeftToRightTesseract.WritingDirection



And for the individual symbols in the same word:
NameValueType
[912]{FlowingText [100%] [0] E}Tesseract.TessSymbol
▶ BaseLine{X = 811 Y = 2253 Width = 91 Height = 0}System.Drawing.Rectangle
BlockTypeFlowingTextTesseract.PolyBlockType
▶ Bounds{X = 811 Y = 2238 Width = 17 Height = 18}System.Drawing.Rectangle
▶ ChoicesCount = 0System.Collections.Generic.List<Tesseract.TessChoice>
Confidence99.55842float
DeskewAngle0.0002258866float
IsDropcapfalsebool
IsSubscriptfalsebool
IsSuperscriptfalsebool
OrientationPageUpTesseract.Orientation
Text"E"string
TextLineOrderTopToBottomTesseract.TextLineOrder
WritingDirectionLeftToRightTesseract.WritingDirection
[913]{FlowingText [100%] [0] n}Tesseract.TessSymbol
▶ BaseLine{X = 811 Y = 2253 Width = 91 Height = 0}System.Drawing.Rectangle
BlockTypeFlowingTextTesseract.PolyBlockType
▶ Bounds{X = 811 Y = 2238 Width = 27 Height = 14}System.Drawing.Rectangle
▶ ChoicesCount = 0System.Collections.Generic.List<Tesseract.TessChoice>
Confidence99.55082float
DeskewAngle0.0002258866float
IsDropcapfalsebool
IsSubscriptfalsebool
IsSuperscriptfalsebool
OrientationPageUpTesseract.Orientation
Text"n"string
TextLineOrderTopToBottomTesseract.TextLineOrder
WritingDirectionLeftToRightTesseract.WritingDirection
[914]{FlowingText [100%] [0] g}Tesseract.TessSymbol
▶ BaseLine{X = 811 Y = 2253 Width = 91 Height = 0}System.Drawing.Rectangle
BlockTypeFlowingTextTesseract.PolyBlockType
▶ Bounds{X = 839 Y = 2242 Width = 11 Height = 14}System.Drawing.Rectangle
▶ ChoicesCount = 0System.Collections.Generic.List<Tesseract.TessChoice>
Confidence99.56353float
DeskewAngle0.0002258866float
IsDropcapfalsebool
IsSubscriptfalsebool
IsSuperscriptfalsebool
OrientationPageUpTesseract.Orientation
Text"g"string
TextLineOrderTopToBottomTesseract.TextLineOrder
WritingDirectionLeftToRightTesseract.WritingDirection
[915]{FlowingText [84%] [0] l}Tesseract.TessSymbol
▶ BaseLine{X = 811 Y = 2253 Width = 91 Height = 0}System.Drawing.Rectangle
BlockTypeFlowingTextTesseract.PolyBlockType
▶ Bounds{X = 850 Y = 2239 Width = 7 Height = 13}System.Drawing.Rectangle
▶ ChoicesCount = 0System.Collections.Generic.List<Tesseract.TessChoice>
Confidence84.39856float
DeskewAngle0.0002258866float
IsDropcapfalsebool
IsSubscriptfalsebool
IsSuperscriptfalsebool
OrientationPageUpTesseract.Orientation
Text"l"string
TextLineOrderTopToBottomTesseract.TextLineOrder
WritingDirectionLeftToRightTesseract.WritingDirection
[916]{FlowingText [92%] [0] a}Tesseract.TessSymbol
▶ BaseLine{X = 811 Y = 2253 Width = 91 Height = 0}System.Drawing.Rectangle
BlockTypeFlowingTextTesseract.PolyBlockType
▶ Bounds{X = 852 Y = 2238 Width = 19 Height = 18}System.Drawing.Rectangle
▶ ChoicesCount = 0System.Collections.Generic.List<Tesseract.TessChoice>
Confidence91.98873float
DeskewAngle0.0002258866float
IsDropcapfalsebool
IsSubscriptfalsebool
IsSuperscriptfalsebool
OrientationPageUpTesseract.Orientation
Text"a"string
TextLineOrderTopToBottomTesseract.TextLineOrder
WritingDirectionLeftToRightTesseract.WritingDirection
[917]{FlowingText [100%] [0] n}Tesseract.TessSymbol
▶ BaseLine{X = 811 Y = 2253 Width = 91 Height = 0}System.Drawing.Rectangle
BlockTypeFlowingTextTesseract.PolyBlockType
▶ Bounds{X = 858 Y = 2242 Width = 23 Height = 10}System.Drawing.Rectangle
▶ ChoicesCount = 0System.Collections.Generic.List<Tesseract.TessChoice>
Confidence99.55326float
DeskewAngle0.0002258866float
IsDropcapfalsebool
IsSubscriptfalsebool
IsSuperscriptfalsebool
OrientationPageUpTesseract.Orientation
Text"n"string
TextLineOrderTopToBottomTesseract.TextLineOrder
WritingDirectionLeftToRightTesseract.WritingDirection
[918]{FlowingText [100%] [0] d}Tesseract.TessSymbol
▶ BaseLine{X = 811 Y = 2253 Width = 91 Height = 0}System.Drawing.Rectangle
BlockTypeFlowingTextTesseract.PolyBlockType
▶ Bounds{X = 870 Y = 2238 Width = 23 Height = 18}System.Drawing.Rectangle
▶ ChoicesCount = 0System.Collections.Generic.List<Tesseract.TessChoice>
Confidence99.57375float
DeskewAngle0.0002258866float
IsDropcapfalsebool
IsSubscriptfalsebool
IsSuperscriptfalsebool
OrientationPageUpTesseract.Orientation
Text"d"string
TextLineOrderTopToBottomTesseract.TextLineOrder
WritingDirectionLeftToRightTesseract.WritingDirection
[919]{FlowingText [100%] [0] :}Tesseract.TessSymbol
▶ BaseLine{X = 811 Y = 2253 Width = 91 Height = 0}System.Drawing.Rectangle
BlockTypeFlowingTextTesseract.PolyBlockType
▶ Bounds{X = 882 Y = 2239 Width = 20 Height = 13}System.Drawing.Rectangle
▶ ChoicesCount = 0System.Collections.Generic.List<Tesseract.TessChoice>
Confidence99.5546341float
DeskewAngle0.0002258866float
IsDropcapfalsebool
IsSubscriptfalsebool
IsSuperscriptfalsebool
OrientationPageUpTesseract.Orientation
Text":"string
TextLineOrderTopToBottomTesseract.TextLineOrder
WritingDirectionLeftToRightTesseract.WritingDirection
ccitt.tif
Message has been deleted

farhad khalafi

unread,
Aug 18, 2019, 10:03:13 AM8/18/19
to tesseract-ocr
I tried the gImageReader utility with similar results. A screenshot is attached.
gImageReader.png
Reply all
Reply to author
Forward
0 new messages