why my hocr file look like this

119 views
Skip to first unread message

Ben Zhang

unread,
Jun 23, 2018, 2:53:22 AM6/23/18
to tesseract-ocr
Hi, All,
I used tesseract 3.05, and type 'tesseract test.png result horc' in command line, get result.horc, in this file it has:

Provider Networks Precertification 808.791.7505 direct 888.941.4622 x302toll-free 808.535.8398 fax 

Medical & Dental - Hawaii Medical - Mainland 888.941 .HMAA (4622) V Cigna PPO hmaa.com/providers HWMG cigna.com 4F Clgna Submit claims directly to HWMG: Submit claims directly to Cigna: PO Box 32580 PO Box 188061 Honolulu, HI 96803-2580 Chattanooga, TN 37422-8061 Payer ID 48330 Payer ID 62308 8 Drug - Hawaii & Mainland Vision - Hawaii & Mainland 855.785.6960 .._ Vision Choice {.3 Express—Scripts.com fl; amass SCRIPTSE 800.877.7195 VS V" VS p I CO m 9; care for Me Submit claims directly to Express Scripts Submit claims directly to VSP. 


\ or call 800.922.1557 for pharmacy help. 


Why no info like 

LibTesseract.simple_read(config_line_with_hocr, 'phrase.png')
  <div class='ocr_page' id='page_1' title='image ""; bbox 0 0 319 33; ppageno 0'>
   <div class='ocr_carea' id='block_1_1' title="bbox 0 0 319 33">
    <p class='ocr_par' dir='ltr' id='par_1_1' title="bbox 10 13 276 25">
     <span class='ocr_line' id='line_1_1' title="bbox 10 13 276 25; baseline 0 0"><span class='ocrx_word' id='word_1_1'     title='bbox 10 14 41 25; x_wconf 75' lang='eng' dir='ltr'><strong>the</strong></span> <span class='ocrx_word' id='word_1_2' title='bbox 53 13 97 25; x_wconf 84' lang='eng' dir='ltr'><strong>book</strong></span> <span class='ocrx_word' id='word_1_3' title='bbox 111 13 129 25; x_wconf 79' lang='eng' dir='ltr'><strong>is</strong></span> <span class='ocrx_word' id='word_1_4' title='bbox 143 17 164 25; x_wconf 83' lang='eng' dir='ltr'>on</span> <span class='ocrx_word' id='word_1_5' title='bbox 178 14 209 25; x_wconf 75' lang='eng' dir='ltr'><strong>the</strong></span> <span class='ocrx_word' id='word_1_6' title='bbox 223 14 276 25; x_wconf 76' lang='eng' dir='ltr'><strong>table</strong></span> 
     </span>
    </p>
   </div>
  </div>
I am new to tesseract. Thanks for your help

Shree Devi Kumar

unread,
Jun 23, 2018, 3:14:38 AM6/23/18
to tesser...@googlegroups.com
tesseract test.png result horc

You used wrong config file. It should be hocr not horc

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/37fb723a-750f-434d-a12e-f597a80b59e7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--

____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

Ben Zhang

unread,
Jun 23, 2018, 3:22:09 PM6/23/18
to tesseract-ocr
sorry I made a typo in the question. I used  "hocr " in my config file

在 2018年6月23日星期六 UTC-4上午3:14:38,shree写道:
Reply all
Reply to author
Forward
0 new messages