Black line deleted

52 views
Skip to first unread message

lelive

unread,
Dec 6, 2017, 2:36:22 PM12/6/17
to tesseract-ocr
Hi all,
i use tesseract for technical documents and produce pdf searchable . But if the picture contain lines, in the pdf file result, the lines are deleted 





  

Is there a solution or parameter for say to tesseract do not "clean" picture out ?

Many thanks for your help !

Olivier

Zdenko Podobný

unread,
Dec 7, 2017, 4:05:15 AM12/7/17
to tesser...@googlegroups.com
I do not think that images like this are appropriate for OCR (at least not for tesseract). IMO you should do preprocessing of them and pass to tesseract only areas with text.

Tesseract is very noise sensitive (at least 3.x version).

Zdenko

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/f8d3df29-c900-4172-a9ce-9892463f0634%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

lelive

unread,
Dec 10, 2017, 2:55:18 AM12/10/17
to tesseract-ocr
Hello,
yes i know that, but i have the same problem with classic tables in A4 page. All lines disapears !

Help plz !


Le jeudi 7 décembre 2017 10:05:15 UTC+1, zdenop a écrit :
I do not think that images like this are appropriate for OCR (at least not for tesseract). IMO you should do preprocessing of them and pass to tesseract only areas with text.

Tesseract is very noise sensitive (at least 3.x version).

Zdenko

On Wed, Dec 6, 2017 at 8:32 PM, lelive <o....@groupe-archibald.fr> wrote:
Hi all,
i use tesseract for technical documents and produce pdf searchable . But if the picture contain lines, in the pdf file result, the lines are deleted 





  

Is there a solution or parameter for say to tesseract do not "clean" picture out ?

Many thanks for your help !

Olivier

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.

ShreeDevi Kumar

unread,
Dec 10, 2017, 4:02:30 AM12/10/17
to tesser...@googlegroups.com, Jeff Breidenbach
I think the question is related to pdf generation and not the actual OCR.

The resulting pdf should include the original image with the text layer. It seems the lines are deleted in generated pdf.

To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

lelive

unread,
Dec 10, 2017, 12:33:53 PM12/10/17
to tesseract-ocr
Ok, thank for your reply !

If i use 
tesseract img.tif out -l fra pdf

which software makes the conversion to pdf ?

Olivier

ShreeDevi Kumar

unread,
Dec 10, 2017, 11:27:48 PM12/10/17
to tesser...@googlegroups.com
Pdf generation is done by tesseract only. I had cc:ed Jeff who is the main developer for the pdf related code.




To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

ShreeDevi Kumar

unread,
Dec 11, 2017, 3:42:04 AM12/11/17
to tesser...@googlegroups.com
You have not mentioned which version of tesseract you are using. I tested just now with tesseract4.0alpha and the pdf has the original image with lines. See attached.
However, as Zdenko had pointed out before, the OCR is NOT accurate.

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
Capture-fra.pdf
Reply all
Reply to author
Forward
0 new messages