tesseract 3 pdf error

1,637 views
Skip to first unread message

Nikhil Soni

unread,
Dec 13, 2014, 2:58:21 AM12/13/14
to tesser...@googlegroups.com
Hi,
I have install new version of tesseract OSR in linux  form sourcecode  compilation.
I want convert from tiff file  to pdf it gives error in console-

Version -
tesseract 3.03
 leptonica-1.71
  libjpeg 6b : libtiff 3.9.4

Command -      "tesseract 1.tiff output pdf"

Tesseract Open Source OCR Engine v3.03 with Leptonica
read_params_file: Can't open pdf
Page 1 of 3
Error in boxClipToRectangle: box outside rectangle
Error in pixScanForForeground: invalid box
Page 2 of 3
Page 3 of 3

Why it gives params error?

ShreeDevi Kumar

unread,
Dec 13, 2014, 3:32:46 AM12/13/14
to tesser...@googlegroups.com
Which version of source have you used?

Latest version is available from 

You need the pdf config files in tessdata directory. See

You also need to make sure that tessdata_prefix is pointing to the correct directory.

I use a bash script and command line such as the following to specify the different options

tesseract  --tessdata-dir ./tesseract-ocr/testing   $f $f  -l $LANG   -psm $PSM $PDF 

eg. in this case the traineddata and pdf files are in ./tesseract-ocr/testing/tessdata directory



---








ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/405ebcc4-dec8-4b9e-adf7-427498cebed8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Nikhil Soni

unread,
Dec 13, 2014, 8:52:27 AM12/13/14
to tesser...@googlegroups.com
HI Shree,
Thanks. Now it works.

one more problem what I want to overcome.
Actually I want to get Pdf from Java. I am using apache java ocr which is compilable with tesseract 3.0.3 pdf version.

We I invoke it pdf method either from  direct or test class it always give me below error -

Return int  from the method is  15999424. And it change according to tiff file input.

Dec 13, 2014 7:09:19 PM net.sourceforge.tess4j.Tesseract createDocuments
SEVERE: Error during processing.

My test method is -
public void getPdf() throws Exception {
        Tesseract t = Tesseract.getInstance();
        List<RenderedFormat> list = new ArrayList<RenderedFormat>();
        list.add(RenderedFormat.PDF);
        t.setLanguage("eng");
        t.setDatapath("/usr/share/");
        //t.setOcrEngineMode(1);
        //t.setPageSegMode(2);
        //t.setTessVariable("read_params_file", "fh");
        t.createDocuments(
                "tess4j-master/target/test-classes/test-data/1.tiff",
                "tess4j-master/target/test-classes/test-data",
                list);
       

    }


Internal method is -
private void createDocuments(String filename, TessResultRenderer renderer) throws TesseractException {
        int result = api.TessBaseAPIProcessPages(handle, filename, null, 0, renderer);
System.out.println("result   "+result);
        if (result != TessAPI.TRUE) {
            throw new TesseractException("Error during processing.");
        }
    }


In any method of java example I get same error.
Can you plz help me on this issue.
Reply all
Reply to author
Forward
0 new messages