tesseract 3 pdf error

Nikhil Soni

unread,

Dec 13, 2014, 2:58:21 AM12/13/14

to tesser...@googlegroups.com

Hi,
I have install new version of tesseract OSR in linux form sourcecode compilation.
I want convert from tiff file to pdf it gives error in console-

Version -
tesseract 3.03
leptonica-1.71
libjpeg 6b : libtiff 3.9.4

Command - "tesseract 1.tiff output pdf"

Tesseract Open Source OCR Engine v3.03 with Leptonica
read_params_file: Can't open pdf
Page 1 of 3
Error in boxClipToRectangle: box outside rectangle
Error in pixScanForForeground: invalid box
Page 2 of 3
Page 3 of 3

Why it gives params error?

ShreeDevi Kumar

unread,

Dec 13, 2014, 3:32:46 AM12/13/14

to tesser...@googlegroups.com

Which version of source have you used?

Latest version is available from

https://code.google.com/p/tesseract-ocr/source/checkout

You need the pdf config files in tessdata directory. See

https://code.google.com/p/tesseract-ocr/source/browse/tessdata

You also need to make sure that tessdata_prefix is pointing to the correct directory.

I use a bash script and command line such as the following to specify the different options

tesseract --tessdata-dir ./tesseract-ocr/testing $f $f -l $LANG -psm $PSM $PDF

eg. in this case the traineddata and pdf files are in ./tesseract-ocr/testing/tessdata directory

---

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/405ebcc4-dec8-4b9e-adf7-427498cebed8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Nikhil Soni

unread,

Dec 13, 2014, 8:52:27 AM12/13/14

to tesser...@googlegroups.com

HI Shree,
Thanks. Now it works.

one more problem what I want to overcome.
Actually I want to get Pdf from Java. I am using apache java ocr which is compilable with tesseract 3.0.3 pdf version.

We I invoke it pdf method either from direct or test class it always give me below error -

Return int from the method is 15999424. And it change according to tiff file input.

Dec 13, 2014 7:09:19 PM net.sourceforge.tess4j.Tesseract createDocuments
SEVERE: Error during processing.

My test method is -
public void getPdf() throws Exception {
        Tesseract t = Tesseract.getInstance();
        List<RenderedFormat> list = new ArrayList<RenderedFormat>();
        list.add(RenderedFormat.PDF);
        t.setLanguage("eng");
        t.setDatapath("/usr/share/");
        //t.setOcrEngineMode(1);
        //t.setPageSegMode(2);
        //t.setTessVariable("read_params_file", "fh");
        t.createDocuments(
                "tess4j-master/target/test-classes/test-data/1.tiff",
                "tess4j-master/target/test-classes/test-data",
                list);


    }

Internal method is -
private void createDocuments(String filename, TessResultRenderer renderer) throws TesseractException {
        int result = api.TessBaseAPIProcessPages(handle, filename, null, 0, renderer);
System.out.println("result   "+result);
        if (result != TessAPI.TRUE) {
            throw new TesseractException("Error during processing.");
        }
    }

In any method of java example I get same error.
Can you plz help me on this issue.

Reply all

Reply to author

Forward