Java GUI frontend for Tesseract OCR engine

2,672 views
Skip to first unread message

nguyenq

unread,
Jan 21, 2008, 11:25:40 PM1/21/08
to tesseract-ocr
Java GUI prototype for Tesseract OCR engine

Supports JPEG, GIF, BMP, PNG image formats and recognition of selected
area on image.

System requirements:

- Java Runtime Environment 5.0 or later - http://java.sun.com/
- JAI Image I/O 1.1 - https://jai-imageio.dev.java.net/

If you encounter out-of-memory exception, run ocr.bat instead of using
the .jar.

http://sourceforge.net/project/showfiles.php?group_id=153105

gregg

unread,
Jan 23, 2008, 12:47:20 PM1/23/08
to tesseract-ocr
what tesseract directory should i specify in linux ? I have tried all
possible directories :(

On 22 янв, 07:25, nguyenq <nguyen...@gmail.com> wrote:
> Java GUI prototype for Tesseract OCR engine
>
> Supports JPEG, GIF, BMP, PNG image formats and recognition of selected
> area on image.
>
> System requirements:
>
> - Java Runtime Environment 5.0 or later -http://java.sun.com/
> - JAI Image I/O 1.1 -https://jai-imageio.dev.java.net/

gregg

unread,
Jan 23, 2008, 1:05:46 PM1/23/08
to tesseract-ocr
solved: i have renamed the "tesseract" directory to tess_smth_else
than i have created an empty one with the same name "tesseract" then
cd in it and simlink ln -s /path/to/tessdata-dir . and ln -s /path/
to/tesseract-binary .
now it works. I see some bugs, where should i post them ?

nguyenq

unread,
Jan 23, 2008, 11:13:59 PM1/23/08
to tesseract-ocr
Strange! I had no problems setting the path on Windows. It should
point to the directory that contains the Tesseract binary executable.
But I haven't tested on Linux yet, so did not see the problem.

You can email me or post here if you like.

Thank you.

Quan

nguyenq

unread,
Feb 26, 2008, 10:51:58 PM2/26/08
to tesseract-ocr
Version 0.8.1 has been released with bug fixes and feature
enhancements.

http://sourceforge.net/project/showfiles.php?group_id=153105

ttutuncu

unread,
Apr 1, 2008, 3:40:57 AM4/1/08
to tesseract-ocr
I have a problem with OCR. When I select a region from a tiff file and
press OCR button I get an error saying "errors occured".

the commandline outputs: Exit value = 1

and the tesseract log file outputs:
read_variables_file:Can't open C:/Documents and Settings/ttutuncu/
Desktop/VietOCR-0.8.3-Beta/VietOCR/tesseract/tessdata/configs/"C:
\Documentsread_variables_file:Can't open C:/Documents and Settings/
ttutuncu/Desktop/VietOCR-0.8.3-Beta/VietOCR/tesseract/tessdata/configs/
andread_variables_file:Can't open C:/Documents and Settings/ttutuncu/
Desktop/VietOCR-0.8.3-Beta/VietOCR/tesseract/tessdata/configs/Settings
\ttutuncu\Desktop\bbT_exe/output"read_variables_file:Can't open C:/
Documents and Settings/ttutuncu/Desktop/VietOCR-0.8.3-Beta/VietOCR/
tesseract/tessdata/configs/lread_variables_file:Can't open C:/
Documents and Settings/ttutuncu/Desktop/VietOCR-0.8.3-Beta/VietOCR/
tesseract/tessdata/configs/tebCould not open file, Settings\ttutuncu
\Desktop\bbT_exe\tempImageFile00.tif"

what is happening?

nguyenq

unread,
Apr 1, 2008, 7:56:49 PM4/1/08
to tesseract-ocr
It is perhaps because you have your images on Windows' desktop or in
My Documents folder. Move them out to another folder, e.g., C:\Temp,
and try again.

nguyenq

unread,
Apr 2, 2008, 1:03:04 AM4/2/08
to tesseract-ocr
It seems that either Tesseract command or Java's native command call
does not support filepaths (filename and/or directory name) that have
spaces in it. So, for now, do not put images in folders whose names
contain spaces.

ttutuncu

unread,
Apr 2, 2008, 7:52:11 AM4/2/08
to tesseract-ocr
Thank you very much that worked!!!

How do you call the tesseract exe file from java?

On Apr 2, 8:03 am, nguyenq <nguyen...@gmail.com> wrote:
> It seems that either Tesseract command orJava'snative command call

nguyenq

unread,
Apr 5, 2008, 12:56:51 AM4/5/08
to tesseract-ocr
Here is the code. Hope someone will figure out how to support
filepaths containing spaces.


package net.sourceforge.jtocr;

import java.io.*;
import java.util.*;
import java.lang.*;

public class OCR {
private final String LANG_OPTION = "-l";
private final String EOL = System.getProperty("line.separator");

private String tessPath;

/** Creates a new instance of OCR */
public OCR(String tessPath) {
this.tessPath = tessPath;
}

String recognizeText(File imageFile, int index, boolean all,
String imageFormat, String lang) throws Exception {
ArrayList<File> tempImages =
ImageIOHelper.createImages(imageFile, index, all, imageFormat);

File outputFile = new File(imageFile.getParentFile(),
"output");
StringBuffer strB = new StringBuffer();

List<String> cmd = new ArrayList<String>();
cmd.add(tessPath + "/tesseract");
cmd.add(""); // placeholder for inputfile
cmd.add(outputFile.getAbsolutePath());
cmd.add(LANG_OPTION);
cmd.add(lang);

ProcessBuilder pb = new ProcessBuilder();

for (File tempImage : tempImages) {
// actual output file will be "output.txt"
// ProcessBuilder pb = new ProcessBuilder(tessPath + "/
tesseract", tempImage.getAbsolutePath(), outputFile.getAbsolutePath(),
LANG_OPTION, lang);

cmd.set(1, tempImage.getAbsolutePath());
pb.command(cmd);
pb.redirectErrorStream(true);
Process process = pb.start();
// Process process =
Runtime.getRuntime().exec(cmd.toArray(new String[0]));

int w = process.waitFor();
System.out.println("Exit value = " + w);

// delete temp working files
tempImage.delete();

if (w == 0) {
BufferedReader in = new BufferedReader(new
InputStreamReader(new FileInputStream(outputFile.getAbsolutePath() +
".txt"), "UTF-8"));

String str;

while ((str = in.readLine()) != null) {
strB.append(str).append(EOL);
}
in.close();
} else {
String msg;
switch (w) {
case 1:
msg = "Errors accessing files. There may be
spaces in your image's filepath:\n" + imageFile.getAbsolutePath();
break;
case 29:
msg = "Cannot recognize the image or its
selected region.";
break;
case 31:
msg = "Unsupported image format.";
break;
default:
msg = "Errors occurred.";
}
throw new RuntimeException(msg);
}

}
new File(outputFile.getAbsolutePath() + ".txt").delete();
return strB.toString();
}
}

nguyenq

unread,
Apr 5, 2008, 4:06:44 PM4/5/08
to tesseract-ocr

74yrs old

unread,
Apr 6, 2008, 2:52:15 AM4/6/08
to tesser...@googlegroups.com
Whether it works in MSwindows platform? Which version of tesseract-ocr used?

nguyenq

unread,
Apr 6, 2008, 11:33:29 AM4/6/08
to tesseract-ocr
It is a Java program, so should work on any platform and any version
of Tesseract, the v2.01 of which is also included in the distro.

On Apr 6, 1:52 am, "74yrs old" <withblessi...@gmail.com> wrote:
> Whether it works in MSwindows platform? Which version of tesseract-ocr used?
>
> On Sun, Apr 6, 2008 at 1:36 AM, nguyenq <nguyen...@gmail.com> wrote:
>
> > Problem solved. Please check out jtOCR v0.9 Beta.
>
> >http://sourceforge.net/project/showfiles.php?group_id=153105&package_...

Nick White

unread,
Sep 30, 2012, 4:40:17 AM9/30/12
to tesser...@googlegroups.com
Hi Kimo,

On Sat, Sep 29, 2012 at 09:15:10PM -0700, kimo wrote:
> Could you tell me what is and how to use Tesseract OCR engine, JavaOCR
> engine, GOCR engine, AndroidARKit lib, jPCT-AE lib?
>
> If you have no time, Please send me some documents about them.

A question like this seems to be saying "my time is more important
than anybody elses," and as such comes off as rather disrespectful.
All the information you seek should be available if you use a
search engine.

If you have specific questions that you can't find answers to about
Tesseract, we'll be happy to answer them.

Nick

Khanh Nguyen Tuan

unread,
Sep 30, 2012, 11:49:21 AM9/30/12
to tesser...@googlegroups.com

Nick

--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesser...@googlegroups.com
To unsubscribe from this group, send email to
tesseract-oc...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en



Thanks for your answer. It's not important to answer this question. I see. And I'm sorry for the silly question like that.

First of all, I can't find the right thing to get informatioin about what they are, how to use them, how to install them... But now, I can find the answer. It took three or four days to find all information, which is explane exactly what I want. 

It's easy to think about how suppried I am. I'm a new andorid developer.

Anyway, thx for your answer. If I have a specific problem and very impotant, I'll ask for a help and please help me.

Best regard.

Quan Nguyen

unread,
Oct 5, 2012, 7:45:39 PM10/5/12
to tesser...@googlegroups.com
VietOCR 3.4 RC has been released. This incorporates the latest Tesseract 3.02 executable and library. Please help test. Any input or comment is welcome.

http://sourceforge.net/projects/vietocr/files/vietocr/
Message has been deleted

Gaara Sabaku

unread,
Oct 9, 2012, 6:38:37 PM10/9/12
to tesser...@googlegroups.com
I have expertise in Tesseract OCR. For general information seek elsewhere. For technical and specific questions you may ask me.

On Sat, Sep 29, 2012 at 10:15 PM, kimo <tuankhan...@gmail.com> wrote:
Hi, there.

Could you tell me what is and how to use Tesseract OCR engine, JavaOCR engine, GOCR engine, AndroidARKit lib,  jPCT-AE lib?

If you have no time, Please send me some documents about them.

Please help me

Thank you. 

Serious Hacker

unread,
Jul 14, 2019, 11:18:23 PM7/14/19
to tesseract-ocr
Hello Gaara Sabaku,

I am not sure if you are still active, however, I am facing an issue with Tesseract which says :

 ERROR [Tesseract] Need to install JAI Image I/O package.
https://java.net/projects/jai-imageio/
java.lang.RuntimeException: Need to install JAI Image I/O package.

I have the required jar files installed, and unsure how to resolve the error. Any help on this would be appreciated.

I am using tess4j 4.3.0 and jai-imageio-core', version: '1.4.0'

On Wednesday, October 10, 2012 at 11:38:37 AM UTC+13, Gaara Sabaku wrote:
I have expertise in Tesseract OCR. For general information seek elsewhere. For technical and specific questions you may ask me.

On Sat, Sep 29, 2012 at 10:15 PM, kimo <tuankhan...@gmail.com> wrote:
Hi, there.

Could you tell me what is and how to use Tesseract OCR engine, JavaOCR engine, GOCR engine, AndroidARKit lib,  jPCT-AE lib?

If you have no time, Please send me some documents about them.

Please help me

Thank you. 

--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesser...@googlegroups.com
To unsubscribe from this group, send email to
Reply all
Reply to author
Forward
0 new messages