Introduction of TessOCR

562 views
Skip to first unread message

岸 和孝

unread,
Feb 5, 2012, 9:07:01 PM2/5/12
to tesser...@googlegroups.com
Hello everybody.
Let me introduce myself.
My name is KISI, KAZUTAKA(岸 和孝).
I am a Japanese.
I have been developed "TessOCR" that includes tesseract as a framework.
I designed a feature primarily for Japanese character recognition.
The following describes the specifications of TessOCR.
Please email me with any bug reports, questions or comments:
kisi-k...@nifty.com .
English writing is my weak point.
Please tell me if there is a mistake.
Thank you.

Title
-----
TessOCR

Abstract
--------
TessOCR is a free OCR tool.
Its framework is multilingual optical character recognition (OCR) engine is the Tesseract Open Source OCR Engine.

Specifications of TessOCR
-------------------------
Recognizable language : Japanese, English, French and so on.
Additional support for character recognition dictionary.
Layout recognition : Detects horizontal-writing and vertical-writing automatically.
Recognizes only content of tabular.
Recognizable format of image data : JPEG,PNG,GIF,BMP and TIFF.
Recognizable image dimensions : There is no particular limitation.
Recognizable character size : (Under the investigation)
Elimination of noise in the image : Manual control.
Correction of the inclination of the image : Manual control.
Crop the image : Manual control.
Spread pages can be specified.
Convert to the grayscaled image by threshold : Manual control.
Training the character recognition dictionary : Semi-automatic control.
You can edit the box.
Text Editing : You can input the text and edit it, and save it.
You can search the text and replace with another string.


Download URL
------------
http://djvu.life.coocan.jp/TessOCR/TessOCR-1.02.zip

Version
-------
1.02

Version Information
-------------------
1.02 Published February 5, 2012
Internationalization of messages.
Append function that moves box by arrow keys.
Abolition of the text editing window resizing.
1.01 Published 26 January 2012
Fixed bug at external access.
Fixed bug during scrolling.
Fixed bug when selected in the box.
Fixed bug when selected Edit button.
Appended the threshold for images.
Revised Japanese character recognition dictionary.
Abolition of the Help menu.
1.00 Published 20 January 2012
The first distribution.
New built-in dictionary for Japanese character recognition.
Built tesseract 3.01.

Platform
--------
Mac OS X
10.6.8
Intel

Requirements
------------
TessOCR uses internally the ImageMagick to process the image.
However, ImageMagick do not include as a framework of TessOCR.
If ImageMagick is already installed in your environment, TessOCR will link to it.
If ImageMagick have not been installed yet, that thing will notify to you.
You have to install ImageMagick using MacPorts.

---------------------------------------------------------------------

Reply all
Reply to author
Forward
0 new messages