Announcement: introducing TesseractStudio.Net, a free Windows GUI for Tesseract 4.0

farhad khalafi

unread,

Nov 8, 2018, 6:09:59 PM11/8/18

to tesseract-ocr

Hello All,

I am pleased to introduce a new graphical interface for Tesseract 4.0 built specifically for the .Net platform.

Here are some attributes of TesseractStudio.Net:

Simple graphical interface.
Improved performance by parallel processing of multipage documents.
Simultaneous display of images and hidden OCR text.
Create, edit, split, merge, move, resize or delete recognized words.
Unlimited undo/redo stacks.
Works with image/PDF files with or without prior OCR data.
Preserves non-image PDF components, such as visible text.
Integrated spell checker.

Download: https://github.com/OpaitSoftware/TesseractStudio.Net

Thank you,

Farhad Khalafi

Opait Software

farhad khalafi

unread,

Jan 9, 2019, 8:40:05 PM1/9/19

to tesseract-ocr

We have released version 1.2 of Tesseract Studio with the following enhancements:

Install and use any language file supported by Tesseract 4.0.
Install any available spelling dictionary.
Use multiple languages for multi-lingual documents.
Create PDF/A documents.
Create encrypted PDF documents.
Experimental image processing functions:

Correct light text on dark background for improved OCR precision.
Remove grid lines from tables.
Deskew images.
Capture intermediary images for debugging.

Post process OCR data with a regular expression based rules engine.
UI improvements and several bug fixes.

farhad khalafi

unread,

Jan 30, 2019, 4:16:13 PM1/30/19

to tesseract-ocr

We have released version 1.3 of Tesseract Studio with the following enhancements:

Improved memory management to support large multi-page files.

Streaming interface to Leptonica.
Eliminate unnecessary cache of images.
Unload processed pages early.
Tested with a scanned file of 2,100 pages.

New OCR options:

OCR only image objects in vector PDF files, or
Fully rasterize and OCR each page.

New Save options:

Save as vector PDF with existing objects (including visible text) preserved and merged with OCR.
Save as searchable PDF where each page has a single image which overlays hidden OCR data.

Maintain original color if applicable, or
Convert to grayscale before saving, or
Convert to monochrome using a dithering algorithm, or
Convert to monochrome using dynamic or specified thresholding.
Specify or automatically assign resolution to control PDF size.

Save as text-only PDF.

Use a visible font for OCR and other text objects.
Pick standard type 1 fonts to reduce PDF size.
Embed any available font into the PDF file (with some overhead).
Format OCR and other text to approximate the original layout (without graphics).

Some bug fixes.

farhad khalafi

unread,

Mar 26, 2019, 5:20:08 PM3/26/19

to tesseract-ocr

We have released version 1.4 of Tesseract Studio for .Net.

Improved UI for correcting OCR artifacts.
Bundles Tesseract 4.1 RC1.
Updated Pdfium engine.
TLS bug fix that produced "Activation Failed" warnings.

farhad khalafi

unread,

May 1, 2019, 8:48:21 PM5/1/19

to tesseract-ocr

We have released version 1.5 of Tesseract Studio for .Net.

Displays recognized words with average confidence scores for easier identification of suspect words.
Uses direct PDF generation from recognized words. Previously we used the built-in PDF generator and a temporary file. This version avoids the temporaries for better performance.

Hyperion0

unread,

May 12, 2021, 1:57:13 PM5/12/21

to tesseract-ocr

Is this still an available thing? The download link on GitHub isn't working.

Reply all

Reply to author

Forward