Announcement: introducing TesseractStudio.Net, a free Windows GUI for Tesseract 4.0

866 views
Skip to first unread message

farhad khalafi

unread,
Nov 8, 2018, 6:09:59 PM11/8/18
to tesseract-ocr
Hello All,

I am pleased to introduce a new graphical interface for Tesseract 4.0 built specifically for the .Net platform. 

Here are some attributes of TesseractStudio.Net:
  • Simple graphical interface.
  • Improved performance by parallel processing of multipage documents.
  • Simultaneous display of images and hidden OCR text.
  • Create, edit, split, merge, move, resize or delete recognized words.
  • Unlimited undo/redo stacks.
  • Works with image/PDF files with or without prior OCR data.
  • Preserves non-image PDF components, such as visible text.
  • Integrated spell checker.

Thank you,

Farhad Khalafi
Opait Software

farhad khalafi

unread,
Jan 9, 2019, 8:40:05 PM1/9/19
to tesseract-ocr
We have released version 1.2 of Tesseract Studio with the following enhancements:
  • Install and use any language file supported by Tesseract 4.0.
  • Install any available spelling dictionary. 
  • Use multiple languages for multi-lingual documents.
  • Create PDF/A documents.
  • Create encrypted PDF documents.
  • Experimental image processing functions:
    • Correct light text on dark background for improved OCR precision.
    • Remove grid lines from tables.
    • Deskew images.
    • Capture intermediary images for debugging.
  • Post process OCR data with a regular expression based rules engine.
  • UI improvements and several bug fixes. 

farhad khalafi

unread,
Jan 30, 2019, 4:16:13 PM1/30/19
to tesseract-ocr
We have released version 1.3 of Tesseract Studio with the following enhancements:
  • Improved memory management to support large multi-page files.
    • Streaming interface to Leptonica.
    • Eliminate unnecessary cache of images.
    • Unload processed pages early.
    • Tested with a scanned file of 2,100 pages.
  • New OCR options:
    • OCR only image objects in vector PDF files, or
    • Fully rasterize and OCR each page.
  • New Save options:
    • Save as vector PDF with existing objects (including visible text) preserved and merged with OCR.
    • Save as searchable PDF where each page has a single image which overlays hidden OCR data.
      • Maintain original color if applicable, or
      • Convert to grayscale before saving, or
      • Convert to monochrome using a dithering algorithm, or
      • Convert to monochrome using dynamic or specified thresholding.
      • Specify or automatically assign resolution to control PDF size.
    • Save as text-only PDF.
      • Use a visible font for OCR and other text objects.
      • Pick standard type 1 fonts to reduce PDF size.
      • Embed any available font into the PDF file (with some overhead).
      • Format OCR and other text to approximate the original layout (without graphics).
  • Some bug fixes. 

farhad khalafi

unread,
Mar 26, 2019, 5:20:08 PM3/26/19
to tesseract-ocr
We have released version 1.4 of Tesseract Studio for .Net. 
  • Improved UI for correcting OCR artifacts.
  • Bundles Tesseract 4.1 RC1.
  • Updated Pdfium engine.
  • TLS bug fix that produced "Activation Failed" warnings.

farhad khalafi

unread,
May 1, 2019, 8:48:21 PM5/1/19
to tesseract-ocr
We have released version 1.5 of Tesseract Studio for .Net. 
  • Displays recognized words with average confidence scores for easier identification of suspect words.
  • Uses direct PDF generation from recognized words. Previously we used the built-in PDF generator and a temporary file. This version avoids the temporaries for better performance.

Hyperion0

unread,
May 12, 2021, 1:57:13 PM5/12/21
to tesseract-ocr
Is this still an available thing? The download link on GitHub isn't working.
Reply all
Reply to author
Forward
0 new messages