Overview Key Benefits OmniPage Professional 18 (World's No.1 OCR) allows business professionals to achieve new levels of productivity by eliminating the manual reproduction of documents. Precision OCR analysis, layout detection, Logical Form Recognition (LFR) technology, and advanced security features quickly turn office documents and forms into various PC applications for editing, searching and sharing.
OmniPage by Nuance really requires no introduction, the name is synonymous with Optical Character Recognition, more commonly known as OCR. Few software have had the longevity in the market place, and the fact that DigitalReviews is taking OmniPage Professional 17 on a test drive is testament to its pedigree.
OCR, or Optical Character Recognition, is a means of translating images of handwritten, typewritten or printed text into machine-editable text. It is a field of research in pattern recognition, artificial intelligence and machine vision. Early versions of OCR software required a lot of training, specific fonts and other parameters to improve accuracy.
Accuracy is paramount in the field of OCR. There is little point in scanning a document, converting it to machine-editable text only to spend more time fixing errors created in the recognition process. Some nine years ago I was heavily involved in a project to electronically capture Public Records which must capture exact images of the original paper document, but performing OCR across the pages so that it is text searchable amongst other requirements. Early attempts at OCR resulted in numerous errors in translation which in some cases, absolutely hilarious but all completely useless as an accurate electronic record. From the depths of my memory, OmniPage was still back around version 10 or 11 although that was not one of the softwares I used at the time.
Another new feature is the support for mobile phone captures of documents and sending them to OmniPage for conversion into Word documents or searchable PDF archives. The 3D-Capture technology automatically corrects for distortions caused by misalignment of the page or curves in captured book pages.
Improved features include claims of up to 40% more accuracy in converting scanned images into formatted Microsoft Word documents and up to 50% more accurate on alternatives. Integrated scanning and document conversion directly into Microsoft Office with the new OmniPage Toolbar available from within Microsoft Word, Excel and PowerPoint. Microsoft SharePoint 2007 has a direct connection from the OmniPage Workflow Assistant.
OmniPage Professional 17 now supports over 120 languages including support for converting Chinese, Japanese and Korean documents. The expanded language support is able to process documents that are composed of a single language or mixed-language documents such as English words within a predominantly Japanese document.
The product enables the automated creation of searchable PDF and ISO PDF/A archives as well as built-in support for PDF-MRC which is the highly compressed PDF compliant archives. There is built-in automatic redaction and highlight with automated mark up based on a list of key words.
Installation
As one would expect from a product of this maturity, the installation was painless. It took about 10 minutes on my (reasonably fresh) hard disk to install OmniPage 17, PDF Create 5 and RealSpeak Solo v4, taking up to 800 Mb or so of disk space. A pittance really considering the size of your average piece of Windows software thesedays.
The bouncing ball is clear and concise, the only options I changed was the languages for Speech Mode. It was nice to see localisation for Australia on the list alongside American and British voices in the English options.
The front page proved more of a challenge. The half that was printed in the correct orientation was recognised without any errors from an OCR perspective. Understandably, OmniPage was not particularly happy translating the half that was printed upside down. It made a valiant effort but it was never going to succeed translating upside down letters into proper words. My resolution for that was to mark that section of the page as an image and all was well.
Workflow Creation
One of the key features of OmniPage is the automated workflow. This allows me to customise every facet of the OCR process at every step and any step. There is no restriction on which function is the first step, although loading a file by one of the many available means would make a logical starting point in the workflow.
For example the loading a file, you can pre-determine which directory to start in either by manual prompt or load automatically from specified files or folders. Preprocessing options including defining rotation (automatic, none, right 90, 180, left 90), de-speckle image, de-skew image amongst others. Next you can choose to enhance the images or zone the images if they are of a particular layout, both of these tasks are optional. Once the files are preprocessed to the way you want, you can tell the workflow to perform the recognition process.
In the recognition step you can define the languages in the document, specify user dictionary, professional dictionaries (legal and medical in various languages) and font matching. You could also opt for speed over accuracy if so desired. Next you have control over editing and proofing options, mark text for redacting, highlight and strikeout including approximate matches rather than exact matches. Finally you can save it however you like and in any of the many supported format in a pre-determined location or be prompted.
Swapping it to an average digital camera in macro mode changed the results significantly. I did have to make some changes before performing the OCR, namely creating text zones where the text columns were. I did deliberately leave some curvature at the side of the page which was pretty graciously handled by the software. It was not set and forget but with a little effort it did a pretty adequate job from a difficult source image. The trick with mobile phones would be to make sure that the image is legible on screen which under less than optimal lighting conditions is rather challenging.
Other Languages
OCR for other languages proved to be more of a challenge. I picked a webpage with traditional Chinese character with a liberal sprinkle of English words. The website was a Chinese newspaper so the page was peppered with advertising, non-standard layouts, variable font colours and the like. To give you an idea of the mess printed Chinese newspaper can be, the front page is never about news. The front page always has a full page advertisement, the real news happens from page two onwards.
Other Features
The OmniPage Toolbar integration into Microsoft Office makes it handy to grab and convert documents directly into the Microsoft Office suite. There are actually two toolbar buttons, one for OCR and one for PDF. The latter converts documents into PDF format along with all the standard options such as password protection, comments, bookmarking and the like. An unactivated copy of PDF Create! Assistant will result in large impossible to ignore watermark declaring that it is created by a trial copy of the software.
The Speech Mode works pretty well considering my samples include place and people names, it did a fair job at not butchering my name and was better at it then some people I have met over the years. The various flavours of English speech were distinctly different.
After throwing quite a few curve balls at OmniPage Professional 17, I really did not have any major gripes about the software. Yes there were errors in the recognition process and sometimes it took a bit of playing with the features to get an acceptable result. What one has to keep in mind is that Optical Character Recognition is more an art than an exact science. Plenty of rocket science goes on behind the scenes to simplify the process but there is no magic bullet. It will always require some massaging and some human intervention to ensure that what is electronically recognised matches what it really should be.
At USD$499.99 (or AUD$999) either physical or download, OmniPage Professional 17 is certainly not cheap. The numerous features of the product and the integration into many document management systems should put it high on the list for businesses looking to reduce paper. For those with a Kindle the integration is a definite bonus. Overall it is a very solid product and definitely should be on the list for consideration for anyone serious about doing some OCR work. Full product details are available
LinkedIn and 3rd parties use essential and non-essential cookies to provide, secure, analyze and improve our Services, and to show you relevant ads (including professional and job ads) on and off LinkedIn. Learn more in our Cookie Policy.
7fc3f7cf58