OCRed text becomes mangled when coping it in Preview.app on OSX

27 views
Skip to first unread message

const...@doo.net

unread,
Oct 28, 2014, 9:52:54 AM10/28/14
to tesser...@googlegroups.com
Hi, 

I have a problem with PDF generated by tesseract. When one opens this document in Preview.app on OS X it become mangled: spacing is incorrect and lines are in wrong order.
Although coping text in Acrobat Reader works great. Is there any way to tweak some values or parameters to fix this issue?


Here is how text can looks:
String Objects
201407-15 | Copyright0 2014AppleInc.AllRightsReserved.
9
NSString Class Reference Overview
The NSSt ring class has two primitive methods— length (page 31) and characterAtIndex: (page 49)—that
provide the basis for all other methods in its interface. The length of Unicode characters in the string. cha racterAtIndex: (page 49)
(page 31) mEthOd
gives access to each character in the string
by index, with index values starting at 0.
NSStringobjectsrepresentcharacterstringsinframeworks.Representingstringsasobjectsallowsyouto usestringswhereveryouuseotherobjects.Italsoprovidesthebenefitsofencapsulation,sothatstringobjects canusewhateverencodingandstorageareneededforefficiencywhilesimplyappearingasarraysofcharacters.
Thecluster'stwopublicclasses,NSStringandNSMutableString,declaretheprogrammaticinterfacefor
returns the total number
NSStringdeclaresmethodsforfindingandcomparingstrings.Italsodeclaresmethodsforreadingnumeric values from strings, for combining strings in various ways, and for converting a string to different forms (such
as encoding and case changes).
TheApplicationKitalsousesNSParagraphStyleanditssubclassNSMutableParagraphStyletoencapsulate theparagraphorrulerattributesusedbytheNSA’ttributedStringclasses.Additionally,methodstosupport
string drawing are described
in NSString Additions, found
in the Application Kit.
NSSt ring is"toll—free bridged” with itsCore Foundation counterpart, CFStringRef. See "Toll-Free Bridging”
formore
information
on toll-free bridging.
non—editable and editable strings, respectively.
Note:Animmutablestringisatextstringthatisdefinedwhenitiscreatedandsubsequentlycannot bechanged.AnimmutablestringisimplementedasanarrayofUnicodecharacters(inotherwords,
atextstring).Tocreateandmanageanimmutablestring,usetheNSStringclass.Toconstructand manageastringthatcanbechangedafterithasbeencreated,useNSMutableString.
The objects you create using NSSt ring and NSMutableSt ring are referred to as string objects (or,when no
confusion will result, merely as strings). The term C string refers to the standard cha r * type. Because of the natureofclassclusters,stringobjectsaren’tactualinstancesoftheNSStringorNSMutableStringclasses
b u t o f o n e o f t h e i r p r i v a t e s u b c l a s s e s . A l t h o u g h a s t r i n g o b j e c t ’ s c l a s s is p r i v a t e , its i n t e r f a c e is p u b l i c , a s d e c l a r e d by these abstract superclasses, NSSt ring and NSMutableSt ring. The string classes adopt the NSCopying
andNSMutableCopyingprotocols,makingitconvenienttoconvertastringofonetypetotheother....



I'm using latest version of tesseract, compiled from source code (Rev. 239f350a7288, Oct 14, 2014).
out.pdf

const...@doo.net

unread,
Oct 28, 2014, 10:06:19 AM10/28/14
to tesser...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages