How to create html using Tesseract hocr in iOS?

255 views
Skip to first unread message

Arnold Bailey

unread,
Nov 6, 2013, 11:38:06 AM11/6/13
to tesser...@googlegroups.com

I have a working iOS code using tesseract. I would like to create html using the hocr config option. I cannot find an example of how to generate html programmatically. Is there any sample code/tutorial around? I'm using Xcode 5 and the latest Tesseract-ios.

- (NSString *)OCRPage:(NSString )imagePath { //NSLog(@"imagePath: %@", imagePath); Tesseract tesseract = [[Tesseract alloc] initWithDataPath:@"tessdata" language:@"eng"]; [tesseract setImage:[UIImage imageNamed:imagePath]]; [tesseract recognize];

//NSLog(@"%@", [tesseract recognizedText]);
NSString *textData=[tesseract recognizedText];

UIAccessibilityPostNotification(UIAccessibilityAnnouncementNotification, [tesseract recognizedText]);
return textData;

}

Also, the use of the config file is unclear. I don't have a configs directory off of the tesseractdata path.


Arnold Bailey

unread,
Nov 17, 2013, 6:05:19 PM11/17/13
to tesser...@googlegroups.com
Ping …. Does anyone know if hocr output can be generated with the iOS version?

zdenko podobny

unread,
Nov 18, 2013, 3:54:18 AM11/18/13
to tesser...@googlegroups.com
I am not iOS user ;-) If you use some tool/wrapper/code you should ask its author(s) first :-).

You do not provide a lot of relevant information (e.g. what version of tesseract you are using, if you are just wrapping tesseract executable etc...) IMO there is nothing platform specific regarding hocr output: if you are able to use tesseract API, you can use API function GetHOCRText[1]. If you are using tesseract executable, you need to (create&)use hocr config file[2].

Zdenko


--
--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesser...@googlegroups.com
To unsubscribe from this group, send email to
tesseract-oc...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en
 
---
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Arnold Bailey

unread,
Nov 19, 2013, 11:42:46 AM11/19/13
to tesser...@googlegroups.com
Thanks for the reply. I believe that I need to contact the wrapper developer Lois Di Qual. It is using the tesseract 3.0.2 release. My interpretation is that the appears that the wraper does not use a configs file. The wrapper has a setVariableValue method that I coded as follows:

    [tesseract setVariableValue:@"TRUE" forKey:@"tessedit_create_hocr"];


It doesn't generate the html output. 




You received this message because you are subscribed to a topic in the Google Groups "tesseract-ocr" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/tesseract-ocr/pzK-sH4YLO8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to tesseract-oc...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages