You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to tesseract-ocr
I am wanting to create an online application that takes a large pdf file and extracts information that is valuable for the user. The key to the application is going to be speed - I am basically wanting to provide a minimal service for free that builds up an e-mail address. I know when I OCRed one of these files in FoxIt it takes about 20 minutes. Here is my question: most of the information that I need is in the bookmarks but not all. One piece of info I need is an address that I could either get from accessing an API in Google Maps os something, or doing a partial OCR . I can see OCRing 10-12 pages to get my info. I am wondering about speed - anyone have ideas about what approach would be the fastest?