tesseract + emscripten?

331 views
Skip to first unread message

Nick Williams

unread,
Sep 13, 2013, 3:52:05 PM9/13/13
to tesser...@googlegroups.com
Hi all,

I've had an idea for a while about utilising getUserMedia/file input in a browser to capture a picture of text, and then use OCR to get the text. It seems there are no readily available JS OCR libs, so I've started thinking about using emscripten to compile tesseract to JS. 

I'd just like to know is this viable? Modern browsers can create binary blobs, so I assume this would be possible, providing the tesseract API supports it?

Thanks in advance,
Nick

Vanuan

unread,
Sep 16, 2013, 1:54:11 PM9/16/13
to tesser...@googlegroups.com
There's no other way to know than to try.

Sven Pedersen

unread,
Sep 16, 2013, 3:04:51 PM9/16/13
to tesser...@googlegroups.com
Here is a tesseract interface via NodeJS:

But I think you're a psycho! :-)
Running OCR in a browser would be excessively resource intensive at present. Perhaps send the blob to your server (making sure to limit its size) and send back an asynchronous JSON text reply? There are some OCR server systems.

Google Docs API is an option:

Just some thoughts. Many of us would like to hear what you can come up with.
--Sven


--
--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesser...@googlegroups.com
To unsubscribe from this group, send email to
tesseract-oc...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en
 
---
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.



--
``All that is gold does not glitter,
  not all those who wander are lost;
the old that is strong does not wither,
  deep roots are not reached by the frost.
From the ashes a fire shall be woken,
  a light from the shadows shall spring;
renewed shall be blade that was broken,
  the crownless again shall be king.”

Nick Williams

unread,
Sep 19, 2013, 4:26:54 AM9/19/13
to tesser...@googlegroups.com
:D Pyschotic I may be, but Atwood's law compels me to do it! I'm committed to getting it working in the client, I don't want these fancy pants node bindings - they're cheating if you ask me! 

I'll report back if/when I have made some headway with this. I've been put in touch with one of the emscripten guys by Brendan Eich (!!) so maybe the ball will get rolling :)

Jochen Brüggemann

unread,
Nov 4, 2014, 3:21:38 AM11/4/14
to tesser...@googlegroups.com
Any news here? I would be very interested in your result, because in our business application sending the file to the server is no option because of security/privacy issues. So client-side OCR in JS would be great.

Message has been deleted

Jean Millerat

unread,
Mar 14, 2016, 1:09:32 PM3/14/16
to tesseract-ocr, jochen...@gmail.com
Hey there,


Le mardi 4 novembre 2014 09:21:38 UTC+1, Jochen Brüggemann a écrit :
Any news here? I would be very interested in your result, because in our business application sending the file to the server is no option because of security/privacy issues. So client-side OCR in JS would be great.


We were in a similar situation and found out how to compile tesseract with emscripten. Our howto documentation is available from this post on my blog :

http://www.akasig.org/2016/03/14/how-to-run-tesseract-from-web-browsers-with-the-help-of-emscripten/

Enjoy !

--
Jean
Reply all
Reply to author
Forward
0 new messages