[Clarification request] Is it possible to let Tesseract generate three output files i) text ii) hOCR iii) PDF in a *single* run ?

195 views
Skip to first unread message

Tom

unread,
Sep 16, 2014, 4:03:22 AM9/16/14
to tesser...@googlegroups.com
I wish to generate the three meaningful output formats, at least hOCR and PDF, in one run (call).

Questions:

  1. Is it possible to generate hOCR and PDF in one go with the present (master branch) version ?
  2. Would it become possible (when I write a patch) — or do you foresee special coding problems which would make the single-run approach too complicate ?

Dovhani Foneworx

unread,
Sep 16, 2014, 7:48:57 AM9/16/14
to tesser...@googlegroups.com
I think you can if you write your own app that make use of tesseract.

Quan Nguyen

unread,
Sep 16, 2014, 9:33:47 PM9/16/14
to tesser...@googlegroups.com
You can use the new ResultRenderer API in v3.03 to generate different output formats simultaneously.

Shree Devi Kumar

unread,
Sep 16, 2014, 10:09:00 PM9/16/14
to tesser...@googlegroups.com
Quan,

Can it also be done in commandline version?

Shree

Shree Devi Kumar
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/3796e38d-d255-4260-97a0-7ec526ed02ce%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Quan Nguyen

unread,
Sep 16, 2014, 11:00:55 PM9/16/14
to tesser...@googlegroups.com
Sure, if the program is coded for it.

zdenko podobny

unread,
Sep 17, 2014, 3:25:45 AM9/17/14
to tesser...@googlegroups.com
At the moment tesseract executable allows only one output (per run).
It is a trivial change to allow multiple outputs

Zdenko

shree

unread,
Sep 27, 2014, 6:06:39 AM9/27/14
to tesser...@googlegroups.com
Thanks, Zdenko. It is good to be able to get txt, hocr and pdf from single run.

Tom

unread,
Oct 9, 2014, 8:47:23 AM10/9/14
to tesser...@googlegroups.com
I noticed that the recent HEAD version in git (October 2014) can be used with parameters

tesseract file.png file pdf hocr

to generate all what I want

  • file.txt
  • file.pdf
  • file.hocr

in one go

Many thanks for that l!

Reply all
Reply to author
Forward
0 new messages