Bill
> On Aug 30, 11:04 pm, Bill Janssen <bill.jans...@gmail.com> wrote:
>> Properly formatted hOCR should "just work" with wkhtmltopdf.
>
> I gave it a try. Didn't come out as I expected. Was attempting to
> get PDFs of the scanned documents, keep the original look of the
> scanned images, but make them searchable.
If you just want to make the documents searchable, try
http://jwilk.net/software/ocrodjvu
In my opinion for some applications DjVu is much better then PDF. At
least you will keep the original look.
Regards
JSB
--
,
Prof. dr hab. Janusz S. Bien - Uniwersytet Warszawski (Katedra Lingwistyki Formalnej)
Prof. Janusz S. Bien - University of Warsaw (Formal Linguistics Department)
jsb...@uw.edu.pl, jsb...@mimuw.edu.pl, http://fleksem.klf.uw.edu.pl/~jsbien/
You'll need to write a tool that puts the page image into the PDF,
then draws "invisible" text on top of that. UpLib, for instance, does
this, if you put a document into an UpLib repository and then pull it
out in PDF format.
Bill