Bill
> On Aug 30, 11:04 pm, Bill Janssen <bill.jans...@gmail.com> wrote:
>> Properly formatted hOCR should "just work" with wkhtmltopdf.
>
> I gave it a try.  Didn't come out as I expected.  Was attempting to
> get PDFs of the scanned documents, keep the original look of the
> scanned images, but make them searchable.  
If you just want to make the documents searchable, try
http://jwilk.net/software/ocrodjvu
In my opinion for some applications DjVu is much better then PDF. At
least you will keep the original look.
Regards
JSB
-- 
                           ,   
Prof. dr hab. Janusz S. Bien -  Uniwersytet Warszawski (Katedra Lingwistyki Formalnej)
Prof. Janusz S. Bien - University of Warsaw (Formal Linguistics Department)
jsb...@uw.edu.pl, jsb...@mimuw.edu.pl, http://fleksem.klf.uw.edu.pl/~jsbien/
You'll need to write a tool that puts the page image into the PDF,
then draws "invisible" text on top of that.  UpLib, for instance, does
this, if you put a document into an UpLib repository and then pull it
out in PDF format.
Bill