Hi Enrico,
Thanks for the further feedback.
All the German documents I have tried have worked fine with
Tesseract/OCRFeeder.
The problem I'm having is that I am unable to complete the steps
outlined on the Tesseract3 training page because of errors I don't
understand.
This is why it would be helpful if I could speak to a developer
directly.
Thanks to a very generous donation by the Spielberg Foundation, the
National Yiddish Book Center has been able to digitise and make
available the 11,000+ volumes currently online at
archive.org.
The Center has now pledged to begin a project to translate titles
from this collection, but needs the ability to OCR these texts.
Although I cannot, of course, make any promises (I am only a
volunteer), it is reasonable to assume that with the amount of time
and effort that has gone into this project so far, the software with
the best chance for fulfilling this OCR need would also be in a good
position for possible funding considerations. I think that software
might be Tesseract.
Again, I cannot promise anything (and I am not making any promises
here), but if someone were able to help address the specific
problems I'm encountering, I could then pass this back to the Center
and give a much fuller endorsement of this product (if, in fact, it
can deliver the desired end result).
I would, therefore, be very grateful if someone from the development
team or someone who has done a good deal of new language
bootstrapping would contact me to help me understand where I've gone
wrong.
Please understand I am not asking anyone to do my work for me. I am
simply asking for advice. Although I am fairly confident working
from the command line, I am not a programmer and certainly do not
understand the specifics of this particular software package.
My thanks in advance.
Will