Extract Text From A Scanned PDF Using OCR In Java: low elaboration in a Oracle Linux Server 8.6

1,029 views

Skip to first unread message

Giuseppe Coniglio

unread,

Nov 28, 2022, 9:50:10 AM11/28/22

to tesseract-ocr

Hi to all,

I have implemented a Spring boot microservice which use tess4j 4.3.1 and pdfbox 2.0.22 in my server Oracle Linux Server , example code https://colwil.com/how-to-extract-text-from-a-scanned-pdf-using-ocr-in-java/

When I execute code with my IDE on windows pc and invoke local service, time execution is fast : "Tesseract.doOcr" 8 seconds, so when I execute api to invoke microservice's code method "Tesseract.doOcr" is slow, parameter pdf file is the same

Any idea?

Thanks :-)

Giuseppe Coniglio

unread,

Nov 29, 2022, 9:47:39 AM11/29/22

to tesseract-ocr

Code is https://medium.com/gft-engineering/creating-an-ocr-microservice-using-tesseract-pdfbox-and-docker-155beb7f2623