On Wed, Aug 16, 2023 at 12:44:38PM -0700, DSpace Community wrote:
> DSpace does not have an OCR engine. It is only able to index PDFs (or
> other electronic files) if they have been previously OCR'ed by a different
> system.
Or if they contained machine-readable text to begin with.
So: a PDF that was rendered from a word-processing document (for
example) probably contains text that can be flattened and indexed. A
PDF which contains images of paper documents will not, unless the
imaging software or some other tool has OCRed the images and added a
text layer to the PDF.
--
Mark H. Wood
Lead Technology Analyst
University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu