On Thu, Jan 30, 2014 at 9:07 AM, Bodnar Robert <
bodnar...@bcucluj.ro> wrote:
> I have again a problem with an error, could you help me pls figure out
> what is the problem with the software?
Hi Robert,
the problem is not with the software - not with DSpace. DSpace uses a
library called Apache PDFBox to extract the text from PDFs. This
library can't extract the text from this particular file, most
commonly this is due to the PDF either being damaged or in a format it
can't work with (PDF is a container that can contain various types of
content). Perhaps this can shed light on this particular error, even
though it might not help you resolve it:
http://forum.openkm.com/viewtopic.php?f=3&t=8187
If you need (e.g. you have many files in this format), you might try
asking on the PDFBox mailing list why this happens and how to work
around it (that will almost surely involve changing the process of how
you generate the PDF).
Anyway, if PDFBox reports an error for a particular PDF, DSpace skips
indexing it and continues with the next file.
Regards,
~~helix84
Compulsory reading: DSpace Mailing List Etiquette
https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette