Hi Kerry,
There is a Skip mode option (-s) for the filter-media command:
--
Sean
From:
dspac...@googlegroups.com <dspac...@googlegroups.com> on behalf of Kerry Bouchard <k.bou...@tcu.edu>
Date: Wednesday, May 5, 2021 at 3:20 PM
To: DSpace Technical Support <dspac...@googlegroups.com>
Subject: [dspace-tech] How do I create an exclusion list for filter-media?
|
Caution: This message was sent from outside the University of Manitoba. |
We are running into the problem described here: http://dspace.2283337.n4.nabble.com/Filter-media-on-PDFs-exported-from-Outlook-causes-a-TikaException-error-and-prevents-Items-from-inde-td4683489.html , where the *.pdf.txt files output by the PDF Text Extractor media filter for a couple of PDFs in our repository causes indexing to fail for not just the PDF full text, but all the associated metadata. (In our case, the PDFs were not output from Microsoft Outlook mail folders, but I'm seeing the same "org.apache.tika.exception.TikaException: Failed to parse an email message" in the dspace log file.)
The posting at the URL above refers to a work-around by creating an exclusion list for filter-media. But I can find any documentation on how to create an exclusion list. Can someone point me to that?
Thanks, Kerry
--
All messages to this mailing list should adhere to the Code of Conduct:
https://duraspace.org/about/policies/code-of-conduct/
---
You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
dspace-tech...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/dspace-tech/85e9b754-31d4-4558-8bde-071facdf9d0bn%40googlegroups.com.
Thank you!
-Kerry