[DuraSpace JIRA] (DS-3873) PDFBox runs on more than PDF bitstreams

2 views
Skip to first unread message

Chris Herron (Atmire) (DuraSpace JIRA)

unread,
Mar 20, 2018, 4:42:01 PM3/20/18
to dspace-...@googlegroups.com
Chris Herron (Atmire) created an issue
 
DSpace / Bug DS-3873
PDFBox runs on more than PDF bitstreams
Issue Type: Bug Bug
Affects Versions: 6.2, 6.1, 6.0, 7.0, 6.3
Assignee: Unassigned
Components: filter-media
Created: 20/Mar/18 3:41 PM
Priority: Minor Minor
Reporter: Chris Herron (Atmire)

PDFBoxThumbnail media filter is currently configured to run on more than just PDFs. This consistently results in an error in the logs:

ERROR filtering, skipping bitstream #7f142dd0-fca6-4533-b966-19b1354e9a9d java.io.IOException: Error: Header doesn't contain versioninfo
java.io.IOException: Error: Header doesn't contain versioninfo
        at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:244)
        at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:966)
        at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:868)
        at org.dspace.app.mediafilter.PDFBoxThumbnail.getDestinationStream(PDFBoxThumbnail.java:80)
        at org.dspace.app.mediafilter.MediaFilterServiceImpl.processBitstream(MediaFilterServiceImpl.java:358)
        at org.dspace.app.mediafilter.MediaFilterServiceImpl.filterBitstream(MediaFilterServiceImpl.java:286)
        at org.dspace.app.mediafilter.MediaFilterServiceImpl.filterItem(MediaFilterServiceImpl.java:180)
        at org.dspace.app.mediafilter.MediaFilterServiceImpl.applyFiltersItem(MediaFilterServiceImpl.java:158)
        at org.dspace.app.mediafilter.MediaFilterCLITool.main(MediaFilterCLITool.java:315)

This can be fixed by modifying .getInputMIMETypes() to only return "Adobe PDF" instead of all ImageIO Reader Mimetypes.

Add Comment Add Comment
 
This message was sent by Atlassian JIRA (v7.3.3#73014-sha1:d5be8da)
Atlassian logo

Anonymous (DuraSpace JIRA)

unread,
Mar 21, 2018, 10:19:01 AM3/21/18
to dspace-...@googlegroups.com
Issue was automatically transitioned when Chris Herron created pull request #1995 in GitHub
Change By: Chris Herron
Status: Received Code Review Needed

Tim Donohue (LYRASIS JIRA)

unread,
May 4, 2021, 4:06:01 PM5/4/21
to dspace-...@googlegroups.com
Tim Donohue closed an issue as Duplicate
Change By: Tim Donohue
Resolution: Duplicate
Status: Code Review Needed Closed
This message was sent by Atlassian Jira (v8.13.2#813002-sha1:c495a97)
Atlassian logo

Tim Donohue (LYRASIS JIRA)

unread,
May 4, 2021, 4:06:02 PM5/4/21
to dspace-...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages