[Dspace-tech] Filter media error

192 views
Skip to first unread message

Navalkishore H Sarda

unread,
Aug 24, 2015, 2:24:25 PM8/24/15
to dspac...@lists.sourceforge.net

 

For this particular item which was added on March 9th 2005, filter-media is throwing error.

https://ritdml.rit.edu/dspace/handle/1850/431

 

It just looks like normal pdf to me.

 

SKIPPED: bitstream 1113 because 'USHumanResources_2.pdf.txt' already exists

ERROR filtering, skipping bitstream #1123 java.io.IOException: You do not have permission to extract text

java.io.IOException: You do not have permission to extract text

        at org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:140)

        at org.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:99)

        at org.dspace.app.mediafilter.PDFFilter.getDestinationStream(PDFFilter.java:108)

        at org.dspace.app.mediafilter.MediaFilter.processBitstream(MediaFilter.java:157)

        at org.dspace.app.mediafilter.MediaFilterManager.filterBitstream(MediaFilterManager.java:244)

        at org.dspace.app.mediafilter.MediaFilterManager.filterItem(MediaFilterManager.java:207)

        at org.dspace.app.mediafilter.MediaFilterManager.applyFiltersAllItems(MediaFilterManager.java:184)

        at org.dspace.app.mediafilter.MediaFilterManager.main(MediaFilterManager.java:155)

Creating search index:

java.lang.Throwable: Warning: You did not close the PDF Document

        at org.pdfbox.cos.COSDocument.finalize(COSDocument.java:384)

        at java.lang.ref.Finalizer.invokeFinalizeMethod(Native Method)

        at java.lang.ref.Finalizer.runFinalizer(Finalizer.java:83)

        at java.lang.ref.Finalizer.access$100(Finalizer.java:14)

        at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:160)

 

Any insight will be of great help. Because of this item, indexing after filtering as the logs shows is not working.

Thanks in advance.

 

-          Naval

-           

 

Dalal, Dhaval

unread,
Aug 24, 2015, 2:24:27 PM8/24/15
to dspac...@lists.sourceforge.net, nhs...@rit.edu
Hi Naval
You are running ./filter-media -vf
as "dspace" user?
Dhaval

Navalkishore H Sarda

unread,
Aug 24, 2015, 2:24:28 PM8/24/15
to George Kozak, da...@bnl.gov, dspac...@lists.sourceforge.net

 

Thanks for all your help!!!.

Pdf submitted was locked earlier. Now it has been unlocked. And filter-media is working fine now.

 

- Naval


From: George Kozak [mailto:gs...@cornell.edu]
Sent: Thursday, March 10, 2005 3:30 PM
To: Navalkishore H Sarda
Subject: Re: [Dspace-tech] Filter media error

 

Naval:

I've seen this same error for PDF's that have been uploaded to DSpace but which the user had originally set up with a password protection so no one could change their PDF.

***************************
George Kozak
Digital Library Specialist
Library Systems
501 Olin Library
Cornell University
607-255-8924
***************************
gs...@cornell.edu

Gary Browne

unread,
Aug 24, 2015, 4:14:41 PM8/24/15
to dspac...@lists.sourceforge.net

Hi all

 

Has anyone seen this one running filter-media before?

 

java.io.IOException: Invalid header signature; read 7015536635646467195, expected -2226271756974174256

      at org.apache.poi.poifs.storage.HeaderBlockReader.<init>(HeaderBlockReader.java:125)

      at org.apache.poi.poifs.filesystem.POIFSFileSystem.<init>(POIFSFileSystem.java:120)

      at org.textmining.text.extraction.WordExtractor.extractText(WordExtractor.java:32)

      at org.dspace.app.mediafilter.WordFilter.getDestinationStream(WordFilter.java:97)

      at org.dspace.app.mediafilter.MediaFilter.processBitstream(MediaFilter.java:162)

      at org.dspace.app.mediafilter.MediaFilterManager.filterBitstream(MediaFilterManager.java:287)

      at org.dspace.app.mediafilter.MediaFilterManager.filterItem(MediaFilterManager.java:250)

      at org.dspace.app.mediafilter.MediaFilterManager.applyFiltersAllItems(MediaFilterManager.java:224)

      at org.dspace.app.mediafilter.MediaFilterManager.main(MediaFilterManager.java:195)

 

 

Cheers

Gary

 

 

Gary Browne
Development Programmer
Library IT Services
University of Sydney
Australia
ph: 61-2-9351 5946

 

Mark Diggory

unread,
Aug 24, 2015, 4:14:42 PM8/24/15
to Gary Browne, dspac...@lists.sourceforge.net
Thats pretty deep inside POI, I might recommend you post a similar question there and see what kind of response you get.


This might be a start:

Cheers,
Mark

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
DSpace-tech mailing list

Mark R. Diggory
~~~~~~~~~~~~~
DSpace Systems Manager
MIT Libraries, Systems and Technology Services
Massachusetts Institute of Technology


Reply all
Reply to author
Forward
0 new messages