Removing Thumbnails in DSpace 5

1,068 views
Skip to first unread message

Jeffrey Sheldon

unread,
Feb 29, 2016, 6:02:07 PM2/29/16
to DSpace Community

DSpace folks,

Two questions for the community...

1. I'd like to remove a large number of thumbnail bitstreams that were generated for various bitstream types (JPEG, PDF, text, etc.). I'm doing this to simply fall back on a generic icon view for various bitstream types and be more selective about which types have generated thumbnails.

In comments dating back to 2013, helix84 suggests changing the 'deleted' column's boolean in the table 'bitstream' to true and then running "dspace cleanup -v", that this will remove the associate in bundle2bitstream.

Nabble ref:
http://dspace.2283337.n4.nabble.com/How-to-purge-remove-old-thumbnails-tp4667289p4667326.html

Unfortunately, the outcome is instead to remark that the item exists in bundle2bitstream and won't be removed. Here's the exact output I receive:

./dspace cleanup -v
- Deleting bitstream information (ID: 90841)
- Deleting bitstream record from database (ID: 90841)
Error: ERROR: update or delete on table "bitstream" violates foreign key constraint "$2" on table "bundle2bitstream"
Detail: Key (bitstream_id)=(90841) is still referenced from table "bundle2bitstream".

Since it is discouraged to manually remove the references in bundle2bitstream, are there any other suggestions? (Using "./dspace itemupdate" will require prepping a substantial archival structure, one I'm currently ruling out.)

2. A second question: Thumbnail generation for TIFF images in DSpace 5, via the media filter, resulted in Java exceptions and were all therefore skipped. I read in one document that ImageMagick and Ghostscript are the preferred conversion utilities in DSpace 5 and have switched to that from our default thumbnail settings in DSpace 5. Is this true and have I made the right choice? I would rather not change approaches just because a vague error is being produced.

Ref: See under "PLEASE NOTE"
http://wiki.lib.sun.ac.za/index.php/SUNScholar/Media_Filters/5.X


Regards,
-Jeff

Andrea Schweer

unread,
Feb 29, 2016, 6:19:41 PM2/29/16
to Jeffrey Sheldon, DSpace Community
Hi Jeffrey,

On 01/03/16 12:02, Jeffrey Sheldon wrote:
> 1. I'd like to remove a large number of thumbnail bitstreams that were generated for various bitstream types (JPEG, PDF, text, etc.). I'm doing this to simply fall back on a generic icon view for various bitstream types and be more selective about which types have generated thumbnails.

The "right" way to do this is probably via a curation task, assuming you
have Java skills available locally. The documentation is here:
https://wiki.duraspace.org/display/DSDOC5x/Curation+System#CurationSystem-Writingyourowntasks
-- there is also an option to use Jython (a python flavour) if that's
more convenient:
https://wiki.duraspace.org/display/DSDOC5x/Curation+tasks+in+Jython

I have a script on github that removes PNG thumbnails for bitstreams
that also have a JPG thumbnail:
https://github.com/UoW-IRRs/DSpace-Scripts/blob/master/src/main/java/nz/ac/waikato/its/irr/scripts/RemovePNGThumbnailsForPDFs.java
We had custom thumbnails for a while and switched back to DSpace default
when we upgraded to 5.x; the script was used as a one-off during the
upgrade. It's a command-line script, to be invoked via dsrun after
dropping the jar file into [dspace]/lib. I believe I went that route
(instead of writing a curation task) because it was faster to write this
way and we were confident that it really only would run once. It should
be easy enough to rip out the processItem method and toss it into a
curation task.

cheers,
Andrea

--
Dr Andrea Schweer
Lead Software Developer, ITS Information Systems
The University of Waikato, Hamilton, New Zealand
+64-7-837 9120

Joan Caparros

unread,
Sep 15, 2016, 11:57:32 AM9/15/16
to DSpace Community, jshe...@ksu.edu
I'm trying to figure out how I can delete a bitstream, now I have this curtation task but I'm not able to delete the file, I can not discover how I have to call a delete or remove method from this file, do you know how I can delete it?


class MyTask(ScriptedTask):
        def init(self, curator, taskName):
                print "initializing with Jython"

        def performDso(self, dso):
                print "perform on dso "
                if dso.getType()==2:
                        print "Item '" + dso.getName() + "' ("+dso.getHandle()+")"
                        myBundles = dso.itemService.getBundles(dso,"THUMBNAIL")
                        for i in myBundles:
                                myBitstreams = i.getBitstreams()
                                for k in myBitstreams:
                                        print "I WANT TO DELETE THIS FILE "+k.getName()
                return 0                
                
        def performId(self, context, id):
                print "perform on id %s" % (id)
                return 0










El dimarts, 1 març de 2016 0:19:41 UTC+1, Andrea Schweer va escriure:

Terry Brady

unread,
Sep 15, 2016, 12:21:21 PM9/15/16
to Joan Caparros, DSpace Community, jshe...@ksu.edu

--
You received this message because you are subscribed to the Google Groups "DSpace Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dspace-community+unsubscribe@googlegroups.com.
To post to this group, send email to dspace-community@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-community.
For more options, visit https://groups.google.com/d/optout.



--
Terry Brady
Applications Programmer Analyst
Georgetown University Library Information Technology
425-298-5498 (Seattle, WA)

helix84

unread,
Sep 15, 2016, 12:40:19 PM9/15/16
to Joan Caparros, Terry Brady, DSpace Community
You can also refer to here:

http://demo.dspace.org/javadocs/


Regards,
~~helix84

Compulsory reading: DSpace Mailing List Etiquette
https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette
>> email to dspace-communi...@googlegroups.com.
>> To post to this group, send email to dspace-c...@googlegroups.com.
>> Visit this group at https://groups.google.com/group/dspace-community.
>> For more options, visit https://groups.google.com/d/optout.
>
>
>
>
> --
> Terry Brady
> Applications Programmer Analyst
> Georgetown University Library Information Technology
> http://georgetown-university-libraries.github.io/
> 425-298-5498 (Seattle, WA)
>
> --
> You received this message because you are subscribed to the Google Groups
> "DSpace Community" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to dspace-communi...@googlegroups.com.
> To post to this group, send email to dspace-c...@googlegroups.com.

Joan Caparros

unread,
Sep 15, 2016, 1:08:41 PM9/15/16
to DSpace Community, joanca...@gmail.com, jshe...@ksu.edu
Thank you Terry, I think that in DSpace could work but not in DSpace 6, Bitstream doesn't have this method

https://github.com/DSpace/DSpace/blob/master/dspace-api/src/main/java/org/dspace/content/Bitstream.java

El dijous, 15 setembre de 2016 18:21:21 UTC+2, Terry Brady va escriure:
To unsubscribe from this group and stop receiving emails from it, send an email to dspace-communi...@googlegroups.com.
To post to this group, send email to dspace-c...@googlegroups.com.

Terry Brady

unread,
Sep 15, 2016, 1:36:31 PM9/15/16
to Joan Caparros, DSpace Community, jshe...@ksu.edu

Joan Caparros

unread,
Sep 16, 2016, 7:05:26 AM9/16/16
to DSpace Community, joanca...@gmail.com, jshe...@ksu.edu
Thank you for all your help, I've tried almost all, now with the BitstreamService I think that is the correct way but I'm getting an error of ConcurrentModificationException, I'm completely lost, I'm not modifying my ArrayList...

initializing with Jython
perform on dso 
Item '10- Façana posterior' (10687/35746)
DELETE aplan_436-A_1266_00010.tif
Exception: null
java.util.ConcurrentModificationException
at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:859)
at java.util.ArrayList$Itr.next(ArrayList.java:831)
at org.hibernate.collection.internal.AbstractPersistentCollection$IteratorProxy.next(AbstractPersistentCollection.java:810)
at org.python.core.JavaIterator.__iternext__(JavaIterator.java:18)
at org.python.pycode._pyx0.performDso$3(<script>:29)
at org.python.pycode._pyx0.call_function(<script>)
at org.python.core.PyTableCode.call(PyTableCode.java:167)
at org.python.core.PyBaseCode.call(PyBaseCode.java:307)
at org.python.core.PyBaseCode.call(PyBaseCode.java:198)
at org.python.core.PyFunction.__call__(PyFunction.java:482)
at org.python.core.PyMethod.instancemethod___call__(PyMethod.java:237)
at org.python.core.PyMethod.__call__(PyMethod.java:228)
at org.python.core.PyMethod.__call__(PyMethod.java:218)
at org.python.core.PyMethod.__call__(PyMethod.java:213)
at org.python.core.PyObject._jcallexc(PyObject.java:3626)
at org.python.proxies.__builtin__$MyTask$0.performDso(Unknown Source)
at org.dspace.curate.ResolvedTask.perform(ResolvedTask.java:88)
at org.dspace.curate.Curator$TaskRunner.run(Curator.java:537)
at org.dspace.curate.Curator.curate(Curator.java:252)
at org.dspace.curate.Curator.curate(Curator.java:199)
at org.dspace.curate.CurationCli.main(CurationCli.java:229)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:229)
at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:81)


class MyTask(ScriptedTask):
        def init(self, curator, taskName):
                print "initializing with Jython"

        def performDso(self, dso):
                print "perform on dso "
                if dso.getType()==2:
                        print "Item '" + dso.getName() + "' ("+dso.getHandle()+")"
                        myBundles = dso.itemService.getBundles(dso,"ORIGINAL")
                        for i in myBundles:
                                myBitstreams = i.getBitstreams()
                                for k in myBitstreams:
                                        print "DELETE "+k.getName()
                                        bitstreamService = ContentServiceFactory.getInstance().getBitstreamService()
                                        bitstreamService.delete(Curator.curationContext(),k)

Joan Caparros

unread,
Sep 20, 2016, 4:08:57 AM9/20/16
to DSpace Community, joanca...@gmail.com, jshe...@ksu.edu
Any help? can anyone try it?

i will appreciate it

Thank you
Joan

El divendres, 16 setembre de 2016 13:05:26 UTC+2, Joan Caparros va escriure:

Joan Caparros

unread,
Sep 26, 2016, 5:58:12 AM9/26/16
to DSpace Community, joanca...@gmail.com, jshe...@ksu.edu
For those who have been following this thread I want to share my work,

Here you will find a working curation task for DSpace-6_x that can delete all files in a bundle (in this example THUMBNAILS), and also will delete empty Bundles (orphan):


Have fun.
Bests
Joan Caparrós



El dimarts, 20 setembre de 2016 10:08:57 UTC+2, Joan Caparros va escriure:
Reply all
Reply to author
Forward
0 new messages