Authentication error when depositing via SWORDv2

179 views
Skip to first unread message

MALMQUIST Hrafn

unread,
Apr 9, 2018, 11:27:12 AM4/9/18
to dspac...@googlegroups.com

Hello all



I am getting an error response from DSpace when trying to deposit content via it's SWORDv2 API.


I am using the Python SWORD client library (https://github.com/swordapp/python-client-sword2).


What I find particularly confusing is the fact that when I try to deposit using curl, everything goes smoothly:


curl -i --data-binary "@strategic_plan_2016.pdf" -H "Content-Disposition:attachment; filename=strategic_plan_2016.pdf" -H "Content-Type:application/pdf" -H "Packaging:http://purl.org/net/sword/package/Binary" -u hrafn.m...@ed.ac.uk:********** -X POST http://test.digitalpreservation.is.ed.ac.uk/swordv2/collection/123456789/2

the python script that generates the error is copy pasted below

There is one caveat that the SSL certificate on the server is broken which might be an issue (https://github.com/swordapp/python-client-sword2/issues/9). 

However I get a 403 response when trying to deposit the files via the Python sword client. The edit-media file (http://test.digitalpreservation.is.ed.ac.uk/swordv2/edit-media/e9598d4d-ba1c-4710-95b9-000b8ce30772) gives this error:

<atom:title xmlns:atom="http://www.w3.org/2005/Atom">ERROR</atom:title>
<atom:updated xmlns:atom="http://www.w3.org/2005/Atom">2018-04-09T10:39:20Z</atom:updated>
<atom:generator xmlns:atom="http://www.w3.org/2005/Atom" uri="http://www.dspace.org/ns/sword/2.0/" version="2.0">dspac...@myu.edu</atom:generator>
<sword:treatment>Processing failed</sword:treatment>
<atom:summary xmlns:atom="http://www.w3.org/2005/Atom">No plugin can disseminate the requested formats</atom:summary>
<sword:verboseDescription>
org.swordapp.server.SwordError: No plugin can disseminate the requested formats at org.dspace.sword2.SwordDisseminatorFactory.getContentInstance(SwordDisseminatorFactory.java:112) at org.dspace.sword2.MediaResourceManagerDSpace.getItemResource(MediaResourceManagerDSpace.java:114) at org.dspace.sword2.MediaResourceManagerDSpace.getMediaResourceRepresentation(MediaResourceManagerDSpace.java:229) at org.swordapp.server.MediaResourceAPI.get(MediaResourceAPI.java:82) at org.swordapp.server.MediaResourceAPI.get(MediaResourceAPI.java:33) at org.swordapp.server.servlets.MediaResourceServletDefault.doGet(MediaResourceServletDefault.java:35) at javax.servlet.http.HttpServlet.service(HttpServlet.java:622) at javax.servlet.http.HttpServlet.service(HttpServlet.java:729) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:230) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:165) at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:192) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:165) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:198) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:96) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:474) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:140) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:79) at org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:624) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:87) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:349) at org.apache.coyote.ajp.AjpProcessor.service(AjpProcessor.java:478) at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:66) at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:789) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1437) at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) at java.lang.Thread.run(Thread.java:748)
</sword:verboseDescription>
<atom:link xmlns:atom="http://www.w3.org/2005/Atom" rel="alternate" type="text/html" href="http://test.digitalpreservation.is.ed.ac.uk/contact"/>
</sword:error>

I even tried adding the verify=False argument to the request post which should disable SSL certificate checking but I get the same error. Any ideas on what might be causing this issue?

Unless it is the SSL certificate I can't understand why depositing using the same user name as I've done using curl gives an authentication error using python request.

Hrafn Malmquist
digital library development
Edinburgh University


Python Script
---------------------------------------

from sword2 import Connection, exceptions
import requests
import os
import urllib


c = Connection("http://test.digitalpreservation.is.ed.ac.uk/swordv2/servicedocument", user_name="hrafn.m...@ed.ac.uk", user_pass="**********")

entry_receipt = c.create(
            col_iri=destination_path,
            in_progress=True,
            metadata_entry=entry,
        )

headers = {
                'Content-Type': 'application/pdf', #str(mimetypes.guess_type("strategic_plan_2016.pdf")),
                # 'Content-MD5': str(md5sum),
                #'Packaging': 'http://purl.org/net/sword/package/Binary',
                'Content-Length': str(os.path.getsize("strategic_plan_2016.pdf")),
                'Content-Disposition': "attachment; filename=%s" % urllib.quote(os.path.basename("strategic_plan_2016.pdf")),
            }


entry = '<?xml version="1.0"?>' \
        '<entry xmlns="http://www.w3.org/2005/Atom" xmlns:dcterms="http://purl.org/dc/terms/">' \
        '    <generator uri="http://bitbucket.org/beno/python-sword2" version="0.1"/>' \
        '    <dcterms:rights.copyright xmlns:atom="http://www.w3.org/2005/Atom">This content may be under copyright. Researchers are responsible for determining the appropriate use or reuse of materials.</dcterms:rights.copyright>' \
        '    <dcterms:title xmlns:atom="http://www.w3.org/2005/Atom">Strategic Plan DB</dcterms:title>' \
        '       <atom:title xmlns:atom="http://www.w3.org/2005/Atom">Strategic Plan Atom</atom:title>' \
        '       <dcterms:date.issued xmlns:atom="http://www.w3.org/2005/Atom">2018</dcterms:date.issued>' \
        '       <atom:updated xmlns:atom="http://www.w3.org/2005/Atom">2018-04-06T04:08:41.425884</atom:updated>' \
        '       <dcterms:relation.ispartofseries xmlns:atom="http://www.w3.org/2005/Atom">Central Records Registry - ESTATES</dcterms:relation.ispartofseries>' \
        '       <dcterms:description.abstract xmlns:atom="http://www.w3.org/2005/Atom"/>' \
        '       <dcterms:contributor.author xmlns:atom="http://www.w3.org/2005/Atom">University of Edinburgh (Scottish University)</dcterms:contributor.author>' \
        '</entry>'

with open("strategic_plan_2016.pdf", "rb") as data:
    content = data.read()


receipt = requests.post(entry_receipt.edit_media, headers=headers, data=content, auth=("hrafn.m...@ed.ac.uk", "**********"), verify=False)

---------------------------------------

Tim Donohue

unread,
Apr 10, 2018, 10:10:45 AM4/10/18
to MALMQUIST Hrafn, dspac...@googlegroups.com
Hello Hrafn,

You may want to check the log files on the DSpace server ([dspace]/log/dspace.log.[date]) to see if further information is given on the 403 response you are seeing.

The error you are getting from the "edit-media" file may be *unrelated* to the initial 403 response, as that error seems to be resulting from an inability of DSpace to re-disseminate the deposited file. It specifically states: "No plugin can disseminate the requested formats" That error only occurs if DSpace has trouble disseminating an object -- and has nothing to do with the deposit process. 

So, my guess is that the real error behind the 403 response likely is in the DSpace server logs (or maybe the Tomcat logs).  If you need more information on finding the error in the logs, take a look at https://wiki.duraspace.org/display/DSPACE/Troubleshoot+an+error

I'd recommend sending the error stack to this mailing list once you find it.  Hopefully it provides more information so that someone on this list can help.

One final note, I'd recommend double checking that your "curl" command and Python script are sending the same comment.  I'm not a Python coder myself, but it looks to me like you've commented out the "Packaging: http://purl.org/net/sword/package/Binary" header in your Python script...while it is included in your "curl" script.


Tim



--
You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dspace-tech...@googlegroups.com.
To post to this group, send email to dspac...@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

--
You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dspace-tech...@googlegroups.com.
To post to this group, send email to dspac...@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.
--
Tim Donohue
Technical Lead for DSpace & DSpaceDirect
DuraSpace.org | DSpace.org | DSpaceDirect.org

MALMQUIST Hrafn

unread,
Apr 11, 2018, 5:32:14 AM4/11/18
to tdon...@duraspace.org, dspac...@googlegroups.com

Hello Tim


Thank you for taking the time to reply and for good suggestions.


The Edit-media error occurs before I hand over any data about whether or not I intend to deposit files (see attached simple-test.py). In this script I just hand over XML formatted metadata (entry variable) because this gives me the Edit-media IRI which is needed for me to deposit the files (in simple-test.py I am in fact reproducing the process on lines 252-268 on https://github.com/artefactual/archivematica-storage-service/blob/stable/0.11.x/storage_service/locations/models/dspace.py with a view to make line 297 work). 


DSpace 6.2 log, at this point shows no error (see attached dspace.log)


Python SWORDv2 Client log for the same process doesn't show an error either (see attached sword-client.log)


So running the simple-test.py will result in an item being created in submissions DSpace with the declared metadata. It seems no error is logged.


If the generated Edit-media IRI is opened, the "No plugin can disseminate the requested formats" error is displayed (which does not show up in logs). If I attempt to deposit files using this Edit-media IRI I get 403 errors which do show up in the SWORD client log as:


DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): test.digitalpreservation.is.ed.ac.uk
DEBUG:urllib3.connectionpool:http://test.digitalpreservation.is.ed.ac.uk:80 "POST /swordv2/edit-media/a454b207-dea5-4fcb-ba80-1618317d10f0 HTTP/1.1" 403 996

and in the DSpace log as:

2018-04-10 15:07:06,202 INFO  org.dspace.sword2.MediaResourceManagerDSpace @ hrafn.m...@ed.ac.uk:session_id=0:replace_failed_authorisation:user=hrafn.m...@ed.ac.uk,on_behalf_of=none


Now, the fact that the curl command works and also a simple equivalent submission using the python requests library (see attached request.py) tells me that the SWORD server is functioning correctly and that this can't be a proper permissions issue.

Best regards, Hrafn



From: Tim Donohue <tdon...@duraspace.org>
Sent: 10 April 2018 15:10:31
To: MALMQUIST Hrafn
Cc: dspac...@googlegroups.com
Subject: Re: [dspace-tech] Authentication error when depositing via SWORDv2
 
dspace.log
sword-client.log
simple-test.py
request.py

MALMQUIST Hrafn

unread,
Apr 11, 2018, 9:27:38 AM4/11/18
to tdon...@duraspace.org, dspac...@googlegroups.com

Hello


I'm just following up by adding that I replicate can the error on http://demo.dspace.org.


Sending in a metadata only entry is mostly fine (see working.py). For the first few runs they went straight through but then an error was generated in the sword client but the deposited item materialised nonetheless (for instance: http://demo.dspace.org/xmlui/handle/10673/80)


Example of an Edit-IRI http://demo.dspace.org/swordv2/edit/334112ee-077b-4192-9ba3-8606e399aa4d

and the Edit-Media-IRI gives the same error:

http://demo.dspace.org/swordv2/edit-media/334112ee-077b-4192-9ba3-8606e399aa4d


To authenticate

user: dspacede...@gmail.com

pass: dspace


Any thoughts?


Hrafn


From: dspac...@googlegroups.com <dspac...@googlegroups.com> on behalf of MALMQUIST Hrafn <Hrafn.M...@ed.ac.uk>
Sent: 11 April 2018 10:32:05
To: tdon...@duraspace.org
working.py

Tim Donohue

unread,
Apr 13, 2018, 11:18:22 AM4/13/18
to MALMQUIST Hrafn, dspac...@googlegroups.com
Hello Hrafn,

I've had a bit of time to dig on this today.  That "No plugin can disseminate the requested formats" error seems to be the result of calling a SWORDv2 path *without passing along an expected header*

So, for example, I can easily reproduce it via "curl", if I simply do a GET against an existing item in the demo site.  (For this example, I'm using this existing item: http://demo.dspace.org/xmlui/handle/10673/5 which has an internal ID of "1fd633f5-5df9-4b50-b23f-bb1f951e57a4")

For example, this simple GET request will throw that "No plugin can disseminate the requested formats" error:


However, if I do the same GET request, but tell it I want "atom+xml" format, then the request *succeeds* and I get back an ATOM response with metadata about that Item

curl -i -H "Accept:application/atom+xml" -u dspacede...@gmail.com:[password] -X GET http://demo.dspace.org/swordv2/edit-media/1fd633f5-5df9-4b50-b23f-bb1f951e57a4

So, that error doesn't seem to be a bug, but instead is an improper call to the "edit-media" path, as that path expects that you will pass along an "Accept" header in your request. Based on the DSpace SWORDv2 disseminator configuration, there are a limited number of valid "Accept" headers that DSpace supports by default -- they are configured here: https://github.com/DSpace/DSpace/blob/dspace-6_x/dspace/config/modules/swordv2-server.cfg#L254 

My suspicion here is that some of your testing is passing this "Accept" header along properly...while other scripts/tests may not be.  Either that, or this "No plugin can disseminate the requested formats" error is not the core problem you are encountering.  In which case, we may need more examples of reproducing the core problem (ideally via "curl" commands, as with "curl" it is much easier to see the headers/etc that are being passed via the request).

Good luck,

- Tim


MALMQUIST Hrafn

unread,
Apr 17, 2018, 9:05:04 AM4/17/18
to tdon...@duraspace.org, dspac...@googlegroups.com

Hello Tim


Thanks again for taking the time to look at this.


Yes, you are correct of course that the attempting to GET the Edit-Media URI without a proper Accept header will throw this "No plugin can disseminate the requested formats" error. I should have realised beforehand that the Edit-Media URI isn't intended for human consumption, I am new to SWORDv2 and I was trying to trace the 403 error so it threw me off .


I've identified a more likely source of my troubles. I'm using the SWORDv2 Python Client library (https://github.com/swordapp/python-client-sword2) and my 31 line Python script (attached dspace-demo.py) that creates a metadata item in Dspace and then attempts to upload a single binary file gets the same 403 error (this is roughly following https://github.com/swordapp/python-client-sword2/wiki/BasicUsage).


In any case, I'm assuming the problem is with the Python Client library and I'm not too keen on trying to trace it further because my alternative solution is to just post the binary using the requests Python library and subsequently updating metadata/permissions using the DSpace REST API.


Best regards, Hrafn





From: dspac...@googlegroups.com <dspac...@googlegroups.com> on behalf of Tim Donohue <tdon...@duraspace.org>
Sent: 13 April 2018 16:18:08
dspace-demo.py

MALMQUIST Hrafn

unread,
Apr 23, 2018, 11:33:00 AM4/23/18
to tdon...@duraspace.org, dspac...@googlegroups.com

It just occurred to me that between DSpace versions 5 and 6 the authentication method for the REST API changed.


Any possibility that the authentication for SWORDv2 changed as well?


Hrafn


Sent: 17 April 2018 14:03:30

Tim Donohue

unread,
Apr 23, 2018, 11:59:57 AM4/23/18
to MALMQUIST Hrafn, dspac...@googlegroups.com
Hello Hrafn,

No, the REST API and SWORDv2 interfaces are entirely separate modules/codebases. So, while the authentication method for REST API changed between 5.x and 6.x (to essentially better align it with other DSpace modules), there were no changes to authentication in SWORDv2.

And again, SWORDv2 seems to work perfectly fine via "curl".  My best guess is that the Python library you are using is somehow sending an unexpected request to DSpace. Unfortunately, I don't know Python, so I'm not sure what exactly could be the problem in that library.

Tim

MALMQUIST Hrafn

unread,
Apr 23, 2018, 12:14:51 PM4/23/18
to tdon...@duraspace.org, dspac...@googlegroups.com

Hi Tim


Yeah, I realise this seems to be a client library issue. I've more or less decided to got with the REST API instead of SWORDv2.


Hrafn


From: Tim Donohue <tdon...@duraspace.org>
Sent: 23 April 2018 16:59:43
Reply all
Reply to author
Forward
0 new messages