bitstreams

56 views
Skip to first unread message

Anja Le Blanc

unread,
Sep 27, 2013, 6:36:31 AM9/27/13
to dspac...@googlegroups.com
Hello All,

Now I am at the point of write code :-)

Bitstream was not done yet, so it seems a good candidate.

Wijiti returns

<?xml version="1.0" encoding="utf-8"?>
<bitstream>
    <checkSum>350d55e27582159e9754f7b45be2fe88</checkSum>
    <checkSumAlgorithm>MD5</checkSumAlgorithm>
    <description />
    <formatDescription>image/png</formatDescription>
    <id>45</id>
    <mimeType>image/png</mimeType>
    <name>Wijiti Logo large.png</name>
    <sequenceId>1</sequenceId>
    <size>551207</size>
    <source />
    <storeNumber>0</storeNumber>
    <userFormatDescription />
</bitstream>

Hedtek got even more information.

What does a user of the API actually want?

I think the 'checkSumAlgorithm' should be an attribute of 'checkSum'.
There is <formatDescription>, <mimeType> and <userFormatDescription>. Would the <mimeType> be sufficiant? What is the use case for the others?
<source> what does it contain and - does anyone want that? What is the use case for the API to expose <storeNumber> and <type>?

At the moment I would do:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<bitstream>
<bitstreamID>3800</bitstreamID>
<checkSum checkSumAlgorithm="MD5">4d6f6b6a1a7c5392a625120b5d8eeb22</checkSum>
<description />
<mimeType>application/octet-stream</mimeType>
<name>imscp_v1p1.xsd</name>
<sequenceID>4</sequenceID>
<size>17639</size>
</bitstream>

To getting to the file:
Hedtek does:
bitstream/:id:/receive

Wijiti
  bitstreams/:id:/download

Preferences or alternatives?

I would like to integrate stats updates at this point. A config file should indicate whether a user wants that. If switched on a GET  of a file should add a download to Solr.
(I also would like to know the source of the stats in Solr (viewed/downloaded via xmlui, rest, ...) but this is probably a too great revolution at the moment.)

Comments?

Best regards,
Anja


helix84

unread,
Sep 27, 2013, 6:48:05 AM9/27/13
to Anja Le Blanc, dspac...@googlegroups.com
On Fri, Sep 27, 2013 at 12:36 PM, Anja Le Blanc
<anja.l...@googlemail.com> wrote:
> What does a user of the API actually want?

Why don't we let the user request a custom set of information like this:
bitstream/:id:/view?fields=bitstreamID,name,sequenceID

Of course, we would still provide a sensible set of defaults (fields
which are currently needed for display in the UIs) instead of the full
set. There could also be
bitstream/:id:/view?fields=all


> I think the 'checkSumAlgorithm' should be an attribute of 'checkSum'.

I would agree.


> I would like to integrate stats updates at this point.

I think the correct approach would be to make the rest module an event
dispatcher. That way the statistics modules which already are event
consumers would be notified of the events automatically.


Regards,
~~helix84

helix84

unread,
Sep 27, 2013, 8:52:00 AM9/27/13
to Anja Le Blanc, Anja Le Blanc, dspac...@googlegroups.com
On Fri, Sep 27, 2013 at 2:30 PM, Anja Le Blanc <an...@bindrich.de> wrote:
> On 27/09/2013 11:48, helix84 wrote:
>>
>> On Fri, Sep 27, 2013 at 12:36 PM, Anja Le Blanc
>> <anja.l...@googlemail.com> wrote:
>>>
>>> What does a user of the API actually want?
>>
>>
>> Why don't we let the user request a custom set of information like this:
>> bitstream/:id:/view?fields=bitstreamID,name,sequenceID
>>
>> Of course, we would still provide a sensible set of defaults (fields
>> which are currently needed for display in the UIs) instead of the full
>> set. There could also be
>> bitstream/:id:/view?fields=all
>
>
> Good point. I am just not sure whether 'all' really should be all.

I'm not sure if I was clear. The client would have three choices based
on request parameters:
a) no parameters - a default set (not full) of most used attributes
would be returned
b) explicitly specified fields - return only the specified fields
c) a parameter for full set of fields: 1) so that you don't have to
specify the full list with every request and 2) to prepare for forward
compatibility (an older client wouldn't might not know about a new
field added in a newer server version)

That said, are there any concerns about fields that should be
access-controlled? I don't think bitstreams have any such fields.


>>> I would like to integrate stats updates at this point.
>>
>>
>> I think the correct approach would be to make the rest module an event
>> dispatcher. That way the statistics modules which already are event
>> consumers would be notified of the events automatically.
>
>
> Yes, I've done this previously using events (then I was mangling Wijiti).
> The complications on the stats part is that we really want to know the IP
> address of the user rather than the one of the application using the API,
> otherwise we become our only user.

Good catch. The API should return something like X-Forwarded-For. It
doesn't have to be a HTTP header, it can be a request parameter.


Regards,
~~helix84

Compulsory reading: DSpace Mailing List Etiquette
https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette

Pottinger, Hardy J.

unread,
Sep 27, 2013, 10:06:51 AM9/27/13
to hel...@centrum.sk, Anja Le Blanc, dspac...@googlegroups.com
So far I think this discussion is going along fine, and my only input
would be: follow your gut instincts on this, build the API *you* would
like to be using, and then we can all work from there. I've seen nothing
objectionable. One caution, which has already been mentioned, is we should
probably provide for access control on the fields in the bitstream table.
--
HARDY POTTINGER <potti...@umsystem.edu>
University of Missouri Library Systems
http://lso.umsystem.edu/~pottingerhj/
https://MOspace.umsystem.edu/
"To pay attention, this is our endless and proper work." --Mary Oliver
>--
>You received this message because you are subscribed to the Google Groups
>"DSpace REST" group.
>To unsubscribe from this group and stop receiving emails from it, send an
>email to dspace-rest...@googlegroups.com.
>Visit this group at http://groups.google.com/group/dspace-rest.
>For more options, visit https://groups.google.com/groups/opt_out.

helix84

unread,
Sep 27, 2013, 10:17:11 AM9/27/13
to Pottinger, Hardy J., Anja Le Blanc, dspac...@googlegroups.com
On Fri, Sep 27, 2013 at 4:06 PM, Pottinger, Hardy J.
<Potti...@missouri.edu> wrote:
> objectionable. One caution, which has already been mentioned, is we should
> probably provide for access control on the fields in the bitstream table.

True, this is something that pops up over and over again and should be
addressed.

But should it be addressed by the rest module or rather below the
dspace-api layer? Might this be combined with this long-overdue
DCValue overhaul?


Furthermore, we should finally agree upon a future (long-term) vision
for our metadata. Should they remain flat or be hierarchical? What
methods will be used to access them, will there be a complex API or
shall we just spit out XML/JSON/whatever and let the user deal with
it? How should data types and schema be enforced? How should access
control information be stored and enforced? What do other comparable
systems do and how successful are their approaches? And finally, why
am I not discussing this in dspace-devel instead? :) There are endless
questions and we can't answer them by designing by committee but it
would help to have an idea about the general direction we're going.

Anja Le Blanc

unread,
Sep 27, 2013, 10:56:04 AM9/27/13
to dspac...@googlegroups.com, Pottinger, Hardy J., hel...@centrum.sk
That is rather philosophical for a Friday afternoon.

Authentication/Authorization certainly needs to be addressed sooner rather than later. And I am not sure what the best way forward would be. (Most of our users authenticate with Shibboleth.)

Your other questions overstrain my capabilities at the moment and any of my answers would break DSpace in a great way.

All, have a nice weekend!

Anja

Peter Dietz

unread,
Sep 27, 2013, 12:48:11 PM9/27/13
to Anja Le Blanc, dspac...@googlegroups.com, Pottinger, Hardy J., Ivan Masár
My take on either making the client request specific attributes, or giving them a good default is, to give a good but limited default, and they have to pass a parameter: "?expand=collections", to get more/expensive information.

Thus, I can render a /community/123 really cheaply, if I just give you the basics (name, id, type, handle, description), but if you want subCommunities, subCollections, recentlySubmittedItems, items for that object, which are expensive to generate, then you have to specifically add it to your request. I have the response now giving you a hint on what you can "expand" upon. I've lifted this "expand" concept from working with Atlasssian/JIRA's API.


<collection>
<metadata>
<name>
1570 Edition Selected Woodcuts (John Foxe's Actes and Monuments)
</name>
<handle>
1811/24846
</handle>
</metadata>
<collectionID>
889
</collectionID>
<expand>
parentCommunityIDList
</expand>
<expand>
parentCommunityID
</expand>
<expand>
itemIDList
</expand>
<expand>
license
</expand>
<numberItems>
155
</numberItems>
</collection>




Regarding security and authorization/authentication... It really is a tricky path to navigate. You can from what I read get it wrong, wronger, and possibly good-enough. My basic approach might be to have REST enforce that some endpoints are restricted, and require a authN/authZ user. 

We use local DSpace login. I'm wondering if we can get by with requiring SSL, and having the DSpace username and password come through as parameters (either ?/& or in the header). From my reading, Jersey should be reasonably straight-forward to wire up for that kind of setup. 

If you have a different security for DSpace, i.e. Shibboleth, then... Rest would need to accept whoever is set as the remote-user, and find the eperson, and then I think its the same as above.

If we have to establish a way to provide oauth keys, to trusted applications, thats a bit further than I'm planning to go at this point.

But, as some of you have said, it would be great if DSpace was smart enough to prevent sensitive information from going through, just because you requested it.. Such as throwing an error for: item(context = anonymous-user).getMetadata("dc.description.provenance")

Or better, item(context = anonymous-user).getMetadata("*"), would only return the appropriately authorized metadata (all except internal / provenance).



Peter Dietz


--
Reply all
Reply to author
Forward
0 new messages