JERSEY REST in core, looking at Search

67 views
Skip to first unread message

Peter Dietz

unread,
Oct 24, 2013, 1:59:52 PM10/24/13
to dspac...@googlegroups.com
Hi REST-a-farians,

The JERSEY DSpace-REST api was pull-requested, and merged into DSpace master, (in place for DSpace 4.0).

Its read-only, anonymous user only, won't expose sensitive information, and has endpoints for Community, Collection, Item, and Bitstream, enough to get one started. I would say the code is pretty simple/clean, easy to understand, maintain, add features, etc. I've also had jmeter attack all of the endpoints (JSON), and they seem to be pretty stable, i.e. 100 requests/min won't harm the site. On my laptop, I was averaging 600 requests/second. Some endpoints are really fast, and some larger collections of objects are really slow. An amazon ec2 micro instance could only handle about 20 requests/second.

So, I wanted to add search to this, and wrote a quick-and-dirty method for item.search, that used internal DSpace lucene index, and it was really quick, I think 15ms was the average response time, while having jmeter attack the site. However, the lucene search index in DSpace is becoming DEPRECATED with this DSpace 4.0 release, in favor of discovery/solr, so if we/you want search, then I'd recommend checking out the code and trying to wire that in. Also, if you any preference on how search should work, it would be better to get that in to the API before the 4.0 release.

The way I had it was pretty simple: /rest/items/search?q=global information
But I'm thinking if you wanted to search by author, then it should become: /rest/items/search?author=Herrick, John H. or /rest/items/search?author=Herrick, John H.&q=aviation

There is paging: offset+limit, I haven't touched anything like sorting/ordering.

I also need to work on the documentation for this project. I've started this at: https://wiki.duraspace.org/display/DSDOC4x/REST+API

Lastly, I was wondering what experiences people had in building clients to REST api's. In my sample-app, (dspace-rest-play), everything works really well, but I was trying to piece together a jQuery-Tree the other day, and kept struggling with getting javascript/browser to send the appropriate Accept header (json), and then having to deal with cross-domain issues, and dealing with jsonp. I was just wondering if anyone had recommendations.

Anja Le Blanc

unread,
Oct 25, 2013, 6:27:02 AM10/25/13
to dspac...@googlegroups.com
Hi Peter,

It is really good that this is now merged into DSpace. Somehow I thought it impossible to have anything on the deadline for DSpace4. Well done Peter!

This morning I finally figured out why I could not fork your repository -- I already got a DSpace fork, and Git somehow does not allow me to have two forks of basically the same repository(?). Does anyone know a way around that? I could fork it to some other place of course.

As for the search functionality: I would hope we could be as user friendly as possible and not require the user to write any lucene/solr/elasticsearch queries directly (at least not as the only option).
Could we have something like an abstraction layer or interface so what we can implement the search for  whichever search indexer could be underneath DSpace? I don't know whether DSpace already got this kind of abstraction, so far we've done our searches directly on ElasticSearch (from the web application).

The API did not do a listing of items yet. I was looking at that now. To avoid running out of memory I introduced another configuration parameter which provides the maximum limit of items returned. The query parameter 'limit' can only reduce that number. So the repository administrator can decide for a safe setting.
Against which repository/branch should I PR any code I write?

Best regards,
Anja


helix84

unread,
Oct 25, 2013, 7:14:56 AM10/25/13
to Anja Le Blanc, dspac...@googlegroups.com
Hi Anja,

you don't need to clone the whole forked repo, just add peter's repo
as a remote to your local repo:

git remote add peterdietz g...@github.com:peterdietz/DSpace.git
git fetch peterdietz

Then you'll be able to add remote tracking branches to your repo:

git branch --track -b rest-jersey peterdietz/rest-jersey

Don't forget the occasional git fetch --all to download the latest
changes from remotes.

There should be usefull stuff here, too:
https://wiki.duraspace.org/display/DSPACE/Development+with+Git


Regards,
~~helix84
Reply all
Reply to author
Forward
0 new messages