Using Discovery in Rest search

324 views
Skip to first unread message

Anja Le Blanc

unread,
Dec 6, 2013, 10:50:04 AM12/6/13
to dspac...@googlegroups.com
Hello,

I've now extended Peter's Lucene search by the possibility to use DSpace Discovery for search queries.
Queries can now look like:
/items/search?author=burke&publisher=Cambridge&expand=metadata&order_desc=title&offset=3


DSpace administrators can configure (rest.cfg) on which fields users can search and what the term the query is using should be. In the same way the sort fields can be configured.
(https://github.com/AnjaLeBlanc/DSpace/blob/search/dspace/config/modules/rest.cfg).
There is als an endpoint
/items/search/help
which explains how to use the search, also displaying the fields useable for search.

If someone would like to play with it it is here: https://github.com/AnjaLeBlanc/DSpace/tree/search
I would appreciate if you could think of what functionality is still missing. I am planing to do the same search functionality for ElasticSearch next.

Peter, should I do a pull request of this to your repository? I am just afraid we encounter the same problems as before.

Best regards,
Anja



Peter Dietz

unread,
Dec 6, 2013, 12:00:23 PM12/6/13
to Anja Le Blanc, dspac...@googlegroups.com
Hi Anja

Thanks for this work, I definitely would like to see this get used, I will check it out today and give it a test, and some feedback.

Its best to send PR's to DSpace/DSpace:master, instead of my repository.

Peter Dietz


--
You received this message because you are subscribed to the Google Groups "DSpace REST" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dspace-rest...@googlegroups.com.
Visit this group at http://groups.google.com/group/dspace-rest.
For more options, visit https://groups.google.com/groups/opt_out.

Peter Dietz

unread,
Dec 9, 2013, 12:53:10 AM12/9/13
to Anja Le Blanc, dspac...@googlegroups.com
I've been able to take a look at this, and I hope I can find a way to share my commits back to you.

So, I started with a clean /dspace directory, and a clean git clone of your repository. DSpace at this point has deprecated lucene so bad that I couldn't make a fresh index. I'm guessing, that we should migrate lucene to be just a implementing.search.class, as opposed to being hard-coded if your item search has a ?q query.

Not REST related, but, when trying to run /dspace/bin/dspace index-init (i.e. fresh lucene index).
Exception: Caching is not supported by the ItemCountDAOSolr as it is not really needed, Solr is faster!
org.dspace.browse.ItemCountException: Caching is not supported by the ItemCountDAOSolr as it is not really needed, Solr is faster!
at org.dspace.browse.ItemCountDAOSolr.communityCount(ItemCountDAOSolr.java:82)
at org.dspace.browse.ItemCounter.count(ItemCounter.java:176)
at org.dspace.browse.ItemCounter.buildItemCounts(ItemCounter.java:90)
at org.dspace.browse.ItemCounter.main(ItemCounter.java:57)

Therefore I couldn't test the lucene search: http://localhost:8080/rest/items/search?q=Einstein


However, I was able to index discovery/solr, and so parameterizedSearch worked just fine. I'm tempted to say that this should support a wildcard search, i.e. q=Einstein Zurich Polytechnic 1910 Theory, and let the search engine handle the relevancy.. But, advanced search, where you can fill in each field author, date, title, publisher is great to have.

In my testing, something like: http://localhost:8080/rest/items/search?author=black&title=open%20source worked just fine.


But, I really like the context block.

<itemList>

<context>
<limit>
100
</limit>
<offset>
0
</offset>
<query_date>
2013-12-09T00:22:47
</query_date>
<total_count>
6
</total_count>
</context>
<item>...



Other code things.

The search help page looks very useful. I'm wondering if that will be a pain to write that message in Java. Is there any way for /items/search/help to point to something like webapps/items/searchHelp.html, or maybe just read in a resource file and output?


Moving Item.countAll to the Item class, as opposed to the SQL in the middle of REST.

Handles are strings, not integers. So instead of int prefix + "/" + int suffix, those have to be strings. example: 2374.OX/58458



Ultimately, I've made a Pull Request of these changes, and sent it back to you: https://github.com/AnjaLeBlanc/DSpace/pull/1

Hopefully, others will get a chance to review this and give feedback. Once everyone is satisfied, it can be made as a PR to DSpace/DSpace. I'm not sure what the DSpace4 release deadlines are, but I'm afraid that that code base might be "frozen".



Peter Dietz

Anja Le Blanc

unread,
Dec 9, 2013, 3:22:44 AM12/9/13
to dspac...@googlegroups.com, Anja Le Blanc
Hi Peter,


On Monday, December 9, 2013 5:53:10 AM UTC, Peter Dietz wrote:
I've been able to take a look at this, and I hope I can find a way to share my commits back to you.

I've made you a collaborator on my Git repository. I merged your pull request but in future just check-in your changes. You know what you are doing.

So, I started with a clean /dspace directory, and a clean git clone of your repository. DSpace at this point has deprecated lucene so bad that I couldn't make a fresh index. I'm guessing, that we should migrate lucene to be just a implementing.search.class, as opposed to being hard-coded if your item search has a ?q query.

It does not at the moment. The reason for that is that I could not see how to implement it without changing the Solr index to include a sort of 'merge-all' field. I am not an Solr expert. I can only do specific searches. ?q is your lucene search.
 
Not REST related, but, when trying to run /dspace/bin/dspace index-init (i.e. fresh lucene index).
Exception: Caching is not supported by the ItemCountDAOSolr as it is not really needed, Solr is faster!
org.dspace.browse.ItemCountException: Caching is not supported by the ItemCountDAOSolr as it is not really needed, Solr is faster!
at org.dspace.browse.ItemCountDAOSolr.communityCount(ItemCountDAOSolr.java:82)
at org.dspace.browse.ItemCounter.count(ItemCounter.java:176)
at org.dspace.browse.ItemCounter.buildItemCounts(ItemCounter.java:90)
at org.dspace.browse.ItemCounter.main(ItemCounter.java:57)

Therefore I couldn't test the lucene search: http://localhost:8080/rest/items/search?q=Einstein

I don't think I started from a completely fresh index. I guess I still run from an upgraded 1.8 index. Is this a problem for the code base we got in DSpace 4 so far?
 

However, I was able to index discovery/solr, and so parameterizedSearch worked just fine. I'm tempted to say that this should support a wildcard search, i.e. q=Einstein Zurich Polytechnic 1910 Theory, and let the search engine handle the relevancy.. But, advanced search, where you can fill in each field author, date, title, publisher is great to have.

I agree and I would appreciate some hints of how to do this this Solr.
 

In my testing, something like: http://localhost:8080/rest/items/search?author=black&title=open%20source worked just fine.


But, I really like the context block.

<itemList>

<context>
<limit>
100
</limit>
<offset>
0
</offset>
<query_date>
2013-12-09T00:22:47
</query_date>
<total_count>
6
</total_count>
</context>
<item>...



Always nice to get some positive feedback :-)
 

Other code things.

The search help page looks very useful. I'm wondering if that will be a pain to write that message in Java. Is there any way for /items/search/help to point to something like webapps/items/searchHelp.html, or maybe just read in a resource file and output?

I agree. Part of what this help endpoint is doing, is reading in the rest config file. You are right, I really should have a html templet file and just add config stuff in java.
 

Moving Item.countAll to the Item class, as opposed to the SQL in the middle of REST.

Handles are strings, not integers. So instead of int prefix + "/" + int suffix, those have to be strings. example: 2374.OX/58458



Ultimately, I've made a Pull Request of these changes, and sent it back to you: https://github.com/AnjaLeBlanc/DSpace/pull/1

Thank you. All merged.
 

Hopefully, others will get a chance to review this and give feedback. Once everyone is satisfied, it can be made as a PR to DSpace/DSpace. I'm not sure what the DSpace4 release deadlines are, but I'm afraid that that code base might be "frozen".

I would suspect so and it is not a problem; it will be in time for 4.1 and nobody really expects the API an completely finished product. We will use the REST API as a back-port to 1.8 anyway.

Best regards,
Anja

helix84

unread,
Dec 9, 2013, 3:30:50 AM12/9/13
to Peter Dietz, Anja Le Blanc, dspac...@googlegroups.com
On Mon, Dec 9, 2013 at 6:53 AM, Peter Dietz <pdie...@gmail.com> wrote:
> Not REST related, but, when trying to run /dspace/bin/dspace index-init
> (i.e. fresh lucene index).
>>
>> Exception: Caching is not supported by the ItemCountDAOSolr as it is not
>> really needed, Solr is faster!
>> org.dspace.browse.ItemCountException: Caching is not supported by the
>> ItemCountDAOSolr as it is not really needed, Solr is faster!
>> at
>> org.dspace.browse.ItemCountDAOSolr.communityCount(ItemCountDAOSolr.java:82)
>> at org.dspace.browse.ItemCounter.count(ItemCounter.java:176)
>> at org.dspace.browse.ItemCounter.buildItemCounts(ItemCounter.java:90)
>> at org.dspace.browse.ItemCounter.main(ItemCounter.java:57)

Hi Peter,

index-init fails because the default indexing implementation in DSpace
4 is now Discovery. You'd be right to object that it shouldn't
interfere with Lucene indexing and it doesn't - it's just a message
that should remind you that in default configuration, index-init
doesn't do what it did in previous versions (well, it does, but that
index is not used by the UIs). I agree wording could be improved.

https://jira.duraspace.org/browse/DS-1762

For testing your Lucene search in REST, try re-enabling Lucene:

https://wiki.duraspace.org/display/DSDOC4x/Legacy+methods+for+re-indexing+content#Legacymethodsforre-indexingcontent-Re-EnablingthelegacyLuceneSearchand/orDBMSBrowseproviders

Regards,
~~helix84

Anja Le Blanc

unread,
Dec 10, 2013, 11:36:42 AM12/10/13
to dspac...@googlegroups.com, Anja Le Blanc
 

However, I was able to index discovery/solr, and so parameterizedSearch worked just fine. I'm tempted to say that this should support a wildcard search, i.e. q=Einstein Zurich Polytechnic 1910 Theory, and let the search engine handle the relevancy.. But, advanced search, where you can fill in each field author, date, title, publisher is great to have.


I think I have found out how to do this now. Final query to Solr looks then like
q={!lucene+q.op%3DAND}skinning+megan&wt=javabin&fq=NOT(withdrawn:true)&fq=read:(g0)&version=2&rows=100
 but of course at the REST interface you only see
rest/items/search?q=skinning+megan&expand=metadata

I also noticed that we don't have to check the authorization. This is done further down in a Discovery plugin (that is the fq:(g0) part of the query).

Regards,
Anja

Peter Dietz

unread,
Dec 10, 2013, 11:53:58 AM12/10/13
to Anja Le Blanc, dspac...@googlegroups.com
Anja,

I only see one org.dspace.content.Item.count(...) type of method, that spits out the Long, that I just added. I don't have a preference either way, but in the code base, there's only the one.

re: Elasticsearch,
Won't you have to implement discovery in elasticsearch to accomplish this? Currently, only elasticsearch-statistics is available.


re: facets,
Sounds good, and would allow for a richer UI experience. I have to admit that I currently don't have any sprint cycles at work to push forward with my play-client-UI, so I haven't been pushing on satisfying all the client needs.



Peter Dietz


--

Anja Le Blanc

unread,
Jan 8, 2014, 3:19:44 AM1/8/14
to dspac...@googlegroups.com
Happy New Year!

Since DSpace 4 is now released I thought I would do a Pull Request for my changes. I only done this once before but now I can't find out how to create a new issue in Jira. Can someone give me a hint please? I don't think I got a login.

Regards,
Anja

helix84

unread,
Jan 8, 2014, 3:52:52 AM1/8/14
to Anja Le Blanc, dspac...@googlegroups.com
On Wed, Jan 8, 2014 at 9:19 AM, Anja Le Blanc
<anja.l...@googlemail.com> wrote:
> changes. I only done this once before but now I can't find out how to create
> a new issue in Jira. Can someone give me a hint please? I don't think I got
> a login.

Hi Anja,

you do need to log in to create an issue. It's the big blue "Create
issue" button on top. You already have a login with the following
email address: anja.l...@manchester.ac.uk


Regards,
~~helix84
Reply all
Reply to author
Forward
0 new messages