Fedora 4.4.x and Solr Index

111 views
Skip to first unread message

Thomas Desub

unread,
Oct 21, 2015, 8:00:34 AM10/21/15
to Fedora Community
Hi

Im running Fedora 4 Vagrant for development and have som questions about Solr and index.
I have the brand new 4.4.1-snapshot Fedora.

Environment
-----------------------------------------------------------------------
apache-jena-fuseki-2.3.0
apache-karaf-4.0.1
commons-logging-1.1.2
fcrepo-webapp-plus-webac-audit-4.4.1-SNAPSHOT
solr-4.10.3

Questions
-----------------------------------------------------------------------
- When delete container or binary in Fedora its not deleted in Solr index.
Is it possible to make this work when deleting objects?

- Dublin Core elements not ending up in Solr. I testet dc:description and dc:subject but it doesnt show up in Solr index. The properties are there in schema.xml file for Solr collection so for the Solr it looks like it can be indexed.
How do i make those properties to be indexed in Solr?

Regards Thomas



Andrew Woods

unread,
Oct 24, 2015, 1:16:46 PM10/24/15
to Thomas Desub, Fedora Community
Hello Thomas,
I am trying to reproduce your "not deleting from Solr" issue by using the latest "master" branch of fcrepo4-vagrant which contains the same components that you have detailed:
https://github.com/fcrepo4-exts/fcrepo4-vagrant

Note: I have also tested and verified everything below on the 4.4.0 release version of fcrepo4-vagrant:
https://github.com/fcrepo4-exts/fcrepo4-vagrant/releases/tag/fcrepo4-vagrant-4.4.0

After starting up vagrant:
> vagrant up

I monitor the Apache Karaf log:
> sudo tail -f /opt/karaf/data/log/karaf.log

followed by creating an object:
> curl -i -u fedoraAdmin:secret3 -X PUT localhost:8080/fcrepo/rest/object
response >> 201 Created

The karaf.log includes the following output:
org.fcrepo.camel.fcrepo-indexing-solr - 4.4.1.SNAPSHOT | Indexing Solr Object  /object

Searching the Solr index (http://localhost:8080/solr), I see the "object" resource.

Then I delete the newly created object:
> curl -i -ufedoraAdmin:secret3 -X DELETE localhost:8080/fcrepo/rest/object
response >> 204 No Content

The karaf.log includes the following output:
org.fcrepo.camel.fcrepo-indexing-solr - 4.4.1.SNAPSHOT | Deleting Solr Object /object

Searching the Solr index reveals that the "object" resource has indeed been removed.
Obviously I am doing something different than you. Can you describe your process for deleting a resource that is not subsequently being deleted in the Solr index?

Regarding your question of "How do i make those properties to be indexed in Solr"?, you must create a transformation that is aware of the new properties, in addition updating the Solr/schema.xml (which apparently you have done or verified). The default transformation syntax expected by fcrepo-indexing-solr is LDPath:
http://marmotta.apache.org/ldpath/language.html

You will need to:
1) create a new LDPath transform that includes the fields you want and
2) configure fcrepo-indexing-solr to use that new transform

For 1) below is a simple LDPath transform that includes the default properties along with your dc:description and dc:subject properties:
==================named:"nt-base.txt"==================

@prefix fedora : <http://fedora.info/definitions/v4/repository#>
@prefix dc: <http://purl.org/dc/elements/1.1/>
id      = . :: xsd:string ;
title = dc:title :: xsd:string;
created = fedora:created :: xsd:dateTime;
last_modified = fedora:lastModified :: xsd:dateTime;
has_parent = fedora:hasParent :: xsd:string;
description = dc:description :: xsd:string;
subject = dc:subject :: xsd:string;

==================
Note you will need to update the transform above to rename the properties if they are named differently in your Solr/schema.xml.

You should then create a new binary resource (the content of which is the above LDPath transform file) at the following path, as follows:
> curl -i -ufedoraAdmin:secret3 -XPUT http://localhost:8080/fcrepo/rest/fedora:system/fedora:transform/fedora:ldpath/custom
> curl -i -ufedoraAdmin:secret3 -XPUT -H"Content-Type: text/plain" --data-binary @nt-base.txt http://localhost:8080/fcrepo/rest/fedora:system/fedora:transform/fedora:ldpath/custom/nt:base

For 2) you need to update the fcrepo-indexing-solr configuration within your Vagrant box as follows:
> vagrant ssh

Use "vi" or whichever terminal-based text editor your prefer:
> sudo vi /opt/karaf/etc/org.fcrepo.camel.indexing.solr.cfg

Replace the following line:
>> fcrepo.defaultTransform=default
with:
>> fcrepo.defaultTransform=custom

No Karaf nor Camel restart is required to load this updated configuration... the beauty of OSGi.

Now when you create or update a resource with dc:subject and dc:description properties, they should be visible in your Solr index. Let us know how it goes.

As a side note, I see that user-facing documentation is missing for describing the full details of the fcr:transform service. I have created the following ticket to remedy that:

--
You received this message because you are subscribed to the Google Groups "Fedora Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fedora-communi...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages