[REINDEXING SERVICE] Question about how it works

37 views
Skip to first unread message

stefano

unread,
Dec 14, 2018, 8:21:36 AM12/14/18
to Fedora Community
Hello everyone,

I'm pretty new to Fedora and I'm experiencing some troubles understanding how the Reindexing service works.
I read the documentation but couldn't achieve to fulfill a particular scenario.

I'm using Fedora 4.7.4 with Solr Indexing using a ldpath.custom transformation and everything works fine.

My system though might require a reindex at some point, depending on the requirements in term of fields indexed in Solr. 

So the scenario is:
My documents in Solr don't index all the informations (fields) for the related contents in Fedora repo, but just those fields configured with ldpath.
I thought that adding a new field in the ldpath transformation and invoking the reindexing service would have been enough to index the new configured field in Solr.
But that's not true. So I think I probably misunderstood how the reindexing service works. 
Still, If I trigger an update on a content from fedora I can see the new field in Solr. 
Do I have to write a custom service that triggers an update of all the contents in fedora repo to index the new field?

Can anyone please give me some hints on how to overcome this problem?

Thank you very much,
Stefano

Peter Matthew Eichman

unread,
Jan 17, 2019, 11:53:03 AM1/17/19
to fedora-c...@googlegroups.com
Stefano,

Sorry for the delay in getting back to you. Can you provide more details on how you are invoking the reindexing service after you have updated your field definitions in the ldpath?

The reindexing service requires you to specify both a) the path you would like to start reindexing from, and b) which destination queues you want the reindexing messages sent to.

The reindexer will traverse the LDP hierarchy in your repository (i.e., it will recursively follow any ldp:contains predicates it finds on the resource at the initial path you send it) and send a reindexing message to the destination queue(s) for each of resources it finds.

Example: If you want to reindex http://localhost:8080/fcrepo/rest/my/resource (and all of its LDP children), you would call the reindexing service like this:

curl -X POST http://localhost:9080/reindexing/my/resource -H 'Content-Type: application/json' -d '["activemq:queue:solr.reindex"]'

I hope this helps, and please let us know if you have any further questions,
-Peter


--
You received this message because you are subscribed to the Google Groups "Fedora Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fedora-communi...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
Peter Eichman
Senior Software Developer
University of Maryland Libraries

stefano

unread,
Jan 18, 2019, 6:19:02 AM1/18/19
to Fedora Community
Hello Peter,

Thank you a lot for your answer.

I'll try to give you more infos about my scenario. I'm pretty sure the Solr reindexing is somehow failing in my environment, probably due to some misconfiguration. 
I'm actually invoking the reindexing service as you said:

http://<hostname>:<port>/reindexing/path/to/the/resource

where  the URL to access the resource on fcrepo is:

http://<hostname>:<port>/fcrepo/rest/path/to/the/resource

I'm using postman, setting the header Content-Type = application/json and with the following body:

["broker:queue:solr.reindex"]

This actually triggers some actions on Solr (I'm going to post parts of its log), but the result is not what I would expect.

Anyway everything is running on a Docker cotnainer and these are the configurations in $KARAF_HOME/etc/org.fcrepo.camel.indexing.solr.cfg:

error.maxRedeliveries=10

fcrepo.checkHasIndexingTransformation=true

#This resource is external
fcrepo.defaultTransform=http://<hostname>/ldpath_transform.txt

ldpath.service.baseUrl=http://localhost:9086/ldpath

indexing.predicate=true

input.stream=broker:queue:fedora

solr.reindex.stream=broker:queue:solr.reindex


solr.commitWithin=10


Now let's suppose I have some resources in fcrepo already indexed on Solr. At some point I decide to modify my ldpath program adding the following line:
 
example_metadata_name = metadata:example_metadata_name :: xsd:string;

and then I call the reindexing service. I would expect that the related document on Solr would now show the new field example_metadata_name. But that doesn't happen.
Moreover, I manually deleted a specific document on Solr from its console. I expected the reindexing to recreate the document. That doesn't happen either.

The only way the Solr indexing works at the moment is when I add/update a content within fcrepo. 

   Btw these are the logs I get from Solr and Karaf when I invoke the reindexing service on a specific fcrepo resource: 

KARAF 
2019-01-18 12:13:01,673 | INFO  | tp221853157-1142 | ReindexingRouter                 | 208 - org.fcrepo.camel.fcrepo-reindexing - 4.7.2 | Initial indexing path: http://localhost:8080/fcrepo/rest/16/cd/40/7c/16cd407c-73cf-4f91-892f-52372beef482
2019-01-18 12:13:01,846 | INFO  | er[solr.reindex] | SolrRouter                       | 205 - org.fcrepo.camel.fcrepo-indexing-solr - 4.7.2 | Deleting Solr Object http://localhost:8080/fcrepo/rest/16/cd/40/7c/16cd407c-73cf-4f91-892f-52372beef482
2019-01-18 12:13:01,952 | INFO  | er[solr.reindex] | SolrRouter                       | 205 - org.fcrepo.camel.fcrepo-indexing-solr - 4.7.2 | Deleting Solr Object http://localhost:8080/fcrepo/rest/16/cd/40/7c/16cd407c-73cf-4f91-892f-52372beef482/binary/attachments
2019-01-18 12:13:02,028 | INFO  | er[solr.reindex] | SolrRouter                       | 205 - org.fcrepo.camel.fcrepo-indexing-solr - 4.7.2 | Deleting Solr Object http://localhost:8080/fcrepo/rest/16/cd/40/7c/16cd407c-73cf-4f91-892f-52372beef482/VIDEO_0000000115/MEDIA/VIDEO-MEDIA-2018-10-10-11-24-38-465-1BGE6t

SOLR
INFO  - 2019-01-18 12:02:26.581; org.apache.solr.update.processor.LogUpdateProcessor; [collection1] webapp=/solr path=/update params={commitWithin=10} {delete=[http://localhost:8080/fcrepo/rest/16/cd/40/7c/16cd407c-73cf-4f91-892f-52372beef482 (-1622995733398421504)]} 0 7
INFO  - 2019-01-18 12:02:26.592; org.apache.solr.update.DirectUpdateHandler2; start commit{,optimize=false,openSearcher=false,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false}
INFO  - 2019-01-18 12:02:26.594; org.apache.solr.core.SolrCore; SolrIndexSearcher has not changed - not re-opening: org.apache.solr.search.SolrIndexSearcher
INFO  - 2019-01-18 12:02:26.594; org.apache.solr.update.DirectUpdateHandler2; end_commit_flush
INFO  - 2019-01-18 12:02:26.625; org.apache.solr.update.processor.LogUpdateProcessor; [collection1] webapp=/solr path=/update params={commitWithin=10} {delete=[http://localhost:8080/fcrepo/rest/16/cd/40/7c/16cd407c-73cf-4f91-892f-52372beef482/binary/attachments (-1622995733444558848)]} 0 2
INFO  - 2019-01-18 12:02:26.640; org.apache.solr.update.DirectUpdateHandler2; start commit{,optimize=false,openSearcher=false,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false}
INFO  - 2019-01-18 12:02:26.641; org.apache.solr.core.SolrCore; SolrIndexSearcher has not changed - not re-opening: org.apache.solr.search.SolrIndexSearcher
INFO  - 2019-01-18 12:02:26.641; org.apache.solr.update.DirectUpdateHandler2; end_commit_flush
INFO  - 2019-01-18 12:02:26.671; org.apache.solr.update.processor.LogUpdateProcessor; [collection1] webapp=/solr path=/update params={commitWithin=10} {delete=[http://localhost:8080/fcrepo/rest/16/cd/40/7c/16cd407c-73cf-4f91-892f-52372beef482/VIDEO_0000000115/MEDIA/VIDEO-MEDIA-2018-10-10-11-24-38-465-1BGE6t (-1622995733494890496)]} 0 0
INFO  - 2019-01-18 12:02:26.683; org.apache.solr.update.DirectUpdateHandler2; start commit{,optimize=false,openSearcher=false,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false}
INFO  - 2019-01-18 12:02:26.684; org.apache.solr.core.SolrCore; SolrIndexSearcher has not changed - not re-opening: org.apache.solr.search.SolrIndexSearcher
INFO  - 2019-01-18 12:02:26.684; org.apache.solr.update.DirectUpdateHandler2; end_commit_flush
INFO  - 2019-01-18 12:02:41.584; org.apache.solr.update.DirectUpdateHandler2; start commit{,optimize=false,openSearcher=false,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
INFO  - 2019-01-18 12:02:41.586; org.apache.solr.update.DirectUpdateHandler2; No uncommitted changes. Skipping IW.commit.
INFO  - 2019-01-18 12:02:41.591; org.apache.solr.update.DirectUpdateHandler2; end_commit_flush

Do you have any idea on where I am doing wrong?
 
Thank you again for you help Peter. 

Stefano
Reply all
Reply to author
Forward
0 new messages