Solr & GSearch: Objects Not Getting Indexed Upon Ingest

846 views
Skip to first unread message

slanger

unread,
Jan 27, 2014, 11:55:14 AM1/27/14
to isla...@googlegroups.com
Problem: When I successfully ingest an object into Fedora (via Islandora), it is not getting automatically indexed in Solr. However, if I index the site manually (http://localhost:8080/fedoragsearch/rest?operation=updateIndex > "updateIndex fromFoxmlFiles"), the object gets successfully indexed and it appears in the Islandora search results.

Here is some info about my setup:
  • Fedora 3.5
  • Solr 3.6.1
  • Fedora Generic Search Service 2.4.2
  • Java 1.6.0_45
In the file /fedora/server/config/fedora.fcfg, the Java Messaging Service (JMS) module is enabled:

  <module role="org.fcrepo.server.messaging.Messaging" class="org.fcrepo.server.messaging.MessagingModule">
   
<comment>Fedora's Java Messaging Service (JMS) Module</comment>
   
<param name="enabled" value="true"/>
   
<param name="java.naming.factory.initial" value="org.apache.activemq.jndi.ActiveMQInitialContextFactory"/>
   
<param name="java.naming.provider.url" value="vm:(broker:(tcp://localhost:61616))"/>
   
<param name="datastore1" value="apimUpdateMessages">
     
<comment>A datastore representing a JMS Destination for APIM events which update the repository</comment>
   
</param>
   
<param name="datastore2" value="apimAccessMessages">
     
<comment>A datastore representing a JMS Destination for APIM events which do not update the repository</comment>
   
</param>
 
</module>

I'm not seeing any errors in my Catalina log files when Fedora is starting up. When I manually reindex the site (via the GSearch Admin Client), I can see several commit messages being logged, which (as I understand it) signals that the object is getting indexed. However, when I ingest an object via Islandora and look at the logs, those commit messages are absent.

Here is an example of what I see in the logs after ingesting an object via Islandora:
  • Jan 27, 2014 12:09:59 PM org.apache.solr.core.SolrCore execute
    INFO: [] webapp=/solr path=/select params={fl=PID,+dc.title,dc.description,dc.date,dc.subject,dc.contributor&q=PID:"islandora:30"&json.nl=map&wt=json&version=1.2} hits=0 status=0 QTime=15
  • Jan 27, 2014 12:09:59 PM org.apache.solr.core.SolrCore execute
    INFO: [] webapp=/solr path=/select params={fl=PID,+dc.description&q=PID:"islandora:30"&json.nl=map&wt=json&version=1.2} hits=0 status=0 QTime=0
Any idea where I should start looking to troubleshoot this problem?

Thanks!

slanger

unread,
Jan 30, 2014, 1:15:36 PM1/30/14
to isla...@googlegroups.com
Here's another Solr/GSearch symptom that we're experiencing, in case it offers additional insight into our larger problem (Hopefully, this doesn't end up muddying the waters):

When we search inside a book via the Internet Archive BookReader for a term that we know should be there, it always returns the message "No matches were found." The "OCR" and "HOCR" datastreams are being successfully created upon ingest and are populated with text. This problem continues to happen even after manually reindexing the site via the Admin Client for Fedora Generic Search Service interface.

I bring it up because I can see in the logs that the book reader is successfully requesting data from Solr, but no data is being returned. Perhaps it, too, is not getting indexed?

INFO: [] webapp=/solr path=/select params={hl.fragsize=18&hl.tag.post=}}}&hl.tag.pre={{{&hl.fl=text_nodes_HOCR_hlt&json.nl=map&wt=json&hl=true&rows=32&version=1.2&defType=dismax&fl=PID,RELS_EXT_isSequenceNumber_literal_ms&hl.snippets=8&hl.useFastVectorHighlighter=true&start=0&q=the&qt=standardfq=RELS_EXT_isMemberOf_uri_ms:("info:fedora/islandora:87"+OR+"islandora:87")} hits=0 status=0 QTime=16

Again, I'm not sure if this provides any insight into the larger Solr/GSearch problem that we're experiencing; but I thought I'd throw it out there. We'll happily accept solutions to either issue. :-)

Jordan Dukart

unread,
Jan 30, 2014, 1:34:48 PM1/30/14
to isla...@googlegroups.com
The searching within IAV relies upon the Solr field that's configured in yoursite/admin/islandora/internet_archive_bookreader. Is it mapped to the correct field?

Jordan
--
You received this message because you are subscribed to the Google Groups "islandora" group.
To unsubscribe from this group and stop receiving emails from it, send an email to islandora+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

slanger

unread,
Jan 30, 2014, 3:15:49 PM1/30/14
to isla...@googlegroups.com
Thanks for your response, Jordan! We're pretty much using the defaults that came with the module.

Solr field relating pages to book PIDs = RELS_EXT_isMemberOf_uri_ms

I'll admit to being a bit of a Solr n00b. Is that just a placeholder? If so, where would I find the correct value?

Thanks!

Jordan Dukart

unread,
Jan 30, 2014, 3:43:18 PM1/30/14
to isla...@googlegroups.com
Depends on how your indexing is set up. Are you using a specific GSearch/Solr config such as the one that lives here: https://github.com/discoverygarden/basic-solr-config?

Jordan

slanger

unread,
Jan 30, 2014, 4:04:20 PM1/30/14
to isla...@googlegroups.com
I've manually configured our installation based on the Customizing GSearch and Solr documentation provided by Islandora. I haven't seen that Discovery Garden configuration before. I can certainly give it a try if you think I'll have better luck with that.

Jordan Dukart

unread,
Jan 30, 2014, 4:08:34 PM1/30/14
to isla...@googlegroups.com
The one recommended there is definitely out of date. The one I linked is what we pulled all of our production boxes off of, and what I think some community members like Nick Ruest is using. If you do decide to go that route you can more or less guarantee that the core functionality will work.

Jordan

Peter MacDonald

unread,
Jan 30, 2014, 5:15:53 PM1/30/14
to isla...@googlegroups.com
slanger:

You might benefit by logging into http://sandbox.islandora.ca as admin/islandora and looking at the Solr Settings menu for how they set it up there.

Administer > Islandora > Solr Index > Solr Settings

Peter
--
Library Information Systems Specialist
Hamilton College Library
Clinton, New York

slanger

unread,
Feb 5, 2014, 2:34:22 PM2/5/14
to isla...@googlegroups.com

Thanks for your suggestions, Everyone!

Peter: I compared my Solr Settings with the Islandora sandbox site beforehand and they exactly matched. I doublechecked and that's still the case.

slanger

unread,
Feb 5, 2014, 2:34:44 PM2/5/14
to isla...@googlegroups.com

Fedora 3.5, Solr 3.6.1, GSearch 2.4.2

I downloaded the config files located at https://github.com/discoverygarden/basic-solr-config as Jordan suggested. Once they were in place, I received a few warnings about the solrconfig.xml file, saying that <indexDefaults> and <mainIndex> were deprecated and that it was missing <luceneMatchVersion>. Perhaps this basic-solr-config bundle is for an earlier version of Solr/GSearch than what I'm using? The README.md actually doesn't say what it's compatible with. That said, I was able to update the solrconfig.xml file with the recommended changes and make the warnings go away.

The islandora_transforms directory was completely new to my setup, as was the dgi_gsearch_extensions JAR. I updated the paths in "index.properties", "foxmlToSolr.xslt" and "slurp_all_MODS_to_solr.xslt" to match my directory structure.

The verdict? The Internet Archive BookReader search now works! -- but only after I manually reindex the site. Unfortunately, automatic indexing upon ingest still isn't working in Fedora 3.5.

Fedora 3.6.2, Solr 4.2.0, GSearch 2.6

Here's a twist: I found the York University Library's version of those config files, which this blog post says is for GSearch 2.6 & Solr 4.2.0 (perhaps that should be mentioned in the README.md file as well?). I upgraded Fedora, Solr and GSearch to the appropriate versions, then put those config files in place and updated paths in the appropriate files. Behold! Both automatic indexing upon ingest and the Internet Archive BookReader search work -- although I'm now encountering a bunch of problems with the ingest process itself and the Fedora Web Administrator is acting weird.
Questions
Can you think of anything unique that the York University Library's basic-solr-config bundle has that would help it achieve automatic indexing? Right now, my Fedora 3.5 installation is much faster and more stable, so if I can get automatic indexing to work there, I'll stick with that one.
Any other thoughts / recommendations?
Thanks!

Nick Ruest

unread,
Feb 5, 2014, 2:41:19 PM2/5/14
to isla...@googlegroups.com
*puts hand up*

That is our config file :-)

What is not actually live in the solrconfig.xml file is the 'autoCommit'
section I added from this thread[1]. That will get you the autocommits
you're looking for.

-nruest

[1] https://groups.google.com/d/msg/islandora/cXKqZs2WBGo/lSl7VYinK0sJ

On 14-02-05 02:34 PM, slanger wrote:
> *Fedora 3.5, Solr 3.6.1, GSearch 2.4.2*
>
> I downloaded the config files located at
> _https://github.com/discoverygarden/basic-solr-config_ as Jordan
> suggested. Once they were in place, I received a few warnings about
> the solrconfig.xml file, saying that <indexDefaults> and <mainIndex>
> were deprecated and that it was missing <luceneMatchVersion>.
> Perhaps this /basic-solr-config/ bundle is for an earlier version of
> Solr/GSearch than what I'm using? The README.md actually doesn't say
> what it's compatible with. That said, I was able to update the
> solrconfig.xml file with the recommended changes and make the
> warnings go away.
>
> The /islandora_transforms/ directory was completely new to my setup,
> as was the /dgi_gsearch_extensions/ JAR. I updated the paths in
> "index.properties", "foxmlToSolr.xslt" and
> "slurp_all_MODS_to_solr.xslt" to match my directory structure.
>
> The verdict? The Internet Archive BookReader search now works! --
> but only after I manually reindex the site. Unfortunately, automatic
> indexing upon ingest still isn't working in Fedora 3.5.
>
> *Fedora 3.6.2, Solr 4.2.0, GSearch 2.6*
>
> Here's a twist: I found the York University Library's version of
> those config files, which _this blog post_
> <http://islandora.ca/content/more-long-tail-modules> says is for
> GSearch 2.6 & Solr 4.2.0 (perhaps that should be mentioned in the
> README.md file as well?). I upgraded Fedora, Solr and GSearch to the
> appropriate versions, then put those config files in place and
> updated paths in the appropriate files. Behold! Both automatic
> indexing upon ingest and the Internet Archive BookReader search work
> -- although I'm now encountering a bunch of problems with the ingest
> process itself and the Fedora Web Administrator is acting weird.
>
> *Questions*
>
> Can you think of anything unique that the York University Library's
> /basic-solr-config/ bundle has that would help it achieve automatic
> indexing? Right now, my Fedora 3.5 installation is much faster and
> more stable, so if I can get automatic indexing to work there, I'll
> stick with that one.
>
> Any other thoughts / recommendations?
>
> Thanks!
>
> On Wednesday, February 5, 2014 2:34:22 PM UTC-5, slanger wrote:
>
> Thanks for your suggestions, Everyone!
>
> Peter: I compared my /Solr Settings/ with the Islandora sandbox site
> beforehand and they exactly matched. I doublechecked and that's
> still the case.
>
> On Thursday, January 30, 2014 5:15:53 PM UTC-5, Peter MacDonald wrote:
>
> slanger:
> You might benefit by logging into http://sandbox.islandora.ca as
> admin/islandora and looking at the Solr Settings menu for how
> they set it up there.
> Administer > Islandora > Solr Index > Solr Settings
> Peter
> On Thu, Jan 30, 2014 at 3:15 PM, slanger <slan...@gmail.com> wrote:
>
> Thanks for your response, Jordan! We're pretty much using
> the defaults that came with the module.
> /Solr field relating pages to book PIDs/ =
> RELS_EXT_isMemberOf_uri_ms
> I'll admit to being a bit of a Solr n00b. Is that just a
> placeholder? If so, where would I find the correct value?
> Thanks!
> On Thursday, January 30, 2014 1:34:48 PM UTC-5, Jordan
> Dukart wrote:
>
> The searching within IAV relies upon the Solr field
> that's configured in
> yoursite/admin/islandora/__internet_archive_bookreader.
> Is it mapped to the correct field?
> Jordan
> On 1/30/2014, 2:15 PM, slanger wrote:
>> Here's another Solr/GSearch symptom that we're
>> experiencing, in case it offers additional insight
>> into our larger problem (Hopefully, this doesn't end
>> up muddying the waters):
>> When we search inside a book via the Internet Archive
>> BookReader for a term that we know should be there, it
>> always returns the message "No matches were found."
>> The "OCR" and "HOCR" datastreams are being
>> successfully created upon ingest and are populated
>> with text. This problem continues to happen even after
>> manually reindexing the site via the Admin Client for
>> Fedora Generic Search Service interface.
>> I bring it up because I can see in the logs that the
>> book reader is successfully requesting data from Solr,
>> but no data is being returned. Perhaps it, too, is not
>> getting indexed?
>> |
>> INFO:[]webapp=/solr
>> path=/selectparams={hl.fragsize=18&hl.tag.__post=}}}&hl.tag.pre={{{&hl.fl=__text_nodes_HOCR_hlt&json.nl=ma__p&wt=json&hl=true&rows=32&vers__ion=1.2&defType=dismax&fl=PID,__RELS_EXT_isSequenceNumber___literal_ms&hl.snippets=8&hl.us__eFastVectorHighlighter=true&st__art=0&q=the&qt=standardfq=RELS___EXT_isMemberOf_uri_ms:("info:__fedora/islandora:87"+OR+"__islandora:87")}hits=0status=0QTime=16
>> |
>> Again, I'm not sure if this provides any insight into
>> the larger Solr/GSearch problem that we're
>> experiencing; but I thought I'd throw it out there.
>> We'll happily accept solutions to either issue. :-)
>> On Monday, January 27, 2014 11:55:14 AM UTC-5, slanger
>> wrote:
>>
>> *Problem:* When I successfully ingest an object
>> into Fedora (via Islandora), it is not getting
>> automatically indexed in Solr. However, if I index
>> the site manually
>> (http://localhost:8080/__fedoragsearch/rest?operation=__updateIndex
>> <http://localhost:8080/fedoragsearch/rest?operation=updateIndex>
>> > "updateIndex fromFoxmlFiles"), the object gets
>> successfully indexed and it appears in the
>> Islandora search results.
>> Here is some info about my setup:
>>
>> * Fedora 3.5
>> * Solr 3.6.1
>> * Fedora Generic Search Service 2.4.2
>> * Java 1.6.0_45
>>
>> In the file /fedora/server/config/fedora.__fcfg,
>> the Java Messaging Service (JMS) module is enabled:
>> |
>> <modulerole="org.fcrepo.server.__messaging.Messaging"class="org.fcrepo.server.__messaging.MessagingModule">
>> <comment>Fedora's Java Messaging Service (JMS)
>> Module</comment>
>> <paramname="enabled"value="true"/>
>> <paramname="java.naming.factory.__initial"value="org.apache.activemq.__jndi.__ActiveMQInitialContextFactory"__/>
>> <paramname="java.naming.provider.__url"value="vm:(broker:(tcp://__localhost:61616))"/>
>> <paramname="datastore1"value="apimUpdateMessages">
>> <comment>A datastore representing a JMS
>> Destination for APIM events which update the
>> repository</comment>
>> </param>
>> <paramname="datastore2"value="apimAccessMessages">
>> <comment>A datastore representing a JMS
>> Destination for APIM events which do not update
>> the repository</comment>
>> </param>
>> </module>
>> |
>> I'm not seeing any errors in my Catalina log files
>> when Fedora is starting up. When I manually
>> reindex the site (via the GSearch Admin Client), I
>> can see several commit messages being logged,
>> which (as I understand it) signals that the object
>> is getting indexed. However, when I ingest an
>> object via Islandora and look at the logs, those
>> commit messages are absent.
>> Here is an example of what I see in the logs after
>> ingesting an object via Islandora:
>>
>> * Jan 27, 2014 12:09:59 PM
>> org.apache.solr.core.SolrCore execute
>> INFO: [] webapp=/solr path=/select
>> params={fl=PID,+dc.title,dc.__description,dc.date,dc.__subject,dc.contributor&q=PID:"__islandora:30"&json.nl
>> <http://json.nl>=map&wt=__json&version=1.2}
>> hits=0 status=0 QTime=15
>> * Jan 27, 2014 12:09:59 PM
>> org.apache.solr.core.SolrCore execute
>> INFO: [] webapp=/solr path=/select
>> params={fl=PID,+dc.__description&q=PID:"islandora:__30"&json.nl
>> <http://json.nl>=map&wt=json&__version=1.2}
>> hits=0 status=0 QTime=0
>>
>> Any idea where I should start looking to
>> troubleshoot this problem?
>> Thanks!
>>
>> --
>> You received this message because you are subscribed
>> to the Google Groups "islandora" group.
>> To unsubscribe from this group and stop receiving
>> emails from it, send an email to
>> islandora+...@__googlegroups.com.
>> For more options, visit
>> https://groups.google.com/__groups/opt_out
>> <https://groups.google.com/groups/opt_out>.
>
> --
> You received this message because you are subscribed to the
> Google Groups "islandora" group.
> To unsubscribe from this group and stop receiving emails
> from it, send an email to islandora+...@googlegroups.com.
> For more options, visit
> https://groups.google.com/groups/opt_out
> <https://groups.google.com/groups/opt_out>.
>
> --
> Library Information Systems Specialist
> Hamilton College Library
> Clinton, New York
>

slanger

unread,
Feb 5, 2014, 3:16:59 PM2/5/14
to isla...@googlegroups.com
Actually, I think you're in the clear, @nruest. ;-) Both versions of the basic-solr-config bundle (from Discovery Garden and York University Library) include a visible <autoCommit> section in the solrconfig.xml file. They're slightly different from each other and I've tried both in my Fedora 3.5 installation, but it hasn't made a difference.
 
Right now, we're using:
 
    <autoCommit>
      <maxDocs>1</maxDocs>
      <maxTime>1</maxTime>
    </autoCommit>

Nick Ruest

unread,
Feb 5, 2014, 3:25:53 PM2/5/14
to isla...@googlegroups.com
I stand corrected, I guess I did push it up there!

Which solrconfig.xml did you put it in?

Mine is in
$CATALINA_HOME/webapps/fedoragsearch/WEB-INF/classes/fgsconfigFinal/index/FgsIndex/conf/solrconfig.xml

-nruest

On 14-02-05 03:16 PM, slanger wrote:
> Actually, I think you're in the clear, @nruest. ;-) Both versions of the
> /basic-solr-config/ bundle (from Discovery Garden
> <https://github.com/discoverygarden/basic-solr-config/blob/modular/conf/solrconfig.xml>
> and York University Library
> <https://github.com/yorkulibraries/basic-solr-config/blob/modular/conf/solrconfig.xml>)
> <http://json.nl>=ma__p&wt=json&hl=true&rows=32&vers__ion=1.2&defType=dismax&fl=PID,__RELS_EXT_isSequenceNumber___literal_ms&hl.snippets=8&hl.us__eFastVectorHighlighter=true&st__art=0&q=the&qt=standardfq=RELS___EXT_isMemberOf_uri_ms:("info:__fedora/islandora:87"+OR+"__islandora:87")}hits=0status=0QTime=16
>
> >> |
> >> Again, I'm not sure if this provides any insight
> into
> >> the larger Solr/GSearch problem that we're
> >> experiencing; but I thought I'd throw it out there.
> >> We'll happily accept solutions to either issue. :-)
> >> On Monday, January 27, 2014 11:55:14 AM UTC-5,
> slanger
> >> wrote:
> >>
> >> *Problem:* When I successfully ingest an object
> >> into Fedora (via Islandora), it is not getting
> >> automatically indexed in Solr. However, if I
> index
> >> the site manually
> >>
> (http://localhost:8080/__fedoragsearch/rest?operation=__updateIndex
> <http://localhost:8080/__fedoragsearch/rest?operation=__updateIndex>
> >>
> <http://localhost:8080/fedoragsearch/rest?operation=updateIndex
> <http://googlegroups.com>.
> >> For more options, visit
> >> https://groups.google.com/__groups/opt_out
> <https://groups.google.com/__groups/opt_out>
> >> <https://groups.google.com/groups/opt_out
> <https://groups.google.com/groups/opt_out>>.
> >
> > --
> > You received this message because you are subscribed
> to the
> > Google Groups "islandora" group.
> > To unsubscribe from this group and stop receiving emails
> > from it, send an email to
> islandora+...@googlegroups.com.
> > For more options, visit
> > https://groups.google.com/groups/opt_out
> <https://groups.google.com/groups/opt_out>
> > <https://groups.google.com/groups/opt_out
> <https://groups.google.com/groups/opt_out>>.
> >
> > --
> > Library Information Systems Specialist
> > Hamilton College Library
> > Clinton, New York
> >
> > --
> > You received this message because you are subscribed to the Google
> > Groups "islandora" group.
> > To unsubscribe from this group and stop receiving emails from it,
> send
> > an email to islandora+...@googlegroups.com <javascript:>.

slanger

unread,
Feb 5, 2014, 4:06:47 PM2/5/14
to isla...@googlegroups.com
Originally, I followed these instructions and grabbed a copy of the schema.xml file generated at the location you specified and moved it to $FEDORA_HOME/solr/conf/. Now, I'm using the schema.xml and solrconfig.xml files that come with https://github.com/discoverygarden/basic-solr-config. They reside in $FEDORA_HOME/solr/conf/ as well.
Message has been deleted

slanger

unread,
Feb 5, 2014, 4:20:41 PM2/5/14
to isla...@googlegroups.com
Actually, I just deleted my last post because it was not accurate. I actually can see the fields defined on the Solr Settings screen in my search results. I'm not sure why I wasn't seeing it before.
 
Sorry about the misinformation . . . .

slanger

unread,
Feb 12, 2014, 12:34:28 PM2/12/14
to isla...@googlegroups.com
Thanks again for your feedback and suggestions, Everyone! Although I wasn't able to get automatic indexing upon ingest to work with Fedora 3.5, I was able to get it to work in Fedora 3.6.2, using a tweaked version of the York University Library's config files. Despite the fact that I'm now encountering some weird memory issues (I'll probably start a new thread about that), I'm going to continue using 3.6.2 from here on out. Thanks again for your help!

slanger

unread,
Feb 12, 2014, 12:53:56 PM2/12/14
to isla...@googlegroups.com
Concluding thoughts: A lot of great resources were brought to my attention in this thread -- things that I know other Islandora users would find really helpful. With this in mind, updating some of the documentation may make these tools easier to find and utilize:
Links to these resources are scattered throughout Google Groups and the Islandora blog; but in that way, they're easy to miss. If not for this thread, I certainly wouldn't have known about them. Consolidated these links would be a big help to new Islandora users and may even reduce the number of similar Solr/GSearch installation questions that you have to answer.

Melissa Anez

unread,
Feb 12, 2014, 1:01:08 PM2/12/14
to isla...@googlegroups.com
Excellent suggestions. I'll be updating the wiki documentation for the upcoming 7.x-1.3 release, and that section has needed attention for a while, so I'll get these pieces incorporated. 

Given your recent experiences installing Islandora, would you be willing to review that section of docs and give us more suggestions? We're always looking for more user perspectives and we're well ware that the installations docs need some love.

- Melissa

slanger

unread,
Feb 12, 2014, 6:34:07 PM2/12/14
to isla...@googlegroups.com
Sure, Melissa. I'm up against some deadlines right now; but once you get going on the 7.x-1.3 version of the Installing Solr and GSearch page, I can take a look.

Alex Garnett

unread,
Feb 13, 2014, 1:13:28 PM2/13/14
to isla...@googlegroups.com
Just a note to say I'd be very keen to see an updated version of that configuration page as every time I've attempted it I'm unable to find some of the referenced files circa the Apache ant step :)
Reply all
Reply to author
Forward
0 new messages