Rebuilding Solr Statistics

1,157 views
Skip to first unread message

Hayden Young

unread,
Feb 18, 2016, 12:11:57 AM2/18/16
to DSpace Technical Support
Rebuilding DSpace's search indexes simply requires issuing a reindex using the dspace command line tool's index-discovery command.

What is the process for rebuilding Solr's statistics index?

I've tried using the stats-log-importer without success:

./dspace stats-log-importer -m -i /path/to/dspace/log/dspace.log*

Any assistance much appreciated.

Thanks


Hayden

helix84

unread,
Feb 18, 2016, 6:03:03 AM2/18/16
to Hayden Young, DSpace Technical Support
Hi Hayden,

you should keep backups of your statistics core and authority core (if
you use them), because unlike search and oai core, those are not just
caches you can rebuild. Admitedly, we haven't had tools to
export/import data from Solr to CSV until Dpace 5.3 or so.

Importing events from dspace.log to Solr is possible, but won't
capture neither all types of evens, neither as much event properties
(e.g. geolocation) as Solr does. It's a two-step process:
stats-log-converter, stats-log-importer

https://wiki.duraspace.org/display/DSDOC5x/Command+Line+Operations#CommandLineOperations-SOLRStatistics


Regards,
~~helix84

Compulsory reading: DSpace Mailing List Etiquette
https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette

Hayden Young

unread,
Feb 18, 2016, 7:21:08 AM2/18/16
to DSpace Technical Support, hay...@knowledgearc.com, hel...@centrum.sk
Hi Helix

Thanks for the feedback and suggestions.

The problem I have is that I'm trying to get statistics from DSpace 1.6 working in DSpace 5 but the update_solr_indexes is throwing a too old exception so I'm guessing DSpace 1.6's solr statistics is just too old for the Lucene lib to successfully update the Solr statistics so that it is usable in newer versions of DSpace. The full stack trace being:

Exception in thread "main" java.io.IOException: Could not read Lucene segments files in /opt/dspace/solr/statistics/data/index
    at org.dspace.app.util.IndexVersion.getIndexVersion(IndexVersion.java:141)
    at org.dspace.app.util.IndexVersion.main(IndexVersion.java:59)
Caused by: org.apache.lucene.index.IndexFormatTooOldException: Format version is not supported (resource: BufferedChecksumIndexInput(MMapIndexInput(path="/opt/dspace/solr/statistics/data/index/segments_3zxwg"))): -7 (needs to be between -9 and -11). This version of Lucene only supports indexes created with release 3.0 and later.
    at org.apache.lucene.codecs.lucene3x.Lucene3xSegmentInfoReader.readLegacySegmentInfo(Lucene3xSegmentInfoReader.java:131)
    at org.apache.lucene.codecs.lucene3x.Lucene3xSegmentInfoReader.readLegacyInfos(Lucene3xSegmentInfoReader.java:57)
    at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:414)
    at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:454)
    at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:906)
    at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:752)
    at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:450)
    at org.dspace.app.util.IndexVersion.getIndexVersion(IndexVersion.java:136)
    ... 1 more

Therefore, what would be my migration path to get DSpace 5 to update DSpace 1.6 statistics successfully?

Thanks


Hayden

helix84

unread,
Feb 18, 2016, 7:31:32 AM2/18/16
to Hayden Young, DSpace Technical Support
Yes, I know the problem. In fact, you shouldn't have run into it -
since DSpace 5, this should be automatically recognized during ant
fresh_install or ant update and the cores updated in two steps.

Did you copy the solr data only after running ant? If so, re-run ant
to trigger the automated upgrade. If that' snot the case, let me know,
we would like to know about any bugs in the autoupgrade process.

The steps can be done manually, too. I described this here:
https://wiki.duraspace.org/display/DSPACE/DSpace+Release+5.0+Status#DSpaceRelease5.0Status-AutomaticSolrupgrade

Hayden Young

unread,
Feb 18, 2016, 10:04:09 AM2/18/16
to DSpace Technical Support, hay...@knowledgearc.com, hel...@centrum.sk
Okay, again thanks for the recommendations. I'm definitely making progress.

So currently my steps are:

- Delete the statistics directory in DSpace 5 and copy the statistics directory from the DSpace 1.6 site into /path/to/dspace/solr/
- Run java -cp lucene-core-3.5.0.jar org.apache.lucene.index.CheckIndex /dspace/solr/statistics/data/index/; This reports Lucene version 2.4
- Run java -cp lucene-core-3.5.0.jar org.apache.lucene.index.IndexUpgrader /dspace/solr/statistics/data/index
- Run java -cp lucene-core-3.5.0.jar org.apache.lucene.index.CheckIndex /dspace/solr/statistics/data/index/; This reports Lucene version 3.1+ so looks like the upgrade was successful.
- Run the update_solr_indexes using ant build script. Everything starts successfully but then ends with the errors below:

update_solr_indexes:
     [echo] Checking if any Solr indexes (/opt/dspace/solr/*) need upgrading...
     [echo] Current version of Solr/Lucene: 4.10.2

check_solr_index:
     [echo] Checking if the Solr index at /opt/dspace/solr/statistics/data/index/ is >= Solr 3.5.0
     [echo] The Solr index in /opt/dspace/solr/statistics/data/index/ IS >= Solr 3.5.0. Looks good!

check_solr_index:
     [echo] Checking if the Solr index at /opt/dspace/solr/statistics/data/index/ is >= Solr 4.10.2
     [echo] The Solr index in /opt/dspace/solr/statistics/data/index/ needs an upgrade to Solr 4.10.2

upgrade_solr_index:
     [echo] Upgrading Solr/Lucene Index at /opt/dspace/solr/statistics/data/index/ to Solr/Lucene 4.10.2.
     [echo] Upgrading the Solr index in /opt/dspace/solr/statistics/data/index/. Depending on the index size, this may take a while (please be patient)...

BUILD FAILED
/tmp/dspace-hosted-master/dspace/target/dspace-installer/build.xml:978: The following error occurred while executing this line:
/tmp/dspace-hosted-master/dspace/target/dspace-installer/build.xml:1076: The following error occurred while executing this line:
/tmp/dspace-hosted-master/dspace/target/dspace-installer/build.xml:1172: Java returned: 143


Unfortunately there are no real errors being reported. I thought maybe it was crashing because of not enough memory but I've pumped this up to 2g using JAVA_OPTS and ANT_OPTS and yet I get the same failure.

helix84

unread,
Feb 18, 2016, 12:59:22 PM2/18/16
to Hayden Young, DSpace Technical Support
Interesting, I haven't seen the automatic upgrade fail yet. Perhaps
running the second step (3.5.0 -> 4.10.2 using the 4.10.2 jar)
manually, too, will reveal what the problem is.

1) Anyway, I didn't check how sophisticated the upgrade logic is. You
seem to have only upgraded one of the cores manually (1.3.0 -> 3.5.0).
Either upgrade them all manually or upgrade them all automatically.

2) Try to run Maven with debug flags

3) Download lucene-core-4.10.2.jar manually and run IndexUpgrader and
CheckIndex on your index version 3.5.0 to complete the second step.

4) If all else fails, assuming it's really a memory / index size issue
(but I don't think it is), you could shard the statistics core with
DSpace 3.x (Solr 3.5.0) and try to upgrade each shard individually
(auto-upgrade of all shards at once should work, too, but I'm not
certain).

Hayden Young

unread,
Feb 18, 2016, 2:38:52 PM2/18/16
to DSpace Technical Support, hay...@knowledgearc.com, hel...@centrum.sk
Thanks Helix; you've solved the final piece of what has been a pretty painful upgrade of the Solr statistics (and has brought to an end 4 days of frustration).

So as you suggested I ended up manually upgrading to 3.5 and then to 4.

The process that worked for me:

- Stop tomcat
- Download the lucene core 3.5 jar
- Run java -Xmx2048m -Xms256m -cp lucene-core-3.5.jar org.apache.lucene.index.IndexUpgrader /dspace/solr/statistics/data/index/
- Start tomcat
- Check that the statistics are still working
- Stop tomcat
- cd into /dspace/lib
- Run java -Xmx2048m -Xms256m -cp lucene-core-4.10.2.jar org.apache.lucene.index.IndexUpgrader /dspace/solr/statistics/data/index/
- Start tomcat
- Check that the statistics are still working.

I don't know if restarting tomcat half way through the process was necessary, I just did it more so I could check that no errors were being thrown by Solr.

One other thing I noticed was that my solr/statistics/conf was not upgraded so I manually copied over the conf from the ant build but getting the old config could have occurred when I restored the old solr statistics directory (I ended up doing a number of restores).

Thanks again


Hayden
Reply all
Reply to author
Forward
0 new messages