Importing sharded solr statistics in DSpace 7

67 views
Skip to first unread message

Karol

unread,
Oct 30, 2022, 7:52:19 AM10/30/22
to DSpace Technical Support
Hi,

I am trying to migrate solr stats from dspace 6.3 to 7.4. Importing 2022 - current stats works without any problem, the problem occurs when I try to import 2019,2020,2021(sharded stats):
/dspace/bin/dspace solr-import-statistics -i statistics-2021
  
Exception: Error from server at http://localhost:8983/solr/statistics-2021: Expected mime type application/octet-stream but got text/html. <html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/>
<title>Error 404 Not Found</title>
</head>
<body><h2>HTTP ERROR 404 Not Found</h2>
<table>
<tr><th>URI:</th><td>/solr/statistics-2021/admin/luke</td></tr>
<tr><th>STATUS:</th><td>404</td></tr>
<tr><th>MESSAGE:</th><td>Not Found</td></tr>
<tr><th>SERVLET:</th><td>default</td></tr>
</table>

</body>
</html>

org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://localhost:8983/solr/statistics-2021: Expected mime type application/octet-stream but got text/html. <html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/>
<title>Error 404 Not Found</title>
</head>
<body><h2>HTTP ERROR 404 Not Found</h2>
<table>
<tr><th>URI:</th><td>/solr/statistics-2021/admin/luke</td></tr>
<tr><th>STATUS:</th><td>404</td></tr>
<tr><th>MESSAGE:</th><td>Not Found</td></tr>
<tr><th>SERVLET:</th><td>default</td></tr>
</table>

</body>
</html>

        at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:635)
        at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:266)
        at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:248)
        at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:214)
        at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:231)
        at org.dspace.util.SolrImportExport.getMultiValuedFields(SolrImportExport.java:482)
        at org.dspace.util.SolrImportExport.importIndex(SolrImportExport.java:433)
        at org.dspace.util.SolrImportExport.main(SolrImportExport.java:148)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:277)
        at org.dspace.app.launcher.ScriptLauncher.handleScript(ScriptLauncher.java:133)
        at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:98)

Tim Donohue

unread,
Nov 1, 2022, 10:08:57 AM11/1/22
to DSpace Technical Support
Hi,

That appears to be saying that the Solr service at http://localhost:8983/solr/statistics-2021 is not responding or doesn't exist?  It looks like you are getting a 404 from that URL.  So, you should verify that URL is correct and that it is accessible to you.

Tim

Karol

unread,
Nov 1, 2022, 4:45:16 PM11/1/22
to DSpace Technical Support
Hi Tim,

As usual, thank you for your reply. Indeed, there was no statistics-2021 folder in my "solr/configsets/". Can you tell me if I created statistics-2021 in the correct way?

1) I copied /solr/configsets/statistics to /solr/configsets/statistics-2021
2) I made the import with the script: solr-import-statistics -i statistics-2021
Now the import error does not appear.

Is this not a good practice and need to do it differently? E.g. through the solr web panel?

Alternatively, is it possible to merge the split statistics by changing all exported files from the split year, e.g:
statistics-2021_export_2021-05_31.csv to statistics_export_2021-05_31.csv and import with the master script:
/dspace/bin/dspace solr-import-statistics -i statistics
?

Thanks and best regards,

Karol

Tim Donohue

unread,
Nov 2, 2022, 1:24:24 PM11/2/22
to DSpace Technical Support

Hi Karol,

Yes, you *should* be able to combine older sharded/split statistics by exporting those stats (using "solr-export-statistics") to CSV, an then importing them into your new Solr (using "solr-import-statistics").  By default that "solr-import-statistics" script would just add the new statistics to your Solr (just make sure not to specify the "clear" flag which would completely delete all current stats before importing the new ones).  See the docs at https://wiki.lyrasis.org/display/DSDOC7x/SOLR+Statistics+Maintenance

As always, I'd recommend testing this process before doing it in Production. But it seems to me like it should work.

After responding to your earlier question about shards, I realized we have a known bug with loading statistics from shards in DSpace 7.  See https://github.com/DSpace/DSpace/issues/8478   That is scheduled to be analyzed/fixed, but I don't yet know when (as it's still waiting on a volunteer).   We also have a note to the official docs about ongoing sharding, as it's likely that any sharding decisions need to be made based on your local Solr setup/needs (and any sharding likely should be done using Solr tools themselves).  See note at https://wiki.lyrasis.org/display/DSDOC7x/SOLR+Statistics+Maintenance#SOLRStatisticsMaintenance-SolrShardingByYear

If you have other questions, let us know on this list.

Tim
Reply all
Reply to author
Forward
0 new messages