Migrating sharded SOLR statistics to DSPACE 7.4

157 views
Skip to first unread message

amtuan...@gmail.com

unread,
Feb 14, 2023, 7:53:10 AM2/14/23
to DSpace Technical Support
Hi all

We are in the process of migrating our DSpace 6x to 7.4. 

I am running into a problem with trying to import the SOLR statistics CSV files. We have our statistics sharded for each year and dumped the data to CSV files. However, the upgrade instructions have no mention of how to recreate the shards for statistics, only instructions for importing the CSVs. When I try to import the shards, I get the error 404 because the cores/shards are not created yet. Could you please give me some hints how to do this?

Thanks for you help.
Tuan


Excerpt of the instructions I  followed below: 

Load authority and statistics from the dumps that you made earlier (not the disaster-recovery backup).


[dspace]/bin/dspace solr-import-statistics -i authority
[dspace]/bin/dspace solr-import-statistics -i statistics

This could take quite some time.

If you had sharded your statistics, you will need to load the dump of each shard separately.  As when dumping, the index names will be ... statistics-2017 statistics-2018 statistics.

Sean Carte

unread,
Feb 16, 2023, 2:41:40 AM2/16/23
to amtuan...@gmail.com, DSpace Technical Support
Hi Tuan

On the Slack tech-support channel, Nicholas Woodward reported that he had recreated the cores 'by copying the new statistics core into a new Solr configset, e.g. statistics-2022, for each previous year and changing the schema name in schema.xml. Then after restarting Solr I was able to import all of the old stats CSV files.'

I was about to try this, when I realised I'd made a mistake and imported all my sharded statistics into the main core. I'm actually fine with that, unless someone here tells me that that was a terrible thing to do.

Sean

--
All messages to this mailing list should adhere to the Code of Conduct: https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx
---
You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dspace-tech...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dspace-tech/e2cc11bc-1761-4936-8a15-5895b3741340n%40googlegroups.com.

Mark H. Wood

unread,
Feb 16, 2023, 8:42:22 AM2/16/23
to dspac...@googlegroups.com
On Thu, Feb 16, 2023 at 09:41:25AM +0200, Sean Carte wrote:
> On the Slack tech-support channel, Nicholas Woodward reported that he had
> recreated the cores 'by copying the new statistics core into a new Solr
> configset, e.g. statistics-2022, for each previous year and changing the
> schema name in schema.xml. Then after restarting Solr I was able to import
> all of the old stats CSV files.'
>
> I was about to try this, when I realised I'd made a mistake and imported
> all my sharded statistics into the main core. I'm actually fine with that,
> unless someone here tells me that that was a terrible thing to do.

That's actually what the upgrade procedure ought to do, because
sharding was not included when working out the separation of Solr from
the DSpace build (which was required by changes in the packaging of
currently supported Solr versions). There may be a way to make it
work, but the tools that currently come with DSpace will not help.

Sharding is awaiting some discussion:

o Does sharding actually help?

o Consider that we are abusing a feature which was meant for
evenly spreading load across multiple hosts.

o This could be done much more simply and organically, without
involving DSpace code at all, using Time Routed Aliases, but TRA
requires Solr Cloud, which runs a bit differently. Is the benefit
of sharding (whatever that actually is) worth the trouble to set up
a (possibly degenerate) Cloud?

--
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu
signature.asc

Sean Carte

unread,
Feb 16, 2023, 8:59:16 AM2/16/23
to dspac...@googlegroups.com
Thanks, Mark. That's good to know. And it certainly makes my life easier having everything in a single core.

Sean

amtuan...@gmail.com

unread,
Feb 16, 2023, 9:03:50 AM2/16/23
to DSpace Technical Support
Thank you both for your insights. I was able to recreate the cores and imported the CSVs. 
Reply all
Reply to author
Forward
0 new messages