How to curb massive resource re-indexing?

118 views
Skip to first unread message

Kidd, Mary

unread,
Aug 20, 2025, 5:41:41 PMAug 20
to Archivesspac...@lyrasislists.org
Hi all,

We are running into some pretty significant indexing issues here at Yale and would appreciate your perspective on what might be happening and how best to tune certain indexer behaviors.

I am getting reports that staff performing routine accessioning steps (adding a new archival object to a series, and creating a new top container instance attached to that AO) on a large resource containing thousands of AOs/TCs are experiencing long delays. After creating a new TC, it is invisible in box Manage Top Containers and Instances > Add Container Instance > Browse. I confirmed the container and AO records are in the database, but do not show up in the index for 15 to 30+ minutes. Updating system_mtime is not helping.

The ASpace logs show:

E, [2025-08-19T08:51:26.083272 #589] ERROR -- : Thread-3308: SolrIndexerError when committing:
Timeout error with  POST {"commit":{"softCommit":false}}.
Please check your :indexer_solr_timeout_seconds, :indexer_thread_count, and :indexer_records_per_thread settings in
your config.rb file.

We upped the AppConfig[:indexer_solr_timeout_seconds] to 600, but this did not noticeably improve the situation.

When I reviewed the backend and Solr logs, it appears that small edits (like adding a TC) trigger a resource-wide reindex. I'm seeing multiple "deleteByQuery" calls against the entire resource and then began re-adding large batches of tree node documents for the PUI. It all seems a bit excessive.

Has anyone seen massive re-indexing behavior triggered by small edits like this? Is there any way to adjust this behavior so the indexer "radius" is limited and doesn't traverse the entire collection tree?

Many thanks for any advice or guidance here.

Mary

Mary Kidd (she/her)
Technical Lead, Archival Systems
Yale Library IT – Client Services and IT Operations

Joshua D. Shaw

unread,
Aug 20, 2025, 6:23:20 PMAug 20
to Archivesspac...@lyrasislists.org, Kidd, Mary
Hi Mary

A lot of that indexing activity is done to store a lot of info for the PUI so that it doesn't hit the db as often (the tree nodes index are PUI related). The scope of the reindex depends on how the relationships between different objects are defined in the models - i.e. its pretty hardwired into the core code and not really changeable without a bunch of effort and knock-on effects. The size of a resouce tree is also going to impact the index time (obviously) and larger resources index time will not necessarily scale in a linear fashion.

There are a couple of things you can try to get around the indexing time and Solr timeout issue. None of these suggestions are really 'fixes' in the true sense.

  1. Try a really large timeout. We need to set that at about 16 hours on 4.x. On 3.3.1 we were at about 3-4 hours. We have about 5.3 million rows in our db for ~18k resources and ~700k AOs.,
  2. Tweak the indexer thread and record per thread counts
    1. AppConfig[:indexer_records_per_thread]
    2. AppConfig[:indexer_thread_count]
    3. AppConfig[:pui_indexer_records_per_thread]
    4. AppConfig[:pui_indexer_thread_count]
    5. Upping these from the defaults will increase the resources used by the app and may also cause the Solr timeout to be more likely.
  3. Check if any plugins you use are checking ancestor information since this will also add to the indexer time. I'm working on an optimization for the core code for that issue (https://github.com/archivesspace/archivesspace/pull/3667)

Related to this, there's also what looks like a bug to me where the PUI indexer can skip records. I have a pull request in for that as well (https://github.com/archivesspace/archivesspace/pull/3622)

jds


From: 'Kidd, Mary' via Archivesspace_Users_Group <Archivesspac...@lyrasislists.org>
Sent: Wednesday, August 20, 2025 5:41 PM
To: Archivesspac...@lyrasislists.org <Archivesspac...@lyrasislists.org>
Subject: [ArchivesSpace Users Group] How to curb massive resource re-indexing?
 
--
You received this message because you are subscribed to the Google Groups "Archivesspace_Users_Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to Archivesspace_User...@lyrasislists.org.
To view this discussion visit https://groups.google.com/a/lyrasislists.org/d/msgid/Archivesspace_Users_Group/SJ0PR08MB6750E773DE93C4F115294ACE9A33A%40SJ0PR08MB6750.namprd08.prod.outlook.com.

Donald Mennerich

unread,
Aug 20, 2025, 6:25:14 PMAug 20
to Kidd, Mary, Archivesspac...@lyrasislists.org
Hi Mary,

What version of ArchivesSpace are you running? We did not have significant problems on 3.5.1 but are starting our testing for 4.1.1 and have some, I believe, similarly large resources in our repos.

Thanks,

Donald
--
You received this message because you are subscribed to the Google Groups "Archivesspace_Users_Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to Archivesspace_Users_Group+unsub...@lyrasislists.org.


--
Donald R. Mennerich, Senior Digital Archivist
Digital Library Technology Services
New York University Libraries

Regine I. Heberlein

unread,
Aug 20, 2025, 6:31:33 PMAug 20
to Kidd, Mary, Archivesspac...@lyrasislists.org
Hi Mary,

I have more of a non-answer for you, just in case something about our experience may help you out.

We ran into the same issue a while back when updating some resource-level boilerplate notes, which triggered a re-index of all linked objects. (I learned the hard way never to do that again during the week!) Our case was quite bad—days went by with our archivists still not seeing newly-created objects in the SUI.

We did a couple different things to get us up and running again:

  • It turned out that we were indexing for the PUI, which we don’t use, so we turned that off. That may not help you. 
  • With Blake’s help, we fine-tuned settings for the records per thread and thread count, like so:
AppConfig[:indexer_records_per_thread] = 15
AppConfig[:indexer_thread_count] = 2
  • At the time, we were on older infrastructure (we’re hosted by Lyrasis), and moving to a newer setup where SOLR and the db are closer together helped. Again, that may moot in your case. However, with that said,
  • We also considered paying for a dedicated single-stack setup, which is something you might investigate. It wasn’t necessary for us in the end, but we did think it would have made things even faster (so I’m keeping it in my back pocket for next time).

Hope this is helpful!

-Regine


Regine Heberlein (she/her)

Archival Systems Technical Lead

hebe...@princeton.edu

 

**My working day may not be your working day. Please do not feel obliged to reply to this email outside of your regular working hours.**

 

 

From: 'Kidd, Mary' via Archivesspace_Users_Group <Archivesspac...@lyrasislists.org>
Date: Wednesday, August 20, 2025 at 5:41 PM
To: Archivesspac...@lyrasislists.org <Archivesspac...@lyrasislists.org>
Subject: [ArchivesSpace Users Group] How to curb massive resource re-indexing?

--
You received this message because you are subscribed to the Google Groups "Archivesspace_Users_Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to Archivesspace_User...@lyrasislists.org.

Kidd, Mary

unread,
Aug 21, 2025, 4:28:17 PMAug 21
to Joshua D. Shaw, Archivesspac...@lyrasislists.org
Hi Josh,

Thanks for letting me know about your setup! I had thought 600 seconds was a long time but now I'm rethinking that notion :-)

I'm going to experiment a bit with the indexer settings, starting by dialing down the indexer_thread_count/pui_indexer_thread_count (from 5 to 2) and doubling indexer_records_per_thread/pui_indexer_records_per_thread (from 15 up to 30 or 40). My understanding is that this should cut down on request overhead by sending fewer, larger batches per thread.

I'll test this out and give an update soon.

Thank you,
Mary

Mary Kidd (she/her)
Technical Lead, Archival Systems
Yale Library IT – Client Services and IT Operations

From: Joshua D. Shaw <Joshua...@dartmouth.edu>
Sent: Wednesday, August 20, 2025 6:23 PM
To: Archivesspac...@lyrasislists.org <Archivesspac...@lyrasislists.org>; Kidd, Mary <mary...@yale.edu>
Subject: Re: How to curb massive resource re-indexing?
 

Kidd, Mary

unread,
Aug 21, 2025, 4:37:31 PMAug 21
to Donald Mennerich, Archivesspac...@lyrasislists.org
Hi Don,

We're currently on a highly customized 3.3.1, aiming to upgrade to 4.x in the next few months. That's helpful to know - I would be curious to see if we run into anything similar in our testing. What are you noticing so far in terms of indexer behavior in 4?

Thanks,
Mary

Mary Kidd (she/her)
Technical Lead, Archival Systems
Yale Library IT – Client Services and IT Operations

From: archivesspac...@lyrasislists.org <archivesspac...@lyrasislists.org> on behalf of Donald Mennerich <don.me...@nyu.edu>
Sent: Wednesday, August 20, 2025 6:25 PM
To: Kidd, Mary <mary...@yale.edu>
Cc: Archivesspac...@lyrasislists.org <Archivesspac...@lyrasislists.org>
Subject: Re: [ArchivesSpace Users Group] How to curb massive resource re-indexing?
 

Joshua D. Shaw

unread,
Aug 21, 2025, 5:14:12 PMAug 21
to Archivesspac...@lyrasislists.org
For what its worth, the need for a timeout increase is really noticeable when running in a kubernetes environment - especially when the Solr pod is on a different node from the app pod. Locally, running bare metal, I didn't need to change the timeout.

AS: 4.0.0 (heavily customized, including indexer customizations)
Solr: 9.8.1
DB: mariadb 10.11.11

Our indexer thread and records per thread config settings are at the defaults at the moment.

I think we have about 6GB assigned to each deployment.

jds

From: Kidd, Mary <mary...@yale.edu>
Sent: Thursday, August 21, 2025 4:28 PM
To: Joshua D. Shaw <Joshua...@dartmouth.edu>; Archivesspac...@lyrasislists.org <Archivesspac...@lyrasislists.org>

Subject: Re: How to curb massive resource re-indexing?
 
You don't often get email from mary...@yale.edu. Learn why this is important
Reply all
Reply to author
Forward
0 new messages