How to Fix Indexing Problems

17 views
Skip to first unread message

Sherry Lake

unread,
Jul 29, 2022, 11:54:06 AM7/29/22
to Dataverse Users Community
I think this is an indexing problem, but need someone to help me debug it and then fix it.


is totally open/public, but is not found when searched nor does it appear in its sub-collection.

I assume indexing problem? But how do I fix?

Thanks.
Sherry

P.S. I sent this question/problem to support@dataverse (actually I sent it via the "support" link on the Harvard dataverse site... are those two different? Do they go to the same folks?)


José Carvalho

unread,
Jul 29, 2022, 12:47:15 PM7/29/22
to dataverse...@googlegroups.com
Hi Sherry,

Have You tried to execute the procedures depicted  in https://guides.dataverse.org/en/latest/admin/solr-search-index.html?
I have recently stumbled into a similar issue and executing the Reindex In Place solved the issue on my case.

Regards,

José



--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/e3503eb2-77f1-4b29-95e6-847ce6a34ff7n%40googlegroups.com.

Julian Gautier

unread,
Jul 29, 2022, 3:19:44 PM7/29/22
to Dataverse Users Community
Emails sent to sup...@dataverse.edu and messages sent using the "support" link on the Harvard Dataverse site (and emails sent to sup...@dataverse.harvard.edu) do go to the same place and are seen by the same folks :)

Sherry Lake

unread,
Jul 29, 2022, 3:26:37 PM7/29/22
to dataverse...@googlegroups.com
Hello,

Update on indexing problems. Seems the indexing "errors" were encountered when our Dataverse software was upgraded (from V5.9 - to V5.10, V5.10.1, V5.11 - somewhere along the way). There were 5 datasets that did not get indexed. Here's the error message, for one:

[2022-07-20T11:15:07.375-0400] [Payara 5.2021.10] [INFO] [] [edu.harvard.iq.dataverse.search.IndexBatchServiceBean] [tid: _ThreadID=224 _ThreadName=__ejb-thread-pool13] [timeMillis: 1658330107375] [levelValue: 800] [[
 indexing dataset 383 of 386 (id=3662)]]

[2022-07-20T11:15:07.376-0400] [Payara 5.2021.10] [INFO] [] [edu.harvard.iq.dataverse.search.IndexBatchServiceBean] [tid: _ThreadID=224 _ThreadName=__ejb-thread-pool13] [timeMillis: 1658330107376] [levelValue: 800] [[
 FAILURE indexing dataset 383 of 386 (id=3662) Exception info: Attempt to invoke when container is in Undeployed]]
Solution was to individually index each dataset. Also wanted to note that each of the non-indexed datasets had over 1,000 files - Problem???
Julian thanks for confirming where the support emails go.
--
Sherry


James Myers

unread,
Jul 30, 2022, 8:35:57 AM7/30/22
to dataverse...@googlegroups.com

I haven’t been following this but – indexing is done asynchronously and I believe recent versions put the largest datasets at the end of the list. I could imagine that if you stop/start the server while indexing is running there could be a case where some of the jobs try to run when the app isn’t ready. That would be a bug. That would mean the only reason that it is the large datasets having problems is that those are now at the end of the list (so more datasets appear more quickly during reindexing). The workaround would be to let indexing finish (I think progress is visible in the log) before restarting/upgrading. The reindex in place probably should be a way to recover – I haven’t tested but if you ran a full reindex (which removes the lastIndexed times) and some failed, the reindex in place would presumably find the ones that didn’t get done.

 

- Jim

Reply all
Reply to author
Forward
0 new messages