DSpace 7.5 intermittent errors with patch operations

83 views
Skip to first unread message

Agustina Martinez

unread,
Mar 17, 2023, 12:46:47 PM3/17/23
to DSpace Technical Support
Hi all,

We are experiencing intermittent errors, mostly affecting PATCH operations (e.g. modifying items metadata, editing a user or group), when there is heavy load in the backend. 

We do not see any relevant errors in the backend logs at all, but after some timeout takes place I consistently see these two errors in the developer tools:

Access to XMLHttpRequest at 'OUR-BACKEND-URL/server/api/core/items/421578b1-a644-4016-b3a0-c0afe904b0e8' from origin 'https://dspace7-f-dev.lib.cam.ac.uk' has been blocked by CORS policy: No 'Access-Control-Allow-Origin' header is present on the requested resource.
polyfills.1cded4d53dfa4781.js:1          PATCH OUR-BACKEND-URL/server/api/core/items/421578b1-a644-4016-b3a0-c0afe904b0e8 net::ERR_FAILED 500 (Internal Server Error)

It is definitely not CORS related and that error is mis-leading.

Is anybody experiencing similar issues? So far I have not been able to reproduce in the DEMO DSpace site or in my local environment.

Best,
Agustina
Screenshot 2023-03-17 at 16.44.47.png

Tim Donohue

unread,
Mar 20, 2023, 1:55:12 PM3/20/23
to DSpace Technical Support
Hi Agustina,

If you are sure it's not a CORS policy error, then I'd recommend checking the backend logs.  That "500 (Internal Server Error)" might also be logged in dspace.log or tomcat logs. If so, there may be more information there to help out.   If it's reproducible semi-easily, you could also turn on "Debug mode for the REST API" per our troubleshooting guide: https://wiki.lyrasis.org/display/DSPACE/Troubleshoot+an+error#Troubleshootanerror-DSpace7.x(orabove)   (I wouldn't recommend leaving that on for a long time though, as allows full error stacktraces to be seen in every REST API response, which is really useful to debug issues...but also can be useful to hackers trying to find vulnerabilities in your system.)

Tim

Agustina Martinez

unread,
Mar 21, 2023, 11:55:55 AM3/21/23
to DSpace Technical Support
Hi Tim, 

Many thanks for the hints. Yes, I can confirm now that the issue was completely unrelated to CORS. Unfortunately there were no errors at all in any of the backend logs regarding this other than a request timeout.

I have managed to track down the issue and it was quite a worrying one, which I am happy to share in case others experience similar issues and to also gather some advice from you in terms of appropriate fixes:
  • The root of the issue was a locked DB transaction originating in a time consuming curation task that we were running over a large repository collection. This curation task iterates over collection items and modifies their metadata in certain conditions are matched. The curation task is run via the command-line "curate -t" command
  • We have observed that until the curation task completes successfully all of the potentially affected items cannot be edited at all via the user interface and the error I reported above is the one that occurs. I.e. patch request from UI but request timeouts as there is the DB transaction locked. There is no useful message being reported at the DSpace level.
  • The transaction lock took place when the event dispatcher queue kicks off after the curation task has finished iterating over the collection and before committing changes to the DB. 
  • The event consumer that was causing for the DB lock was the DOIConsumer, and I think that the root of the issue is the "ctx.commit" call in https://github.com/DSpace/DSpace/blob/b956bcd3891e372cc0fe5b1595e422e5d059bced/dspace-api/src/main/java/org/dspace/identifier/doi/DOIConsumer.java#L144.
Replacing that commit call with a "ctx.uncacheEntity(dso)" call seems to have resolved the DB transaction lock as our curation task now finishes successfully and all the events, including discovery indexing take place as expected, but wanted to check whether this is the appropriate way of doing it. It makes sense to not do a hard commit in the DOIConsumer in case the transaction does not finish successfully to be able to roll back but I would like advice here??

Tim, should I open a ticket and try to replicate this with a DSpace standard curation task running over a large collection? Your advice would be much appreciated.

Best,
Agustina

Tim Donohue

unread,
Mar 21, 2023, 2:22:49 PM3/21/23
to Agustina Martinez, DSpace Technical Support
Hi Agustina,

Yes, if you are able to narrow down the issue to code within DSpace, please do submit a bug ticket (and possibly also a PR if you've found a fix).  I'm not familiar enough with all the DOI code in DSpace to comment further, but generally, it's best to report bugs back if you can narrow down the issue to DSpace code itself.

Thanks & glad to hear you've started to figure out what is going on.

Tim

From: dspac...@googlegroups.com <dspac...@googlegroups.com> on behalf of Agustina Martinez <amgc...@gmail.com>
Sent: Tuesday, March 21, 2023 10:55 AM
To: DSpace Technical Support <dspac...@googlegroups.com>
Subject: [dspace-tech] Re: DSpace 7.5 intermittent errors with patch operations
 
--
All messages to this mailing list should adhere to the Code of Conduct: https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx
---
You received this message because you are subscribed to a topic in the Google Groups "DSpace Technical Support" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/dspace-tech/0wI3czNRwP8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to dspace-tech...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dspace-tech/5879ae73-6286-48ce-9c9b-c5b09f3e238fn%40googlegroups.com.

Mark H. Wood

unread,
Mar 22, 2023, 11:41:29 AM3/22/23
to dspac...@googlegroups.com
Interesting. It sounds to me like there are two problems here, with a
common source:

o a (potentially) gigantic transaction is built up over a sequence of
operations which should be unrelated;

o a (potentially) gigantic queue of events which should be unrelated
is built up within the curation run's Context.

It sounds to me as though none of the tasks in the run is committing
its work. The default curation scope is OPEN, so Curator is not
committing. The event handler in question commits its own work, and
in so doing, the first time it is called by the dispatcher,
inadvertently commits all of the *other* curation work that should
have been committed already.

I can find no documentation for this, but it appears that event
handlers should not commit work. The design of Context seems to
depend on this. If true, we should document this requirement
thoroughly.

++++++++++++++++++

It seems to me that a task designed to be run in OPEN scope must
commit its own work. There is, however, a problem with this: Context
contains only one Session. A bulk operation needs to keep a
transaction open while it consumes the sequence of entities on which
it is to operate, and also to be able to commit work (closing a
transaction) whenever it modifies one of those entities. I suppose
that one could work around this by draining the sequence into a List
and then reloading them one at a time for processing, but how
inefficient that is!

--
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu
signature.asc

Agustina Martinez

unread,
Mar 23, 2023, 4:24:16 AM3/23/23
to DSpace Technical Support
Hi Mark,

Thanks a lot for chipping in. All that you have described is exactly what is happening. To give some more details:
  • There is the single big transaction being built up, and it does not save until all the items in the collection have been processed.
  • My task implementation is annotated with the "@Distributive" annotation as I need for it to only report its status one it has iterated over all of its items (the task implements a custom report method that generates a csv file with items results and emails that to administrators) - I couldn't come up with a better way of doing it. Not having that annotation causes for the individual tasks to report their status, and hence send an email each.
  • In testing when / how commits happened, I was trying to get the individual item task to commit its changes, but as you report, there is the single context and after the first task commit I was getting a SQLException concerning the resultset being closed.
For the time being, until I come up with a better approach to this bulk operation, the small change to the DOIConsumer (replace commit with uncacheEntity) has helped with the lock issue, and the long task completes successfully and all changes are committed to the DB as expected. Do you see any issue with this change?

Is there any way of solving the issue with the single context? Curation tasks are a good framework for allowing bulk operations, but they should allow for changes to be committed frequently in cases where there is no dependency between single tasks being run. Although I can see this being tricky.

I did try the route of creating a Script using the DSpaceRunnable classes which would use an iterator, but I was still facing the issue with the transaction lock, and this script was much more innefficient in terms of running times.

Best,
Agustina

Mark H. Wood

unread,
Mar 23, 2023, 10:52:07 AM3/23/23
to dspac...@googlegroups.com
On Thu, Mar 23, 2023 at 01:24:16AM -0700, Agustina Martinez wrote:
> Thanks a lot for chipping in. All that you have described is exactly what
> is happening. To give some more details:
>
> - There is the single big transaction being built up, and it does not
> save until all the items in the collection have been processed.
> - My task implementation is annotated with the "@Distributive"
> annotation as I need for it to only report its status one it has iterated
> over all of its items (the task implements a custom report method that
> generates a csv file with items results and emails that to administrators)
> - I couldn't come up with a better way of doing it. Not having that
> annotation causes for the individual tasks to report their status, and
> hence send an email each.
> - In testing when / how commits happened, I was trying to get the
> individual item task to commit its changes, but as you report, there is the
> single context and after the first task commit I was getting a SQLException
> concerning the resultset being closed.
>
> For the time being, until I come up with a better approach to this bulk
> operation, the small change to the DOIConsumer (replace commit with
> uncacheEntity) has helped with the lock issue, and the long task completes
> successfully and all changes are committed to the DB as expected. Do you
> see any issue with this change?
>
> Is there any way of solving the issue with the single context? Curation
> tasks are a good framework for allowing bulk operations, but they should
> allow for changes to be committed frequently in cases where there is no
> dependency between single tasks being run. Although I can see this being
> tricky.
>
> I did try the route of creating a Script using the DSpaceRunnable classes
> which would use an iterator, but I was still facing the issue with the
> transaction lock, and this script was much more innefficient in terms of
> running times.

I made a quick and dirty tool to fix a metadata problem that we had,
which manages to get two Sessions and use them as I've suggested. I
had to ignore the relevant DAO and build my own Query, so that I could
use the second Session. Curator would have to be similarly hacked to
do it this way. I would rather see a proper solution, but for
illustration, my tool looks like this:

diff --git a/dspace/modules/additions/src/main/java/edu/iupui/ulib/dspace/util/FixThumbnailPolicies.java b/dspace/modules/additions/src/main/java/edu/iupui/ulib/dspace/util/FixThumbnailPolicies.java
new file mode 100644
index 0000000000..6e7b475238
--- /dev/null
+++ b/dspace/modules/additions/src/main/java/edu/iupui/ulib/dspace/util/FixThumbnailPolicies.java
@@ -0,0 +1,130 @@
+/*
+ * Copyright 2023 Indiana University.
+ */
+package edu.iupui.ulib.dspace.util;
+
+import java.sql.SQLException;
+import java.util.List;
+import java.util.function.Consumer;
+import java.util.stream.Stream;
+import javax.persistence.criteria.CriteriaBuilder;
+import javax.persistence.criteria.CriteriaQuery;
+import javax.persistence.criteria.Root;
+
+import org.dspace.authorize.AuthorizeException;
+import org.dspace.authorize.ResourcePolicy;
+import org.dspace.authorize.factory.AuthorizeServiceFactory;
+import org.dspace.authorize.service.ResourcePolicyService;
+import org.dspace.content.Bitstream;
+import org.dspace.content.Bundle;
+import org.dspace.content.Item;
+import org.dspace.core.Context;
+import org.dspace.core.ContextHelper;
+import org.hibernate.Session;
+import org.hibernate.query.Query;
+
+/**
+ * Find and fix thumbnail bitstreams having no policies. Copy the Bundle's
+ * policies, or the Item's policies if none.
+ *
+ * @author mwood
+ */
+public class FixThumbnailPolicies {
+ static ResourcePolicyService resourcePolicyService;
+ static Context context;
+
+ private FixThumbnailPolicies() {}
+
+ /**
+ * Check all Items' thumbnail bitstreams for missing policies.
+ *
+ * @param argv unused.
+ */
+ static public void main(String[] argv) {
+ resourcePolicyService = AuthorizeServiceFactory.getInstance()
+ .getResourcePolicyService();
+ context = new Context();
+
+ try (Session driverSession = ContextHelper.getReadOnlySession(context)) {
+ CriteriaBuilder qBuilder = driverSession.getCriteriaBuilder();
+ CriteriaQuery criteria = qBuilder.createQuery(Item.class);
+ Root<Item> root = criteria.from(Item.class);
+ criteria.select(root);
+ Query<Item> itemQuery = driverSession.createQuery(criteria);
+ try (Stream<Item> items = itemQuery.getResultStream()) {
+ items.forEachOrdered(new ItemConsumer());
+ }
+ } catch (SQLException e) {
+ // TODO do something
+ }
+ }
+
+ /**
+ * Update the policies of an Item's THUMBNAIL Bitstreams as needed.
+ */
+ static private class ItemConsumer
+ implements Consumer<Item> {
+ /**
+ * Add parent policies to THUMBNAIL Bitstreams if missing.
+ *
+ * @param item Update bitstream policies for this Item.
+ */
+ @Override
+ public void accept(Item item) {
+ for (Bundle bundle : item.getBundles("THUMBNAIL")) {
+ for (Bitstream bitstream : bundle.getBitstreams()) {
+ List<ResourcePolicy> bitstreamPolicies
+ = bitstream.getResourcePolicies();
+ if (bitstreamPolicies.isEmpty()) {
+ List<ResourcePolicy> bundlePolicies
+ = bundle.getResourcePolicies();
+ if (bundlePolicies.isEmpty()) {
+ List<ResourcePolicy> itemPolicies
+ = item.getResourcePolicies();
+ if (itemPolicies.isEmpty()) {
+ // do nothing
+ } else {
+ copyPolicies(itemPolicies, bitstream);
+ }
+ } else {
+ copyPolicies(bundlePolicies, bitstream);
+ }
+ }
+ }
+ }
+ }
+
+ /**
+ * Copy a list of policies to a Bitstream. Only copy custom and
+ * inherited policies.
+ *
+ * @param parentPolicies the policies to be copied.
+ * @param bitstream the Bitstream to receive new policies.
+ */
+ private static void copyPolicies(List<ResourcePolicy> parentPolicies,
+ Bitstream bitstream) {
+ for (ResourcePolicy parentPolicy : parentPolicies) {
+ try {
+ if (ResourcePolicy.TYPE_CUSTOM.equals(parentPolicy.getRpType())
+ || ResourcePolicy.TYPE_INHERITED.equals(parentPolicy.getRpType())) {
+ ResourcePolicy newPolicy = resourcePolicyService.clone(
+ context, parentPolicy);
+ newPolicy.setdSpaceObject(bitstream);
+ newPolicy.setRpType(ResourcePolicy.TYPE_INHERITED);
+ resourcePolicyService.update(context, newPolicy);
+ }
+ } catch (SQLException | AuthorizeException e) {
+ System.err.format("Failed to copy policy %d to Bitstream %s: %s%n",
+ parentPolicy.getID(), bitstream.getID().toString(),
+ e.getMessage());
+ }
+ }
+ try {
+ context.commit();
+ } catch (SQLException e) {
+ System.err.format("Failed to commit updates to Bitstream %s: %s%n",
+ bitstream.getID().toString(), e.getMessage());
+ }
+ }
+ }
+}
diff --git a/dspace/modules/additions/src/main/java/org/dspace/core/ContextHelper.java b/dspace/modules/additions/src/main/java/org/dspace/core/ContextHelper.java
new file mode 100644
index 0000000000..fde3df16a0
--- /dev/null
+++ b/dspace/modules/additions/src/main/java/org/dspace/core/ContextHelper.java
@@ -0,0 +1,38 @@
+/*
+ * Copyright 2023 Indiana University.
+ */
+package org.dspace.core;
+
+import java.sql.SQLException;
+import javax.persistence.FlushModeType;
+
+import org.hibernate.Session;
+
+/**
+ * Cheat methods to expose package-private DSpace {@link Context} internals
+ * without hacking Context.
+ *
+ * @author mwood
+ */
+public class ContextHelper {
+ /**
+ * Get an additional JPA {@link Session} from a DSpace {@link Context},
+ * set default-read-only.
+ * Be sure to {@code close()} the Session when finished -- Context does not
+ * track it.
+ *
+ * @param context the DSpace session to provide database context.
+ * @return a new read-only JPA {@link Session}.
+ * @throws SQLException passed through.
+ */
+ static public Session getReadOnlySession(Context context)
+ throws SQLException {
+ DBConnection<Session> connection = context.getDBConnection();
+ Session newSession = connection.getSession()
+ .getSessionFactory()
+ .openSession();
+ newSession.setDefaultReadOnly(true);
+ newSession.setFlushMode(FlushModeType.COMMIT);
+ return newSession;
+ }
+}

In hope of seeing a proper fix for the transaction lifetime conflict,
I've filed an Issue for consideration relative to DSpace 8:

https://github.com/DSpace/DSpace/issues/8731
signature.asc
Reply all
Reply to author
Forward
0 new messages