Comparison method violates its general contract

295 views
Skip to first unread message

Philipp Offensand

unread,
Jun 16, 2021, 6:00:49 AM6/16/21
to Opencast Users
Hi All,

we are sometimes getting a -seemingly random- error of the likes below, resulting in eg broken ingests and generally failing workflow instances. Searching for the general error, I found this to be related to Java, with workarounds only sometimes working. One such possible workaround I found would be to set -Djava.util.Arrays.useLegacyMergeSort=true as start parameter for Java. Did anyone stumble across the same problem and has a solution to this by chance? We are using Opencast 9.5 and Java openjdk version "1.8.0_292".

org.opencastproject.workflow.api.WorkflowOperationException: java.lang.IllegalArgumentException: Comparison method violates its general contract! at org.opencastproject.workflow.impl.WorkflowOperationWorker.start(WorkflowOperationWorker.java:206) at org.opencastproject.workflow.impl.WorkflowOperationWorker.execute(WorkflowOperationWorker.java:116) at org.opencastproject.workflow.impl.WorkflowServiceImpl.runWorkflowOperation(WorkflowServiceImpl.java:801) at org.opencastproject.workflow.impl.WorkflowServiceImpl.process(WorkflowServiceImpl.java:1833) at org.opencastproject.workflow.impl.WorkflowServiceImpl$JobRunner.call(WorkflowServiceImpl.java:2283) at org.opencastproject.workflow.impl.WorkflowServiceImpl$JobRunner.call(WorkflowServiceImpl.java:2249) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.IllegalArgumentException: Comparison method violates its general contract! at java.util.TimSort.mergeLo(TimSort.java:777) at java.util.TimSort.mergeAt(TimSort.java:514) at java.util.TimSort.mergeForceCollapse(TimSort.java:457) at java.util.TimSort.sort(TimSort.java:254) at java.util.Arrays.sort(Arrays.java:1512) at java.util.ArrayList.sort(ArrayList.java:1464) at java.util.Collections.sort(Collections.java:177) at org.opencastproject.serviceregistry.impl.ServiceRegistryJpaImpl.getServiceRegistrationsByLoad(ServiceRegistryJpaImpl.java:2805) at org.opencastproject.serviceregistry.impl.ServiceRegistryJpaImpl.getServiceRegistrationsByLoad(ServiceRegistryJpaImpl.java:2198) at org.opencastproject.serviceregistry.api.RemoteBase.getResponse(RemoteBase.java:176) at org.opencastproject.serviceregistry.api.RemoteBase.getResponse(RemoteBase.java:145) at org.opencastproject.ingestdownloadservice.remote.IngestDownloadServiceRemoteImpl.ingestDownload(IngestDownloadServiceRemoteImpl.java:92) at org.opencastproject.workflow.handler.ingest.IngestDownloadWorkflowOperationHandler.start(IngestDownloadWorkflowOperationHandler.java:86) at org.opencastproject.workflow.impl.WorkflowOperationWorker.start(WorkflowOperationWorker.java:193) ... 9 more

Kind Regards,
Philipp Offensand

Karen Dolan

unread,
Jun 16, 2021, 10:27:39 AM6/16/21
to us...@opencast.org
Hi Phillip,

I don’t know if this is the issue, but there is a potential sort problem in the comparator associated to the error from the stack trace you saw.

This is the compare used by the service registry JPA impl from line #2805

    @Override
    public int compare(ServiceRegistration serviceA, ServiceRegistration serviceB) {
      String hostA = serviceA.getHost();
      String hostB = serviceB.getHost();
      NodeLoad nodeA = loadByHost.get(hostA);
      NodeLoad nodeB = loadByHost.get(hostB);
      //If the load factors are about the same, sort based on maximum load
      if (Math.abs(nodeA.getLoadFactor() - nodeB.getLoadFactor()) <= 0.01) {
        //NOTE: The sort order below is *reversed* from what you'd expect
        //When we're comparing the load factors we want the node with the lowest factor to be first
        //When we're comparing the maximum load value, we want the node with the highest max to be first
        return Float.compare(nodeB.getMaxLoad(), nodeA.getMaxLoad());
      }
      return Float.compare(nodeA.getLoadFactor(), nodeB.getLoadFactor());

    }

Here is an example of how it would return a different sort order depending on input order of objects in the collection:
===================
Input order:

A(x:0.01, y:10)
B(x:0.02, y:5)  
C(x:0.03, y:0)

Sort logic:

A > B because |A.x > B.x| <= 0.01 and  A.y > B.y 
C > A because |C.x  - A.x| > 0.01

Resulting sort order:

B(x:0.02, y:5)
A(x:0.01, y:10)
C(x:0.03, y:0)

===============================
Input order:

C(x:0.03, y:0)
B(x:0.02, y:5)
A(x:0.01, y:10)

Sort logic:

B > C because |C.x - B.x| <=0.01 and B.y >  C.y
A > B because |A.x > B.x| <= 0.01 and  A.y > B.y 

Resulting sort order:

C(x:0.03, y:0)
B(x:0.02, y:5)
A(x:0.01, y:10)

- Karen


--
To unsubscribe from this group and stop receiving emails from it, send an email to users+un...@opencast.org.

Greg Logan

unread,
Jun 16, 2021, 12:08:32 PM6/16/21
to Opencast Users
HI all,

Karen is likely correct considering this *does* violate the transitive nature required by compare(). I'm surprised that you're the first we've seen complaining about this.  Has anyone *else* run into this?

The fix is trivial, relative to the complexity inside the SR - we need a two pass compare.  Philipp, can you file this and assign it to me?

Thanks,
G

Philipp Offensand

unread,
Jun 17, 2021, 11:03:06 AM6/17/21
to Opencast Users, Greg Logan
Hi,
I don't seem to be able to assign anyone, so I'll just link the issue here as well for
better visibility.
Maybe we were the only ones as of now to experience this directly, as we had a
rather... busy week for Opencast. Our old opencast-setup started to die due to
varying circumstances, so we set up a new one, where we migrated all 30k+ events
to from the old system within about a week (about 400 concurrent workflows at
most times). I'd imagine this kind of load to show issues that normally won't appear
as often.

Kind Regards,
Philipp Offensand
Reply all
Reply to author
Forward
0 new messages