Opencast 11.8 - Services do not load and database takes 100% CPU

68 views
Skip to first unread message

Andreas Lindner

unread,
Oct 6, 2022, 5:37:21 AM10/6/22
to Opencast Users

Hi everyone,

 

I´m having a problem, and I need some help.

After changing the data of a lot of videos, our Opencast Services do not load.

Even if I try to load them, I get a wheel of death and shortly after the admin ui gets a 502 (proxy error).

 

Here is what I did:

So, I had to work with about 7000 Videos that needed their ACL changed.

As this would take way too long in the admin ui, I wrote a script to do this job for me.

With this script I first called the “GET admin/events” Endpoint, to get all stored video – data on the server (14500 videos / 1000 videos at a time).

After this I processed the video data to filter for the 7000 Videos, I needed to change.

With this video – list I called /api/events/<video-ID>/acl to send the new ACL to the mediapackage.

Up to this point, everything worked fine. Every ACL was updated, and after a “republish metadata” Workflow that, in addition added a series to the videos, everything was working as intended (on one video).

 

My next step was to use the republish metadata workflow on the 7000 videos, 100 videos at a time.

And here the problems started to emerge.

 

The videos went into “running” but did not get processed.

In addition to this, other, new uploaded videos did not get processed either.

A video that was just about to be published got this error:


2022-10-05T10:16:26,783 | ERROR | (WorkflowOperationWorker:143) - Workflow operation 'operation:'publish-engage', position:1, state:'FAILED'' failed

org.opencastproject.workflow.api.WorkflowOperationException: Mediapackage ef6c0b3a-0a34-43bb-a7f9-39b8a020af36 could not be published

        at org.opencastproject.workflow.handler.distribution.PublishEngageWorkflowOperationHandler.start(PublishEngageWorkflowOperationHandler.java:418) ~[?:?]

       

 

And a just uploaded video got this error:


2022-10-05T10:19:24,719 | ERROR | (WorkflowOperationWorker:143) - Workflow operation 'operation:'inspect', position:1, state:'FAILED'' failed

org.opencastproject.workflow.api.WorkflowOperationException: Track https://cast-admin.th-deg.de/files/mediapackage/25fed4dc-4d23-461c-a3cb-327856c86096/8e02054c-42d9-479c-96ee-9673655f4b72/c201_2022_10_05_10_06_42_mat.mp4 could not be inspected

       

A server restart did not solve this problem.

Videos could not be processed and the workflow republish metadata did not start.

 

So, after this I restarted the servers and all services, made sure elasticsearch worked and finally, existing videos could be viewed again.

But new videos could still not be created (the workflow would not finish).

In the worker – log I get warnings for all services:

(this is just 2 of them, there are more ofc)

2022-10-06T08:52:38,519 | WARN  | (ServiceRegistryJpaImpl$JobProducerHeartbeat:3364) - Marking org.opencastproject.ingest@https://cast-admin.th-deg.de as offline

2022-10-06T08:53:38,601 | WARN  | (ServiceRegistryJpaImpl$JobProducerHeartbeat:3348) - Service org.opencastproject.workflow@https://cast-admin.th-deg.de is not working as expected: HTTP/1.1 502 Proxy Error

 

But after restarting everything I could use republish metadata on all 7000 videos, and they are done now.

Still, no services can be found:

2022-10-06T11:17:21,852 | WARN  | (RemoteBase:179) - No services of type 'org.opencastproject.inspection' found, waiting...

 

 

Additional information:
We run MariaDB and yesterday our DB went from about 51GB to about 54GB.

Our table Sizes are (everything else is < 100kb):

-rw-rw---- 1 mysql mysql 7.0M Oct  6 11:17 oc_assets_version_claim.ibd

-rw-rw---- 1 mysql mysql 8.0M Oct  5 16:47 oc_bundleinfo.ibd

-rw-rw---- 1 mysql mysql  11M Oct  6 11:22 oc_user_ref.ibd

-rw-rw---- 1 mysql mysql  12M Oct  6 11:22 oc_user_ref_role.ibd

-rw-rw---- 1 mysql mysql  12M Oct  6 11:12 oc_role.ibd

-rw-rw---- 1 mysql mysql  80M Sep  7 14:25 oc_user_session.ibd

-rw-rw---- 1 mysql mysql 120M Oct  6 11:07 oc_assets_properties.ibd

-rw-rw---- 1 mysql mysql 184M Oct  6 11:17 oc_assets_asset.ibd

-rw-rw---- 1 mysql mysql 388M Oct  6 11:11 oc_search.ibd

-rw-rw---- 1 mysql mysql 568M Oct  5 10:51 oc_oaipmh.ibd

-rw-rw---- 1 mysql mysql 1.3G Oct  5 10:51 oc_oaipmh_elements.ibd

-rw-rw---- 1 mysql mysql 1.8G Oct  6 11:17 oc_assets_snapshot.ibd

-rw-rw---- 1 mysql mysql 5.0G Oct  6 11:20 oc_job.ibd

-rw-rw---- 1 mysql mysql  13G Sep  7 14:25 oc_user_action.ibd

 

Calling the Opencast services takes about 30 minutes and returns a 502 (proxy error)

 

 

Does anyone have an Idea how to solve this?

 

Best regards,

Andreas | TH Deggendorf

Greg Logan

unread,
Oct 6, 2022, 10:57:05 PM10/6/22
to us...@opencast.org
Hi Andreas,

My immediate thoughts are that this is likely to be very much database related.  As the folks on Matrix said, there are some queries which have been improved since 11.8 which might speed things up.  The real telling point will be what happens if you restart all your nodes and ensure that no one talks to the admin ui - that's where lots of the really expensive queries originate from.  If things work as expected then it's likely to be related to one or more of the admin views - several of which have gotten better in 12.x.  If, however, things continue like they are right now then it's related to the service registry itself - in which case *please* get the expensive queries out of MariaDB so we can see what's causing the bottleneck.

The other thing that immediately pops out to me is that the oc_user_action table is quite large.  While this is unrelated to the issues you're facing, you can safely delete it all, or even part of it, as long as you're not using it for viewing metrics.

G

--
To unsubscribe from this group and stop receiving emails from it, send an email to users+un...@opencast.org.

Andreas Lindner

unread,
Oct 7, 2022, 3:08:42 AM10/7/22
to Opencast Users, Greg Logan
We just checked the database, these are the running queries after the database gets stuck, about 10 minutes in:


----
MariaDB [(none)]> show full processlist;
+----+----------+-----------------------------+----------+---------+------+--------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| Id | User     | Host                        | db       | Command | Time | State        | Info                                                                                                                                                                                                                                                                                                | Progress |
+----+----------+-----------------------------+----------+---------+------+--------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
|  4 | opencast | localhost:57614             | opencast | Query   |  737 | Sending data | SELECT t0.id, t1.status, COUNT(t1.status), AVG(t1.queue_time), AVG(t1.run_time) FROM oc_service_registration t0, oc_job t1 WHERE (((t1.date_created >= '2022-09-23 08:24:10.962') AND (t1.date_created <= '2022-10-08 08:24:10.962')) AND (t0.id = t1.processor_service)) GROUP BY t0.id, t1.status |    0.000 |
|  3 | opencast | localhost:57612             | opencast | Query   |  743 | Sending data | SELECT t0.id, t1.status, COUNT(t1.status), AVG(t1.queue_time), AVG(t1.run_time) FROM oc_service_registration t0, oc_job t1 WHERE (((t1.date_created >= '2022-09-23 08:24:05.017') AND (t1.date_created <= '2022-10-08 08:24:05.017')) AND (t0.id = t1.processor_service)) GROUP BY t0.id, t1.status |    0.000 |
|  6 | opencast | localhost:57622             | opencast | Query   |  815 | Sending data | SELECT t0.id, t1.status, COUNT(t1.status), AVG(t1.queue_time), AVG(t1.run_time) FROM oc_service_registration t0, oc_job t1 WHERE (((t1.date_created >= '2022-09-23 08:22:53.566') AND (t1.date_created <= '2022-10-08 08:22:53.566')) AND (t0.id = t1.processor_service)) GROUP BY t0.id, t1.status |    0.000 |
|  8 | opencast | localhost:57626             | opencast | Query   |  840 | Sending data | SELECT t0.id, t1.status, COUNT(t1.status), AVG(t1.queue_time), AVG(t1.run_time) FROM oc_service_registration t0, oc_job t1 WHERE (((t1.date_created >= '2022-09-23 08:22:28.413') AND (t1.date_created <= '2022-10-08 08:22:28.413')) AND (t0.id = t1.processor_service)) GROUP BY t0.id, t1.status |    0.000 |
|  7 | opencast | localhost:57624             | opencast | Query   |  695 | Sending data | SELECT t0.id, t1.status, COUNT(t1.status), AVG(t1.queue_time), AVG(t1.run_time) FROM oc_service_registration t0, oc_job t1 WHERE (((t1.date_created >= '2022-09-23 08:24:53.785') AND (t1.date_created <= '2022-10-08 08:24:53.785')) AND (t0.id = t1.processor_service)) GROUP BY t0.id, t1.status |    0.000 |
| 10 | opencast | localhost:57630             | opencast | Query   |  720 | Sending data | SELECT t0.id, t1.status, COUNT(t1.status), AVG(t1.queue_time), AVG(t1.run_time) FROM oc_service_registration t0, oc_job t1 WHERE (((t1.date_created >= '2022-09-23 08:24:28.68') AND (t1.date_created <= '2022-10-08 08:24:28.68')) AND (t0.id = t1.processor_service)) GROUP BY t0.id, t1.status   |    0.000 |
|  9 | opencast | localhost:57628             | opencast | Query   |  798 | Sending data | SELECT t0.id, t1.status, COUNT(t1.status), AVG(t1.queue_time), AVG(t1.run_time) FROM oc_service_registration t0, oc_job t1 WHERE (((t1.date_created >= '2022-09-23 08:23:10.756') AND (t1.date_created <= '2022-10-08 08:23:10.756')) AND (t0.id = t1.processor_service)) GROUP BY t0.id, t1.status |    0.000 |
| 11 | opencast | localhost:57632             | opencast | Query   |  803 | Sending data | SELECT t0.id, t1.status, COUNT(t1.status), AVG(t1.queue_time), AVG(t1.run_time) FROM oc_service_registration t0, oc_job t1 WHERE (((t1.date_created >= '2022-09-23 08:23:04.908') AND (t1.date_created <= '2022-10-08 08:23:04.908')) AND (t0.id = t1.processor_service)) GROUP BY t0.id, t1.status |    0.000 |
| 18 | opencast | localhost:58472             | opencast | Query   |  731 | Sending data | SELECT t0.id, t1.status, COUNT(t1.status), AVG(t1.queue_time), AVG(t1.run_time) FROM oc_service_registration t0, oc_job t1 WHERE (((t1.date_created >= '2022-09-23 08:24:17.01') AND (t1.date_created <= '2022-10-08 08:24:17.01')) AND (t0.id = t1.processor_service)) GROUP BY t0.id, t1.status   |    0.000 |
| 20 | opencast | localhost:58474             | opencast | Query   |  755 | Sending data | SELECT t0.id, t1.status, COUNT(t1.status), AVG(t1.queue_time), AVG(t1.run_time) FROM oc_service_registration t0, oc_job t1 WHERE (((t1.date_created >= '2022-09-23 08:23:53.673') AND (t1.date_created <= '2022-10-08 08:23:53.673')) AND (t0.id = t1.processor_service)) GROUP BY t0.id, t1.status |    0.000 |
| 19 | opencast | localhost:58475             | opencast | Query   |  809 | Sending data | SELECT t0.id, t1.status, COUNT(t1.status), AVG(t1.queue_time), AVG(t1.run_time) FROM oc_service_registration t0, oc_job t1 WHERE (((t1.date_created >= '2022-09-23 08:22:58.904') AND (t1.date_created <= '2022-10-08 08:22:58.904')) AND (t0.id = t1.processor_service)) GROUP BY t0.id, t1.status |    0.000 |
| 21 | opencast | localhost:58514             | opencast | Query   |  791 | Sending data | SELECT t0.id, t1.status, COUNT(t1.status), AVG(t1.queue_time), AVG(t1.run_time) FROM oc_service_registration t0, oc_job t1 WHERE (((t1.date_created >= '2022-09-23 08:23:16.904') AND (t1.date_created <= '2022-10-08 08:23:16.904')) AND (t0.id = t1.processor_service)) GROUP BY t0.id, t1.status |    0.000 |
| 22 | opencast | localhost:58516             | opencast | Query   |  780 | Sending data | SELECT t0.id, t1.status, COUNT(t1.status), AVG(t1.queue_time), AVG(t1.run_time) FROM oc_service_registration t0, oc_job t1 WHERE (((t1.date_created >= '2022-09-23 08:23:28.58') AND (t1.date_created <= '2022-10-08 08:23:28.58')) AND (t0.id = t1.processor_service)) GROUP BY t0.id, t1.status   |    0.000 |
| 23 | opencast | localhost:58518             | opencast | Query   |  749 | Sending data | SELECT t0.id, t1.status, COUNT(t1.status), AVG(t1.queue_time), AVG(t1.run_time) FROM oc_service_registration t0, oc_job t1 WHERE (((t1.date_created >= '2022-09-23 08:23:59.022') AND (t1.date_created <= '2022-10-08 08:23:59.022')) AND (t0.id = t1.processor_service)) GROUP BY t0.id, t1.status |    0.000 |
| 51 | opencast | <presentation-adress>:51114  | opencast | Sleep   |    0 |              | NULL                                                                                                                                                                                                                                                                                                |    0.000 |
| 68 | opencast | <worker-adress>:43730  | opencast | Sleep   |    0 |              | NULL                                                                                                                                                                                                                                                                                                |    0.000 |
| 87 | opencast | <worker-adress>:54730 | opencast | Sleep   |  653 |              | NULL                                                                                                                                                                                                                                                                                                |    0.000 |
| 88 | opencast | <worker-adress>:54734 | opencast | Sleep   |  653 |              | NULL                                                                                                                                                                                                                                                                                                |    0.000 |
| 89 | opencast | <worker-adress>:54732 | opencast | Sleep   |  653 |              | NULL                                                                                                                                                                                                                                                                                                |    0.000 |
| 90 | root     | localhost                   | NULL     | Query   |    0 | starting     | show full processlist                                                                                                                                                                                                                                                                               |    0.000 |
+----+----------+-----------------------------+----------+---------+------+--------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
20 rows in set (0.002 sec)

MariaDB [(none)]>


I hope this helps, we reset our system to a working snapshot, right now it works.

Take care,

Andreas | TH Deggendorf

Greg Logan

unread,
Oct 11, 2022, 10:59:10 PM10/11/22
to Opencast Users
Those look like the queries generated by one of the jobs or services views in the admin UI which should, at least in theory, resolve once the UIs are all closed.  That's not the solution, but if this happens again, and it *continues* to happen once all the UIs are closed then we know there's a deeper bug - otherwise this could be a combination of starting a ton of workflows combined with a poorly optimized view query.

Glad it's working now at least!

G
Reply all
Reply to author
Forward
0 new messages