Processing fails for scheduled events (unable to load metadata from catalog)

88 views
Skip to first unread message

Katrin S

unread,
Sep 5, 2023, 12:17:40 PM9/5/23
to Opencast Users
Hi all,

I am in the process of debugging a strange race condition with the processing of scheduled events from capture agents.

In most cases, workflows will fail in the "inspect" step because the file "dublincore.xml" cannot be found, for example:

```
17767 2023-09-05T17:17:24,137 | ERROR | (DublinCoreUtil:79) - Unable to load metadata from catalog 'https://XXX/files/mediapackage/c9348f0b-ffa0-46e2-9dec-2453827e8907/e8e89c83-534c-4d8f-90ea-6d605d1f9f54/dublincore.xml'
```
The reason seems to be that Opencast is merging the catalog/mediapackage from the capture agent with the one from the asset manager (legacy mediapackage). However, dublincore.xml is not made available at the file repo location of the (legacy) mediapackage from Opencast.

When I check the file system, I can see dublincore.xml at 

files/mediapackage/{ID FROM CA}/e8e89c83-534c-4d8f-90ea-6d605d1f9f54/dublincore.xml'

but *not* 

files/mediapackage/{LEGACY ID FROM OC}/e8e89c83-534c-4d8f-90ea-6d605d1f9f54/dublincore.xml'

More detailed log from admin node:
```
2023-09-05T17:17:24,088 | WARN  | (WorkspaceImpl:418) - The working file repository URI and paths don't match. Looking up https://XXX/files/mediapackage/c9348f0b-ffa0-46e2-9dec-2453827e8907/e8e89c83-534c-4d8f-90ea-6d605d1f9f54/dublincore.xml at /srv/opencast-data/opencast/files/mediapackage/c9348f0b-ffa0-46e2-9dec-2453827e8907/e8e89c83-534c-4d8f-90ea-6d605d1f9f54/dublincore.xml failed
17766 2023-09-05T17:17:24,091 | WARN  | (WorkspaceImpl:386) - The working file repository and workspace paths don't match. Looking up https://XXX/files/mediapackage/c9348f0b-ffa0-46e2-9dec-2453827e8907/e8e89c83-534c-4d8f-90ea-6d605d1f9f54/dublincore.xml at /srv/opencast-data/opencast/files/mediapackage/c9348f0b-ffa0-46e2-9dec-2453827e8907/e8e89c83-534c-4d8f-90ea-6d605d1f9f54/dublincore.xml failed
17767 2023-09-05T17:17:24,137 | ERROR | (DublinCoreUtil:79) - Unable to load metadata from catalog 'https://XXX/files/mediapackage/c9348f0b-ffa0-46e2-9dec-2453827e8907/e8e89c83-534c-4d8f-90ea-6d605d1f9f54/dublincore.xml'
```

Log from remote worker:
```
2023-09-05T17:17:21,237 | INFO  | (IngestDownloadServiceImpl:291) - Downloaded the external element https://XXX/assets/assets/c9348f0b-ffa0-46e2-9dec-2453827e8907/8354f909-ce2f-4603-aa20-e12c62c123e7/0/dublincore.xml
2023-09-05T17:17:21,256 | INFO  | (IngestDownloadServiceImpl:291) - Downloaded the external element https://XXX/assets/assets/c9348f0b-ffa0-46e2-9dec-2453827e8907/e8e89c83-534c-4d8f-90ea-6d605d1f9f54/0/dublincore.xml
2023-09-05T17:17:21,275 | INFO  | (IngestDownloadServiceImpl:291) - Downloaded the external element https://XXX/assets/assets/c9348f0b-ffa0-46e2-9dec-2453827e8907/security-policy-series/0/xacml.xml
2023-09-05T17:17:21,309 | INFO  | (IngestDownloadServiceImpl:291) - Downloaded the external element https://XXX/assets/assets/c9348f0b-ffa0-46e2-9dec-2453827e8907/security-policy-episode/0/xacml.xml
```

I also checked the file repo location, on all nodes, I have:
prop.org.opencastproject.admin.ui.url=https://OURADMIN
prop.org.opencastproject.file.repo.url=${prop.org.opencastproject.admin.ui.url}


Does someone have any insight on how this merging process works and how the files are made available? 
I suspect some kind of timing issue, because it sometimes works , but I don't fully understand the underlying process.

Any help/hints are much appreciated.

Katrin

Seeland, Per Pascal

unread,
Sep 6, 2023, 1:09:56 AM9/6/23
to us...@opencast.org

Hi Katrin,

 

I don’t remember having issues with the dublincore.xml itsself, but we habe also some issues some time ago with files not being present. I did not have time to dig into the code but we do since then mount all nfsshares on the workers with the sync option, which did solve out issues.

 

Kind regards,

Pascal

--
To unsubscribe from this group and stop receiving emails from it, send an email to users+un...@opencast.org.

Katrin Ihler

unread,
Sep 15, 2023, 4:08:56 AM9/15/23
to us...@opencast.org

Hi Katrin, Pascal,

some of our clients have also been experiencing this issue. Out of curiosity - which operating system are you running on your servers? Also which Opencast version are you using and can you give me some information on your NFS? Trying to find a common denominator here.

Cheers,

Katrin

-- 
ELAN e.V.
Heger-Tor-Wall 12
D-49074 Osnabrück
https://elan-ev.de

Katrin S

unread,
Oct 5, 2023, 5:25:51 AM10/5/23
to Opencast Users, Katrin Ihler
Hi Katrin,

please excuse the delayed answer, I was out of office the past two weeks.

Currently we use:
- Debian 11/Bullseye
- Opencast 14.2 (deb packages)
- Separate nodes for admin, presentation, worker
- NFS Server: NetApp, mounted with options rw,hard,intr,rsize=32768,wsize=32768

The issue happens with any scheduled event (tested with pyCA and Extron SMP).

Perhaps NetApp's request throttling mechanism has something to do with it. Although it's still weird, that it happens so consistently at this specific step and we don't notice any issues with e.g. uploading.

Setting the kernel parameters:
sunrpc.tcp_max_slot_table_entries = 65535
sunrpc.tcp_slot_table_entries = 2

might be worth a try (will try that myself, too)

- Katrin


Katrin S

unread,
Oct 12, 2023, 4:01:29 AM10/12/23
to Opencast Users, Katrin S, Katrin Ihler
Another observation we now made is that longer events with larger video files do not seem to be affected at all, so it is probably some kind of timing issue. 
Luckily, this also means it's not a huge problem in praxis. Of course it would still be good to find a solution since we can never be 100% sure.

Huber, Daniel

unread,
Apr 8, 2024, 12:45:54 PMApr 8
to Opencast Users, Katrin S, Katrin Ihler
Hi,

has anyone found the cause and/or a solution to this issue yet?

We experienced the same error for short (~ 10min) test recordings first, but now longer recordings (~1h) are occasionally affected as well (Opencast 14.8.0 on Debian 11, multi-server installation with shared NFS storage).

I am tempted to give Pascal's sync mount option a try, but would be interested if somebody maybe has found out more about this issue by now...

Thanks and best regards,

Daniel

________________________________________
Von: us...@opencast.org <us...@opencast.org> im Auftrag von Katrin S <hfti...@gmail.com>
Gesendet: Donnerstag, 12. Oktober 2023 10:01:28
An: Opencast Users
Cc: Katrin S; Katrin Ihler
Betreff: Re: AW: [OC Users] Processing fails for scheduled events (unable to load metadata from catalog)
To unsubscribe from this group and stop receiving emails from it, send an email to users+un...@opencast.org<mailto:users+un...@opencast.org>.

Katrin Ihler

unread,
Apr 9, 2024, 3:02:12 AMApr 9
to Huber, Daniel, Opencast Users, Katrin S, Sascha Nösberger
Hey Daniel,

I haven't spent any more time on this issue. The last time I looked into
this, it appeared to be a problem with the NFS syncing between nodes,
where the file was available for one node, but not the other. This seems
to be a timing issue, which causes shorter video files, often used when
testing, to be more problematic, because the NFS has less time to sync
before the next operation on a different node tries to read the file.
This also means that only certain operations in certain orders are a
problem, namely those which run on different nodes and perform read and
write for the same file shortly after each other. This would explain why
the same operations fail consistently while others are unaffected.

We have also been looking into different options for the NFS, but I'm
fuzzy on the details right now. I've been wondering whether an update to
the OS or NFS caused this, because while this can always happen when
using async, this didn't used to be such a prevalent problem.

Gonna cc Sascha on this, maybe when he is back from vacation he can say
whether this problem is still showing up for them since we adjusted some
settings.

Cheers,

Katrin
--
ELAN e.V.
Karlstr. 23
D-26123 Oldeburg

elan-ev.de

Huber, Daniel

unread,
Apr 15, 2024, 9:21:35 AMApr 15
to us...@opencast.org
Hi Katrin,

thanks for your in-depth explanation. This sounds reasonable and would explain our observations.

I finally had some time to play around with mount options in out test environment. After adding the "sync" and "hard" options, the problem seems to disappear.
At least as long as there is some time between scheduling an event and the recording (processing fails for events scheduled less than ~10 mins ahead).
So it looks like something is still off, but the effects are acceptable. Hopefully I can say the same after testing this in our production environment in the next days.

Thanks so far,

Daniel
________________________________________
Von: us...@opencast.org <us...@opencast.org> im Auftrag von Katrin Ihler <ih...@elan-ev.de>
Gesendet: Dienstag, 9. April 2024 08:59:58
An: Huber, Daniel; Opencast Users
Cc: Katrin S; Sascha Nösberger
Betreff: Re: AW: AW: [OC Users] Processing fails for scheduled events (unable to load metadata from catalog)
Reply all
Reply to author
Forward
0 new messages