OC 3.x Text extraction randomly fails - No such file or directory

60 views
Skip to first unread message

Katrin Scharnowski

unread,
Sep 26, 2017, 8:12:47 AM9/26/17
to Opencast Users

Hello,

we are running OC 3.3 (but the described problem has been there since 3.x, maybe even before that) on a cluster with 6 servers (1 admin, 1 presentation, 4x worker). We have a problem with text extraction failing sometimes because necessary files are not found.
The error message on the worker (http://10.80.10.215:8080) performing the text extraction job is

2017-09-26 13:36:35,463 | WARN  | (TextAnalyzerServiceImpl:226) - Error extracting text from http://10.80.10.214:8080/files/collection/composer/73835_0.png
java.lang.RuntimeException: Error aquiring lock for /srv/opencast/workspace/http_10.80.10.214_8080/files/collection/composer/73835_0.png
    at org.opencastproject.util.IoSupport.acquireLock(IoSupport.java:668)[39:matterhorn-common:3.3.0]
    at org.opencastproject.util.IoSupport.locked(IoSupport.java:644)[39:matterhorn-common:3.3.0]
    at org.opencastproject.workspace.impl.WorkspaceImpl.get(WorkspaceImpl.java:297)[100:matterhorn-workspace-impl:3.3.0]
    at org.opencastproject.textanalyzer.impl.TextAnalyzerServiceImpl.extract(TextAnalyzerServiceImpl.java:179)[79:matterhorn-textanalyzer-impl:3.3.0]
    at org.opencastproject.textanalyzer.impl.TextAnalyzerServiceImpl.process(TextAnalyzerServiceImpl.java:256)[79:matterhorn-textanalyzer-impl:3.3.0]
    at org.opencastproject.job.api.AbstractJobProducer$JobRunner.call(AbstractJobProducer.java:281)[39:matterhorn-common:3.3.0]
    at org.opencastproject.job.api.AbstractJobProducer$JobRunner.call(AbstractJobProducer.java:240)[39:matterhorn-common:3.3.0]
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)[:1.8.0_144]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)[:1.8.0_144]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)[:1.8.0_144]
    at java.lang.Thread.run(Thread.java:748)[:1.8.0_144]
Caused by: java.io.FileNotFoundException: /srv/opencast/workspace/http_10.80.10.214_8080/files/collection/composer/73835_0.png (No such file or directory)
    at java.io.RandomAccessFile.open0(Native Method)[:1.8.0_144]
    at java.io.RandomAccessFile.open(RandomAccessFile.java:316)[:1.8.0_144]
    at java.io.RandomAccessFile.<init>(RandomAccessFile.java:243)[:1.8.0_144]
    at org.opencastproject.util.IoSupport.acquireLock(IoSupport.java:655)[39:matterhorn-common:3.3.0]
    ... 10 more
2017-09-26 13:36:35,564 | INFO  | (ServiceRegistryJpaImpl:2399) - State set to WARNING for current service org.opencastproject.textanalyzer on host http://10.80.10.215:8080
2017-09-26 13:36:35,715 | ERROR | (AbstractJobProducer$JobRunner:311) - Error handling operation 'Extract': org.opencastproject.serviceregistry.api.ServiceRegistryException: Error handling operation 'Extract'
    at org.opencastproject.textanalyzer.impl.TextAnalyzerServiceImpl.process(TextAnalyzerServiceImpl.java:266)
    at org.opencastproject.job.api.AbstractJobProducer$JobRunner.call(AbstractJobProducer.java:281)
    at org.opencastproject.job.api.AbstractJobProducer$JobRunner.call(AbstractJobProducer.java:240)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: org.opencastproject.textanalyzer.api.TextAnalyzerException: java.lang.RuntimeException: Error aquiring lock for /srv/opencast/workspace/http_10.80.10.214_8080/files/collection/composer/73835_0.png
    at org.opencastproject.textanalyzer.impl.TextAnalyzerServiceImpl.extract(TextAnalyzerServiceImpl.java:230)
    at org.opencastproject.textanalyzer.impl.TextAnalyzerServiceImpl.process(TextAnalyzerServiceImpl.java:256)
    ... 6 more
Caused by: java.lang.RuntimeException: Error aquiring lock for /srv/opencast/workspace/http_10.80.10.214_8080/files/collection/composer/73835_0.png
    at org.opencastproject.util.IoSupport.acquireLock(IoSupport.java:668)
    at org.opencastproject.util.IoSupport.locked(IoSupport.java:644)
    at org.opencastproject.workspace.impl.WorkspaceImpl.get(WorkspaceImpl.java:297)
    at org.opencastproject.textanalyzer.impl.TextAnalyzerServiceImpl.extract(TextAnalyzerServiceImpl.java:179)
    ... 7 more
Caused by: java.io.FileNotFoundException: /srv/opencast/workspace/http_10.80.10.214_8080/files/collection/composer/73835_0.png (No such file or directory)
    at java.io.RandomAccessFile.open0(Native Method)
    at java.io.RandomAccessFile.open(RandomAccessFile.java:316)
    at java.io.RandomAccessFile.<init>(RandomAccessFile.java:243)
    at org.opencastproject.util.IoSupport.acquireLock(IoSupport.java:655)
    ... 10 more

As a result the text-extraction WOH fails. Restarting the workflow (sometimes multiple times) solves the problem eventually, so it's not tied to a specific source material or worker. Further investigating the DEBUG log of the admin node shows that the respective image extraction job is started
2017-09-26 13:34:16,983 | INFO  | (ComposerServiceRemoteImpl:253) - Image extraction job 73835 started on a remote composer

and finished properly. And on the worker performing the image extraction (http://10.80.10.214:8080):
2017-09-26 13:35:23,924 | DEBUG | (ComposerServiceImpl:1261) - Copied image file to the workspace at http://10.80.10.214:8080/files/collection/composer/73835_0.png

We have been investigating this weird race condition for several days now, we can also say that:
- the server clocks are synchronized (clock skew is < 1ms)
- NFS hard linking works (according to the log)
2017-09-26 10:43:06,316 | INFO  | (WorkspaceImpl:223) - Hard links between the working file repository and the workspace enabled

Further observations:
- the error appears independent of which worker does the text extraction and the image extraction job
- from each failing text analysis, only 1-2 images are not found, the rest is found and read properly
- the .../files/collection/composer folders on teh different workers seems to get re-created at several occasions, is that normal? I suspect that, for some reason the folder is recreated/deleted, prematurely before all the images are read.
- the workflow also fails if it is the only workflow being run (so its not a problem with concurrent workflows)

I don't know if that is relevant, but we are using NetApp as our NFS share and it is mounted using NFSv3 with the following options:
nfsvers=3,rw,hard,intr,rsize=32768,wsize=32768

We are really out of ideas at this point. Any help is much appreciated!

Best regards,
Katrin




Message has been deleted

Katrin Scharnowski

unread,
Sep 26, 2017, 8:29:34 AM9/26/17
to Opencast Users
The incident can also be found in the admin debug logs:

2017-09-26 13:34:15,676 | INFO  | (ComposerServiceRemoteImpl:253) - Image extraction job 73744 started on a remote composer
2017-09-26 13:34:15,704 | INFO  | (ComposerServiceRemoteImpl:253) - Image extraction job 73745 started on a remote composer
2017-09-26 13:34:15,732 | INFO  | (ComposerServiceRemoteImpl:253) - Image extraction job 73746 started on a remote composer
2017-09-26 13:34:15,762 | INFO  | (ComposerServiceRemoteImpl:253) - Image extraction job 73747 started on a remote composer
2017-09-26 13:34:15,793 | INFO  | (ComposerServiceRemoteImpl:253) - Image extraction job 73748 started on a remote composer
2017-09-26 13:34:15,820 | INFO  | (ComposerServiceRemoteImpl:253) - Image extraction job 73749 started on a remote composer
2017-09-26 13:34:15,852 | INFO  | (ComposerServiceRemoteImpl:253) - Image extraction job 73750 started on a remote composer
2017-09-26 13:34:15,888 | INFO  | (ComposerServiceRemoteImpl:253) - Image extraction job 73801 started on a remote composer
2017-09-26 13:34:15,916 | INFO  | (ComposerServiceRemoteImpl:253) - Image extraction job 73802 started on a remote composer
2017-09-26 13:34:15,947 | INFO  | (ComposerServiceRemoteImpl:253) - Image extraction job 73803 started on a remote composer
2017-09-26 13:34:15,981 | INFO  | (ComposerServiceRemoteImpl:253) - Image extraction job 73804 started on a remote composer
2017-09-26 13:34:16,008 | INFO  | (ComposerServiceRemoteImpl:253) - Image extraction job 73805 started on a remote composer
2017-09-26 13:34:16,040 | INFO  | (ComposerServiceRemoteImpl:253) - Image extraction job 73806 started on a remote composer
2017-09-26 13:34:16,069 | INFO  | (ComposerServiceRemoteImpl:253) - Image extraction job 73807 started on a remote composer
2017-09-26 13:34:16,098 | INFO  | (ComposerServiceRemoteImpl:253) - Image extraction job 73808 started on a remote composer
2017-09-26 13:34:16,130 | INFO  | (ComposerServiceRemoteImpl:253) - Image extraction job 73809 started on a remote composer
2017-09-26 13:34:16,160 | INFO  | (ComposerServiceRemoteImpl:253) - Image extraction job 73810 started on a remote composer
2017-09-26 13:34:16,190 | INFO  | (ComposerServiceRemoteImpl:253) - Image extraction job 73811 started on a remote composer
2017-09-26 13:34:16,243 | INFO  | (ComposerServiceRemoteImpl:253) - Image extraction job 73812 started on a remote composer
2017-09-26 13:34:16,282 | INFO  | (ComposerServiceRemoteImpl:253) - Image extraction job 73813 started on a remote composer
2017-09-26 13:34:16,315 | INFO  | (ComposerServiceRemoteImpl:253) - Image extraction job 73814 started on a remote composer
2017-09-26 13:34:16,352 | INFO  | (ComposerServiceRemoteImpl:253) - Image extraction job 73815 started on a remote composer
2017-09-26 13:34:16,391 | INFO  | (ComposerServiceRemoteImpl:253) - Image extraction job 73816 started on a remote composer
2017-09-26 13:34:16,432 | INFO  | (ComposerServiceRemoteImpl:253) - Image extraction job 73817 started on a remote composer
2017-09-26 13:34:16,466 | INFO  | (ComposerServiceRemoteImpl:253) - Image extraction job 73818 started on a remote composer
2017-09-26 13:34:16,497 | INFO  | (ComposerServiceRemoteImpl:253) - Image extraction job 73819 started on a remote composer
2017-09-26 13:34:16,532 | INFO  | (ComposerServiceRemoteImpl:253) - Image extraction job 73820 started on a remote composer
2017-09-26 13:34:16,562 | INFO  | (ComposerServiceRemoteImpl:253) - Image extraction job 73821 started on a remote composer
2017-09-26 13:34:16,594 | INFO  | (ComposerServiceRemoteImpl:253) - Image extraction job 73822 started on a remote composer
2017-09-26 13:34:16,626 | INFO  | (ComposerServiceRemoteImpl:253) - Image extraction job 73823 started on a remote composer
2017-09-26 13:34:16,656 | INFO  | (ComposerServiceRemoteImpl:253) - Image extraction job 73824 started on a remote composer
2017-09-26 13:34:16,687 | INFO  | (ComposerServiceRemoteImpl:253) - Image extraction job 73825 started on a remote composer
2017-09-26 13:34:16,714 | INFO  | (ComposerServiceRemoteImpl:253) - Image extraction job 73826 started on a remote composer
2017-09-26 13:34:16,742 | INFO  | (ComposerServiceRemoteImpl:253) - Image extraction job 73827 started on a remote composer
2017-09-26 13:34:16,769 | INFO  | (ComposerServiceRemoteImpl:253) - Image extraction job 73828 started on a remote composer
2017-09-26 13:34:16,800 | INFO  | (ComposerServiceRemoteImpl:253) - Image extraction job 73829 started on a remote composer
2017-09-26 13:34:16,830 | INFO  | (ComposerServiceRemoteImpl:253) - Image extraction job 73830 started on a remote composer
2017-09-26 13:34:16,857 | INFO  | (ComposerServiceRemoteImpl:253) - Image extraction job 73831 started on a remote composer
2017-09-26 13:34:16,887 | INFO  | (ComposerServiceRemoteImpl:253) - Image extraction job 73832 started on a remote composer
2017-09-26 13:34:16,917 | INFO  | (ComposerServiceRemoteImpl:253) - Image extraction job 73833 started on a remote composer
2017-09-26 13:34:16,951 | INFO  | (ComposerServiceRemoteImpl:253) - Image extraction job 73834 started on a remote composer

2017-09-26 13:34:16,983 | INFO  | (ComposerServiceRemoteImpl:253) - Image extraction job 73835 started on a remote composer
2017-09-26 13:34:17,017 | INFO  | (ComposerServiceRemoteImpl:253) - Image extraction job 73836 started on a remote composer
2017-09-26 13:34:17,042 | INFO  | (ComposerServiceRemoteImpl:253) - Image extraction job 73837 started on a remote composer
2017-09-26 13:34:17,073 | INFO  | (ComposerServiceRemoteImpl:253) - Image extraction job 73838 started on a remote composer
2017-09-26 13:34:17,102 | INFO  | (ComposerServiceRemoteImpl:253) - Image extraction job 73839 started on a remote composer
2017-09-26 13:34:17,134 | INFO  | (ComposerServiceRemoteImpl:253) - Image extraction job 73840 started on a remote composer
2017-09-26 13:34:17,160 | INFO  | (ComposerServiceRemoteImpl:253) - Image extraction job 73841 started on a remote composer
2017-09-26 13:34:17,189 | INFO  | (ComposerServiceRemoteImpl:253) - Image extraction job 73842 started on a remote composer
2017-09-26 13:34:17,242 | INFO  | (ComposerServiceRemoteImpl:253) - Image extraction job 73843 started on a remote composer
2017-09-26 13:34:17,274 | INFO  | (ComposerServiceRemoteImpl:253) - Image extraction job 73844 started on a remote composer
2017-09-26 13:34:17,325 | INFO  | (ComposerServiceRemoteImpl:253) - Image extraction job 73845 started on a remote composer
2017-09-26 13:34:17,354 | INFO  | (ComposerServiceRemoteImpl:253) - Image extraction job 73846 started on a remote composer
2017-09-26 13:34:17,384 | INFO  | (ComposerServiceRemoteImpl:253) - Image extraction job 73847 started on a remote composer
2017-09-26 13:34:17,412 | INFO  | (ComposerServiceRemoteImpl:253) - Image extraction job 73848 started on a remote composer
2017-09-26 13:34:17,439 | INFO  | (ComposerServiceRemoteImpl:253) - Image extraction job 73849 started on a remote composer
2017-09-26 13:34:17,465 | INFO  | (ComposerServiceRemoteImpl:253) - Image extraction job 73850 started on a remote composer
2017-09-26 13:34:17,498 | INFO  | (ComposerServiceRemoteImpl:253) - Image extraction job 73851 started on a remote composer
2017-09-26 13:34:17,552 | INFO  | (ComposerServiceRemoteImpl:253) - Image extraction job 72499 started on a remote composer
2017-09-26 13:34:17,586 | INFO  | (ComposerServiceRemoteImpl:253) - Image extraction job 72500 started on a remote composer
2017-09-26 13:34:17,620 | INFO  | (ComposerServiceRemoteImpl:253) - Image extraction job 73901 started on a remote composer
2017-09-26 13:34:17,654 | INFO  | (ComposerServiceRemoteImpl:253) - Image extraction job 73902 started on a remote composer
2017-09-26 13:36:12,696 | INFO  | (TextAnalysisRemoteImpl:70) - Analyzing http://10.80.10.214:8080/files/collection/composer/73744_0.png on a remote analysis server
2017-09-26 13:36:12,723 | INFO  | (TextAnalysisRemoteImpl:70) - Analyzing http://10.80.10.215:8080/files/collection/composer/73745_0.png on a remote analysis server
2017-09-26 13:36:12,749 | INFO  | (TextAnalysisRemoteImpl:70) - Analyzing http://10.80.10.213:8080/files/collection/composer/73746_0.png on a remote analysis server
2017-09-26 13:36:12,775 | INFO  | (TextAnalysisRemoteImpl:70) - Analyzing http://10.80.10.216:8080/files/collection/composer/73747_0.png on a remote analysis server
2017-09-26 13:36:12,802 | INFO  | (TextAnalysisRemoteImpl:70) - Analyzing http://10.80.10.214:8080/files/collection/composer/73748_0.png on a remote analysis server
2017-09-26 13:36:12,830 | INFO  | (TextAnalysisRemoteImpl:70) - Analyzing http://10.80.10.215:8080/files/collection/composer/73749_0.png on a remote analysis server
2017-09-26 13:36:12,857 | INFO  | (TextAnalysisRemoteImpl:70) - Analyzing http://10.80.10.213:8080/files/collection/composer/73750_0.png on a remote analysis server
2017-09-26 13:36:12,886 | INFO  | (TextAnalysisRemoteImpl:70) - Analyzing http://10.80.10.216:8080/files/collection/composer/73801_0.png on a remote analysis server
2017-09-26 13:36:12,913 | INFO  | (TextAnalysisRemoteImpl:70) - Analyzing http://10.80.10.214:8080/files/collection/composer/73802_0.png on a remote analysis server
2017-09-26 13:36:12,942 | INFO  | (TextAnalysisRemoteImpl:70) - Analyzing http://10.80.10.215:8080/files/collection/composer/73803_0.png on a remote analysis server
2017-09-26 13:36:12,971 | INFO  | (TextAnalysisRemoteImpl:70) - Analyzing http://10.80.10.213:8080/files/collection/composer/73804_0.png on a remote analysis server
2017-09-26 13:36:12,999 | INFO  | (TextAnalysisRemoteImpl:70) - Analyzing http://10.80.10.216:8080/files/collection/composer/73805_0.png on a remote analysis server
2017-09-26 13:36:13,030 | INFO  | (TextAnalysisRemoteImpl:70) - Analyzing http://10.80.10.214:8080/files/collection/composer/73806_0.png on a remote analysis server
2017-09-26 13:36:13,056 | INFO  | (TextAnalysisRemoteImpl:70) - Analyzing http://10.80.10.215:8080/files/collection/composer/73807_0.png on a remote analysis server
2017-09-26 13:36:13,091 | INFO  | (TextAnalysisRemoteImpl:70) - Analyzing http://10.80.10.213:8080/files/collection/composer/73808_0.png on a remote analysis server
2017-09-26 13:36:13,116 | INFO  | (TextAnalysisRemoteImpl:70) - Analyzing http://10.80.10.214:8080/files/collection/composer/73809_0.png on a remote analysis server
2017-09-26 13:36:13,144 | INFO  | (TextAnalysisRemoteImpl:70) - Analyzing http://10.80.10.216:8080/files/collection/composer/73810_0.png on a remote analysis server
2017-09-26 13:36:13,171 | INFO  | (TextAnalysisRemoteImpl:70) - Analyzing http://10.80.10.216:8080/files/collection/composer/73811_0.png on a remote analysis server
2017-09-26 13:36:13,198 | INFO  | (TextAnalysisRemoteImpl:70) - Analyzing http://10.80.10.214:8080/files/collection/composer/73812_0.png on a remote analysis server
2017-09-26 13:36:13,225 | INFO  | (TextAnalysisRemoteImpl:70) - Analyzing http://10.80.10.215:8080/files/collection/composer/73813_0.png on a remote analysis server
2017-09-26 13:36:13,250 | INFO  | (TextAnalysisRemoteImpl:70) - Analyzing http://10.80.10.213:8080/files/collection/composer/73814_0.png on a remote analysis server
2017-09-26 13:36:13,279 | INFO  | (TextAnalysisRemoteImpl:70) - Analyzing http://10.80.10.216:8080/files/collection/composer/73815_0.png on a remote analysis server
2017-09-26 13:36:13,307 | INFO  | (TextAnalysisRemoteImpl:70) - Analyzing http://10.80.10.214:8080/files/collection/composer/73816_0.png on a remote analysis server
2017-09-26 13:36:13,333 | INFO  | (TextAnalysisRemoteImpl:70) - Analyzing http://10.80.10.215:8080/files/collection/composer/73817_0.png on a remote analysis server
2017-09-26 13:36:13,361 | INFO  | (TextAnalysisRemoteImpl:70) - Analyzing http://10.80.10.213:8080/files/collection/composer/73818_0.png on a remote analysis server
2017-09-26 13:36:13,391 | INFO  | (TextAnalysisRemoteImpl:70) - Analyzing http://10.80.10.216:8080/files/collection/composer/73819_0.png on a remote analysis server
2017-09-26 13:36:13,419 | INFO  | (TextAnalysisRemoteImpl:70) - Analyzing http://10.80.10.214:8080/files/collection/composer/73820_0.png on a remote analysis server
2017-09-26 13:36:13,446 | INFO  | (TextAnalysisRemoteImpl:70) - Analyzing http://10.80.10.215:8080/files/collection/composer/73821_0.png on a remote analysis server
2017-09-26 13:36:13,473 | INFO  | (TextAnalysisRemoteImpl:70) - Analyzing http://10.80.10.213:8080/files/collection/composer/73822_0.png on a remote analysis server
2017-09-26 13:36:13,501 | INFO  | (TextAnalysisRemoteImpl:70) - Analyzing http://10.80.10.216:8080/files/collection/composer/73823_0.png on a remote analysis server
2017-09-26 13:36:13,526 | INFO  | (TextAnalysisRemoteImpl:70) - Analyzing http://10.80.10.214:8080/files/collection/composer/73824_0.png on a remote analysis server
2017-09-26 13:36:13,554 | INFO  | (TextAnalysisRemoteImpl:70) - Analyzing http://10.80.10.215:8080/files/collection/composer/73825_0.png on a remote analysis server
2017-09-26 13:36:13,590 | INFO  | (TextAnalysisRemoteImpl:70) - Analyzing http://10.80.10.213:8080/files/collection/composer/73826_0.png on a remote analysis server
2017-09-26 13:36:13,618 | INFO  | (TextAnalysisRemoteImpl:70) - Analyzing http://10.80.10.216:8080/files/collection/composer/73827_0.png on a remote analysis server
2017-09-26 13:36:13,648 | INFO  | (TextAnalysisRemoteImpl:70) - Analyzing http://10.80.10.214:8080/files/collection/composer/73828_0.png on a remote analysis server
2017-09-26 13:36:13,675 | INFO  | (TextAnalysisRemoteImpl:70) - Analyzing http://10.80.10.215:8080/files/collection/composer/73829_0.png on a remote analysis server
2017-09-26 13:36:13,705 | INFO  | (TextAnalysisRemoteImpl:70) - Analyzing http://10.80.10.213:8080/files/collection/composer/73830_0.png on a remote analysis server
2017-09-26 13:36:13,741 | INFO  | (TextAnalysisRemoteImpl:70) - Analyzing http://10.80.10.214:8080/files/collection/composer/73831_0.png on a remote analysis server
2017-09-26 13:36:13,766 | INFO  | (TextAnalysisRemoteImpl:70) - Analyzing http://10.80.10.216:8080/files/collection/composer/73832_0.png on a remote analysis server
2017-09-26 13:36:13,798 | INFO  | (TextAnalysisRemoteImpl:70) - Analyzing http://10.80.10.215:8080/files/collection/composer/73833_0.png on a remote analysis server
2017-09-26 13:36:13,837 | INFO  | (TextAnalysisRemoteImpl:70) - Analyzing http://10.80.10.213:8080/files/collection/composer/73834_0.png on a remote analysis server
2017-09-26 13:36:13,862 | INFO  | (TextAnalysisRemoteImpl:70) - Analyzing http://10.80.10.214:8080/files/collection/composer/73835_0.png on a remote analysis server
2017-09-26 13:36:13,889 | INFO  | (TextAnalysisRemoteImpl:70) - Analyzing http://10.80.10.216:8080/files/collection/composer/73836_0.png on a remote analysis server
2017-09-26 13:36:13,914 | INFO  | (TextAnalysisRemoteImpl:70) - Analyzing http://10.80.10.215:8080/files/collection/composer/73837_0.png on a remote analysis server
2017-09-26 13:36:13,939 | INFO  | (TextAnalysisRemoteImpl:70) - Analyzing http://10.80.10.213:8080/files/collection/composer/73838_0.png on a remote analysis server
2017-09-26 13:36:13,970 | INFO  | (TextAnalysisRemoteImpl:70) - Analyzing http://10.80.10.216:8080/files/collection/composer/73839_0.png on a remote analysis server
2017-09-26 13:36:14,002 | INFO  | (TextAnalysisRemoteImpl:70) - Analyzing http://10.80.10.214:8080/files/collection/composer/73840_0.png on a remote analysis server
2017-09-26 13:36:14,028 | INFO  | (TextAnalysisRemoteImpl:70) - Analyzing http://10.80.10.215:8080/files/collection/composer/73841_0.png on a remote analysis server
2017-09-26 13:36:14,054 | INFO  | (TextAnalysisRemoteImpl:70) - Analyzing http://10.80.10.213:8080/files/collection/composer/73842_0.png on a remote analysis server
2017-09-26 13:36:14,081 | INFO  | (TextAnalysisRemoteImpl:70) - Analyzing http://10.80.10.216:8080/files/collection/composer/73843_0.png on a remote analysis server
2017-09-26 13:36:14,110 | INFO  | (TextAnalysisRemoteImpl:70) - Analyzing http://10.80.10.214:8080/files/collection/composer/73844_0.png on a remote analysis server
2017-09-26 13:36:14,151 | INFO  | (TextAnalysisRemoteImpl:70) - Analyzing http://10.80.10.215:8080/files/collection/composer/73845_0.png on a remote analysis server
2017-09-26 13:36:14,177 | INFO  | (TextAnalysisRemoteImpl:70) - Analyzing http://10.80.10.213:8080/files/collection/composer/73846_0.png on a remote analysis server
2017-09-26 13:36:14,202 | INFO  | (TextAnalysisRemoteImpl:70) - Analyzing http://10.80.10.216:8080/files/collection/composer/73847_0.png on a remote analysis server
2017-09-26 13:36:14,242 | INFO  | (TextAnalysisRemoteImpl:70) - Analyzing http://10.80.10.214:8080/files/collection/composer/73848_0.png on a remote analysis server
2017-09-26 13:36:14,267 | INFO  | (TextAnalysisRemoteImpl:70) - Analyzing http://10.80.10.215:8080/files/collection/composer/73849_0.png on a remote analysis server
2017-09-26 13:36:14,292 | INFO  | (TextAnalysisRemoteImpl:70) - Analyzing http://10.80.10.213:8080/files/collection/composer/73850_0.png on a remote analysis server
2017-09-26 13:36:14,316 | INFO  | (TextAnalysisRemoteImpl:70) - Analyzing http://10.80.10.216:8080/files/collection/composer/73851_0.png on a remote analysis server
2017-09-26 13:36:14,347 | INFO  | (TextAnalysisRemoteImpl:70) - Analyzing http://10.80.10.215:8080/files/collection/composer/72499_0.png on a remote analysis server
2017-09-26 13:36:14,371 | INFO  | (TextAnalysisRemoteImpl:70) - Analyzing http://10.80.10.213:8080/files/collection/composer/72500_0.png on a remote analysis server
2017-09-26 13:36:14,401 | INFO  | (TextAnalysisRemoteImpl:70) - Analyzing http://10.80.10.214:8080/files/collection/composer/73901_0.png on a remote analysis server
2017-09-26 13:36:14,427 | INFO  | (TextAnalysisRemoteImpl:70) - Analyzing http://10.80.10.215:8080/files/collection/composer/73902_0.png on a remote analysis server
2017-09-26 13:37:00,808 | DEBUG | (TextAnalysisWorkflowOperationHandler:398) - Removing temporary images
2017-09-26 13:37:01,718 | ERROR | (WorkflowOperationWorker:175) - Workflow operation 'operation:'extract-text', position:30, state:'FAILED'' failed
org.opencastproject.workflow.api.WorkflowOperationException: Text extraction failed on images from https://{ADMINURL}/archive/archive/mediapackage/a2ac3745-b111-4f8e-bc3f-bc65f46921f5/e1e43b00-09b6-410d-8b3f-1a28bf980714/32/track.m4v
    at org.opencastproject.workflow.handler.textanalyzer.TextAnalysisWorkflowOperationHandler.extractVideoText(TextAnalysisWorkflowOperationHandler.java:327)[127:matterhorn-textanalyzer-workflowoperation:3.3.0]
    at org.opencastproject.workflow.handler.textanalyzer.TextAnalysisWorkflowOperationHandler.start(TextAnalysisWorkflowOperationHandler.java:196)[127:matterhorn-textanalyzer-workflowoperation:3.3.0]
    at org.opencastproject.workflow.impl.WorkflowOperationWorker.start(WorkflowOperationWorker.java:229)[150:matterhorn-workflow-service-impl:3.3.0]
    at org.opencastproject.workflow.impl.WorkflowOperationWorker.execute(WorkflowOperationWorker.java:155)[150:matterhorn-workflow-service-impl:3.3.0]
    at org.opencastproject.workflow.impl.WorkflowServiceImpl.runWorkflowOperation(WorkflowServiceImpl.java:880)[150:matterhorn-workflow-service-impl:3.3.0]
    at org.opencastproject.workflow.impl.WorkflowServiceImpl.process(WorkflowServiceImpl.java:1945)[150:matterhorn-workflow-service-impl:3.3.0]
    at org.opencastproject.workflow.impl.WorkflowServiceImpl$JobRunner.call(WorkflowServiceImpl.java:2365)[150:matterhorn-workflow-service-impl:3.3.0]
    at org.opencastproject.workflow.impl.WorkflowServiceImpl$JobRunner.call(WorkflowServiceImpl.java:2331)[150:matterhorn-workflow-service-impl:3.3.0]

    at java.util.concurrent.FutureTask.run(FutureTask.java:266)[:1.8.0_144]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)[:1.8.0_144]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)[:1.8.0_144]
    at java.lang.Thread.run(Thread.java:748)[:1.8.0_144]

Is it possible that the temporary images are deleted too early?

Katrin

kdolan

unread,
Sep 26, 2017, 8:34:22 AM9/26/17
to us...@opencast.org
Hi Katrin,

I’m not sure this help, but something very easy to try is to add a file into the files/collection/composer directory.
For example "touch thisIsAholderFileToPreventParentDirFromBeingDeleted.txt".

- Karen


> On Sep 26, 2017, at 8:26 AM, Katrin Scharnowski <hfti...@gmail.com> wrote:
>
> The incident is also found in the Admin node logs
>
> ...
> 2017-09-26 13:36:13,862 | INFO | (TextAnalysisRemoteImpl:70) - Analyzing http://10.80.10.214:8080/files/collection/composer/73835_0.png on a remote analysis server
> ...
> 2017-09-26 13:36:14,427 | INFO | (TextAnalysisRemoteImpl:70) - Analyzing http://10.80.10.215:8080/files/collection/composer/73902_0.png on a remote analysis server
> 2017-09-26 13:37:00,808 | DEBUG | (TextAnalysisWorkflowOperationHandler:398) - Removing temporary images
> 2017-09-26 13:37:01,718 | ERROR | (WorkflowOperationWorker:175) - Workflow operation 'operation:'extract-text', position:30, state:'FAILED'' failed
> org.opencastproject.workflow.api.WorkflowOperationException: Text extraction failed on images from https://ocadmin.rz.hft-stuttgart.de/archive/archive/mediapackage/a2ac3745-b111-4f8e-bc3f-bc65f46921f5/e1e43b00-09b6-410d-8b3f-1a28bf980714/32/track.m4v
> at org.opencastproject.workflow.handler.textanalyzer.TextAnalysisWorkflowOperationHandler.extractVideoText(TextAnalysisWorkflowOperationHandler.java:327)[127:matterhorn-textanalyzer-workflowoperation:3.3.0]
> at org.opencastproject.workflow.handler.textanalyzer.TextAnalysisWorkflowOperationHandler.start(TextAnalysisWorkflowOperationHandler.java:196)[127:matterhorn-textanalyzer-workflowoperation:3.3.0]
> at org.opencastproject.workflow.impl.WorkflowOperationWorker.start(WorkflowOperationWorker.java:229)[150:matterhorn-workflow-service-impl:3.3.0]
> at org.opencastproject.workflow.impl.WorkflowOperationWorker.execute(WorkflowOperationWorker.java:155)[150:matterhorn-workflow-service-impl:3.3.0]
> at org.opencastproject.workflow.impl.WorkflowServiceImpl.runWorkflowOperation(WorkflowServiceImpl.java:880)[150:matterhorn-workflow-service-impl:3.3.0]
> at org.opencastproject.workflow.impl.WorkflowServiceImpl.process(WorkflowServiceImpl.java:1945)[150:matterhorn-workflow-service-impl:3.3.0]
> at org.opencastproject.workflow.impl.WorkflowServiceImpl$JobRunner.call(WorkflowServiceImpl.java:2365)[150:matterhorn-workflow-service-impl:3.3.0]
> at org.opencastproject.workflow.impl.WorkflowServiceImpl$JobRunner.call(WorkflowServiceImpl.java:2331)[150:matterhorn-workflow-service-impl:3.3.0]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)[:1.8.0_144]
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)[:1.8.0_144]
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)[:1.8.0_144]
> at java.lang.Thread.run(Thread.java:748)[:1.8.0_144]
>
> Is it possible that the temporary images are deleted prematurely?
>
> - Katrin
>
> On Tuesday, September 26, 2017 at 2:12:47 PM UTC+2, Katrin Scharnowski wrote:
>
> --
> You received this message because you are subscribed to the Google Groups "Opencast Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to users+un...@opencast.org.

Stephen Marquard

unread,
Sep 26, 2017, 8:39:11 AM9/26/17
to us...@opencast.org

Hi Katrin

 

If you think this is an issue:

 

“- the .../files/collection/composer folders on teh different workers seems to get re-created at several occasions, is that normal? I suspect that, for some reason the folder is recreated/deleted, prematurely before all the images are read.”

 

you could try creating a file in that folder, e.g. /srv/opencast/workspace/http_10.80.10.214_8080/files/collection/composer/README.keepalive and use the “touch” command in a script to update the modification time, so that there is always a file in that folder (so the folder will never get deleted), and it’s always up to date so won’t get removed by a cleanup process.

 

I’ve also found it helpful to log the thread name in the log pattern, by changing the log format in etc/org.ops4j.pax.logging.cfg:

 

log4j.appender.out.layout.ConversionPattern=%d{ISO8601} | %-5.5p | %t | (%C{1}:%L) - %m%n

 

so you can get a better idea if the problem occurs within one thread, or as the result of the interaction of two different threads.

 

We’re also using a Netapp for our NFS share, but aren’t yet running text extraction. We have these mount options in /etc/fstab

 

uctnetapp01nas.uct.ac.za:/vol_opcworkshared /data/opencast/work/shared     nfs     rw,lookupcache=none,actimeo=1,nosuid,nodev,noexec,noatime,auto,nouser,_netdev,hard,intr,retry=30 0 0

 

uctnetapp01nas.uct.ac.za:/vol_opcworkshared /data/opencast/work/shared nfs    

rw,lookupcache=none,actimeo=1,nosuid,nodev,noexec,auto,nouser,_netdev,hard,intr,retry=30     0       0

 

In Opencast 1.x we had some concerns about race conditions across NFS (one node writing a file that wasn’t yet visible to another node) which is why we added actimeo=1, lookupcache=none though I’m not sure if these are helpful or not for Opencast 3.x.

 

What is your config for

 

org.opencastproject.file.repo.path

org.opencastproject.workspace.rootdir

 

on the admin server and worker nodes?

 

Regards

Stephen

 

---
Stephen Marquard, Learning Technologies Co-ordinator,
Centre for Innovation in Learning and Teaching (CILT)
University of Cape Town
http://www.cilt.uct.ac.za
stephen....@uct.ac.za
Phone: +27-21-650-5037 Cell: +27-83-500-5290

--
You received this message because you are subscribed to the Google Groups "Opencast Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to users+un...@opencast.org.

Disclaimer - University of Cape Town This e-mail is subject to UCT policies and e-mail disclaimer published on our website at http://www.uct.ac.za/about/policies/emaildisclaimer/ or obtainable from +27 21 650 9111. If this e-mail is not related to the business of UCT, it is sent by the sender in an individual capacity. Please report security incidents or abuse via cs...@uct.ac.za

kdolan

unread,
Sep 26, 2017, 8:47:54 AM9/26/17
to us...@opencast.org

> you could try creating a file in that folder, e.g. /srv/opencast/workspace/http_10.80.10.214_8080/files/collection/composer/README.keepalive and use the “touch” command in a script to update the modification time, so that there is always a file in that folder (so the folder will never get deleted), and it’s always up to date so won’t get removed by a cleanup process.

Stephen & I had a race condition! The "README.keepalive" is a nicer name for the parent directory preserving file.

Katrin,
If you find that adding that dummy file prevents the errors, please create a Jira ticket about it, or post to list, so we can make that directory holder file part of the release.

Best of luck,
Karen

Katrin Scharnowski

unread,
Sep 26, 2017, 8:51:54 AM9/26/17
to Opencast Users
Hi Stephen,

we have not explicitely set org.opencastproject.file.repo.path, so
#org.opencastproject.file.repo.path=${org.opencastproject.storage.dir}/files

Furthermore we have
org.opencastproject.workspace.rootdir=${org.opencastproject.storage.dir}/workspace

and

org.opencastproject.file.repo.url=${org.opencastproject.admin.ui.url}

on all nodes.

Katrin

Katrin Scharnowski

unread,
Sep 27, 2017, 3:19:11 AM9/27/17
to Opencast Users
Following the suggestions of Stephen and Karen we are now running a script on the admin node that recreates a file within the .../composer/-directories of all workers every 5 seconds.

#!/bin/bash

while (true)
do
    mkdir
-p /srv/opencast/workspace/http_10.80.10.213_8080/files/collection/composer/ && touch /srv/opencast/workspace/http_10.80.10.213_8080/files/collection/composer/README.keepalive
    mkdir
-p /srv/opencast/workspace/http_10.80.10.214_8080/files/collection/composer/ && touch /srv/opencast/workspace/http_10.80.10.214_8080/files/collection/composer/README.keepalive
    mkdir
-p /srv/opencast/workspace/http_10.80.10.215_8080/files/collection/composer/ && touch /srv/opencast/workspace/http_10.80.10.215_8080/files/collection/composer/README.keepalive
    mkdir
-p /srv/opencast/workspace/http_10.80.10.216_8080/files/collection/composer/ && touch /srv/opencast/workspace/http_10.80.10.216_8080/files/collection/composer/README.keepalive
    sleep
5
done

Over last night we re-ingested a whole series with 12 events and they all finished without errors! So I can definitely say that creating the "keep-alive" files has kept the folder from being deleted prematurely.

It works for the moment, however, it is more of a workaround then a solution. I suspect that there are more profound problems within our installation, which could potentially lead to other problems in the future.

Best regards,

Katrin

On Tuesday, September 26, 2017 at 2:12:47 PM UTC+2, Katrin Scharnowski wrote:

Stephen Marquard

unread,
Sep 29, 2017, 4:16:14 AM9/29/17
to us...@opencast.org

Hi Katrin

 

First, there is definitely a code issue causing the underlying problem, for which I’ve created https://opencast.jira.com/browse/MH-12435 and submitted a possible fix.

 

Second, your script really only needs to run once a day or so, as the presence of the README.keepalive file should prevent the folder from being removed. The reason to touch the file periodically is to stop it being removed by the workspace cleanup script which runs every 30 days by default (org.opencastproject.workspace.cleanup.max.age in custom.properties)

 

However, the workspace paths you’ve posted below (in your script) suggest that you have different workspaces for every worker node or some other unusual configuration.

 

Our cluster configuration looks like this:

 

Admin node:

 

org.opencastproject.server.url=http://media.uct.ac.za

org.opencastproject.admin.ui.url=http://media.uct.ac.za

org.opencastproject.file.repo.path=/data/opencast/work/shared/files

org.opencastproject.file.repo.url=${org.opencastproject.admin.ui.url}

org.opencastproject.workspace.rootdir=/data/opencast/work/shared/workspace

 

Worker nodes:

 

org.opencastproject.server.url=http://srvubuopc401.uct.ac.za:8080 (similarly 402/403/404 for additional worker nodes)

org.opencastproject.admin.ui.url=http://media.uct.ac.za

org.opencastproject.file.repo.path=/data/opencast/work/shared/files

org.opencastproject.file.repo.url=${org.opencastproject.admin.ui.url}

org.opencastproject.workspace.rootdir=/data/opencast/work/shared/workspace

 

/data/opencast/work/shared is a Netapp NFS share that’s mounted at that path on the admin and all worker nodes.

 

Regards

Stephen

 

---
Stephen Marquard, Learning Technologies Co-ordinator,
Centre for Innovation in Learning and Teaching (CILT)
University of Cape Town
http://www.cilt.uct.ac.za
stephen....@uct.ac.za
Phone: +27-21-650-5037 Cell: +27-83-500-5290

 

From: Katrin Scharnowski [mailto:hfti...@gmail.com]

Sent: 27 September 2017 09:19 AM
To: Opencast Users <us...@opencast.org>

--

You received this message because you are subscribed to the Google Groups "Opencast Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to users+un...@opencast.org.

Katrin Scharnowski

unread,
Sep 29, 2017, 4:43:56 AM9/29/17
to Opencast Users
Hello Stephen,

thanks a lot for your reply, I feared that some more profound problems where going on in our OC cluster. To be honest, I always assumed that the workers having their "own" workspace was to be expected, and that the files in there are just hard links to the 'real' workspace on the admin node.

On our admin node we have in 'custom.properties' (the commented out values, we have not set explicitely):

org.opencastproject.server.url=https://ocadmin.rz.hft-stuttgart.de
org.opencastproject.storage.dir=/srv/opencast
#org.opencastproject.file.repo.path=${org.opencastproject.storage.dir}/files
org.opencastproject.file.repo.url=${org.opencastproject.admin.ui.url}
org.opencastproject.workspace.rootdir=${org.opencastproject.storage.dir}/workspace

On the worker nodes, likewise, except that the org.opencastproject.server.url is different of course.

The main difference I see is that we have not set

   org.opencastproject.admin.ui.url

in custom.properties. We do, however, have

  prop.org.opencastproject.admin.ui.url=https://ocadmin.rz.hft-stuttgart.de

in org.opencastproject.organization-mh_default_org.cfg, which I assumed would set the admin.ui.url globally, is that not correct?

Best regards,
katrin

Katrin Scharnowski

unread,
Sep 29, 2017, 5:24:59 AM9/29/17
to Opencast Users
A little update:

setting

org.opencastproject.admin.ui.url

to the proper value in custom.properties as lead to the workers using the admin nodes workspace folder rather than their own ones.

I feel kind of stupid now, especially since the documentation states to set
org.opencastproject.admin.ui.url
and
org.opencastproject.engage.ui.url

explicitely (rather than "prop.org.....").

Best regards,
Katrin

Stephen Marquard

unread,
Sep 29, 2017, 5:37:18 AM9/29/17
to us...@opencast.org

Actually it is a bit confusing because the docs here:

 

https://docs.opencast.org/r/3.x/admin/installation/multiple-servers/

 

do not match the comments in the default cfg file here:

 

https://bitbucket.org/opencast-community/opencast/src/d061b0ca7f90c82c5000905725744157f21700c0/etc/org.opencastproject.organization-mh_default_org.cfg?at=r%2F3.x&fileviewer=file-view-default

 

I keep hoping that someone will write the definitive guide to Opencast distributed configuration. Then there’s also multi-tenant configuration…

To unsubscribe from this group and stop receiving emails from it, send an email to https://protect-za.mimecast.com/s/61NqBMsL78VNcN.

Disclaimer - University of Cape Town This e-mail is subject to UCT policies and e-mail disclaimer published on our website at http://www.uct.ac.za/about/policies/emaildisclaimer/ or obtainable from +27 21 650 9111. If this e-mail is not related to the business of UCT, it is sent by the sender in an individual capacity. Please report security incidents or abuse via https://protect-za.mimecast.com/s/bJp1BnhO4akECR

 

 

--
You received this message because you are subscribed to the Google Groups "Opencast Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to users+un...@opencast.org.

Katrin Scharnowski

unread,
Sep 29, 2017, 5:58:48 AM9/29/17
to Opencast Users
After some more investigation, I have to say, the documentation is very misleading here (if not wrong).

Setting

org.opencastproject.admin.ui.url=...
org.opencastproject.engage.ui.url=...

in 'org.opencastproject.organization-mh_default_org.cfg', as described in the docs, does not solve the problem.
They actually have to be set in custom.properties for the value to be picked up  (e.g. for the file.repo.url).

Best regards,
Katrin

Greg Logan

unread,
Sep 29, 2017, 11:58:06 AM9/29/17
to Opencast Users
Hi Katrin,

Apologies for the confusion, those keys *should* live in the org config, however the keys specified in the docs are wrong.  They should both be prefixed with "prop.", which is probably why they don't work as specified in the docs.  I'll look into fixing the docs today.

The rule of thumb is that things in the org config need the prefix, whereas those in custom.properties do not.

G

Katrin Scharnowski

unread,
Oct 2, 2017, 4:00:09 AM10/2/17
to Opencast Users
Hi Greg,

since what you are describing in your post is exactly the config we had to begin with, I explicitly rechecked it today.
I put the porp.org.... values in the org.-file (well, they were already there, since I never removed them) and removed

org.opencastproject.admin.ui.url=...
org.opencastproject.engage.ui.url=...

from custom.properties on all nodes.

Now I see the worker-specific workplace folders being used again, e.g. in the logs we see things like

2017-10-02 09:52:03,259 | INFO  | (WorkspaceImpl:364) - Downloading http://10.80.10.215:8080/files/collection/composer/99664.mp4 to /srv/opencast/workspace/http_10.80.10.215_8080/files/collection/composer/99664.mp4

To me it seems like, when NOT putting the org-values in custom.properties, the following is happening:
- org.opencastproject.admin.ui.url is set to the default value of  org.opencastproject.server.url
-
org.opencastproject.file.repo.url is picking this default value up
-
http_10.80.10.XXX_8080 workspace is used

In contrast, when putting the org.-values in custom.properties:
- 'https_ocadmin.rz.hft-stuttgart.de' is used, which makes more sense, since we have
org.opencastproject.file.repo.url=${org.opencastproject.admin.ui.url}
- The issues described in the first post actually disappear

I think this might be a bug.

Best regards,
Kaatrin

Reply all
Reply to author
Forward
0 new messages