Hi,
I'm trying to add a WhisperC++ engine to the speech-to-text-impl. Different to classical Whisper you have to feed a wav file as input to WhisperC++ - therefore I have an encode-job before running speech-to-text:
```
id: common-partial-publish-enrich-stt
title: Operations for speech-to-text
operations:
- id: encode
# fail-on-error: true
exception-handler-workflow: common-partial-error
description: " └─ * Extract wav audio for speech to text"
configurations:
- source-flavor: "*/themed"
- target-flavor: "*/audio+stt"
- encoding-profile: audio-whisper
- target-tags: archive
- id: speechtotext
# fail-on-error: true
exception-handler-workflow: common-partial-error
description: " └─ * Generates subtitles for video and audio files (whispercpp)"
configurations:
- source-flavor: "*/audio+stt"
- target-flavor: captions/vtt+auto-#{lang}
- target-element: attachment
- language-code: de
- target-tags: archive,subtitle,engage-download
```
The wav file is encoded, but the speechtotext-woh cannot pick up the wav file:
```
2023-05-23T08:49:05,098 | INFO | (ComposerServiceImpl:572) - Starting parallel encode with profile audio-whisper with job load 1
2023-05-23T08:49:07,139 | INFO | (EncoderEngine:216) - Executing encoding command: [ffmpeg, -nostdin, -nostats, -i, /data/opencast/staging/workspace/mediapackage/47fb375d-5ad8-4a6f-9a6e-a2237aa73a42/e9494333-cf33-498d-af52-1a9932e9cf2a/e9494333-cf33-498d-af52-1a9932e9cf2a.mp4, -vn, -ar, 16000, -ac, 1, -c:a, pcm_s16le, /data/opencast/staging/workspace/mediapackage/47fb375d-5ad8-4a6f-9a6e-a2237aa73a42/e9494333-cf33-498d-af52-1a9932e9cf2a/e9494333-cf33-498d-af52-1a9932e9cf2a_359d202b-f87a-4619-87c3-5e2ebc2d08e7-stt.wav]
2023-05-23T08:49:07,163 | INFO | (EncoderEngine:491) - vendor_id : [0][0][0][0]
2023-05-23T08:49:07,163 | INFO | (EncoderEngine:491) - vendor_id : [0][0][0][0]
2023-05-23T08:49:07,168 | INFO | (EncoderEngine:460) - Identified output file /data/opencast/staging/workspace/mediapackage/47fb375d-5ad8-4a6f-9a6e-a2237aa73a42/e9494333-cf33-498d-af52-1a9932e9cf2a/e9494333-cf33-498d-af52-1a9932e9cf2a_359d202b-f87a-4619-87c3-5e2ebc2d08e7-stt.wav
2023-05-23T08:49:07,168 | INFO | (EncoderEngine:491) - ISFT : Lavf60.3.100
2023-05-23T08:49:07,168 | INFO | (EncoderEngine:491) - vendor_id : [0][0][0][0]
2023-05-23T08:49:07,356 | INFO | (EncoderEngine:240) - Tracks {video=/data/opencast/staging/workspace/mediapackage/47fb375d-5ad8-4a6f-9a6e-a2237aa73a42/e9494333-cf33-498d-af52-1a9932e9cf2a/e9494333-cf33-498d-af52-1a9932e9cf2a.mp4} successfully encoded using profile 'audio-whisper'
2023-05-23T08:49:07,512 | INFO | (ComposerServiceImpl:533) - Copied the encoded file to the workspace at
https://admin.oc.univie.ac.at/files/collection/composer/522538_0.wav2023-05-23T08:49:12,610 | INFO | (ComposerServiceImpl:553) - Deleted the local copy of the encoded file at /data/opencast/staging/workspace/mediapackage/47fb375d-5ad8-4a6f-9a6e-a2237aa73a42/e9494333-cf33-498d-af52-1a9932e9cf2a/e9494333-cf33-498d-af52-1a9932e9cf2a_359d202b-f87a-4619-87c3-5e2ebc2d08e7-stt.wav
2023-05-23T08:49:19,432 | INFO | (WhisperCppEngine:130) - Executing WhisperC++'s transcription command: [whispercpp, -ovtt, -bs 5, --model, /usr/share/ggml/ggml-base.bin, --output-file, /data/opencast/staging/workspace/collection/subtitles/tmp_522540_e9494333-cf33-498d-af52-1a9932e9cf2a.vtt, -l, de, -f, /data/opencast/staging/workspace/mediapackage/47fb375d-5ad8-4a6f-9a6e-a2237aa73a42/f43a7a1b-8d9e-4c69-a63d-0f652761c238/e9494333-cf33-498d-af52-1a9932e9cf2a.wav]
2023-05-23T08:49:19,472 | INFO | (ServiceRegistryJpaImpl:2266) - State set to WARNING for current service org.opencastproject.speechtotext on host
https://oc-worker2.oc.univie.ac.at2023-05-23T08:49:19,527 | ERROR | (AbstractJobProducer$JobRunner:343) - Error handling operation 'speechtotext':
org.opencastproject.speechtotext.api.SpeechToTextServiceException: Error while generating subtitle from
https://admin.oc.univie.ac.at/files/mediapackage/47fb375d-5ad8-4a6f-9a6e-a2237aa73a42/f43a7a1b-8d9e-4c69-a63d-0f652761c238/e9494333-cf33-498d-af52-1a9932e9cf2a.wav at org.opencastproject.speechtotext.impl.SpeechToTextServiceImpl.process(SpeechToTextServiceImpl.java:165) ~[?:?]
at org.opencastproject.job.api.AbstractJobProducer$JobRunner.call(AbstractJobProducer.java:313) [!/:?]
at org.opencastproject.job.api.AbstractJobProducer$JobRunner.call(AbstractJobProducer.java:272) [!/:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
at java.lang.Thread.run(Thread.java:829) [?:?]
Caused by: org.opencastproject.speechtotext.api.SpeechToTextEngineException: org.opencastproject.speechtotext.api.SpeechToTextEngineException: WhisperC++ produced no output
at org.opencastproject.speechtotext.impl.engine.WhisperCppEngine.generateSubtitlesFile(WhisperCppEngine.java:155) ~[?:?]
at org.opencastproject.speechtotext.impl.SpeechToTextServiceImpl.process(SpeechToTextServiceImpl.java:155) ~[?:?]
... 6 more
Caused by: org.opencastproject.speechtotext.api.SpeechToTextEngineException: WhisperC++ produced no output
at org.opencastproject.speechtotext.impl.engine.WhisperCppEngine.generateSubtitlesFile(WhisperCppEngine.java:150) ~[?:?]
at org.opencastproject.speechtotext.impl.SpeechToTextServiceImpl.process(SpeechToTextServiceImpl.java:155) ~[?:?]
... 6 more
```
The whispercpp command's input does not exist:
```
[opencast@oc-worker2 /]$ ls /data/opencast/staging/workspace/mediapackage/47fb375d-5ad8-4a6f-9a6e-a2237aa73a42/f43a7a1b-8d9e-4c69-a63d-0f652761c238/e9494333-cf33-498d-af52-1a9932e9cf2a.wav
ls: Zugriff auf '/data/opencast/staging/workspace/mediapackage/47fb375d-5ad8-4a6f-9a6e-a2237aa73a42/f43a7a1b-8d9e-4c69-a63d-0f652761c238/e9494333-cf33-498d-af52-1a9932e9cf2a.wav' nicht möglich: No such file or directory
```
The command runs successfully when feeding an existing wav-file:
```
[opencast@oc-worker2 ~]$ whispercpp -ovtt -bs 5 --model /usr/share/ggml/ggml-base.bin \
--output-file /data/opencast/staging/workspace/collection/subtitles/tmp_522540_e9494333-cf33-498d-af52-1a9932e9cf2a.vtt \
-l de \
-f /data/opencast/staging/archive/u_stream/9f24e806-6716-4606-b8ce-4e274e13ad16/3/3a834f68-28d7-4c3b-815a-aa65d9cf0603.wav
whisper_init_from_file_no_state: loading model from '/usr/share/ggml/ggml-base.bin'
whisper_model_load: loading model
[...]
output_vtt: saving output to '/data/opencast/staging/workspace/collection/subtitles/tmp_522540_e9494333-cf33-498d-af52-1a9932e9cf2a.vtt.vtt'
```
Therefore I guess it's not an error in the whispercpp-engine itself but a misconfiguration of the workflow?
Any suggestions?
Best regards, Martin