CCExtractor version: Compiled myself from fa85a527 (I double-checked several times); I donʼt know why it thinks itʼs on d379d726 ``` CCExtractor 0.94, Carlos Fernandez Sanz, Volker Quetschke.
CCExtractor detailed version info Version: 0.94 Git commit: d379d72685959859db797621f270aeeb01a50021 Compilation date: 2023-03-26 CEA-708 decoder: Rust File SHA256: f7edb9796bf45c48bf3fe80db340293854e394f4ed0960f0f730d2ab5eec9028 Libraries used by CCExtractor libGPAC Version: 1.0.1 zlib: 1.2.11 utf8proc Version: 2.4.0 protobuf-c Version: 1.3.1 libpng Version: 1.6.37 FreeType libhash nuklear libzvbi ```
{replace with the arguments}
[Same test input #1516; no need to re-upload] Current output after #1518: test.vtt.gz
Expected output: there should be space between the ♪
and the </i>
; see this line of the g608:
♪ ^@99999999999999000999999999999999RRRRRRRRRRRRRRIIIRRRRRRRRRRRRRRR
The SRT line is has the space before the </i>
; the WebVTT-full one has it after. This isnʼt a big deal in this sample but the same thing would happen in a visible way in most circumstances. For example, if the line were supposed to be <i> ♪♪ [epic music] ♪♪ </i>
, then it would instead be <i> ♪♪ [epic music</i>] ♪♪
.
This is a follow-up for #1516. Its fix, #1518, does prevent splitting characters, but the styling will still always get out of sync with the text if there are any multibyte characters. This is because it still uses j
as both a bytes index and a screen index; I think a more comprehensive fix would be to use j
as only a screen index, like the SRT decoder does, and decode each symbol separately.
I have a fix Iʼm planning to upload shortly, although probably a better fix is possible and I wonʼt be disappointed if someone comes along and refactors the entire loop or function away.