I got excited about this and couldn't help myself. I put together a pretty-much-complete script for turning srt files into QLab text cues, which detects text encoding and decodes appropriately, and doesn't care about how lines are ended. I've tested with umlauts and Vietnamese, and it seems to work quite well with ASCII, utf-8, and Unicode-16 encoding. I've tested with all of Mic's variation test files posted previously, and they work as expected.
This script assumes QLab's front workspace has three template cues for subtitles, allowing you to set up separate placeholder formatting for cues with 1, 2, and 3 or more lines of text. I've attached an example workspace. The difference between templates is exaggerated to illustrate the concept.
The attached zip also has a test srt file that demonstrates things like "ä" working, dropping position information from time data, and discarding text markup. It's an ugly edit of Mic's CR.srt test file.
While there's no official srt specification that I can find, there's a fair bit of
commentary in
documentation from various software packages. From that and
wikipedia and
everywhere else I could find to look, I've learned it's possible to format text inline as bold, italic, underlined, and to set a font color, as well as for each line to set its position on screen. It might be nice someday to implement as much of that as is possible, but for now the script just detects those things if they're present, and if so discards them. This markup can be in <...> form or {...} form. Bold, italic, and underline are handled separately from font color, and line position is separate from it all. This allows potential to make use of these things rather than just discarding them, if some enterprising soul has a need to.
SRT also makes it possible to set the position of the bounding box for a complete frame of text on the time stamp line, but since that doesn't make sense in the QLab paradigm, it's being detected and discarded.
If a line of text is long enough to wrap based on the template's formatting, the next template up will be used instead, unless it's already using the 3+ line template. This may result in smaller text that does not wrap, but I figured that's preferable to large text that wraps messily. If it's wrapping badly, and is more than three lines of text, I'd suggest the solution is in the srt file, not the QLab cue.
Any blank lines are simply ignored, as we use frame numbers to detect frames. Similarly, frames without text are discarded.
I feel clever about how this detects frames of text. We attempt to coerce each line to an integer. If it's a frame number, that succeeds. If it's an empty line, it gives us a 0, so we can use that to make a list of empty lines to ignore later. If it's a line of time information or a line of text, it will fail and give us an error number. Look for no error and not 0, it's a frame number. Then we can look for " --> ", and assume if that's there then we have time stamps. I can imagine this test failing if a subtitle includes " --> ", or if a line of text is nothing but a number. Those cases seem rare enough (and in the case of the former extremely unlikely) that I feel comfortable using this test.
Thanks to Rich for sending me to "file -I" in a "do shell script". The capital I argument tells us the mime-type, so as long as the "file" operator doesn't make a bad guess (detecting character encoding can be haaard), that's all we need in order to decode correctly. I've only tested on macOS 10.14.6, so it's possible a different shell, or an older/newer OS version may not have the "file -I" tool available.
Without further ado, the script:
(*
- works with template cues to configure 1, 2, and 3+ line subtitles
- detects Unicode 8 and 16, and decodes correctly
- if present, detects and discards position information on the time stamp line (this doesn't really make sense in the QLab paradigm)
- if present, detects and (currently) discards bold, italic, underline, font color, and line position markup in text
- if text will wrap, uses the next template up (unless it's already the 3+ template) (as such, the templates should specify their width (title safe is -20% of total width) explicitly)
- discards blank lines and empty frames
- numbers cues with prefix and frame number
- groups cues in timeline group, with pre-wait and durations
*)
-- set the following variables to fit your tastes:
set subtitlePrefix to "ST" -- prefix for subtitle cue numbers
set subtitleName to "Subtitles" -- Cue name for subtitle group cue
set oneLineTemplate to "ST.1L" -- Cue number for 1 line template cue
set twoLineTemplate to "ST.2L" -- Cue number for 2 line template cue
set threePlusTemplate to "ST.3L" -- Cue number for 3+ line template cue
set srtFile to choose file with prompt "Please select an SRT file:" of type "srt"
set srtFileInfo to do shell script "file -I " & quoted form of POSIX path of srtFile -- character encoding is found here, so we can read the file appropriately below
set oldTIDs to AppleScript's text item delimiters
set AppleScript's text item delimiters to "charset="
set fileEncoding to last text item of srtFileInfo
set AppleScript's text item delimiters to oldTIDs
if fileEncoding contains "utf-8" then -- Unicode? Which flavor? ASCII?
set srtText to read srtFile as «class utf8»
else if fileEncoding contains "utf-16" then
set srtText to read srtFile as Unicode text
else
set srtText to read srtFile
end if
set srtLines to every paragraph of srtText -- list of all lines in the srt file, as strings
set frameIndexes to {} as list
set blankLines to {} as list
repeat with itemIndex from 1 to (count srtLines)
set errorNumber to 0 as integer
set eachLine to item itemIndex of srtLines
try
set isInteger to eachLine as integer -- If it's a blank line, we get 0. If it's a frame number, we get a successful coercion to integer. If it's a timestamp or line of text, it'll give an error number.
on error number errNum
set errorNumber to errNum
end try
if errorNumber is 0 and isInteger is not 0 then -- It's a frame number
set end of frameIndexes to itemIndex
end if
if isInteger is 0 then
set end of blankLines to itemIndex -- It's an empty line, so we can ignore it later
end if
end repeat
set end of frameIndexes to (count srtLines) -- This gives us a last index as we work through the list
tell application id "com.figure53.qlab.4" to tell front workspace -- Here we're just making the group that will hold the text cues.
make type "Group"
set titlesGroup to last item of (selected as list)
set mode of titlesGroup to timeline
set q number of titlesGroup to subtitlePrefix
set q name of titlesGroup to subtitleName
end tell
set textLines to {} as list
repeat with eachIndex from 1 to ((count frameIndexes) - 1)
set thisFrameIndex to item eachIndex of frameIndexes
set nextFrameIndex to item (eachIndex + 1) of frameIndexes
set thisFrameNumber to item thisFrameIndex of srtLines as string
repeat with eachLine from (thisFrameIndex + 1) to (nextFrameIndex - 1)
if blankLines does not contain eachLine then
set eachString to item eachLine of srtLines
if eachString contains " --> " then -- then it's a time stamp line
set oldTIDs to AppleScript's text item delimiters
set AppleScript's text item delimiters to " --> "
set startTime to text item 1 of eachString
set endTime to text item 2 of eachString
set AppleScript's text item delimiters to {space, tab}
if (count text items of endTime) is greater than 1 then -- Some srt files may include position information on the time stamp line. Here we discard it, because it doesn't really make sense in QLab.
set endTime to text item 1 of endTime
end if
set AppleScript's text item delimiters to oldTIDs
else -- it must be a line of text
set end of textLines to eachString
end if
end if
end repeat
-- Split out hours, minutes, seconds, milliseconds
set oldTIDs to AppleScript's text item delimiters
set AppleScript's text item delimiters to ":"
set startHours to text item 1 of startTime as integer
set startMinutes to text item 2 of startTime as integer
set startSeconds to text item 3 of startTime
set endHours to text item 1 of endTime as integer
set endMinutes to text item 2 of endTime as integer
set endSeconds to text item 3 of endTime
set startMilliseconds to text item 2 of startSeconds as integer
set startSeconds to text item 1 of startSeconds as integer
set endMilliseconds to text item 2 of endSeconds as integer
set endSeconds to text item 1 of endSeconds as integer
set AppleScript's text item delimiters to oldTIDs
-- Calculate duration
set preWait to ((((startHours * 60) + startMinutes) * 60) + startSeconds) + (1.0E-3 * startMilliseconds)
set outTime to ((((endHours * 60) + endMinutes) * 60) + endSeconds) + (1.0E-3 * endMilliseconds)
set frameDuration to (outTime - preWait)
if (count textLines) is greater than 0 then -- ignore frames with no text
repeat with eachItem from 1 to (count textLines) -- srt files are allowed to specify formatting for text inline, for bold, italic, underline, and font color, as well as line position.
set eachLine to item eachItem of textLines
if (eachLine contains "<" and eachLine contains ">") or (eachLine contains "{" and eachLine contains "}") then -- find markup
set oldTIDs to AppleScript's text item delimiters
set AppleScript's text item delimiters to {"<b>", "</b>", "{b}", "{/b}", "<i>", "</i>", "{i}", "{/i}", "<u>", "</u>", "{u}", "{/u}"} -- bold, italic, underline
set eachLineItems to text items of eachLine
set AppleScript's text item delimiters to "" -- remove them
set eachLine to eachLineItems as string
set AppleScript's text item delimiters to {"<", ">"} -- detecting additional HTML (font color)
set eachLineFontItems to text items of eachLine
set eachLineNoFonts to {} as list
repeat with eachFontItem in eachLineFontItems -- eliminate font color markup
set eachFontItem to eachFontItem as string
if eachFontItem does not contain "font color" and eachFontItem does not contain "/font" then
set end of eachLineNoFonts to eachFontItem
end if
end repeat
set AppleScript's text item delimiters to "" -- remove font items
set eachLine to eachLineNoFonts as string
set AppleScript's text item delimiters to {"{", "}"} -- detecting line position markup e.g.: {\a3} or {\an3}
set eachLinePositionItems to text items of eachLine
set eachLineNoPosition to {} as list
repeat with eachPositionItem in eachLinePositionItems -- eliminate line position markeup
set eachPositionItem to eachPositionItem as string
if length of eachPositionItem is greater than 0 then
if character 1 of eachPositionItem is not "\\" then
set end of eachLineNoPosition to eachPositionItem
end if
end if
end repeat
set AppleScript's text item delimiters to "" -- remote line position items
set eachLine to eachLineNoPosition as string
set AppleScript's text item delimiters to oldTIDs
end if
set item eachItem of textLines to eachLine -- replace text with cleaned up version
end repeat
tell application id "com.figure53.qlab.4" to tell front workspace
make type "Text" -- our subtitle cue
set newTitle to last item of (selected as list)
set newTitleId to uniqueID of newTitle
set q number of newTitle to subtitlePrefix & "." & thisFrameNumber
set pre wait of newTitle to preWait
set duration of newTitle to frameDuration
move cue id newTitleId of parent of newTitle to end of titlesGroup
if (count textLines) is greater than 1 then -- format multi-lines different from single lines
set oldTIDs to AppleScript's text item delimiters
set AppleScript's text item delimiters to return
set newText to textLines as string
set AppleScript's text item delimiters to oldTIDs
if (count textLines) is greater than 2 then -- format 3+ lines even more differently
set templateFormat to text format of cue threePlusTemplate
set geometryTemplate to cue threePlusTemplate
set templateWidth to fixed width of cue threePlusTemplate
else -- here's 2-line frames
set templateFormat to text format of cue twoLineTemplate
set geometryTemplate to cue twoLineTemplate
set templateWidth to fixed width of cue twoLineTemplate
end if
else -- use the single line template
set newText to last item of textLines
set templateFormat to text format of cue oneLineTemplate
set geometryTemplate to cue oneLineTemplate
set templateWidth to fixed width of cue oneLineTemplate
end if
if templateWidth is missing value then set templateWidth to 0 -- handle if there's no explicit width for the template in use
set text of newTitle to newText
-- format to match the template
set rangeLength of range of item 1 of templateFormat to (count characters of newText) -- make the template format fit the new text
set text format of newTitle to templateFormat
set text alignment of newTitle to text alignment of geometryTemplate
-- This checks if lines will wrap with the new fixed width. If so, it steps to the next template up (unless it's already 3+ lines).
set outputSize to text output size of newTitle
if templateWidth is greater than 0 and item 1 of outputSize is greater than templateWidth then
if (count textLines) is greater than 1 then
set templateFormat to text format of cue threePlusTemplate
set rangeLength of range of item 1 of templateFormat to (count characters of newText)
set text format of newTitle to templateFormat
set geometryTemplate to cue threePlusTemplate
set templateWidth to fixed width of cue threePlusTemplate
else
set templateFormat to text format of cue twoLineTemplate
set rangeLength of range of item 1 of templateFormat to (count characters of newText)
set text format of newTitle to templateFormat
set geometryTemplate to cue twoLineTemplate
set templateWidth to fixed width of cue twoLineTemplate
end if
end if
set fixed width of newTitle to templateWidth
-- geometry from the template
set full surface of newTitle to full surface of geometryTemplate
set preserve aspect ratio of newTitle to preserve aspect ratio of geometryTemplate
set scale x of newTitle to scale x of geometryTemplate
set scale y of newTitle to scale y of geometryTemplate
set translation x of newTitle to translation x of geometryTemplate
set translation y of newTitle to translation y of geometryTemplate
set opacity of newTitle to opacity of geometryTemplate
set layer of newTitle to layer of geometryTemplate
-- clear the buffer and start again
set textLines to {} as list
end tell
end if
end repeat