Divide subtitles more evenly

93 views
Skip to first unread message

Otto Munters

unread,
Feb 16, 2025, 6:37:16 AM2/16/25
to BBEdit Talk
Is there a regex to divide the last two lines of each subtitle more evenly in the following example, so that both sentences are about the same length, with preference given to the longest sentence on the 4th line.
Example:
351
00:18:23,120 --> 00:18:29,600
not that likes and dislikes are 
your enemies

352
00:18:29,600 --> 00:18:31,960
because they end up serving
the society.

Thanks for your kind help! 
Otto
Message has been deleted
Message has been deleted

GP

unread,
Feb 17, 2025, 5:06:15 PM2/17/25
to BBEdit Talk
Regular expressions aren't well suited to handle things like checking line lengths and moving line contents based upon differences in those lengths.

A better method is to use something like a text filter using a scripting language that can check for things like text lengths and make text string changes based upon runtime evaluations.

Below is a perl script text filter which will take as input a selection or whole file of SRT formatted text. It will find any and all SRT sequence entries with two lines of dialog text and reformat/reword wrap the lines of text to a more equal line length leaving the second line longer if necessary for proper word wrapping.

I've named it reformat_subtitle_text.pl and saved it in BBEdit's Text Filters folder so it will be listed in BBEdit's Text Filters pallet. If desired you can also set a keyboard shortcut for it.

You'll probably want to enhance the reformatting logic in the fixup_dialog subroutine to handle cases where simple two line word wrap reformatting produces awkward results. For example, what appears to be two person dialog text like:

- Shall I get you something, Micke?
- No, I don't have time.

or

- Whose turn is it today?
- Malin's, isn't it?

with your simple word wrapping rule gets reformatted as:

- Shall I get you something,
Micke? - No, I don't have time.

- Whose turn is it
today? - Malin's, isn't it?

In the SRT formatting rules I found, "-" has no defined markup rule so perhaps it is just an informal convention so people are using to indicate multiple people speaking.

SRT formatting rules also allow simple markup annotations (e,g., bold - <b> </b>) which will change the lengths of displayed text from the lengths of a subtitle entry's raw dialog text. This script doesn't try to deal with that complicating issue.

reformat_subtitle_text.pl:

#!/usr/bin/env perl

use strict;
use Text::Wrap;
use POSIX qw/ceil/;

my $subtitles = '';

# regex to dissect one subtitle entry 1) sequence number and time range, 2) first dialog text line,
# and 3) second dialog text line
my $seq_item_re = qr/(\d+\n\d{2}:\d{2}:\d{2},\d{3} --> \d{2}:\d{2}:\d{2},\d{3}\n)(.+\n)(.+\n)/;

# read in all the input subtitle text
$subtitles = do { local $/; <STDIN> };

# extract each and all subtitle entries with two lines of dialog text
# and replace them with reformatted version
$subtitles =~ s/$seq_item_re/$1 . fixup_dialog($2, $3)/mge;

#output the reformatted subtitles
print $subtitles;

# reformat two lines of dialog text to have more equal line lengths with line two the longer if
# necessary for proper word wrapping

sub fixup_dialog {
    my ($line1, $line2) = @_;
   
#   trim trailing white space
    $line1 =~ s/\s+$//;
    $line2 =~ s/\s+$//;
   
#   ideal column width for two lines of characters without word wrapping
#   and with word wrapping will leave second line the longer of the two lines
    my $ideal_col_width = ceil((length($line1) + length($line2))/2) + 1;
    my $total_text = $line1 . " " . $line2 . "\n";
   
#   locally set wrapping parameters to not expand tabs and column width constraint
    local($Text::Wrap::unexpand) = 0;
    local($Text::Wrap::columns) = $ideal_col_width;
    my $wrapped_text = wrap('', '', $total_text);
   
#   if word wrapping creates third line move it to end of second line
    if ( $wrapped_text =~ m/(.+\n.+)\n(.+\n)/){
        $wrapped_text = $1 . $2;
    }
    return $wrapped_text;
}


GP

unread,
Feb 17, 2025, 5:22:36 PM2/17/25
to BBEdit Talk
Oops!
Forgot to concatenate a space character in fixing up third line word wrapping. In the fixup_dialog subroutine, change the line:
$wrapped_text = $1 . $2;
to:
$wrapped_text = $1 . " " . $2;

Otto Munters

unread,
Feb 19, 2025, 8:03:31 AM2/19/25
to BBEdit Talk
Thanks you for your help Mark and GP.

I succeeded by making a textfactory, which repeats a series of greps.
search: (?=^.{82}$)(.{29,42}\b)(.*)
replace: \1\n\2
the first grep is repeated with: (?=^.{81}$)(.{29,42}\b)(.*)  until (?=^.{43}$)(.{10,24}\b)(.*)
so that all lines of length 43 to 82 are converted into two lines of approximately equal size.

Best regards, Otto 
Op maandag 17 februari 2025 om 23:22:36 UTC+1 schreef GP:
Reply all
Reply to author
Forward
0 new messages