How to divide subtitles (srt) in two lines

906 views
Skip to first unread message

Otto Munters

unread,
Jun 3, 2023, 10:46:44 AM6/3/23
to BBEdit Talk
An automatically translated subtitle in YouTube turns two lines in the original language into one long sentence. How can I turn this back into two sentences, the top sentence slightly shorter than the sentence below it. Is there a grep for that? Or how do you do that with text factory?

Example output srt file YouTube:
1
00:00:33,360 --> 00:00:45,060
Welkom. Leuk om jullie allemaal te zien. We moeten de tijd nemen om

2
00:00:46,560 --> 00:00:52,380
de regels in te voeren. Net zoals jij ashram-regels hebt, hebben wij regels voor zelfonderzoek.

It should look like this:
1
00:00:33,360 --> 00:00:45,060
Welkom. Leuk om jullie allemaal 
te zien. We moeten de tijd nemen om

2
00:00:46,560 --> 00:00:52,380
de regels in te voeren. Net zoals jij ashram-
regels hebt, hebben wij regels voor zelfonderzoek.

Thanks a lot for your help!

Otto Munters

unread,
Jun 5, 2023, 9:32:01 AM6/5/23
to BBEdit Talk
In Find and Replace I can do this search:

^\n?.{50}

But I can't figure out how to replace the same words followed by a line break.
Can anybody help me?

Op zaterdag 3 juni 2023 om 16:46:44 UTC+2 schreef Otto Munters:

Kaveh Bazargan

unread,
Jun 5, 2023, 9:49:48 AM6/5/23
to bbe...@googlegroups.com
put what you want to keep in brackets, so e.g. 
^\n?(.{50})

Then \1 will remember and replace that.

This site might help. Try putting in:

start of line followed by at least 50 characters and a space

--
This is the BBEdit Talk public discussion group. If you have a feature request or need technical support, please email "sup...@barebones.com" rather than posting here. Follow @bbedit on Twitter: <https://twitter.com/bbedit>
---
You received this message because you are subscribed to the Google Groups "BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bbedit+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bbedit/adbe2498-9b89-4afd-b284-b30ee7debb71n%40googlegroups.com.


--
Kaveh Bazargan PhD
Director
Accelerating the Communication of Research
  https://rivervalley.io/gigabyte-wins-the-alpsp-scholarly-publishing-innovation-award-using-river-valleys-publishing-technology/

Bruce Van Allen

unread,
Jun 5, 2023, 9:50:27 AM6/5/23
to bbe...@googlegroups.com
Surrounding a portion of your match expression with parentheses “captures” the part of the original text matched by that part of the match expression.

Captured bits of text may then be used in the replacement expression, denoted by \1, \2, etc, for each successive set of parens, counting from the left.

So if your whole match expression is as per your question, but enclosed by parens:

(^\n?.{50})

then \1 contains a possible newline and 50 characters following it. If you put the parens only around .{50}, it would only capture the fifty characters.

So if you put parens as I showed above, then you could use:

find: (^\n?.{50})
replace: \1\n

The replacement here is the captured text plus a newline.

(One wrinkle to watch for is the the dot ‘.’ in that expression matches any character EXCEPT newlines.

The BBEdit manual, available under the Help menu as a quick PDF download, has a great chapter on this: Searching with Grep.

— Bruce

_bruce__van_allen__santa_cruz_ca_

Kevin Shay

unread,
Jun 5, 2023, 9:54:48 AM6/5/23
to bbe...@googlegroups.com
Maybe I'm not understanding what you're trying to do, but it seems like you could just use the Hard Wrap feature (Text > Hard Wrap...)? Set it to 50 characters and turn "Paragraph fill" off.

Using a regex will break lines within words, which probably isn't what you want.

Kaveh Bazargan

unread,
Jun 5, 2023, 10:01:58 AM6/5/23
to bbe...@googlegroups.com
Nice simple solution, Kevin!!

Otto Munters

unread,
Jun 5, 2023, 10:29:46 AM6/5/23
to BBEdit Talk
I have tried the Hard Wrap feature. It works well if all the sentences are the same length. But unfortunately that is not the case, the sentences vary in length between about 30 and 200 words. 
Therefore, I am looking for a way to do a line break after a certain amount of characters. I can then repeat this several times with different number of characters, from many to fewer.

Op maandag 5 juni 2023 om 16:01:58 UTC+2 schreef Kaveh Bazargan:

Bruce Van Allen

unread,
Jun 5, 2023, 10:37:34 AM6/5/23
to bbe...@googlegroups.com
Still not clear to me what you’re trying to do. Do you mean that you want to break the lines into 50 characters each, regardless of word length?

> I can then repeat this several times with different number of characters, from many to fewer.

Where are those different numbers coming from? I.e., manually choosing them as you progress through your text? Based on word boundaries?

Could you show a before and after sample or two?

— Bruce

_bruce__van_allen__santa_cruz_ca_
> To view this discussion on the web visit https://groups.google.com/d/msgid/bbedit/08a6dfeb-fa01-433f-8dcc-f288cf93c4fdn%40googlegroups.com.

Otto Munters

unread,
Jun 5, 2023, 10:43:21 AM6/5/23
to BBEdit Talk
Example output srt file YouTube:
1
00:00:33,360 --> 00:00:45,060
Welkom. Leuk om jullie allemaal te zien. We moeten de tijd nemen om

2
00:00:46,560 --> 00:00:52,380
de regels in te voeren. Net zoals jij ashram-regels hebt, hebben wij regels voor zelfonderzoek.

It should look like this:
1
00:00:33,360 --> 00:00:45,060
Welkom. Leuk om jullie allemaal 
te zien. We moeten de tijd nemen om

2
00:00:46,560 --> 00:00:52,380
de regels in te voeren. Net zoals jij ashram-
regels hebt, hebben wij regels voor zelfonderzoek.

Op maandag 5 juni 2023 om 16:37:34 UTC+2 schreef Bruce Van Allen:

Otto Munters

unread,
Jun 5, 2023, 12:11:19 PM6/5/23
to BBEdit Talk
What if I make it a conditional Grep?
Starting with the longest lines and dividing it into two lines. Then in increments the shorter sentences.

I'm trying to make a Find and Replace right now, the finding succeeds, but replace does not (with \1\n
Willen jullie nog eens naar deze formule kijken? 
 (?=^.{130})(^\n?.{63})  

Thanks a lot!

Op maandag 5 juni 2023 om 16:43:21 UTC+2 schreef Otto Munters:

Otto Munters

unread,
Jun 6, 2023, 3:14:57 AM6/6/23
to BBEdit Talk
This formula does its job: 
Find: (?=(^.{120,130}))(^\n?.{61})
Replace: \2\n

I then repeat this formula with different values, from high to low. 
(?=(^.{120,130}))(^\n?.{61})
(?=(^.{110,120}))(^\n?.{55})
(?=(^.{100,110}))(^\n?.{50}
)(?=(^.{90,100}))(^\n?.{45})
(?=(^.{80,90}))(^\n?.{40})
(?=(^.{70,80}))(^\n?.{35})

Since there are many different files involved, it would be nice if this search and replace command could be done for the whole series at once. Is that even possible?


Op maandag 5 juni 2023 om 18:11:19 UTC+2 schreef Otto Munters:

Rich Siegel

unread,
Jun 6, 2023, 9:46:00 AM6/6/23
to BBEdit Talk
On 6 Jun 2023, at 3:14, Otto Munters wrote:

> This formula does its job:
> Find: (?=(^.{120,130}))(^\n?.{61})
> Replace: \2\n
>
> I then repeat this formula with different values, from high to low.
> (?=(^.{120,130}))(^\n?.{61})
> (?=(^.{110,120}))(^\n?.{55})
> (?=(^.{100,110}))(^\n?.{50}
> )(?=(^.{90,100}))(^\n?.{45})
> (?=(^.{80,90}))(^\n?.{40})
> (?=(^.{70,80}))(^\n?.{35})
>
> Since there are many different files involved, it would be nice if this
> search and replace command could be done for the whole series at once. Is
> that even possible?

This is exactly the use case for a Text Factory. :-)

R.

Otto Munters

unread,
Jun 6, 2023, 11:24:41 AM6/6/23
to BBEdit Talk
Thank all for your kind help!
It was not difficult to make a textfactory doing this job. 

I did have to enter the formula above many times in the textfactory with different values to get the desired result. 
The first part of the formula remained the same as in the series above, the second part I had to repeat much more often. Each line with 10 different values in the second part. 
That way the lines of different lengths are neatly divided into two lines.

OM

Op dinsdag 6 juni 2023 om 15:46:00 UTC+2 schreef Rich Siegel:
Reply all
Reply to author
Forward
0 new messages