Grep pattern needed, part 2

99 views
Skip to first unread message

Otto Munters

unread,
Apr 6, 2025, 10:41:57 AMApr 6
to BBEdit Talk
Example of a subtitle from which I want to move the single word in the second timestamp to the end of the subtitle in the first timestamp.
Is it possible to do that with a regex in BBedit? With Python or Applescript, I can't work.

50
00:04:35,000 --> 00:04:42,840
However, what can happen is this can come at the cost of disregarding the larger

51
00:04:42,840 --> 00:04:43,000
whole.


Thank you for helping me!
Otto

Kaveh Bazargan

unread,
Apr 6, 2025, 10:54:38 AMApr 6
to bbe...@googlegroups.com

  • do these always come in pairs?
  • is it only a single word that is left?
  • how do we know a text is a left-over?
Hard to give a general solution unless i am missing something. So pls describe the problem in detail and give before and after example if possible so problem is clear.

--
This is the BBEdit Talk public discussion group. If you have a feature request or believe that the application isn't working correctly, please email "sup...@barebones.com" rather than posting here. Follow @bbedit on Mastodon: <https://mastodon.social/@bbedit>
---
You received this message because you are subscribed to the Google Groups "BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bbedit+un...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/bbedit/2a6c9959-19e8-4aad-bad3-4b0c7973aa5an%40googlegroups.com.


--
Kaveh Bazargan PhD
Director
Accelerating the Communication of Research
  https://rivervalley.io/gigabyte-wins-the-alpsp-scholarly-publishing-innovation-award-using-river-valleys-publishing-technology/

GP

unread,
Apr 6, 2025, 5:36:06 PMApr 6
to BBEdit Talk
It is going to depend on exactly the patterns you want to match and the format of the changed result. Assuming the first subtitle block's subtitle text is a single line of text and the following subtitle block's subtitle text is a single word of all-lower-case characters (assuming that's what you want from your previous "Grep pattern needed" posting) followed by optional period punctuation mark, the following grep pattern will find that pattern and capture group parts for reassembly into the desired result:

^(\d+\s+\d{2}:\d{2}:\d{2},\d{3} --> )(\d{2}:\d{2}:\d{2},\d{3}\s+)(.+)(\s+\d+\s+\d{2}:\d{2}:\d{2},\d{3} --> )(\d{2}:\d{2}:\d{2},\d{3}\s+)((?-i)^[a-z]+[.]?[\W]*)$

The question is then what to you want the result to look like?

1) Just move the word from subtitle block 51 to the end of subtitle block 50's subtitle text and leave everything else the same?

The replace pattern:
\1\2\3 \6\4\5

will produce as a result:

50

00:04:35,000 --> 00:04:42,840

However, what can happen is this can come at the cost of disregarding the larger whole.


51

00:04:42,840 --> 00:04:43,000


2) Change subtitle block 50's timecode end time to subtitle block 51's timecode end time along with moving subtitle block 51's single word subtitle text and completely remove what remains of the subtitle block 51 entry?

The replace pattern:
\1\2\3 \6

will produce as a result:

50

00:04:35,000 --> 00:04:42,840

However, what can happen is this can come at the cost of disregarding the larger whole.


3) If you chose 2), the subtitle sequence numbers for the remaining subtitle blocks will no longer be sequential. Fixing the remaining subtitle blocks' subtitle sequence numbers so they are sequential can't be done using just grep find and replace patterns. You're going to need something beyond grep like a script to accomplish that and you might as well put all the finding and replacing in that script solution also to keep all the manipulation mechanics in one tidy package.

GP

unread,
Apr 6, 2025, 5:48:26 PMApr 6
to BBEdit Talk
Whoops! For 2) I got the replace pattern and result wrong. It should be:
The replace pattern:
\1\5\3 \6

will produce as a result:
50
00:04:35,000 --> 00:04:43,000

However, what can happen is this can come at the cost of disregarding the larger whole.

Sorry about that flub. The things you catch right after you hit post ...

Roland Küffner

unread,
Apr 7, 2025, 7:12:23 PMApr 7
to bbe...@googlegroups.com
A similar but slightly shorter version:

search:
(^\d\d\n\d.+\n.+)\n\n\d\d\n\d.+\n([a-z]+[\W]*)\n

replace:
\1 \2\n

- turn "Case sensitive" on
- currently works only if the subtitle text is always on one line
- currently does not account for differing amounts of empty lines between the text blocks
- the new end of the time stamp is deleted - if you can live with that …

… if not, you could extend the pattern to replace that too
search
(^\d\d\n\d.+ )(.+)\n(.+)\n\n\d\d\n\d.+ (.+)\n([a-z]+[\W]*)\n

replace
\1\4\n\3 \5\n

Regards, Roland

--

GP

unread,
Apr 7, 2025, 10:35:50 PMApr 7
to BBEdit Talk
In order to work with all possible subtitle sequence numbers you'll need to use \d+ instead of \d\d in your grep patterns so your two search patterns should be:

(^\d+\n\d.+\n.+)\n\n\d+\n\d.+\n([a-z]+[\W]*)\n

and

(^\d+\n\d.+ )(.+)\n(.+)\n\n\d+\n\d.+ (.+)\n([a-z]+[\W]*)\n

Otto Munters

unread,
Apr 8, 2025, 4:56:30 AMApr 8
to BBEdit Talk
Thanks a lot for all help. This problem is solved, I've found the working pattern!
Otto
Op dinsdag 8 april 2025 om 04:35:50 UTC+2 schreef GP:

Otto Munters

unread,
May 23, 2025, 2:52:56 AMMay 23
to BBEdit Talk
I do this with a text factory.
First step.
Grep: ^(\d{2}:\d{2}:\d{2},\d{3}) --> (\d{2}:\d{2}:\d{2},\d{3})\n([^\n]*[^.\n])\n([^\n]+(?:\n[^\n]+)*)\n+(\d+)\n(\d{2}:\d{2}:\d{2},\d{3}) --> (\d{2}:\d{2}:\d{2},\d{3})\n((?-i)[^A-Z][^\n]{0,15})\n
Replace: \1 --> \6\n\3\n\n\4\n\5 --> \6\n\7\n
Description:Replace the second part of the top timestamp with the second part of the bottom timestamp, but only if below the second timestamp there is a solitary word without a capital letter, or a sentence containing a maximum of 15 characters

Second step:
Grep: ^([^\n]+)(?<!\.)\n[^\n]*\n[^\n]*\n[^\n]*\n((?-i)[^A-Z][^\n]{0,15})\n\n
Replace: \1 \2\n\n
Description:Move single word without capital letter incl. punctuation mark to the end of the 4th line up, if this line does not end with a full stop. Does the same with multiple words that are up to 15 characters together.

Works fine with subtitles in srt format.

Op dinsdag 8 april 2025 om 10:56:30 UTC+2 schreef Otto Munters:
Reply all
Reply to author
Forward
0 new messages