RegEx pattern find something but replace on following

archaeal

unread,

Apr 12, 2020, 11:48:59 AM4/12/20

to BBEdit Talk

Hello,

I would like to detect the lines starting with >.+ and replace all U with T in the following line, but not in the line starting with >

Example:

>NeiUe166 1551 bp rna

AGAGAUUGAACAUAAGAGUUUGAUCCUGGCUCAGAUUGAACGCUGGCGGCAUGCUUU

>Unc31652 1491 bp rna

AGGGUUUGAUCAUGGCUCAGGACGAACGCUGGCGGUGCGCCUUAUGCAUGCAAGUCG

>Unc31653 1469 bp rna

AGGGUUUGAUCAUGGCUCAGAACGAACGCUGGCGGCAUGCUUCAGACAUGCAAGUCG

should look like:

>NeiUe166 1551 bp rna

AGAGATTGAACATAAGAGTTTGATCCTGGCTCAGATTGAACGCTGGCGGCATGCTTT

>Unc31652 1491 bp rna

AGGGTTTGATCATGGCTCAGGACGAACGCTGGCGGTGCGCCTTATGCATGCAAGTCG

>Unc31653 1469 bp rna

AGGGTTTGATCATGGCTCAGAACGAACGCTGGCGGCATGCTTCAGACATGCAAGTCG

The search pattern should find the >.. line but make changes only in the next line

Another possibility would be to search just in the "second" line for U and replace with T

It would be great if someone has an idea.

Thanks a lot

archaeal

Bruce Van Allen

unread,

Apr 12, 2020, 11:58:21 AM4/12/20

to bbe...@googlegroups.com

Hmm. Not seeing the '>.' anywhere.

--

- Bruce

_bruce__van_allen__santa_cruz__ca_

bruce linde

unread,

Apr 12, 2020, 12:08:15 PM4/12/20

to 'John Love' via BBEdit Talk

this just came up somewhere else for me…. apple mail uses the ApplePlainTextBody class to show the ‘>’ as vertical lines… even in supposed plain text editing mode.

i found a couple of webmail clients that seem to respect that, as well, but using mac browsers… so maybe it’s an overall mac thing.

either way, i THINK he’s talking about the initial ‘>’s at the beginning of quoted lines in plain text.

--
This is the BBEdit Talk public discussion group. If you have a feature request or need technical support, please email "sup...@barebones.com" rather than posting here. Follow @bbedit on Twitter: <https://twitter.com/bbedit>
---
You received this message because you are subscribed to the Google Groups "BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bbedit+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bbedit/r480Ps-10146i-07F3200541164018A573F13298574B5A%40Forest.local.

bruce linde
5 happiness webmaster (four more than the competition!)
http://www.5happy.com/
http://clockhappy.com/
510.530.1331 office
510.206.9730 mobile

(shift key available upon request)

Fletcher Sandbeck

unread,

Apr 12, 2020, 12:26:37 PM4/12/20

to bbe...@googlegroups.com

This seems to do the trick but you have to run Replace All multiple times for it work. The look-behind assertion selection selects only lines which don't start with >. Then we find any sequence of zero or more valid characters followed by a single U and replace it by T. This replaces the first U in each line with T. The lines are 57 characters long so at worst you have to run Replace All 57 times.

Find: ^(?<!>)([ACGT]*)U

Replace: \1T

Hope this helps,

[fletcher]

--
This is the BBEdit Talk public discussion group. If you have a feature request or need technical support, please email "sup...@barebones.com" rather than posting here. Follow @bbedit on Twitter: <https://twitter.com/bbedit>
---
You received this message because you are subscribed to the Google Groups "BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bbedit+un...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/bbedit/19e10d29-be79-4461-90f9-4a20d81ac63f%40googlegroups.com.

Bruce Van Allen

unread,

Apr 12, 2020, 1:17:59 PM4/12/20

to bbe...@googlegroups.com

Sorry for my previous misapprehension of your sample.

Try this:

Find:
U(?=[^a-z])|(U(?=[ACGT]+?))

Replace:
\1T

The first part U(?=[^a-z0-9]) is to eliminate the 'U' in the '>'
line. NOTE: this assumes that those 'U's are always followed by
a lower-case letter or a number.

The second part finds any 'U' followed by one or more of any of
'A, 'C', 'G', or 'T'.

Without that first part, the second part of the match will catch
all but a 'U' at the end of the line.

HTH

On 4/12/20 at 8:01 AM, achim....@gmail.com (archaeal) wrote:

--

- Bruce

_bruce__van_allen__santa_cruz__ca_

Bruce Van Allen

unread,

Apr 12, 2020, 1:32:34 PM4/12/20

to bbe...@googlegroups.com

On 4/12/20 at 10:17 AM, b...@cruzio.com (Bruce Van Allen) wrote:
>Try this:

ADDED: I meant to say you can do it all in one "Replace All" step.

>Find:
>U(?=[^a-z])|(U(?=[ACGT]+?))
>
>Replace:
>\1T
>
>The first part U(?=[^a-z0-9]) is to eliminate the 'U' in the
>'>' line. NOTE: this assumes that those 'U's are always
>followed by a lower-case letter or a number.
>
>The second part finds any 'U' followed by one or more of any of
>'A, 'C', 'G', or 'T'.
>
>Without that first part, the second part of the match will
>catch all but a 'U' at the end of the line.
>
>HTH

--

- Bruce

_bruce__van_allen__santa_cruz__ca_

archaeal

unread,

Apr 13, 2020, 8:54:10 AM4/13/20

to BBEdit Talk

Hello,

I just changed slightly the pattern:

Find: U(?=[^a-z0-9\r])|U(?=[ACGTU]+?)

in the first part I added \r to avoid that a "u" is detected at the end of the line with ">". In the second part I added a U in [ACGTU] to assure that the Us in double Us (ex UUU) are detected. In addition I eliminated the second back reference in the second part, to avoid that the replacement Ts are added.

Now everything works fine

So the final pattern is:

Find: U(?=[^a-z0-9\r])|U(?=[ACGTU]+?)

Replace: \1T

Thanks a lot

Achim

archaeal

unread,

Apr 13, 2020, 8:54:10 AM4/13/20

to BBEdit Talk

Hello,

this works perfectly.

Thanks a lot

Achim

Le dimanche 12 avril 2020 19:32:34 UTC+2, Bruce Van Allen a écrit :

Reply all

Reply to author

Forward