RegEx pattern find something but replace on following

51 views
Skip to first unread message

archaeal

unread,
Apr 12, 2020, 11:48:59 AM4/12/20
to BBEdit Talk
Hello,
I would like to detect the lines starting with >.+ and replace all U with T in the following line, but not in the line starting with >
Example:

>NeiUe166        1551 bp          rna
AGAGAUUGAACAUAAGAGUUUGAUCCUGGCUCAGAUUGAACGCUGGCGGCAUGCUUU
>Unc31652        1491 bp          rna
AGGGUUUGAUCAUGGCUCAGGACGAACGCUGGCGGUGCGCCUUAUGCAUGCAAGUCG
>Unc31653        1469 bp          rna
AGGGUUUGAUCAUGGCUCAGAACGAACGCUGGCGGCAUGCUUCAGACAUGCAAGUCG

should look like:
>NeiUe166        1551 bp          rna
AGAGATTGAACATAAGAGTTTGATCCTGGCTCAGATTGAACGCTGGCGGCATGCTTT
>Unc31652        1491 bp          rna
AGGGTTTGATCATGGCTCAGGACGAACGCTGGCGGTGCGCCTTATGCATGCAAGTCG
>Unc31653        1469 bp          rna
AGGGTTTGATCATGGCTCAGAACGAACGCTGGCGGCATGCTTCAGACATGCAAGTCG

The search pattern should find the >.. line but make changes only in the next line
Another possibility would be to search just in the "second" line for U and replace with T

It would be great if someone has an idea.

Thanks a lot
archaeal


Bruce Van Allen

unread,
Apr 12, 2020, 11:58:21 AM4/12/20
to bbe...@googlegroups.com
Hmm. Not seeing the '>.' anywhere.
--

- Bruce

_bruce__van_allen__santa_cruz__ca_

bruce linde

unread,
Apr 12, 2020, 12:08:15 PM4/12/20
to 'John Love' via BBEdit Talk
this just came up somewhere else for me…. apple mail uses the ApplePlainTextBody class to show the ‘>’ as vertical lines… even in supposed plain text editing mode.

i found a couple of webmail clients that seem to respect that, as well, but using mac browsers… so maybe it’s an overall mac thing.

either way, i THINK he’s talking about the initial ‘>’s at the beginning of quoted lines in plain text.












--
This is the BBEdit Talk public discussion group. If you have a feature request or need technical support, please email "sup...@barebones.com" rather than posting here. Follow @bbedit on Twitter: <https://twitter.com/bbedit>
---
You received this message because you are subscribed to the Google Groups "BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bbedit+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bbedit/r480Ps-10146i-07F3200541164018A573F13298574B5A%40Forest.local.





bruce linde
5 happiness webmaster (four more than the competition!)
http://www.5happy.com/
http://clockhappy.com/
510.530.1331 office
510.206.9730 mobile

(shift key available upon request)








Fletcher Sandbeck

unread,
Apr 12, 2020, 12:26:37 PM4/12/20
to bbe...@googlegroups.com
This seems to do the trick but you have to run Replace All multiple times for it work. The look-behind assertion selection selects only lines which don't start with >. Then we find any sequence of zero or more valid characters followed by a single U and replace it by T. This replaces the first U in each line with T. The lines are 57 characters long so at worst you have to run Replace All 57 times.

Find: ^(?<!>)([ACGT]*)U

Replace: \1T

Hope this helps,

[fletcher]


--
This is the BBEdit Talk public discussion group. If you have a feature request or need technical support, please email "sup...@barebones.com" rather than posting here. Follow @bbedit on Twitter: <https://twitter.com/bbedit>
---
You received this message because you are subscribed to the Google Groups "BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bbedit+un...@googlegroups.com.

Bruce Van Allen

unread,
Apr 12, 2020, 1:17:59 PM4/12/20
to bbe...@googlegroups.com
Sorry for my previous misapprehension of your sample.

Try this:

Find:
U(?=[^a-z])|(U(?=[ACGT]+?))

Replace:
\1T

The first part U(?=[^a-z0-9]) is to eliminate the 'U' in the '>'
line. NOTE: this assumes that those 'U's are always followed by
a lower-case letter or a number.

The second part finds any 'U' followed by one or more of any of
'A, 'C', 'G', or 'T'.

Without that first part, the second part of the match will catch
all but a 'U' at the end of the line.

HTH

On 4/12/20 at 8:01 AM, achim....@gmail.com (archaeal) wrote:

--

- Bruce

_bruce__van_allen__santa_cruz__ca_

Bruce Van Allen

unread,
Apr 12, 2020, 1:32:34 PM4/12/20
to bbe...@googlegroups.com
On 4/12/20 at 10:17 AM, b...@cruzio.com (Bruce Van Allen) wrote:
>Try this:

ADDED: I meant to say you can do it all in one "Replace All" step.

>Find:
>U(?=[^a-z])|(U(?=[ACGT]+?))
>
>Replace:
>\1T
>
>The first part U(?=[^a-z0-9]) is to eliminate the 'U' in the
>'>' line. NOTE: this assumes that those 'U's are always
>followed by a lower-case letter or a number.
>
>The second part finds any 'U' followed by one or more of any of
>'A, 'C', 'G', or 'T'.
>
>Without that first part, the second part of the match will
>catch all but a 'U' at the end of the line.
>
>HTH
--

- Bruce

_bruce__van_allen__santa_cruz__ca_

archaeal

unread,
Apr 13, 2020, 8:54:10 AM4/13/20
to BBEdit Talk
Hello,
I just changed slightly the pattern:
Find: U(?=[^a-z0-9\r])|U(?=[ACGTU]+?)
in the first part I added \r to avoid that a "u" is detected at the end of the line with ">". In the second part I added a U in [ACGTU] to assure that the Us in double Us (ex UUU) are detected. In addition I eliminated the second back reference in the second part, to avoid that the replacement Ts are added.
Now everything works fine

So the final pattern is:
Find: U(?=[^a-z0-9\r])|U(?=[ACGTU]+?)
Replace: \1T

Thanks a lot 
Achim

archaeal

unread,
Apr 13, 2020, 8:54:10 AM4/13/20
to BBEdit Talk
Hello,
this works perfectly.
Thanks a lot

Achim

Le dimanche 12 avril 2020 19:32:34 UTC+2, Bruce Van Allen a écrit :
Reply all
Reply to author
Forward
0 new messages