Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

regex

13 views
Skip to first unread message

George Bouras

unread,
Apr 7, 2021, 6:12:56 AM4/7/21
to
spaces inside the <...> to _
e.g.
"add of <Number A> and <Number B> = " . ( <Number A> + <Number B>
to
"add of <Number_A> and <Number_B> = " . ( <Number_A> + <Number_B>

George Bouras

unread,
Apr 7, 2021, 6:57:23 AM4/7/21
to
Στις 7/4/2021 1:12 μ.μ., ο/η George Bouras έγραψε:

this loosks ok , any better idea ?



my $var = '"add of <Number A> and <Number B> = " . ( <Number A> +
<Number B> )';
my $tmp = '';

say $var;
$var =~s/<([^>]+)(?{ $tmp=$^N; $tmp=~s|\W+|_|g })>/<$tmp>/g;
say $var;

Ben Bacarisse

unread,
Apr 7, 2021, 7:45:07 AM4/7/21
to
s/(<[^<>]*) (?=[^<>]*>)/\1_/g

The (?= ... ) part is a "zero-width positive look ahead". It consumes
no characters (so to speak) but must match for the space to match.

Perl 5.30 has, experimentally, variable-length, zero-width positive look
behind patterns (it's the variable-length part that is experimental),
currently limited to 255 characters. That permits

s/(?<=<[^<>]{0,254}) (?=[^<>]*>)/_/g

--
Ben.

Rainer Weikusat

unread,
Apr 7, 2021, 11:04:24 AM4/7/21
to
Assuming your original description is what you want

------
my $a = "add of <Number A> and <Number B>, <a sentence in angle brackets>";

$a =~ s/(<[^>]+>)/$1=~y| |_|r/eg;

print $a, "\n";
------

Closer to your code:

-------
my $a = "add of <Number A> and <Number B>, <a sentence in angle brackets>";

$a =~ s/(<[^>]+>)/$1=~s|\s+|_|gr/eg;

print $a, "\n";

Eli the Bearded

unread,
Apr 7, 2021, 12:55:22 PM4/7/21
to
In comp.lang.perl.misc, Ben Bacarisse <ben.u...@bsb.me.uk> wrote:
> George Bouras <f...@example.com> writes:
> > spaces inside the <...> to _
> > e.g.
> > "add of <Number A> and <Number B> = " . ( <Number A> + <Number B>
> > to
> > "add of <Number_A> and <Number_B> = " . ( <Number_A> + <Number_B>
>
> s/(<[^<>]*) (?=[^<>]*>)/\1_/g

$ perl -wle '$_="<Number 100 000>"; s/(<[^<>]*) (?=[^<>]*>)/\1_/g; print'
\1 better written as $1 at -e line 1.
<Number 100_000>

> The (?= ... ) part is a "zero-width positive look ahead". It consumes
> no characters (so to speak) but must match for the space to match.

Since you force a match on '<', your code can only change one space
per <...> block. I run into this all the time with similar fixes with
vi search and replace. The fix is loop until it stops matching.

Or use another method. I like the else-thread suggested one with a tr
in an s///eg framework, like:

s/( < [^>]+ > )/ local $_ = $1; tr| |_|; $_ /xeg

I also like to not perl-golf it.

> Perl 5.30 has, experimentally, variable-length, zero-width positive look
> behind patterns (it's the variable-length part that is experimental),
> currently limited to 255 characters. That permits
>
> s/(?<=<[^<>]{0,254}) (?=[^<>]*>)/_/g

Sounds computationally expensive.

Elijah
------
tries not to use the bleeding edge features

Otto J. Makela

unread,
May 4, 2021, 8:10:42 AM5/4/21
to
It would be nice to have bal() which was a pattern matching primitive
which matched balanced quote-like separators (including ones where you
had different left and right quotes, eg () or []). This was a standard
primitive in languages like Snobol4 and Icon, where the pattern matching
wasn't regex-derived.

Of course making an ersatz version in regex isn't impossible:

https://www.andrewzammit.com/blog/regexp-matching-balanced-parenthesis-and-quotes-greedy-non-recursive/
--
/* * * Otto J. Makela <o...@iki.fi> * * * * * * * * * */
/* Phone: +358 40 765 5772, ICBM: N 60 10' E 24 55' */
/* Mail: Mechelininkatu 26 B 27, FI-00100 Helsinki */
/* * * Computers Rule 01001111 01001011 * * * * * * */

Jim Gibson

unread,
May 5, 2021, 1:37:08 AM5/5/21
to
On May 4, 2021 at 5:10:36 AM PDT, "Otto J. Makela" <Otto J. Makela> wrote:

> It would be nice to have bal() which was a pattern matching primitive
> which matched balanced quote-like separators (including ones where you
> had different left and right quotes, eg () or []). This was a standard
> primitive in languages like Snobol4 and Icon, where the pattern matching
> wasn't regex-derived.
>
> Of course making an ersatz version in regex isn't impossible:
>
>
> https://www.andrewzammit.com/blog/regexp-matching-balanced-parenthesis-and-quotes-greedy-non-recursive/

There is a module for that:

https://metacpan.org/pod/Text::Balanced

--
Jim Gibson


0 new messages