Need regex in the middle wildcard help

ToddAndMargo via perl6-users

unread,

Jun 18, 2023, 8:45:04 AM6/18/23

to perl6-users

Hi All,

I know how to do this with several regex's and words.
What I'd like to learn is how to remove something
from the middle with regex using a wild card.

And I can't figure it out

#!/bin/raku

print "\n";
my Str $x = Q[wine-7.12-3.fc37.i686.rpm</a> 23-Jul-2022 19:11 11K
<a href="wine-7.12-3.fc37.x86_64.rpm];
print "1 [$x]\n\n";

$x~~s/ $( Q[</a>] ) * $( Q[a href="] ) / /;
print "2 [$x]\n\n";

1 [wine-7.12-3.fc37.i686.rpm</a> 23-Jul-2022 19:11 11K <a
href="wine-7.12-3.fc37.x86_64.rpm]

2 [wine-7.12-3.fc37.i686.rpm</a> 23-Jul-2022 19:11 11K <
wine-7.12-3.fc37.x86_64.rpm]

My goal is to have `2` print the following out

wine-7.12-3.fc37.i686.rpm wine-7.12-3.fc37.x86_64.rpm

Many thanks,
-T

--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Computers are like air conditioners.
They malfunction when you open windows
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

ToddAndMargo via perl6-users

unread,

Jun 19, 2023, 6:15:04 AM6/19/23

to perl6...@perl.org

>> On 6/18/23 05:38, ToddAndMargo via perl6-users wrote:
>>> Hi All,
>>>
>>> I know how to do this with several regex's and words.
>>> What I'd like to learn is how to remove something
>>> from the middle with regex using a wild card.
>>>
>>> And I can't figure it out
>>>
>>>
>>> #!/bin/raku
>>>
>>> print "\n";
>>> my Str $x = Q[wine-7.12-3.fc37.i686.rpm</a> 23-Jul-2022 19:11 11K <a
>>> href="wine-7.12-3.fc37.x86_64.rpm];
>>> print "1 [$x]\n\n";
>>>
>>> $x~~s/ $( Q[</a>] ) * $( Q[a href="] ) / /;
>>> print "2 [$x]\n\n";
>>>
>>>
>>>
>>> 1 [wine-7.12-3.fc37.i686.rpm</a> 23-Jul-2022 19:11 11K <a
>>> href="wine-7.12-3.fc37.x86_64.rpm]
>>>
>>> 2 [wine-7.12-3.fc37.i686.rpm</a> 23-Jul-2022 19:11 11K <
>>> wine-7.12-3.fc37.x86_64.rpm]
>>>
>>>
>>>
>>> My goal is to have `2` print the following out
>>>
>>> wine-7.12-3.fc37.i686.rpm wine-7.12-3.fc37.x86_64.rpm
>>>
>>>
>>> Many thanks,
>>> -T

> On 6/18/23 12:10, Joseph Brenner wrote:
> References: <d183015a-d171-3352...@zoho.com>
>
> Try something like this, perhaps:
>
> $x ~~ s:i/ ^ (.*?) '</a>' .*? '<a href="' (.*?) $ /$0 $1/;
>
> Some explanations:
>
> s:i
>
> The :i modifier makes it case insensitive, so data with upper-case
> html won't break things.
>
> In general, you want to break it down into chunks, and just keep the
> chunks you want.
>
> ^ begin matching at the start of the string
>
> (.*?) match anything up to the next pattern, *and* capture it to a variable
>
> '...' I'm using single quotes on the literal strings
>
> $ match all the way to the end of the string.
>
> Pinning the match with ^ and $ means a s/// will replace the entire string.
>
> There are two captures, so they load $0 and $1, and here we're using
> them in the replace string: s/.../$0 $1/

Hi Joseph,

Right under my nose! Thank you.

This is my test program:

<RegexTest.pl6>
#!/bin/raku

print "\n";
my Str $x = Q[<a
href="wike-2.0.1-1.fc38.noarch.rpm">wike-2.0.1-1.fc38.noarch.rpm</a>
27-Apr-2023 01:53 143K] ~
Q[<a
href="wine-8.6-1.fc38.i686.rpm">wine-8.6-1.fc38.i686.rpm</a>
19-Apr-2023 21:48 11K] ~
Q[<a
href="wine-8.6-1.fc38.x86_64.rpm">wine-8.6-1.fc38.x86_64.rpm</a>
19-Apr-2023 21:48 11K] ~
Q[<a
href="wine-alsa-8.6-1.fc38.i686.rpm">wine-alsa-8.6-1.fc38.i686.rpm</a>
19-Apr-2023 21:48 223K];

$x~~m:i/ .*? ("wine") (.*?) $(Q[">] ) .*? $( Q[a href="] ) (.*?)
( $(Q[">] ) ) /;

print "0 = <$0>\n1 = <$1>\n2 = <$2>\n\n";

my Str $y = $0 ~ $1 ~ " " ~ $2;
print "$y\n\n";
</RegexTest.pl6>

$ RegexTest.pl6

0 = <wine>
1 = <-8.6-1.fc38.i686.rpm>
2 = <wine-8.6-1.fc38.x86_64.rpm>

wine-8.6-1.fc38.i686.rpm wine-8.6-1.fc38.x86_64.rpm

An aside: /$0$1 $2/ did not work in (so I switched to "m")

$x~~s:i/ .*? ("wine") (.*?) $(Q[">] ) .*? $( Q[a href="] ) (.*?)
( $(Q[">] ) ) /$0$1 $2/;

The result is

wine-8.6-1.fc38.i686.rpm
wine-8.6-1.fc38.x86_64.rpmwine-8.6-1.fc38.x86_64.rpm</a>
19-Apr-2023 21:48 11K<a
href="wine-alsa-8.6-1.fc38.i686.rpm">wine-alsa-8.6-1.fc38.i686.rpm</a>
19-Apr-2023 21:48 223K

Is this a bug or did I write it wrong?

-T

ToddAndMargo via perl6-users

unread,

Jun 19, 2023, 8:00:09 AM6/19/23

to perl6...@perl.org

From my actual program:

Before Joseph's help:
$SysRev = $WebPage;
$SysRev ~~ s/ .*? $( Q[a href="wine] ) /wine/;
$SysRev ~~ s/ $( Q[x86_64.rpm] ) .* /x86_64.rpm/;
$SysRev ~~ s/ $( Q[">wine] ) / /;
$SysRev = $SysRev.words[0] ~ " " ~ $SysRev.words[6];
$SysRev ~~ s/ $( Q[href="] ) //;
$SysRev ~~ s/ $( Q[</a>] ) //;

After Joseph's help:
$SysRev = $WebPage;
$SysRev~~m:i/ .*? ("wine") (.*?) $(Q[">] ) .*? $( Q[a

href="] ) (.*?) ( $(Q[">] ) ) /;

$SysRev = $0 ~ $1 ~ " " ~ $2;

Awesome cleanup!

-T

ToddAndMargo via perl6-users

unread,

Jun 19, 2023, 11:00:12 AM6/19/23

to perl6...@perl.org

On 6/19/23 07:39, Richard Hainsworth wrote:
> HI Todd,
>
> Some more clean up:
>
> On 19/06/2023 12:41, ToddAndMargo via perl6-users wrote:
> <snip>

> <snip>

>> After Joseph's help:
>>       $SysRev = $WebPage;
>>       $SysRev~~m:i/ .*? ("wine") (.*?) $(Q[">] ) .*? $( Q[a
>> href="] ) (.*?) ( $(Q[">] ) ) /;
>>       $SysRev = $0 ~ $1 ~ "   " ~ $2;
>>
>>

> maybe the following would be a bit more Raku-ish
>
> [in file called todd-test.raku]
>
> $=finish ~~ /:i ['href="' ~ \" $<ww> = ( 'wine-' \d .+? ) .*? ]+ $ /;
> say $/<ww>.join(' ');
>
> =finish <a href="wike-2.0.1-1.fc38.noarch.rpm">wike-2.0.1-1.fc38.noarch.rpm</a> 27-Apr-2023 01:53 143K

> <a href="wine-8.6-1.fc38.i686.rpm">wine-8.6-1.fc38.i686.rpm</a> 19-Apr-2023 21:48 11K

> <a href="wine-8.6-1.fc38.x86_64.rpm">wine-8.6-1.fc38.x86_64.rpm</a> 19-Apr-2023 21:48 11K

> <a href="wine-alsa-8.6-1.fc38.i686.rpm">wine-alsa-8.6-1.fc38.i686.rpm</a> 19-Apr-2023 21:48 223K
>

> [end of todd-test.raku]
> Test it in a terminal:
>
> $ raku todd-test.raku
> wine-8.6-1.fc38.i686.rpm wine-8.6-1.fc38.x86_64.rpm
>
> Some comments.
> 1) `=finish` is an undocumented part of the POD6 specification (I only discovered it recently). It will be documented soon.
> Anything after `=finish` is put in string that can be pulled into a Raku program with `$=finish` (also undocumented)
> `=finish` was introduced instead of Perl's `__DATA__`.
> It is useful, because if you have alot of text to be experimented on, just attach the text to the bottom of the program after a =finish
> 2) `~~` does not need a `m` (you only need 'm' if you want to associated a regex with the topic, eg. $_)
> 3) the / 'begin' ~ 'end' 'regex' / syntax means match the regex between 'begin' and 'end'.
> 4) The final output has a 'wine' in it, so why search for it separately? Just include it in the search.
> 5) You seem to be looking for a 'wine-' followed by a digit, so as to eliminate the 'wine-alsa-' line, so look for that
> 6) '$<ww>=' places the match into $/<ww> of the whole match. Multiple matches create an array.
> 7) `$/<ww>.join` takes an array and joins it with a separator.
> 8) In the original code, all the $() and Q[] add noise without any disambiguation.
>
> But then we want to pull out interesting bits and we are not interested in the rest. So `comb` is better.
>
> [start of test-2.raku]
>
> $=finish.comb(/ <?after \">'wine-' \d .+? <?before \"> /).join(' ').say;
>
> =finish <a href="wike-2.0.1-1.fc38.noarch.rpm">wike-2.0.1-1.fc38.noarch.rpm</a> 27-Apr-2023 01:53 143K

> <a href="wine-8.6-1.fc38.i686.rpm">wine-8.6-1.fc38.i686.rpm</a> 19-Apr-2023 21:48 11K

> <a href="wine-8.6-1.fc38.x86_64.rpm">wine-8.6-1.fc38.x86_64.rpm</a> 19-Apr-2023 21:48 11K

> <a href="wine-alsa-8.6-1.fc38.i686.rpm">wine-alsa-8.6-1.fc38.i686.rpm</a> 19-Apr-2023 21:48 223K
>

> [end of test-2.raku]
>
>
> Same output.
>
> Notes:
> 1) comb looks for all matches in a string, so no need for the repeat and end of line in the regex
> 2) We are looking for something 'after' a ｢"｣ and 'before' a second ｢"｣, and so we can use the <?after regex> and <?before regex> zero-width matchers.
>
> Richard, aka finanalyst

Hi Richard,

Perhaps, but way over my head. I will have to read
over what you wrote several times before it gets
though the proverbial cement. Thank you!

-T