awk -v re='a|b' '
function extract(str,regexp)
{ RMATCH = (match(str,regexp) ? substr(str,RSTART,RLENGTH) : "")
return RSTART
}
extract($0,re) { print RMATCH }
'
Ed.
Easier in gawk
$echo 12abcat | gawk -v re="2.*c" '{match($0,re,RMATCH); print RMATCH[0]}'
And just in case we're playing golf rather than looking for something
we can use in longer scripts:
$ echo 12abcat |
perl -e '$a=<STDIN>; $a =~ /(2.*c)/; print "$1\n";'
2abc
$ echo 12abcat |
awk 'match($0,/2.*c/){$0=substr($0,RSTART,RLENGTH)}1'
2abc
Regards,
Ed.
>> awk -v re='a|b' '
>> function extract(str,regexp)
>> { RMATCH = (match(str,regexp) ? substr(str,RSTART,RLENGTH) : "")
>> return RSTART}
>>
>> extract($0,re) { print RMATCH }
>> '
> Hey, that is cool!
> But what about using the matched partin teh regex itself?
> Like, how would you do this perl regex in awk?
> $ echo ca12cat | perl -e '$a=<STDIN>; $a =~ /(ca)(12\1)/; print
> "$2\n";'
> 12ca
> the (12\1) means "12 followed by the stuff that was matched by the
> *first* parenthesized part of the regex".
> print $2 means "print out the stuff that was matched by the *second*
> parenthesized part of the regex".
AFAIK you can't in awk. Backreferences are not supported. GNU awk supports
backreferences for substitutions using the gensub() function, but you can't
pull them out like you do with perl, although you can put together hacks
like (not the same as your example)
$ echo ca12cat | gawk '{s=gensub(/(..)(..).*/,"\\2","g"); print s}'
12
but in any case using backreferences during the match itself is not
supported AFAICT.
That's not supported ditrectly in awk so you'd need something like
(untested):
awk -v re='ca' '
function extract(str,regexp)
{ RMATCH = (match(str,regexp) ? substr(str,RSTART,RLENGTH) : "")
return RSTART
}
extract($0,re) && extract($0,re"12"RMATCH) { print RMATCH }
'
Regards,
Ed.
> That's not supported ditrectly in awk so you'd need something like
> (untested):
>
> awk -v re='ca' '
> function extract(str,regexp)
> { RMATCH = (match(str,regexp) ? substr(str,RSTART,RLENGTH) : "")
> return RSTART
> }
>
> extract($0,re) && extract($0,re"12"RMATCH) { print RMATCH }
> '
However careful with that one if RMATCH happens to contain regex
metacharacters!
Probably OT but - any idea what perl does in that situation?
Ed.
Apparently it does the same:
$ echo '.*cat' | perl -ne 'm/^(..)/;
$m=$1; # save what matched in $1 (".*")
m/($m)/; # try a new match with that...
$nm=$1; # and save what matched
print "first match: $m, new match: $nm\n";'
first match: .*, new match: .*cat
so $m is interpolated to .*, and that is used to do the match, which matches
the whole string of course.
But...perl has also the \Q...\E special regex metacharacters, which escape
everything in between to be regex-safe (something that would have to be
done manually in awk), so:
$ echo '.*cat' | perl -ne 'm/^(..)/;
$m=$1;
print "\Q$m\E\n";
m/(\Q$m\E)/;
$nm=$1;
print "first match: $m, new match: $nm\n";'
\.\*
first match: .*, new match: .*
Yeah, too easy that way :(
This is correct; no awk version that I know of supports this (I don't
know what tawk does).
--
Aharon (Arnold) Robbins arnold AT skeeve DOT com
P.O. Box 354 Home Phone: +972 8 979-0381
Nof Ayalon Cell Phone: +972 50 729-7545
D.N. Shimshon 99785 ISRAEL
Which is understandable since such backreferences exceed chomsky-3
grammars (i.e. handling by finite state machines and expressed by
regular expressions).
I've been told that perl has even severe performance issues in some
application cases with backreferences.
Janis
echo 12abcat | awk 'match($0,/2.*c/){print substr
($0,RSTART,RLENGTH)}'
i.e., why use the one extra step of printing after reassigning?
Because the above will not print anything if the pattern is not found
while to me it looked like the perl statement the OP wanted to find an
awk equivalent for would print the input record whether it was
modified or not. Of course, the perl syntax is so cryptic it might
actually invoke nasal demons for all I know but I'm pretty sure it
will print SOMETHING....
Ed.