> John Moreno <phe...@interpath.com> wrote:
> > Honza Pazdziora <ade...@fi.muni.cz> wrote:
> > > phe...@interpath.com (John Moreno) writes:
> > > > $count = s/(#! rnews )(\d+\n)(.+?)/$1.(length $3).("\n").$3/gse;
> > > > Which gives 1 char for the (length $3), I would have thought that it
> > > > would go until it hit the next #! rnews.
> > > Why should it? You said .+? which means at least one any character and
> > > the least possible count. Which gives us nice 1.
> > I understand what it was doing, I'm just not sure why the ? wasn't more
> > greedy - I would have thought that when using the /g that a ? at the end
>
> The DWIM operator has not yet become part of the standard perl regular
> expression functionality. In the meantime, we have normal greedy repetion
> operators such as + and *, and we have non-greedy versions, such as +?
> and *?. The greedy ones always try to grab as much as possible, the non-
> greedy ones always try to grab as little as possible.
Well, I didn't expect it use the DWIM operator, I guess what I didn't
understand clearly was that the ? is evaluated in relationship to the
whole expression not just it's little part of it. Which means that if
it did what I thought it would do (use the ? with the next value being
considered the beginning of the regex) it would have problems with
infinite recursions.
Instead it has the more generally useful lookahead functions - they just
didn't occur to me until after I'd already completed the project.
> It is my opinion that non-greedy repetition is very subtle and tends to
> expose a large number of programmer bugs. I almost always recommend doing
> without it.
Well it isn't normally my first thought either - not that I have enough
experience for that to mean anything.
--
John Moreno
The DWIM operator has not yet become part of the standard perl regular
expression functionality. In the meantime, we have normal greedy repetion
operators such as + and *, and we have non-greedy versions, such as +?
and *?. The greedy ones always try to grab as much as possible, the non-
greedy ones always try to grab as little as possible.
It is my opinion that non-greedy repetition is very subtle and tends to
expose a large number of programmer bugs. I almost always recommend doing
without it.
I'd do something like this myself:
$r='#! rnews ';
$count = s/$r\d+\n((?:(?!$r)#|[^#]+)+)/$r.(length $1)."\n$1"/gse;
Or maybe even this (for added security against false positives from
#! rnews appearing somewhere besides the start of a line):
$r='#! rnews ';
$count = s/^$r\d+\n((?:(?!\n$r)\n|.+)+)/$r.(length $1)."\n$1"/gem;
Elijah
------
#!/usr/bin/perl -- -*- my ny.pm sig -*-
$_=$^ ;s;s;sss;;s^.^ju^&&s&P&,\n&&&(s(_..)(ers)||s|^|^^|)&&s(T)(q(st%eg))eg;
s<.(o).><$& new 1$$>i+s+\dst.+$a--||reverse(q(rep k))+ge;s*%.+u* so+*i;s=\++
="me"=mex&&s%ege%l$"hke%;$a||s/^\S+ /\/\//;s;\d+;yor;;s[KE]<ac$&>i;print $_;
> It is my opinion that non-greedy repetition is very subtle and tends to
> expose a large number of programmer bugs. I almost always recommend doing
> without it.
>
> I'd do something like this myself:
>
> $r='#! rnews ';
>
> $count = s/$r\d+\n((?:(?!$r)#|[^#]+)+)/$r.(length $1)."\n$1"/gse;
How is it less error-prone than
/$r(\d+\n)(.+?)(?=$r|$)/
?
Ilya
This really made me laugh - it broke my news unbatching program... because
it was at the start of a line...
Ho ho ho.
- KSL. -
(Well, I did write it myself after all).
Well, other than your just adding a new set of parens that throw off
the replacement count, I'd use \Z instead of $ to ensure the right
anchoring if /m were ever added to the flags.
I know your point was that "Here is a case were non-greedy regexps
are not harmful, and in fact simplify things considerably." I know.
There is very strong deliberate anchoring around the part that needed
to be captured, which makes non-greedy matching safe. My point was
to show that it could be solved without needing non-greedy matching.
Elijah
------
will probably always be getting in Ilya's hair over this issue
There is no new set of parens.
> I'd use \Z instead of $ to ensure the right
> anchoring if /m were ever added to the flags.
\Z is not much better than $. I would use \z, but since 5.005 will
not have it ;-), one is forced to use these abomination \Z(?!\n).
> I know your point was that "Here is a case were non-greedy regexps
> are not harmful, and in fact simplify things considerably." I know.
> There is very strong deliberate anchoring around the part that needed
> to be captured, which makes non-greedy matching safe. My point was
> to show that it could be solved without needing non-greedy matching.
The problem is that few people will be able to read your re, so check
that "it could be solved without needing non-greedy matching". I
could not.
Re's are unreadable from the start. One should not deliberately add
to this, unless necessary.
Ilya
P.S. Interested people may look in ./t/op/pat.t of >= 5.004_55 to see
the beast which matches "matching parens" (let's look... it is near
matchit). The only reason I wrote it is to check new (?{...}) engine,
this is not what I would like to have in production code (though it is
_almost_ readable).
P.P.S. Well, to decrease quoted/contents ratio, I may just include it
here:
m'
(
\(
(?{ $c = 1 }) # Initialize
(?:
(?(?{ $c == 0 }) # PREVIOUS iteration was OK, stop the loop
(?!
) # Fail: will unwind one iteration back
)
(?:
[^()]+ # Match a big chunk
(?=
[()]
) # Do not try to match subchunks
|
\(
(?{ ++$c })
|
\)
(?{ --$c })
)
)+ # This may not match with different subblocks
)
(?(?{ $c != 0 })
(?!
) # Fail
) # Otherwise the chunk 1 may succeed with $c>0
'xg;