On 14.07.2015 15:14, John Doe wrote:
> On 14.07.2015 14:39, Janis Papanagnou wrote:
>> On 14.07.2015 14:15, John Doe wrote:
>>>> But this matches too:
>>>> $ echo apples | grep -G "apples\{0\}"
>>>> apples
>>
>> Yes, the part /apple/ in the input string "apples" is matched by grep
>> regexp, where the /s/ part of the rexexp is not effetive, because of
>> the 0-repetition part.
>
> Now I anchored the regexp like this:
>
> echo 'apples' | grep -G '^apples\{0\}$'
>
> Shouldn't it match only 'apple' line now?
Exactly. And it does so.
>
>>> Also, I have some issues going through examples from the book "Beginning
>>> Portable Shell Scripting", for example:
>>>
>>> [...]
>>> the expression ba\(na\)* can match ba, bana, banana or bananana but it cannot
>>> match banan
>>> [...]
>>>
>>> But when I do echo 'banan' | grep -G 'ba\(na\)*' it does match :(
>>
>> As above; because the /ba/ (and the /bana/) would already match the
>> respective part of the (sub-)string, "ba" (or resp. "bana").
>
> So, is my misunderstanding about anchoring the pattern? For example:
>
> echo 'bananananan' | grep -G '^ba\(na\)*$' doesn't match
> echo 'bananananana' | grep -G '^ba\(na\)*$' matches
Now the anchoring requires that the whole line has to match. The first
example does not, because it has a trailling "n" that is not part of the
pattern, while the second sample fulfills the pattern. Note that while
without anchoring /ba\(na\)*/ would also already match "ba", where with
anchoring the pattern must fulfill the match until the end of the line,
i.e. the repetitions must be completely satisied, and a spurious final
"n" will spoil the match.
>
>>> What am I doing wrong?
>>
>> Nothing wrong with what you're doing. Just try to understand what the
>> pattern actually does in the input string sequence.
>
> I'm trying to follow you but have some uncertainty. Is this what you're saying
> with the banana example:
>
> echo 'banan' | grep -G 'ba\(na\)*'
>
> The possible pattern(s) are: ba, bana, banana, bananana, ...
The possible _matches_ of the pattern expression in the string are ba,
bana, banana, bananana, etc., yes.
(As a note aside: typically the regexp parsers try a longest match, but
for the considerations here it is not important, yet.)
> The string is banan
> Two of the possible patterns match banan
There are two possibilities for a match of the pattern given the string
"banan"; "ba" and "bana".
Maybe it's better to understand if comming from the theory of formal
languages; the pattern expression, say 'ba\(na\)*' defines a language
consisting of the words "ba", "bana", "banana", etc., and the matching
parser searches for a substring sequence of characters that is defined
in that language. IOW, it's not important in that context whether the
whole string is part of the language defined by the pattern expression.
But, as previously illustrated, you may control some aspects by anchors.
> Result is that banan matches ba\(na\)*
>
> Is this correct?
With the slight adjustments of semantics; of What matches What. (YMMV)
It's confusing that the terminology seems not consistent in literature.
(Not even in a single book; e.g., the Robbin's "awk Programming" Book
says where the match operator '~' is defined, that in 'exp ~ /regexp/'
"exp (taken as a string) matches regexp". But in the glossary "Regexp"
it is said that "the regexp ... matches any string ...". [3rd Edition])
Janis
>
> John
>