Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

FAQ 6.18 Why don't word-boundary searches with "\b" work for me?

0 views
Skip to first unread message

PerlFAQ Server

unread,
Sep 2, 2008, 3:03:02 AM9/2/08
to
This is an excerpt from the latest version perlfaq6.pod, which
comes with the standard Perl distribution. These postings aim to
reduce the number of repeated questions as well as allow the community
to review and update the answers. The latest version of the complete
perlfaq is at http://faq.perl.org .

--------------------------------------------------------------------

6.18: Why don't word-boundary searches with "\b" work for me?


(contributed by brian d foy)

Ensure that you know what \b really does: it's the boundary between a
word character, \w, and something that isn't a word character. That
thing that isn't a word character might be \W, but it can also be the
start or end of the string.

It's not (not!) the boundary between whitespace and non-whitespace, and
it's not the stuff between words we use to create sentences.

In regex speak, a word boundary (\b) is a "zero width assertion",
meaning that it doesn't represent a character in the string, but a
condition at a certain position.

For the regular expression, /\bPerl\b/, there has to be a word boundary
before the "P" and after the "l". As long as something other than a word
character precedes the "P" and succeeds the "l", the pattern will match.
These strings match /\bPerl\b/.

"Perl" # no word char before P or after l
"Perl " # same as previous (space is not a word char)
"'Perl'" # the ' char is not a word char
"Perl's" # no word char before P, non-word char after "l"

These strings do not match /\bPerl\b/.

"Perl_" # _ is a word char!
"Perler" # no word char before P, but one after l

You don't have to use \b to match words though. You can look for
non-word characters surrounded by word characters. These strings match
the pattern /\b'\b/.

"don't" # the ' char is surrounded by "n" and "t"
"qep'a'" # the ' char is surrounded by "p" and "a"

These strings do not match /\b'\b/.

"foo'" # there is no word char after non-word '

You can also use the complement of \b, \B, to specify that there should
not be a word boundary.

In the pattern /\Bam\B/, there must be a word character before the "a"
and after the "m". These patterns match /\Bam\B/:

"llama" # "am" surrounded by word chars
"Samuel" # same

These strings do not match /\Bam\B/

"Sam" # no word boundary before "a", but one after "m"
"I am Sam" # "am" surrounded by non-word chars

--------------------------------------------------------------------

The perlfaq-workers, a group of volunteers, maintain the perlfaq. They
are not necessarily experts in every domain where Perl might show up,
so please include as much information as possible and relevant in any
corrections. The perlfaq-workers also don't have access to every
operating system or platform, so please include relevant details for
corrections to examples that do not work on particular platforms.
Working code is greatly appreciated.

If you'd like to help maintain the perlfaq, see the details in
perlfaq.pod.

RedGrittyBrick

unread,
Sep 2, 2008, 6:49:18 AM9/2/08
to

PerlFAQ Server wrote:

> You can also use the complement of \b, \B, to specify that there should
> not be a word boundary.
>
> In the pattern /\Bam\B/, there must be a word character before the "a"
> and after the "m". These patterns match /\Bam\B/:
>
> "llama" # "am" surrounded by word chars
> "Samuel" # same
>
> These strings do not match /\Bam\B/
>
> "Sam" # no word boundary before "a", but one after "m"
> "I am Sam" # "am" surrounded by non-word chars
>

If /\Bam\B/ differs from /\wam\w/ maybe an example could be added to
illustrate this. If not, perhaps there is a better example of the use of \B?

--
RGB

John W. Krahn

unread,
Sep 2, 2008, 9:36:05 AM9/2/08
to

/\Bam\B/ matches two characters while /\wam\w/ matches four characters.

$ perl -le'$_ = "Samuel"; s/\Bam\B/ex/; print'
Sexuel
$ perl -le'$_ = "Samuel"; s/\wam\w/ex/; print'
exel


John
--
Perl isn't a toolbox, but a small machine shop where you
can special-order certain sorts of tools at low cost and
in short order. -- Larry Wall

RedGrittyBrick

unread,
Sep 2, 2008, 11:46:12 AM9/2/08
to

John W. Krahn wrote:
> RedGrittyBrick wrote:
>>
>> PerlFAQ Server wrote:
>>
>>> You can also use the complement of \b, \B, to specify that there
>>> should
>>> not be a word boundary.
>>>
>>> In the pattern /\Bam\B/, there must be a word character before
>>> the "a"
>>> and after the "m". These patterns match /\Bam\B/:
>>>
>>> "llama" # "am" surrounded by word chars
>>> "Samuel" # same
>>>
>>> These strings do not match /\Bam\B/
>>>
>>> "Sam" # no word boundary before "a", but one after "m"
>>> "I am Sam" # "am" surrounded by non-word chars
>>
>> If /\Bam\B/ differs from /\wam\w/ maybe an example could be added to
>> illustrate this. If not, perhaps there is a better example of the use
>> of \B?
>
> /\Bam\B/ matches two characters while /\wam\w/ matches four characters.
>
> $ perl -le'$_ = "Samuel"; s/\Bam\B/ex/; print'
> Sexuel
> $ perl -le'$_ = "Samuel"; s/\wam\w/ex/; print'
> exel
>

Yes. I now realise my earlier suggestion is not relevant to this
particular FAQ. I guess some other FAQ or perldoc clarifies when one
might want to use \B and when \w.


However I do have one suggestion for FAQ 6.18: The current version has
these two assertions:

"These patterns match /\Bam\B/:"

"These strings do not match /\Bam\B/"

I suggest, for consistency, the word "patterns" in the first assertion
be replaced by "strings" (as in the second fragment).


--
RGB

Jürgen Exner

unread,
Sep 2, 2008, 12:36:45 PM9/2/08
to
RedGrittyBrick <RedGrit...@spamweary.invalid> wrote:
>However I do have one suggestion for FAQ 6.18: The current version has
>these two assertions:
>
> "These patterns match /\Bam\B/:"
> "These strings do not match /\Bam\B/"
>
>I suggest, for consistency, the word "patterns" in the first assertion
>be replaced by "strings" (as in the second fragment).

And in addition it is the other way round (the pattern is the subject
and the string the object):

"/\Bam\B/ matches these strings:"
"/\Bam\B/ does not match these strings:"

or

"These strings are being match by /\Bam\B/:"
"These strings are not being match by /\Bam\B/"

jue

brian d foy

unread,
Sep 3, 2008, 10:43:34 AM9/3/08
to
In article <48bd5fc8$0$2927$fa0f...@news.zen.co.uk>, RedGrittyBrick
<RedGrit...@spamweary.invalid> wrote:


> Yes. I now realise my earlier suggestion is not relevant to this
> particular FAQ. I guess some other FAQ or perldoc clarifies when one
> might want to use \B and when \w.

As with any feature, use the one that does what you need.

Hans Mulder

unread,
Sep 3, 2008, 5:32:31 PM9/3/08
to
John W. Krahn wrote:
> RedGrittyBrick wrote:
>> PerlFAQ Server wrote:
>>> You can also use the complement of \b, \B, to specify that there
>>> should
>>> not be a word boundary.

>>> In the pattern /\Bam\B/, there must be a word character before
>>> the "a"
>>> and after the "m". These patterns match /\Bam\B/:

>>> "llama" # "am" surrounded by word chars
>>> "Samuel" # same

>>> These strings do not match /\Bam\B/

>>> "Sam" # no word boundary before "a", but one after "m"
>>> "I am Sam" # "am" surrounded by non-word chars

>> If /\Bam\B/ differs from /\wam\w/ maybe an example could be added to
>> illustrate this. If not, perhaps there is a better example of the use
>> of \B?

> /\Bam\B/ matches two characters while /\wam\w/ matches four characters.

> $ perl -le'$_ = "Samuel"; s/\Bam\B/ex/; print'
> Sexuel
> $ perl -le'$_ = "Samuel"; s/\wam\w/ex/; print'
> exel

Additionally, \W can match when there's no \w around:

$ perl -lw
$_ = "++"; # Look Ma: no \w!
s/\B/Here/g; # Find matches
print;
__END__
Here+Here+Here
$

Hope this helps,

-- HansM

0 new messages