Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

regex question on .findall and \b

1 view
Skip to first unread message

Ethan Furman

unread,
Jul 2, 2009, 12:38:56 PM7/2/09
to Python
Greetings!

My closest to successfull attempt:

Python 2.5.4 (r254:67916, Dec 23 2008, 15:10:54) [MSC v.1310 32 bit (Intel)]
Type "copyright", "credits" or "license" for more information.

IPython 0.9.1 -- An enhanced Interactive Python.

In [161]: re.findall('\d+','this is test a3 attempt 79')
Out[161]: ['3', '79']

What I really want in just the 79, as a3 is not a decimal number, but
when I add the \b word boundaries I get:

In [162]: re.findall('\b\d+\b','this is test a3 attempt 79')
Out[162]: []

What am I missing?

~Ethan~

Tim Chase

unread,
Jul 2, 2009, 1:17:22 PM7/2/09
to Ethan Furman, Python

The sneaky detail that the regexp should be in a raw string
(always a good practice), not a cooked string:

r'\b\d+\b'

The "\d" isn't a valid character-expansion, so python leaves it
alone. However, I believe the "\b" is a control character, so
your actual string ends up something like:

>>> print repr('\b\d+\b')
'\x08\\d+\x08'
>>> print repr(r'\b\d+\b')
'\\b\\d+\\b'

the first of which doesn't match your target string, as you might
imagine.

-tkc

Sjoerd Mullender

unread,
Jul 2, 2009, 1:18:07 PM7/2/09
to Ethan Furman, Python
On 2009-07-02 18:38, Ethan Furman wrote:
> Greetings!
>
> My closest to successfull attempt:
>
> Python 2.5.4 (r254:67916, Dec 23 2008, 15:10:54) [MSC v.1310 32 bit
> (Intel)]
> Type "copyright", "credits" or "license" for more information.
>
> IPython 0.9.1 -- An enhanced Interactive Python.
>
> In [161]: re.findall('\d+','this is test a3 attempt 79')
> Out[161]: ['3', '79']
>
> What I really want in just the 79, as a3 is not a decimal number, but
> when I add the \b word boundaries I get:
>
> In [162]: re.findall('\b\d+\b','this is test a3 attempt 79')
> Out[162]: []
>
> What am I missing?
>
> ~Ethan~

Try this:
>>> re.findall(r'\b\d+\b','this is test a3 attempt 79')
['79']

The \b is a backspace, by using raw strings you get an actual backslash
and b.

--
Sjoerd Mullender

Nobody

unread,
Jul 2, 2009, 2:41:54 PM7/2/09
to

You need to use a raw string (r'...') to prevent \b from being interpreted
as a backspace:

re.findall(r'\b\d+\b','this is test a3 attempt 79')

\d isn't a recognised escape sequence, so it doesn't get interpreted:

> print '\b'
^H
> print '\d'
\d
> print r'\b'
\b

Try to get into the habit of using raw strings for regexps.

Ethan Furman

unread,
Jul 2, 2009, 1:12:32 PM7/2/09
to Python
Ethan Furman wrote:
> Greetings!
>
> My closest to successfull attempt:
>
> Python 2.5.4 (r254:67916, Dec 23 2008, 15:10:54) [MSC v.1310 32 bit
> (Intel)]
> Type "copyright", "credits" or "license" for more information.
>
> IPython 0.9.1 -- An enhanced Interactive Python.
>
> In [161]: re.findall('\d+','this is test a3 attempt 79')
> Out[161]: ['3', '79']
>
> What I really want in just the 79, as a3 is not a decimal number, but
> when I add the \b word boundaries I get:
>
> In [162]: re.findall('\b\d+\b','this is test a3 attempt 79')
> Out[162]: []
>
> What am I missing?
>
> ~Ethan~


ARGH!!

Okay, I need two \\ so I'm not trying to match a backspace. I knew
(okay, hoped ;) I would figure it out once I posted the question and
moved on.

*sheepish grin*

Ethan Furman

unread,
Jul 6, 2009, 1:53:31 PM7/6/09
to Ethan Furman, Python
Many thanks to all who replied! And, yes, I will *definitely* use raw
strings from now on. :)

~Ethan~

0 new messages