a list/re problem

Ed Keith

unread,

Dec 11, 2009, 3:49:42 PM12/11/09

to pytho...@python.org

I have a problem and I am trying to find a solution to it that is both
efficient and elegant.

I have a list call it 'l':

l = ['asc', '*nbh*', 'jlsdjfdk', 'ikjh', '*jkjsdfjasd*', 'rewr']

Notice that some of the items in the list start and end with an '*'. I wish to construct a new list, call it 'n' which is all the members of l that start and end with '*', with the '*'s removed.

So in the case above n would be ['nbh', 'jkjsdfjasd']

the following works:

r = re.compile('\*(.+)\*')

def f(s):
m = r.match(s)
if m:
return m.group(1)
else:
return ''

n = [f(x) for x in l if r.match(x)]

But it is inefficient, because it is matching the regex twice for each item, and it is a bit ugly.

I could use:

n = []
for x in keys:
m = r.match(x)
if m:
n.append(m.group(1))

It is more efficient, but much uglier.

Does anyone have a better solution?

Thank,

-EdK

Ed Keith
e_...@yahoo.com

Blog: edkeith.blogspot.com

Andre Engels

unread,

Dec 11, 2009, 4:01:06 PM12/11/09

to Ed Keith, pytho...@python.org

Regexes seem like the proverbial sledgehammer to crack a nut here.
Note that '*' if it is present, is always 1 character, so we can
write:

n = [x[1:-1] for x in l if x.startswith("*") and x.endswith("*")]

--
André Engels, andre...@gmail.com

Vlastimil Brom

unread,

Dec 11, 2009, 4:02:23 PM12/11/09

to pytho...@python.org

2009/12/11 Ed Keith <e_...@yahoo.com>:

> I have a problem and I am trying to find a solution to it that is both
> efficient and elegant.
>
> I have a list call it 'l':
>
> l = ['asc', '*nbh*', 'jlsdjfdk', 'ikjh', '*jkjsdfjasd*', 'rewr']
>
> Notice that some of the items in the list start and end with an '*'. I wish to construct a new list, call it 'n' which is all the members of l that start and end with '*', with the '*'s removed.
>
> So in the case above n would be ['nbh', 'jkjsdfjasd']
>
> the following works:
>
> r = re.compile('\*(.+)\*')
>
> def f(s):
> m = r.match(s)
> if m:
> return m.group(1)
> else:
> return ''
>
> n = [f(x) for x in l if r.match(x)]
>
>
>
> But it is inefficient, because it is matching the regex twice for each item, and it is a bit ugly.
>
> I could use:
>
>
> n = []
> for x in keys:
> m = r.match(x)
> if m:
> n.append(m.group(1))
>
>
> It is more efficient, but much uglier.
>
> Does anyone have a better solution?
>

> Thank,
>
> -EdK
>
>
> Ed Keith
> e_...@yahoo.com
>
> Blog: edkeith.blogspot.com
>
>
>

> --
> http://mail.python.org/mailman/listinfo/python-list
>
Hi,
maybe you could use a list comprehension or the equivalent loop just
using the string methods and slicing?

>>> lst = ['asc', '*nbh*', 'jlsdjfdk', 'ikjh', '*jkjsdfjasd*', 'rewr']
>>> [item[1:-1] for item in lst if (item.startswith("*") and item.endswith("*"))]
['nbh', 'jkjsdfjasd']
>>>

hth,
vbr

Grant Edwards

unread,

Dec 11, 2009, 4:02:45 PM12/11/09

to

On 2009-12-11, Ed Keith <e_...@yahoo.com> wrote:
> I have a problem and I am trying to find a solution to it that is both
> efficient and elegant.
>
> I have a list call it 'l':
>
> l = ['asc', '*nbh*', 'jlsdjfdk', 'ikjh', '*jkjsdfjasd*', 'rewr']

> Notice that some of the items in the list start and end with
> an '*'. I wish to construct a new list, call it 'n' which is
> all the members of l that start and end with '*', with the
> '*'s removed.
>
> So in the case above n would be ['nbh', 'jkjsdfjasd']

[s[1:-1] for s in l if (s[0] == s[-1] == '*')]

--
Grant Edwards grante Yow! Used staples are good
at with SOY SAUCE!
visi.com

Tim Chase

unread,

Dec 11, 2009, 4:05:36 PM12/11/09

to Ed Keith, pytho...@python.org

> l = ['asc', '*nbh*', 'jlsdjfdk', 'ikjh', '*jkjsdfjasd*', 'rewr']
>
> Notice that some of the items in the list start and end with an '*'. I wish to construct a new list, call it 'n' which is all the members of l that start and end with '*', with the '*'s removed.
>
> So in the case above n would be ['nbh', 'jkjsdfjasd']
>

> the following works:
>
> r = re.compile('\*(.+)\*')
>
> def f(s):
> m = r.match(s)
> if m:
> return m.group(1)
> else:
> return ''
>
> n = [f(x) for x in l if r.match(x)]
>
> But it is inefficient, because it is matching the regex twice for each item, and it is a bit ugly.

You can skip the function by writing that as

n = [r.match(s).group(1) for s in l if r.match(s)]

but it doesn't solve your match-twice problem.

I'd skip regexps completely and do something like

n = [s[1:-1] for s in l
if s.startswith('*')
and s.endswith('*')
]

And this is coming from a guy that tends to overuse regexps :)

-tkc

Neil Cerutti

unread,

Dec 11, 2009, 4:16:25 PM12/11/09

to

On 2009-12-11, Grant Edwards <inv...@invalid.invalid> wrote:
> [s[1:-1] for s in l if (s[0] == s[-1] == '*')]

That last bit doesn't work right, does it, since an == expression
evaluates to True or False, no the true or false value itself?

--
Neil Cerutti

Peter Otten

unread,

Dec 11, 2009, 4:24:07 PM12/11/09

to

Ed Keith wrote:

It's efficient and easy to understand; maybe you have to readjust your
taste.

> Does anyone have a better solution?

In this case an approach based on string slicing is probably best. When the
regular expression gets more complex you can use a nested a generator
expression:

>>> items = ['asc', '*nbh*', 'jlsdjfdk', 'ikjh', '*jkjsdfjasd*', 'rewr']
>>> match = re.compile(r"\*(.+)\*").match
>>> [m.group(1) for m in (match(s) for s in items) if m is not None]
['nbh', 'jkjsdfjasd']

Peter

Grant Edwards

unread,

Dec 11, 2009, 4:30:57 PM12/11/09

to

It works for me. Doesn't it work for you?

From the fine manual (section 5.9. Comparisons):

Comparisons can be chained arbitrarily, e.g., x < y <= z is
equivalent to x < y and y <= z, except that y is evaluated
only once (but in both cases z is not evaluated at all when x
< y is found to be false).

--
Grant Edwards grante Yow! Hand me a pair of
at leather pants and a CASIO
visi.com keyboard -- I'm living
for today!

Ed Keith

unread,

Dec 11, 2009, 5:31:46 PM12/11/09

to pytho...@python.org, Peter Otten

--- On Fri, 12/11/09, Peter Otten <__pet...@web.de> wrote:

I am going to use string slicing, re is the wrong tool for the job. But this is what I was looking for when I posted. Simple, elegant and efficient.

Thanks all,

Matt Nordhoff

unread,

Dec 11, 2009, 7:21:10 PM12/11/09

to pytho...@python.org

Grant Edwards wrote:
> On 2009-12-11, Ed Keith <e_...@yahoo.com> wrote:
>> I have a problem and I am trying to find a solution to it that is both
>> efficient and elegant.
>>
>> I have a list call it 'l':
>>
>> l = ['asc', '*nbh*', 'jlsdjfdk', 'ikjh', '*jkjsdfjasd*', 'rewr']
>
>> Notice that some of the items in the list start and end with
>> an '*'. I wish to construct a new list, call it 'n' which is
>> all the members of l that start and end with '*', with the
>> '*'s removed.
>>
>> So in the case above n would be ['nbh', 'jkjsdfjasd']
>
> [s[1:-1] for s in l if (s[0] == s[-1] == '*')]

s[0] and s[-1] raise an IndexError if l contains an empty string.

Better something like:

>>> [s[1:-1] for s in l if (s[:1] == s[-1:] == '*')]

Or just the slightly more verbose startswith/endswith version.
--
Matt Nordhoff

Steven D'Aprano

unread,

Dec 11, 2009, 8:55:51 PM12/11/09

to

On Fri, 11 Dec 2009 12:49:42 -0800, Ed Keith wrote:

> I have a problem and I am trying to find a solution to it that is both
> efficient and elegant.
>
> I have a list call it 'l':
>
> l = ['asc', '*nbh*', 'jlsdjfdk', 'ikjh', '*jkjsdfjasd*', 'rewr']
>
> Notice that some of the items in the list start and end with an '*'. I
> wish to construct a new list, call it 'n' which is all the members of l
> that start and end with '*', with the '*'s removed.
>
> So in the case above n would be ['nbh', 'jkjsdfjasd']
>
> the following works:
>
> r = re.compile('\*(.+)\*')

[snip]

Others have suggested using a list comp. Just to be different, here's a
version using filter and map.

l = ['asc', '*nbh*', 'jlsdjfdk', 'ikjh', '*jkjsdfjasd*', 'rewr']

l = map(
lambda s: s[1:-1] if s.startswith('*') and s.endswith('*') else '', l)
l = filter(None, l)

--
Steven

Lie Ryan

unread,

Dec 12, 2009, 10:52:30 AM12/12/09

to

import re

r = re.compile('\*(.+)\*')

def f(s):
m = r.match(s)
if m:
return m.group(1)

l = ['asc', '*nbh*', 'jlsdjfdk', 'ikjh', '*jkjsdfjasd*', 'rewr']

n = [y for y in (f(x) for x in l) if y]

Lie Ryan

unread,

Dec 12, 2009, 11:22:19 AM12/12/09

to

On 12/12/2009 8:24 AM, Peter Otten wrote:
>>
>> But it is inefficient, because it is matching the regex twice for each
>> item, and it is a bit ugly.
>>
>> I could use:
>>
>>
>> n = []
>> for x in keys:
>> m = r.match(x)
>> if m:
>> n.append(m.group(1))
>>
>>
>> It is more efficient, but much uglier.
>
> It's efficient and easy to understand; maybe you have to readjust your
> taste.

I agree, it's easy to understand, but it's also ugly because of the
level of indentation (which is too deep for such a simple problem).

>> Does anyone have a better solution?

(sorry to ramble around)

A few months ago, I suggested an improvement in the python-ideas list to
add a post-filter to list-comprehension, somewhere in this line:

a = [f(x) as F for x in l if c(F)]

where the evaluation of f(x) will be the value of F so F can be used in
the if-expression as a post-filter (complementing list-comps' pre-filter).

Many doubted its usefulness since they say it's easy to wrap in another
list-comp:
a = [y for y in (f(x) for x in l) if c(y)]
or with a map and filter
a = filter(None, map(f, l))

Up till now, I don't really like the alternatives.

Nobody

unread,

Dec 12, 2009, 3:41:46 PM12/12/09

to

On Fri, 11 Dec 2009 12:49:42 -0800, Ed Keith wrote:

> the following works:
>
> r = re.compile('\*(.+)\*')
>
> def f(s):
> m = r.match(s)
> if m:
> return m.group(1)
> else:
> return ''
>
> n = [f(x) for x in l if r.match(x)]
>
>
>
> But it is inefficient, because it is matching the regex twice for each
> item, and it is a bit ugly.

> Does anyone have a better solution?

Use a language with *real* list comprehensions?

Flamebait aside, you can use another level of comprehension, i.e.:

n = [m.group(1) for m in (r.match(x) for x in l) if m]

Neil Cerutti

unread,

Dec 14, 2009, 8:38:29 AM12/14/09

to

On 2009-12-11, Grant Edwards <inv...@invalid.invalid> wrote:
> On 2009-12-11, Neil Cerutti <ne...@norwich.edu> wrote:
>> On 2009-12-11, Grant Edwards <inv...@invalid.invalid> wrote:
>>> [s[1:-1] for s in l if (s[0] == s[-1] == '*')]
>>
>> That last bit doesn't work right, does it, since an == expression
>> evaluates to True or False, no the true or false value itself?
>
> It works for me. Doesn't it work for you?
>
> From the fine manual (section 5.9. Comparisons):
>
> Comparisons can be chained arbitrarily, e.g., x < y <= z is
> equivalent to x < y and y <= z, except that y is evaluated
> only once (but in both cases z is not evaluated at all when x
> < y is found to be false).

I did not know that. Thanks, Grant.

--
Neil Cerutti

Aahz

unread,

Dec 28, 2009, 8:35:22 PM12/28/09

to

In article <mailman.1744.1260564...@python.org>,

Ed Keith <e_...@yahoo.com> wrote:
>
>I have a list call it 'l':
>
>l = ['asc', '*nbh*', 'jlsdjfdk', 'ikjh', '*jkjsdfjasd*', 'rewr']
>
>Notice that some of the items in the list start and end with an '*'. I
>wish to construct a new list, call it 'n' which is all the members of l
>that start and end with '*', with the '*'s removed.

What kind of guarantee do you have that the asterisk will only exist on
the first and last character, if at all?
--
Aahz (aa...@pythoncraft.com) <*> http://www.pythoncraft.com/

Looking back over the years, after I learned Python I realized that I
never really had enjoyed programming before.

Steven D'Aprano

unread,

Dec 28, 2009, 8:50:18 PM12/28/09

to

On Mon, 28 Dec 2009 17:35:22 -0800, Aahz wrote:

> In article <mailman.1744.1260564...@python.org>, Ed
> Keith <e_...@yahoo.com> wrote:
>>
>>I have a list call it 'l':
>>
>>l = ['asc', '*nbh*', 'jlsdjfdk', 'ikjh', '*jkjsdfjasd*', 'rewr']
>>
>>Notice that some of the items in the list start and end with an '*'. I
>>wish to construct a new list, call it 'n' which is all the members of l
>>that start and end with '*', with the '*'s removed.
>
> What kind of guarantee do you have that the asterisk will only exist on
> the first and last character, if at all?

Does it matter?

In any case, surely the simplest solution is to eschew regular
expressions and do it the easy way.

result = [s[1:-1] for s in l if s.startswith('*') and s.endswith('*')]

For a more general solution, I'd use a pair of helper functions:

def bracketed_by(s, prefix, suffix=None):
if suffix is None:
suffix = prefix
return s.startswith(prefix) and s.endswith(suffix)

def strip_brackets(s, prefix, suffix=None):
if suffix is None:
suffix = prefix
return s[len(prefix):-len(suffix)]

Note that I haven't tested these two helper functions. The second in
particular may not work correctly in some corner cases (e.g. passing the
empty string as suffix).

--
Steven