Array of Chars to String

James Stroud

unread,

Apr 19, 2005, 4:33:17 PM4/19/05

to pytho...@python.org

Hello,

I am looking for a nice way to take only those charachters from a string that
are in another string and make a new string:

>>> astr = "Bob Carol Ted Alice"
>>> letters = "adB"
>>> some_func(astr,letters)
"Bad"

I can write this like this:

astr = "Bob Carol Ted Alice"
letters = "adB"

import sets
alist = [lttr for lttr in astr if lttr in Set(letters)]
newstr = ""
for lttr in alist:
newstr += lttr

But this seems ugly. I especially don't like "newstr += lttr" because it makes
a new string every time. I am thinking that something like this has to be a
function somewhere already or that I can make it more efficient using a
built-in tool.

Any ideas?

James

--
James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095

http://www.jamesstroud.com/

Michael Spencer

unread,

Apr 19, 2005, 4:40:27 PM4/19/05

to pytho...@python.org

James Stroud wrote:
> Hello,
>
> I am looking for a nice way to take only those charachters from a string that
> are in another string and make a new string:
>
>
>>>>astr = "Bob Carol Ted Alice"
>>>>letters = "adB"
>>>>some_func(astr,letters)
>
> "Bad"
>
> I can write this like this:
>
> astr = "Bob Carol Ted Alice"
> letters = "adB"
>
> import sets
> alist = [lttr for lttr in astr if lttr in Set(letters)]
> newstr = ""
> for lttr in alist:
> newstr += lttr
>
> But this seems ugly. I especially don't like "newstr += lttr" because it makes
> a new string every time. I am thinking that something like this has to be a
> function somewhere already or that I can make it more efficient using a
> built-in tool.
>
> Any ideas?
>
> James
>

How about:

>>> astr = "Bob Carol Ted Alice"
>>> letters = "adB"

>>> "".join(letter for letter in astr if letter in set(letters))
'Bad'
>>>
Michael

Alexander Schmolck

unread,

Apr 19, 2005, 4:42:18 PM4/19/05

to

James Stroud <jst...@mbi.ucla.edu> writes:

> But this seems ugly. I especially don't like "newstr += lttr" because it makes
> a new string every time. I am thinking that something like this has to be a
> function somewhere already or that I can make it more efficient using a
> built-in tool.

"".join

'as

rbt

unread,

Apr 19, 2005, 4:50:24 PM4/19/05

to

James Stroud wrote:
> Hello,
>
> I am looking for a nice way to take only those charachters from a string that
> are in another string and make a new string:
>
>
>>>>astr = "Bob Carol Ted Alice"
>>>>letters = "adB"
>>>>some_func(astr,letters)
>
> "Bad"

astr = "Bob Carol Ted Alice"
letters = "adB"

both = [x for x in astr if x in letters]
print both

>>>
['B', 'a', 'd']
>>>

Facundo Batista

unread,

Apr 19, 2005, 4:54:04 PM4/19/05

to James Stroud, pytho...@python.org

On 4/19/05, James Stroud <jst...@mbi.ucla.edu> wrote:

> astr = "Bob Carol Ted Alice"
> letters = "adB"
>

> import sets
> alist = [lttr for lttr in astr if lttr in Set(letters)]
> newstr = ""
> for lttr in alist:
> newstr += lttr

>>> astr = "Bob Carol Ted Alice"
>>> letters = "adB"
>>> s1 = set(astr) # set is a core data type in Py2.4
>>> s2 = set(letters)
>>> ''.join(s1 & s2)
'aBd'

>>> . Facundo

Blog: http://www.taniquetil.com.ar/plog/
PyAr: http://www.python.org/ar/

Bengt Richter

unread,

Apr 19, 2005, 7:07:42 PM4/19/05

to

On Tue, 19 Apr 2005 13:33:17 -0700, James Stroud <jst...@mbi.ucla.edu> wrote:

>Hello,
>
>I am looking for a nice way to take only those charachters from a string that
>are in another string and make a new string:
>
>>>> astr = "Bob Carol Ted Alice"
>>>> letters = "adB"
>>>> some_func(astr,letters)
>"Bad"
>
>I can write this like this:
>
>astr = "Bob Carol Ted Alice"
>letters = "adB"
>
>import sets
>alist = [lttr for lttr in astr if lttr in Set(letters)]
>newstr = ""
>for lttr in alist:
> newstr += lttr
>
>But this seems ugly. I especially don't like "newstr += lttr" because it makes
>a new string every time. I am thinking that something like this has to be a
>function somewhere already or that I can make it more efficient using a
>built-in tool.
>
>Any ideas?
>
>James
>

I think this will be worth it if your string to modify is _very_ long:

>>> def some_func(s, letters, table=''.join([chr(i) for i in xrange(256)])):
... return s.translate(table,
... ''.join([chr(i) for i in xrange(256) if chr(i) not in letters]))
...
>>> some_func("Bob Carol Ted Alice", 'adB')
'Bad'

see help(str.translate)

If you want to use it in a loop, with the same "letters" I'd want to eliminate the repeated
calculation of the deletions. You could make a factory function that returns a function
that uses deletions from a closure cell. But don't optimize prematurely ;-)

Regards,
Bengt Richter

Michael Spencer

unread,

Apr 19, 2005, 7:28:07 PM4/19/05

to pytho...@python.org

Bengt Richter wrote:
> I think this will be worth it if your string to modify is _very_ long:
>
> >>> def some_func(s, letters, table=''.join([chr(i) for i in xrange(256)])):
> ... return s.translate(table,
> ... ''.join([chr(i) for i in xrange(256) if chr(i) not in letters]))
> ...
> >>> some_func("Bob Carol Ted Alice", 'adB')
> 'Bad'
>

According to my measurements the string doesn't have to be long at all before
your method is faster - cool use of str.translate:

>>> def some_func(s, letters, table=''.join([chr(i) for i in xrange(256)])):
... return s.translate(table,
... ''.join([chr(i) for i in xrange(256) if chr(i) not in letters]))
...
>>> some_func("Bob Carol Ted Alice", 'adB')
'Bad'

>>> def func_join(s, letters):
... return "".join(letter for letter in s if letter in set(letters))
...
>>> def func_join1(s, letters):
... return "".join(letter for letter in s if letter in letters)

>>> for multiplier in (1, 10, 100, 1000, 10000):
... print "List multiplier: %s" % multiplier
... print shell.timefunc(func_join, "Bob Carol Ted Alice" * multiplier, 'adB')
... print shell.timefunc(func_join1, "Bob Carol Ted Alice" * multiplier,
'adB')
... print shell.timefunc(some_func, "Bob Carol Ted Alice" * multiplier, 'adB')
...
List multiplier: 1
func_join(...) 11267 iterations, 44.38usec per call
func_join1(...) 38371 iterations, 13.03usec per call
some_func(...) 1230 iterations, 406.69usec per call
List multiplier: 10
func_join(...) 1381 iterations, 362.40usec per call
func_join1(...) 7984 iterations, 62.63usec per call
some_func(...) 1226 iterations, 407.94usec per call
List multiplier: 100
func_join(...) 140 iterations, 3.59msec per call
func_join1(...) 873 iterations, 0.57msec per call
some_func(...) 1184 iterations, 422.42usec per call
List multiplier: 1000
func_join(...) 15 iterations, 35.50msec per call
func_join1(...) 90 iterations, 5.57msec per call
some_func(...) 949 iterations, 0.53msec per call
List multiplier: 10000
func_join(...) 2 iterations, 356.53msec per call
func_join1(...) 9 iterations, 55.59msec per call
some_func(...) 313 iterations, 1.60msec per call
>>>

Michael

Michael Spencer

unread,

Apr 19, 2005, 8:00:02 PM4/19/05

to pytho...@python.org

Michael Spencer wrote:

> Bengt Richter wrote:
> > I think this will be worth it if your string to modify is _very_ long:
>
>>
>> >>> def some_func(s, letters, table=''.join([chr(i) for i in
>> xrange(256)])):
>> ... return s.translate(table,
>> ... ''.join([chr(i) for i in xrange(256) if chr(i) not in
>> letters]))
>> ...
>> >>> some_func("Bob Carol Ted Alice", 'adB')
>> 'Bad'
>>

> According to my measurements the string doesn't have to be long at all
> before your method is faster - cool use of str.translate:
>

...and here's a version that appears faster than "".join across all lengths of
strings:
>>> import string
>>> def some_func1(s, letters, table=string.maketrans("","")):
... return s.translate(table, table.translate(table, letters))
...
>>> some_func1("Bob Carol Ted Alice", "adB")
'Bad'
>>>

Timings follow:

>>> def some_func(s, letters, table=''.join([chr(i) for i in xrange(256)])):
... return s.translate(table,
... ''.join([chr(i) for i in xrange(256) if chr(i) not in letters]))
...

>>> def some_func1(s, letters, table=string.maketrans("","")):
... return s.translate(table, table.translate(table, letters))
...

>>> for multiplier in (1, 10, 100, 1000, 10000):
... print "List multiplier: %s" % multiplier

... print shell.timefunc(some_func, "Bob Carol Ted Alice" * multiplier, 'adB')

... print shell.timefunc(some_func1, "Bob Carol Ted Alice" * multiplier,

'adB')
...
List multiplier: 1

some_func(...) 1224 iterations, 408.57usec per call
some_func1(...) 61035 iterations, 8.19usec per call
List multiplier: 10
some_func(...) 1223 iterations, 408.95usec per call
some_func1(...) 54420 iterations, 9.19usec per call
List multiplier: 100
some_func(...) 1190 iterations, 420.48usec per call
some_func1(...) 23436 iterations, 21.34usec per call
List multiplier: 1000
some_func(...) 951 iterations, 0.53msec per call
some_func1(...) 3870 iterations, 129.21usec per call
List multiplier: 10000
some_func(...) 309 iterations, 1.62msec per call
some_func1(...) 417 iterations, 1.20msec per call
>>>

Bengt Richter

unread,

Apr 19, 2005, 8:40:58 PM4/19/05

to

On Tue, 19 Apr 2005 17:00:02 -0700, Michael Spencer <ma...@telcopartners.com> wrote:

>Michael Spencer wrote:
>> Bengt Richter wrote:
>> > I think this will be worth it if your string to modify is _very_ long:
>>
>>>
>>> >>> def some_func(s, letters, table=''.join([chr(i) for i in
>>> xrange(256)])):
>>> ... return s.translate(table,
>>> ... ''.join([chr(i) for i in xrange(256) if chr(i) not in
>>> letters]))
>>> ...
>>> >>> some_func("Bob Carol Ted Alice", 'adB')
>>> 'Bad'
>>>
>> According to my measurements the string doesn't have to be long at all
>> before your method is faster - cool use of str.translate:
>>
>...and here's a version that appears faster than "".join across all lengths of
>strings:
> >>> import string
> >>> def some_func1(s, letters, table=string.maketrans("","")):
> ... return s.translate(table, table.translate(table, letters))
> ...
> >>> some_func1("Bob Carol Ted Alice", "adB")
> 'Bad'
> >>>
>

Good one! ;-)

BTW, since str has .translate, why not .maketrans?

Anyway, this will be something to keep in mind when doing character-based joinery ;-)

>Timings follow:
Let's just say improved ;-)
(or see parent post)

Regards,
Bengt Richter

Peter Otten

unread,

Apr 20, 2005, 2:58:55 AM4/20/05

to

Michael Spencer wrote:

> >>> def func_join(s, letters):
> ... return "".join(letter for letter in s if letter in set(letters))

Make that

def func_join(s, letters):
letter_set = set(letters)
return "".join(letter for letter in s if letter in letter_set)

for a fair timing of a set lookup as opposed to set creation.

Peter

Scott David Daniels

unread,

Apr 20, 2005, 11:26:13 AM4/20/05

to

Bengt Richter wrote:
> ... BTW, since str has .translate, why not .maketrans?
Probably because, while I can imagine u'whatever'.translate using a
256-wide table (and raising exceptions for other the rest), I have
more problems imagining the size of the table for a UCS-4 unicode
setup (32 bits per character). I suppose it could be done, but a
naïve program might be in for a big shock about memory consumption.

--Scott David Daniels
Scott....@Acm.Org

Michael Spencer

unread,

Apr 20, 2005, 2:19:16 PM4/20/05

to pytho...@python.org

Sorry - yes! I trip up over the early-binding of the outer loop, but the
late-binding of the condition

Anyway, here are the revised timings, which confirm the speed-advantage of the
translate approach. And, as before, with such a short list of white-listed
letters, it does not pay to create a set at all, even outside the loop. Note
the speed advantage of func_translate1 is 50:1 for long strings, so as Bengt
pointed out, it's worth keeping this in mind for character-based filtering/joining.

>>> def func_join1(s, letters):
... return "".join(letter for letter in s if letter in letters)
...
>>> def func_join2(s, letters):
... letter_set = set(letters)
... return "".join(letter for letter in s if letter in letter_set)
...
>>> def func_translate1(s, letters, table=string.maketrans("","")):

... return s.translate(table, table.translate(table, letters))
...

>>> for multiplier in (1, 10, 100, 1000, 10000):

... print "List multiplier: %s" % multiplier

... print shell.timefunc(func_translate1, "Bob Carol Ted Alice" *

multiplier, 'adB')
... print shell.timefunc(func_join1, "Bob Carol Ted Alice" * multiplier,
'adB')

... print shell.timefunc(func_join2, "Bob Carol Ted Alice" * multiplier,

'adB')
...
List multiplier: 1

func_translate1(...) 62295 iterations, 8.03usec per call
func_join1(...) 36510 iterations, 13.69usec per call
func_join2(...) 30139 iterations, 16.59usec per call
List multiplier: 10
func_translate1(...) 53145 iterations, 9.41usec per call
func_join1(...) 7821 iterations, 63.93usec per call
func_join2(...) 7031 iterations, 71.12usec per call
List multiplier: 100
func_translate1(...) 23170 iterations, 21.58usec per call
func_join1(...) 858 iterations, 0.58msec per call
func_join2(...) 777 iterations, 0.64msec per call
List multiplier: 1000
func_translate1(...) 3761 iterations, 132.96usec per call
func_join1(...) 87 iterations, 5.76msec per call
func_join2(...) 81 iterations, 6.18msec per call
List multiplier: 10000
func_translate1(...) 407 iterations, 1.23msec per call
func_join1(...) 9 iterations, 56.27msec per call
func_join2(...) 8 iterations, 64.76msec per call
>>>

Kent Johnson

unread,

Apr 20, 2005, 3:10:12 PM4/20/05

to

Michael Spencer wrote:
> Anyway, here are the revised timings...

> ... print shell.timefunc(func_translate1, "Bob Carol Ted Alice" *
> multiplier, 'adB')

What is shell.timefunc?

Thanks,
Kent

Michael Spencer

unread,

Apr 20, 2005, 3:33:59 PM4/20/05

to pytho...@python.org

This snippet, which I attach to my interactive shell, since I find timeit
awkward to use in that context:

def _get_timer():
if sys.platform == "win32":
return time.clock
else:
return time.time
return

def timefunc(func, *args, **kwds):
timer = _get_timer()
count, totaltime = 0, 0
while totaltime < 0.5:
t1 = timer()
res = func(*args, **kwds)
t2 = timer()
totaltime += (t2-t1)
count += 1
if count > 1000:
unit = "usec"
timeper = totaltime * 1000000 / count
else:
unit = "msec"
timeper = totaltime * 1000 / count
return "%s(...) %s iterations, %.2f%s per call" % \
(func.__name__, count, timeper, unit)

Michael

"Martin v. Löwis"

unread,

Apr 21, 2005, 1:34:09 AM4/21/05

to James Stroud

James Stroud wrote:
> astr = "Bob Carol Ted Alice"
> letters = "adB"

Apparently nobody has proposed this yet:

>>> filter(letters.__contains__, astr)
'Bad'
>>> filter(set(letters).__contains__, astr)
'Bad'

Regards,
Martin

Michael Spencer

unread,

Apr 21, 2005, 1:06:24 PM4/21/05

to pytho...@python.org

Martin v. Löwis wrote:
> Apparently nobody has proposed this yet:
> >>>filter(letters.__contains__, astr)
> 'Bad'
>
> >>>filter(set(letters).__contains__, astr)
> 'Bad'

Everyone is seeking early PEP 3000 compliance ;-)

filter wins on conciseness - it's short enought to use in-line, but for a fair
speed comparison, I've wrapped it in a function, below; str.translate is far
ahead on speed for all but the shortest strings:

def func_translate1(s, letters, table=string.maketrans("","")):

return s.translate(table, table.translate(table, letters))

def func_filter1(s, letters):
in_set = letters.__contains__
return filter(in_set, s)

def func_filter2(s, letters):
in_set = set(letters).__contains__
return filter(in_set, s)

>>> for m in (1, 10, 100, 1000, 10000):
... s = "Bob Carol Ted Alice" * m
... letters = "adB"
... print "List length: %s" % len(s)
... print shell.timefunc(func_translate1, s, letters)
... print shell.timefunc(func_filter1, s, letters)
... print shell.timefunc(func_filter2, s, letters)
...
List length: 19
func_translate1(...) 64179 iterations, 7.79usec per call
func_filter1(...) 63706 iterations, 7.85usec per call
func_filter2(...) 45336 iterations, 11.03usec per call
List length: 190
func_translate1(...) 54950 iterations, 9.10usec per call
func_filter1(...) 12224 iterations, 40.90usec per call
func_filter2(...) 10737 iterations, 46.57usec per call
List length: 1900
func_translate1(...) 22760 iterations, 21.97usec per call
func_filter1(...) 1293 iterations, 386.87usec per call
func_filter2(...) 1184 iterations, 422.52usec per call
List length: 19000
func_translate1(...) 3713 iterations, 134.67usec per call
func_filter1(...) 137 iterations, 3.67msec per call
func_filter2(...) 124 iterations, 4.05msec per call
List length: 190000
func_translate1(...) 426 iterations, 1.18msec per call
func_filter1(...) 14 iterations, 38.29msec per call
func_filter2(...) 13 iterations, 40.59msec per call
>>>

Michael

Bengt Richter

unread,

Apr 21, 2005, 1:32:27 PM4/21/05

to

Baaad ;-)
But since I'm playing the other side of the table for
the moment, isn't filter to be deprecated?

Regards,
Bengt Richter

"Martin v. Löwis"

unread,

Apr 21, 2005, 3:32:12 PM4/21/05

to

Bengt Richter wrote:
> But since I'm playing the other side of the table for
> the moment, isn't filter to be deprecated?

How could we know? It might be removed in P3k, but does that
mean it is deprecated, as in "being disapproved", i.e. "being
passed unfavorable judgement on"?

In Python, I consider something deprecated when the documentation
says it is deprecated, or when using it raises a DeprecationWarning.
Neither is the case for filter.

That it is going to be removed in P3k does not bother me much:
I wouldn't be suprised if Python 3000 is released 995 years from
now :-)

Regards,
Martin