[2,3,4,7] --> "2-4,7" ?

George Young

unread,

May 29, 2003, 3:11:21 PM5/29/03

to

[python 2.3a1]
I have a list of "names" like:
['6','7','mx8','mx09','mx10','8','9','10','foo']
which needs for concise and clear display to become a string like:

"6-7,mx8-10,8-10,foo"

I.e., a name is an integer or an alphabetic prefix possibly followed
by an integer. The display must compress each set of succesive names
having the same (possibly null) prefix and sequential integers. The
original order of names must be preserved.

I (hesitantly) post the following code that works, but makes me cringe
anytime I see it, for it's clumsy unreadableness. Can someone with
a fresh mind see a clear and concise way to make my clear and concise
name display?

names = ['6','7','mx8','mx09','mx10','8','9','10','foo']
groups = []
import re

def collapse(x,y):
if x and x[-1][1] and y[1] and x[-1][0] == y[0] and int(x[-1][2]) == (int(y[2])-1): x[-1][2] = y[2]
return x
else:
x.append(y)
return x

groups = []
for n in names:
r = re.compile('\d*$').search(n)
groups.append([n[0:r.start()], n[r.start():r.end()], n[r.start():r.end()]])

r = reduce(collapse, groups, [])
s=[]
for i in r:
if i[1] == i[2]:
n=i[1]
else:
n=i[1] + '-' + i[2]
s.append(i[0] + n)

print ','.join(s)

Mike C. Fletcher

unread,

May 29, 2003, 3:56:23 PM5/29/03

to

This might be a little easier to read (though actually less concise),
though I haven't tried to follow your code to figure out if it's exactly
equivalent. I also haven't bothered with re-formatting the groups into
range statements, I leave them as data-records from which you can easily
create the range statements:

import re
names = ['6','7','5','mx8','mx09','mx10','8','9','10','foo','5','this']

def collapse( names ):
"""Collapse ranges of names"""
result = []
current = []
currentPrefix = ''
for item in names:
a,b = split( item )
if a == currentPrefix and current and b == current[-1]+1:
current.append( b )
else:
if current:
# need to put previous result in result-set
result.append( (currentPrefix,current))
# now process the new item...
if b is None:
# no number, so just add record to list
result.append( (a,[]))
current = []
else:
current = [b]
currentPrefix = a
if current:
result.append( (currentPrefix,current))
return result

matcher = re.compile( '^(?P<prefix>\D*)(?P<number>\d*)$' )

def split( s ):
"""Get prefix and integer value (or None) for a string"""
a,b = matcher.match( s ).groups()
if b:
b = int( b, 10 )
return a, b or None

print collapse( names )

Enjoy,
Mike

George Young wrote:

>[python 2.3a1]
>I have a list of "names" like:
> ['6','7','mx8','mx09','mx10','8','9','10','foo']
>which needs for concise and clear display to become a string like:
>
> "6-7,mx8-10,8-10,foo"
>
>I.e., a name is an integer or an alphabetic prefix possibly followed
>by an integer. The display must compress each set of succesive names
>having the same (possibly null) prefix and sequential integers. The
>original order of names must be preserved.
>
>I (hesitantly) post the following code that works, but makes me cringe
>anytime I see it, for it's clumsy unreadableness. Can someone with
>a fresh mind see a clear and concise way to make my clear and concise
>name display?
>
>

...

_______________________________________
Mike C. Fletcher
Designer, VR Plumber, Coder
http://members.rogers.com/mcfletch/

Raymond Hettinger

unread,

May 29, 2003, 4:34:54 PM5/29/03

to

"George Young" <g...@ll.mit.edu> wrote in message
news:pan.2003.05.29....@ll.mit.edu...

> [python 2.3a1]
> I have a list of "names" like:
> ['6','7','mx8','mx09','mx10','8','9','10','foo']
> which needs for concise and clear display to become a string like:
>
> "6-7,mx8-10,8-10,foo"
>
> I.e., a name is an integer or an alphabetic prefix possibly followed
> by an integer. The display must compress each set of succesive names
> having the same (possibly null) prefix and sequential integers. The
> original order of names must be preserved.
>
> I (hesitantly) post the following code that works, but makes me cringe
> anytime I see it, for it's clumsy unreadableness. Can someone with
> a fresh mind see a clear and concise way to make my clear and concise
> name display?

It helps to separate the problem into parsing, grouping, and output formatting:

# input parsing
import re
numb = re.compile('(\D*)(\d*)')

names = ['6','7','mx8','mx09','mx10','8','9','10','foo']

pairs = [numb.match(n).groups() for n in names]

# grouping
result = []
currobj, currnum, cnt = None, "", 0
for obj, num in pairs:
if obj == currobj:
cnt += 1
else:
result.append((currobj, currnum, cnt))
currobj, currnum, cnt = obj, num, 1
result.append((currobj, currnum, cnt))

# output formatting
for obj, num, cnt in result[1:]:
if cnt == 1: print obj + num
else: print obj + num + '-' + str(int(num)+cnt)

Raymond Hettinger

Alexander Schmolck

unread,

May 29, 2003, 6:17:27 PM5/29/03

to

George Young <g...@ll.mit.edu> writes:

> [python 2.3a1]
> I have a list of "names" like:
> ['6','7','mx8','mx09','mx10','8','9','10','foo']
> which needs for concise and clear display to become a string like:
>
> "6-7,mx8-10,8-10,foo"
>
> I.e., a name is an integer or an alphabetic prefix possibly followed
> by an integer. The display must compress each set of succesive names
> having the same (possibly null) prefix and sequential integers. The
> original order of names must be preserved.
>
> I (hesitantly) post the following code that works, but makes me cringe
> anytime I see it, for it's clumsy unreadableness. Can someone with
> a fresh mind see a clear and concise way to make my clear and concise
> name display?

Not really:

import re
def compressRanges(l):
if not l: return ""
prefixNumStrTuples = [re.match(r'(\D*)(\d*)', s).groups() for s in l]
lastPrefix, lastNumS = prefixNumStrTuples.pop(0)
startNumS = lastNumS
startNum = lastNum = lastNumS and int(lastNumS)
prefixNumStrTuples.append(("",""))
res = []
for prefix, numS in prefixNumStrTuples:
num = numS and int(numS)
if prefix != lastPrefix or numS and lastNum != num - 1:
if startNum != lastNum:
res.append(lastPrefix + startNumS + '-' + lastNumS)
else:
res.append(lastPrefix + lastNumS)
startNum, startNumS = num, numS
lastNum, lastNumS = num, numS
lastPrefix = prefix
return ','.join(res)

'as

andrew cooke

unread,

May 29, 2003, 8:29:43 PM5/29/03

to

(borrowing r hettinger's neat list comprehension):

import re
numb = re.compile('(\D*)(\d*)')

def group ((list, (last, start, end)), (next, num)):
if last == next:
if int(num) == int(end)+1: return (list, (last, start, num))
if num == start: return (list, (next, num, num))
return (list+[(last, start, end)], (next, num, num))

def fmt ((text, start, end)):
if start == end: return text + start
else: return text + start + "-" + end

def compare ((t1, n1), (t2, n2)):
try: return cmp(t1, t2) or cmp(int(n1), int(n2))
except: return 1

def tidy (list):
if list:
pairs = [numb.match(n).groups() for n in list]
pairs.sort(compare)
(text, num) = pairs[0]
(groups, last) = reduce (group, pairs, ([], (text, num, num)))
return map (fmt, groups+[last])
else: return list

print tidy(['6','7','mx8','mx09','mx10','8','9','10','foo'])

andrew

--
http://www.acooke.org

BOCQUET Jean-Francois

unread,

May 30, 2003, 6:17:50 AM5/30/03

to

hello,
just to notice that your problem is solved in perl with clean code :
sub display {
my ($prefix, $first, $last) = @_;
return "$prefix$first" if $last eq $first;
return "$prefix$first-$last";
}

my @names = qw(6 7 mx8 mx09 mx10 8 9 10 foo);
my @result;

my ($prefix, $integer) = (shift @names) =~ m/(\D*)(\d*)/;
my $last = $integer;
while (my ($new_prefix, $new_integer) = (shift @names) =~ m/(\D*)(\d*)/) {
if ($new_prefix eq $prefix and $new_integer == $last + 1) {
$last += 1;
} else {
push @result, display ($prefix, $integer, $last);
$prefix = $new_prefix;
$integer = $last = $new_integer;
}
last unless @names;
}
push @result, display ($prefix, $integer, $last);

print join (',', @result) . "\n";

or can be solved in a concise way (perhaps someone can do better) :
my @o;
@result = grep {if ($_->[0] eq $o[0] and $_->[1] == $o[2]+1) {$o[2]++;0}
else {($_,@o)=(display(@o),@$_);1}} map {m/(\D*)(\d*)/;[$1,$2, $2]} @names;
push @result, display(@o);shift @result;

"George Young" <g...@ll.mit.edu> wrote in message
news:pan.2003.05.29....@ll.mit.edu...

Peter Hansen

unread,

May 30, 2003, 6:50:00 AM5/30/03

to

BOCQUET Jean-Francois wrote:
>
> hello,
> just to notice that your problem is solved in perl with clean code :

LOL!! Good one. :-)

> sub display {
> my ($prefix, $first, $last) = @_;
> return "$prefix$first" if $last eq $first;
> return "$prefix$first-$last";
> }

[snip rest of line noise]

-bigoted-ly y'rs,
Peter

Manuel Garcia

unread,

May 30, 2003, 1:42:08 PM5/30/03

to

On Thu, 29 May 2003 15:11:21 -0400, George Young <g...@ll.mit.edu>
wrote:

>I have a list of "names" like:
> ['6','7','mx8','mx09','mx10','8','9','10','foo']
>which needs for concise and clear display to become a string like:
>
> "6-7,mx8-10,8-10,foo"

I also see from your subject line that
[2,3,4,7] --> "2-4,7"
I am not sure all the code given before took this into account.

I was thinking about this problem since yesterday, because I have
coded similar stuff at least a dozen times in my life, but I had not
yet found a good way to do it.

I tried Python OO, and I think it turned out pretty clean.

###############################
## gyoung.py

import re

class GroupHelper:
def __init__(self, s='', **d):
m = re.match(r'(\D*)(\d+)$', s)
if m:
g = m.groups()
self.kind = 'pair'
self.prefix = g[0]
self.start = self.end = int(g[1])
else:
self.kind = 'string'
self.prefix = s
self.__dict__.update(d)
def __add__(self, other):
if (self.kind != 'pair') or (other.kind != 'pair'):
raise TypeError
if self.prefix != other.prefix:
raise TypeError
if other.start != (self.end + 1):
raise TypeError
return GroupHelper(
kind='pair',
prefix=self.prefix,
start=self.start,
end=other.end)
def __str__(self):
if self.kind == 'string':
return self.prefix
if self.start == self.end:
return '%s%i' % (self.prefix, self.start)
return '%s%i-%i' % (self.prefix, self.start, self.end)

def gyoung(list0):
if not list0: return ''
list0 = [GroupHelper(s) for s in list0]
result = [list0[0]]
for gh in list0[1:]:
try:
result[-1] += gh
except TypeError:
result.append(gh)
return ','.join( [str(y) for y in result] )

print gyoung(
['6','7','8','12','mx8','mx09','mx10','8','9','10','foo'])

###############################

The main loop is particularly clean, because the class does all the
work.

In the other solutions given, I was worried about what always happened
to me the other dozen times I coded something like this: if there is a
small change to the specification, my old algorithm cannot be simply
adapted.

For example, if you wanted 'foo' to be treated like 'foo00', or if in
the original list, you already had some items in the form
'mx8-12, mx13, mx14-17'
and you wanted the result to be 'mx8-17'
I think with this Python OO approach, you can make such changes
simply. Local changes to the parsing, 'adding', formatting, or the
main loop don't seem to necessitate a global change to the code.

The Perl solution was hilarious! I assume it was done tongue-in-cheek.

Manuel

CezaryB

unread,

May 30, 2003, 1:52:47 PM5/30/03

to

On 5/29/2003 9:11 PM, George Young wrote:
> [python 2.3a1]
> I have a list of "names" like:
> ['6','7','mx8','mx09','mx10','8','9','10','foo']
> which needs for concise and clear display to become a string like:
>
> "6-7,mx8-10,8-10,foo"
>
> I.e., a name is an integer or an alphabetic prefix possibly followed
> by an integer. The display must compress each set of succesive names
> having the same (possibly null) prefix and sequential integers. The
> original order of names must be preserved.
>
> I (hesitantly) post the following code that works, but makes me cringe
> anytime I see it, for it's clumsy unreadableness. Can someone with
> a fresh mind see a clear and concise way to make my clear and concise
> name display?
>

from __future__ import nested_scopes
import re

def collapse( input_sequence ):
search = re.compile( '\d+$' ).search
range_prefix = None
range_start = 0
range_last = 0
item_last = None
result_list = []

def append_last( ):
if range_prefix is not None:
if range_start == range_last:
result_list.append( item_last )
else:
result_list.append( range_prefix + str( range_start ) + '-' + str(
range_last ) )

for item in input_sequence:
found = search( item )
if found is None:
append_last( )
result_list.append( item )
range_prefix = None
continue
start = found.start( )
prefix = item[:start]
value = int( item[start:] )
if prefix == range_prefix and value == range_last + 1:
range_last = value
else:
append_last( )
range_prefix = prefix
range_start = value
range_last = value
item_last = item

append_last( )
return ','.join( result_list )

---
CB

Scott David Daniels

unread,

May 30, 2003, 3:55:47 PM5/30/03

to

Here's a whack at doing it on-the-fly. The tasks are split
as in Raymond Hettinger's solution. Generators allow a nice
local look at several elements in an incoming stream.

from __future__ import generators
import re

def inputparse(source):
splitter = re.compile('(\D*)(\d*)').match
for name in source:
yield splitter(name).groups()

def grouping(pairsource):
source = iter(pairsource)
lastname, lastnumber = source.next()
startnumber = lastnumber
for name, number in source:
if name == lastname and int(number) == int(lastnumber)+1:
lastnumber = number
else:
yield lastname, startnumber, lastnumber
lastname, startnumber, lastnumber = name, number, number
yield lastname, startnumber, lastnumber

def format(lastname, startnumber, lastnumber):
if startnumber is lastnumber:
return lastname + startnumber
else:
return lastname + startnumber + '-' + lastnumber

if __name__ == '__main__':

names = ['6','7','mx8','mx09','mx10','8','9','10','foo']

print ', '.join([format(name, start, stop) for name, start, stop
in grouping(inputparse(names))])

-Scott David Daniels
Scott....@Acm.Org

Alex

unread,

May 30, 2003, 6:51:06 PM5/30/03

to

Manuel Garcia wrote:

<snip>

> In the other solutions given, I was worried about what always happened
> to me the other dozen times I coded something like this: if there is a
> small change to the specification, my old algorithm cannot be simply
> adapted.

<snip>

> or if in
> the original list, you already had some items in the form
> 'mx8-12, mx13, mx14-17'
> and you wanted the result to be 'mx8-17'

<snip>

I agree with the sentiment completely. Having said that, here is how I
amused myself for the last hour:

import re
names=['6','7','mx8','mx09','mx10','8','9','10','foo']
r=re.compile(r'(\D*)(\d*)-?(\d*)')

for ii in xrange(len(names)-1, 0,-1):
(base1, low1, high1),(base2, low2, high2)=r.match(names[ii-1]).groups(), r.match(names[ii]).groups()
if base1==base2 and int(low1)+1==int(low2):
names[ii-1]='%s%d-%d' %(base1, int(low1), int(max(low2.rjust(len(high2)), high2)))
names.pop(ii)

print names

Alex

Bengt Richter

unread,

May 30, 2003, 8:27:48 PM5/30/03

to

On Thu, 29 May 2003 15:11:21 -0400, George Young <g...@ll.mit.edu> wrote:

>[python 2.3a1]
>I have a list of "names" like:
> ['6','7','mx8','mx09','mx10','8','9','10','foo']
>which needs for concise and clear display to become a string like:
>
> "6-7,mx8-10,8-10,foo"

Don't you need to be able to reconstitute the original sequence from the
abbreviated one? I.e., how would you know to go backwards to 'mx09' vs 'mx9' ?

[...]

Regards,
Bengt Richter

Manuel Garcia

unread,

May 30, 2003, 10:57:01 PM5/30/03

to

On Fri, 30 May 2003 22:51:06 GMT, Alex <delete.this.p...@attbi.com> wrote:

>I agree with the sentiment completely. Having said that, here is how I
>amused myself for the last hour:
>
>import re
>names=['6','7','mx8','mx09','mx10','8','9','10','foo']
>r=re.compile(r'(\D*)(\d*)-?(\d*)')
>
>for ii in xrange(len(names)-1, 0,-1):
> (base1, low1, high1),(base2, low2, high2)=r.match(names[ii-1]).groups(), r.match(names[ii]).groups()
> if base1==base2 and int(low1)+1==int(low2):
> names[ii-1]='%s%d-%d' %(base1, int(low1), int(max(low2.rjust(len(high2)), high2)))
> names.pop(ii)
>
>print names

Kick ass!

I recommend to everyone in the thread to play with this one, to fully
appreciate its majesty.

['6','7','12','mx8','mx09','mx10','mx11-12','8','9','10','foo']
-> ['6-7', '12', 'mx8-12', '8-10', 'foo']

Fantastic!

Two bugs, easily fixed:

It doesn't work for ['foo','foo1'].

Little mix-up with 'low' and 'high' in the 'if' condition,
so ['mx3-7','mx8'] doesn't work.

I am not sure if 'int(max(low2.rjust(len(high2)), high2))' always
works. I would replace it with
'int([high2, low2][len(high2)==0])'.

I think this fixes it:

########################
## gyoung2.py

import re
names = ['6','7','12','mx3-7','mx8','mx09','mx10','mx11-12','8','9','10','foo','foo1','foo2']

for ii in xrange(len(names)-1, 0,-1):

(base1, low1, high1),(base2, low2, high2)=(re.match(r'(\D*)(\d*)-?(\d*)', names[ii-1]).groups(), re.match(r'(\D*)(\d*)-?(\d*)', names[ii]).groups())
if base1 == base2 and low1 and low2 and int([high1, low1][len(high1)==0])+1 == int(low2):
names[ii-1] = '%s%d-%d' % (base1, int(low1), int([high2, low2][len(high2)==0]))
names.pop(ii)
print names

########################

My brain is not working to make it any shorter.

Pythonic, it is not!

Manuel

Alex

unread,

May 31, 2003, 11:22:32 PM5/31/03

to

Manuel Garcia wrote:

> Kick ass!

Thanks.

> Two bugs, easily fixed

Thanks for the fixes.

> Pythonic, it is not!

No, but it was fun!

Alex