Splitting a string into substrings of equal size

candide

unread,

Aug 14, 2009, 8:22:57 PM8/14/09

to

Suppose you need to split a string into substrings of a given size (except
possibly the last substring). I make the hypothesis the first slice is at the
end of the string.
A typical example is provided by formatting a decimal string with thousands
separator.

What is the pythonic way to do this ?

For my part, i reach to this rather complicated code:

# ----------------------

def comaSep(z,k=3, sep=','):
z=z[::-1]
x=[z[k*i:k*(i+1)][::-1] for i in range(1+(len(z)-1)/k)][::-1]
return sep.join(x)

# Test
for z in ["75096042068045", "509", "12024", "7", "2009"]:
print z+" --> ", comaSep(z)

# ----------------------

outputting :

75096042068045 --> 75,096,042,068,045
509 --> 509
12024 --> 12,024
7 --> 7
2009 --> 2,009

Thanks

Gabriel Genellina

unread,

Aug 14, 2009, 9:00:52 PM8/14/09

to pytho...@python.org

En Fri, 14 Aug 2009 21:22:57 -0300, candide <can...@free.invalid>
escribi�:

> Suppose you need to split a string into substrings of a given size
> (except
> possibly the last substring). I make the hypothesis the first slice is
> at the
> end of the string.
> A typical example is provided by formatting a decimal string with
> thousands
> separator.
>
>
> What is the pythonic way to do this ?

py> import locale
py> locale.setlocale(locale.LC_ALL, '')
'Spanish_Argentina.1252'
py> locale.format("%d", 75096042068045, True)
'75.096.042.068.045'

:)

> For my part, i reach to this rather complicated code:

Mine isn't very simple either:

py> def genparts(z):
... n = len(z)
... i = n%3
... if i: yield z[:i]
... for i in xrange(i, n, 3):
... yield z[i:i+3]
...
py> ','.join(genparts("75096042068045"))
'75,096,042,068,045'

--
Gabriel Genellina

Jan Kaliszewski

unread,

Aug 14, 2009, 10:17:35 PM8/14/09

to candide, pytho...@python.org

15-08-2009 candide <can...@free.invalid> wrote:

> Suppose you need to split a string into substrings of a given size
> (except
> possibly the last substring). I make the hypothesis the first slice is
> at the end of the string.
> A typical example is provided by formatting a decimal string with
> thousands separator.

I'd use iterators, especially for longer strings...

import itertools

def separate(text, grouplen=3, sep=','):
"separate('12345678') -> '123,456,78'"
repeated_iterator = [iter(text)] * grouplen
groups = itertools.izip_longest(fillvalue='', *repeated_iterator)
strings = (''.join(group) for group in groups) # gen. expr.
return sep.join(strings)

def back_separate(text, grouplen=3, sep=','):
"back_separate('12345678') -> '12,345,678'"
repeated_iterator = [reversed(text)] * grouplen
groups = itertools.izip_longest(fillvalue='', *repeated_iterator)
strings = [''.join(reversed(group)) for group in groups] # list compr.
return sep.join(reversed(strings))

print separate('12345678')
print back_separate('12345678')

# alternate implementation
# (without "materializing" 'strings' as a list in back_separate):
def separate(text, grouplen=3, sep=','):
"separate('12345678') -> '12,345,678'"
textlen = len(text)
end = textlen - (textlen % grouplen)
repeated_iterator = [iter(itertools.islice(text, 0, end))] * grouplen
strings = itertools.imap(lambda *chars: ''.join(chars),
*repeated_iterator)
return sep.join(itertools.chain(strings, (text[end:],)))

def back_separate(text, grouplen=3, sep=','):
"back_separate('12345678') -> '12,345,678'"
beg = len(text) % grouplen
repeated_iterator = [iter(itertools.islice(text, beg, None))] *
grouplen
strings = itertools.imap(lambda *chars: ''.join(chars),
*repeated_iterator)
return sep.join(itertools.chain((text[:beg],), strings))

print separate('12345678')
print back_separate('12345678')

http://docs.python.org/library/itertools.html#recipes
was the inspiration for me (especially grouper).

Cheers,
*j
--
Jan Kaliszewski (zuo) <z...@chopin.edu.pl>

Jan Kaliszewski

unread,

Aug 14, 2009, 10:40:05 PM8/14/09

to candide, pytho...@python.org

15-08-2009 Jan Kaliszewski <z...@chopin.edu.pl> wrote:

> 15-08-2009 candide <can...@free.invalid> wrote:
>
>> Suppose you need to split a string into substrings of a given size
>> (except
>> possibly the last substring). I make the hypothesis the first slice is
>> at the end of the string.
>> A typical example is provided by formatting a decimal string with
>> thousands separator.
>

> I'd use iterators, especially for longer strings...
>
>
> import itertools

[snip]

Err... It's too late for coding... Now I see obvious and simpler variant:

def separate(text, grouplen=3, sep=','):
"separate('12345678') -> '123,456,78'"

textlen = len(text)
end = textlen - (textlen % grouplen)

strings = (text[i:i+grouplen] for i in xrange(0, end, grouplen))

return sep.join(itertools.chain(strings, (text[end:],)))

def back_separate(text, grouplen=3, sep=','):
"back_separate('12345678') -> '12,345,678'"

textlen = len(text)
beg = textlen % grouplen
strings = (text[i:i+grouplen] for i in xrange(beg, textlen, grouplen))

return sep.join(itertools.chain((text[:beg],), strings))

print separate('12345678')
print back_separate('12345678')

--
Jan Kaliszewski (zuo) <z...@chopin.edu.pl>

Rascal

unread,

Aug 15, 2009, 2:08:14 AM8/15/09

to

I'm bored for posting this, but here it is:

def add_commas(str):
str_list = list(str)
str_len = len(str)
for i in range(3, str_len, 3):
str_list.insert(str_len - i, ',')
return ''.join(str_list)

candide

unread,

Aug 15, 2009, 8:28:27 AM8/15/09

to

Thanks to all for your response. I particularly appreciate Rascal's solution.

Jan Kaliszewski

unread,

Aug 15, 2009, 9:38:53 AM8/15/09

to Rascal, pytho...@python.org

For short strings (for sure most common case) it's ok: simple and clear.
But for huge ones, it's better not to materialize additional list for the
string -- then pure-iterator-sollutions would be better (like Gabriel's or
mine).

Cheers,
*j

Emile van Sebille

unread,

Aug 15, 2009, 1:28:00 PM8/15/09

to pytho...@python.org

On 8/14/2009 5:22 PM candide said...

> Suppose you need to split a string into substrings of a given size (except
> possibly the last substring). I make the hypothesis the first slice is at the
> end of the string.
> A typical example is provided by formatting a decimal string with thousands
> separator.
>
>
> What is the pythonic way to do this ?

I like list comps...

>>> jj = '1234567890123456789'
>>> ",".join([jj[ii:ii+3] for ii in range(0,len(jj),3)])
'123,456,789,012,345,678,9'
>>>

Emile

Gregor Lingl

unread,

Aug 15, 2009, 2:49:45 PM8/15/09

to

> What is the pythonic way to do this ?
>
>
> For my part, i reach to this rather complicated code:
>
>
> # ----------------------
>
> def comaSep(z,k=3, sep=','):
> z=z[::-1]
> x=[z[k*i:k*(i+1)][::-1] for i in range(1+(len(z)-1)/k)][::-1]
> return sep.join(x)
>
> # Test
> for z in ["75096042068045", "509", "12024", "7", "2009"]:
> print z+" --> ", comaSep(z)
>

Just if you are interested, a recursive solution:

>>> def comaSep(z,k=3,sep=","):
return comaSep(z[:-3],k,sep)+sep+z[-3:] if len(z)>3 else z

>>> comaSep("7")
'7'
>>> comaSep("2007")
'2,007'
>>> comaSep("12024")
'12,024'
>>> comaSep("509")
'509'
>>> comaSep("75096042068045")
'75,096,042,068,045'
>>>

Gregor

Gregor Lingl

unread,

Aug 15, 2009, 2:48:51 PM8/15/09

to candide

> What is the pythonic way to do this ?
>
>
> For my part, i reach to this rather complicated code:
>
>
> # ----------------------
>
> def comaSep(z,k=3, sep=','):
> z=z[::-1]
> x=[z[k*i:k*(i+1)][::-1] for i in range(1+(len(z)-1)/k)][::-1]
> return sep.join(x)
>
> # Test
> for z in ["75096042068045", "509", "12024", "7", "2009"]:
> print z+" --> ", comaSep(z)
>

Just if you are interested, a recursive solution:

Gregor Lingl

unread,

Aug 15, 2009, 2:50:03 PM8/15/09

to

Emile van Sebille schrieb:

> On 8/14/2009 5:22 PM candide said...

...

>> What is the pythonic way to do this ?
>
> I like list comps...
>
> >>> jj = '1234567890123456789'
> >>> ",".join([jj[ii:ii+3] for ii in range(0,len(jj),3)])
> '123,456,789,012,345,678,9'
> >>>
>
> Emile
>

Less beautiful but more correct:

>>> ",".join([jj[max(ii-3,0):ii] for ii in
range(len(jj)%3,len(jj)+3,3)])
'1,234,567,890,123,456,789'

Gregor

Mark Tolonen

unread,

Aug 15, 2009, 4:22:39 PM8/15/09

to pytho...@python.org

"Gregor Lingl" <gregor...@aon.at> wrote in message
news:4a87036a$0$2292$91ce...@newsreader02.highway.telekom.at...

Is it?

>>> jj = '234567890123456789'

>>> ",".join([jj[max(ii-3,0):ii] for ii in range(len(jj)%3,len(jj)+3,3)])

',234,567,890,123,456,789'

At least one other solution in this thread had the same problem.

-Mark

ryles

unread,

Aug 15, 2009, 5:54:27 PM8/15/09

to

py> s='1234567'
py> ','.join(_[::-1] for _ in re.findall('.{1,3}',s[::-1])[::-1])
'1,234,567'
py> # j/k ;)

MRAB

unread,

Aug 15, 2009, 6:06:22 PM8/15/09

to pytho...@python.org

If you're going to use re, then:

>>> for z in ["75096042068045", "509", "12024", "7", "2009"]:

print re.sub(r"(?<=.)(?=(?:...)+$)", ",", z)

75,096,042,068,045
509
12,024
7
2,009

MRAB

unread,

Aug 15, 2009, 6:28:09 PM8/15/09

to pytho...@python.org

Brian wrote:
>
>
> On Sat, Aug 15, 2009 at 4:06 PM, MRAB <pyt...@mrabarnett.plus.com
> <mailto:pyt...@mrabarnett.plus.com>> wrote:

> If you're going to use re, then:
>
>

> >>> for z in ["75096042068045", "509", "12024", "7", "2009"]:

> print re.sub(r"(?<=.)(?=(?:...)+$)", ",", z)
>
>
>
> 75,096,042,068,045
> 509
> 12,024
> 7
> 2,009
>
>

> Can you please break down this regex?
>
The call replaces a zero-width match with a comma, ie inserts a comma,
if certain conditions are met:

"(?<=.)"
Look behind for 1 character. There must be at least one previous
character. This ensures that a comma is never inserted at the start of
the string. I could also have used "(?<!^)". Actually, it doesn't check
whether the first character is a "-". That's left as an exercise for the
reader. :-)

"(?=(?:...)+$)"
Look ahead for a multiple of 3 characters, followed by the end of
the string.

ryles

unread,

Aug 15, 2009, 6:41:30 PM8/15/09

to

On Aug 15, 6:28 pm, MRAB <pyt...@mrabarnett.plus.com> wrote:

> > >>> for z in ["75096042068045", "509", "12024", "7", "2009"]:
> > print re.sub(r"(?<=.)(?=(?:...)+$)", ",", z)
>
> > 75,096,042,068,045
> > 509
> > 12,024
> > 7
> > 2,009
>

> The call replaces a zero-width match with a comma, ie inserts a comma,
> if certain conditions are met:
>
> "(?<=.)"
> Look behind for 1 character. There must be at least one previous
> character. This ensures that a comma is never inserted at the start of
> the string. I could also have used "(?<!^)". Actually, it doesn't check
> whether the first character is a "-". That's left as an exercise for the
> reader. :-)
>
> "(?=(?:...)+$)"
> Look ahead for a multiple of 3 characters, followed by the end of
> the string.

Wow, well done. An exceptional recipe from Python's unofficial regex
guru. And thanks for sharing the explanation.

Gregor Lingl

unread,

Aug 16, 2009, 5:46:19 AM8/16/09

to

Mark Tolonen schrieb:

Schluck!

Even more ugly:

",".join([jj[max(ii-3,0):ii] for ii in

range(len(jj)%3,len(jj)+3,3)]).strip(",")
'234,567,890,123,456,789'

Gregor

Simon Forman

unread,

Aug 16, 2009, 3:36:20 PM8/16/09

to

On Aug 14, 8:22 pm, candide <cand...@free.invalid> wrote:

FWIW:

def chunks(s, length=3):
stop = len(s)
start = stop - length
while start > 0:
yield s[start:stop]
stop, start = start, start - length
yield s[:stop]

s = '1234567890'
print ','.join(reversed(list(chunks(s))))
# prints '1,234,567,890'

Gregor Lingl

unread,

Aug 17, 2009, 7:14:57 PM8/17/09

to

Simon Forman schrieb:

> On Aug 14, 8:22 pm, candide <cand...@free.invalid> wrote:
>> Suppose you need to split a string into substrings of a given size (except
>> possibly the last substring). I make the hypothesis the first slice is at the
>> end of the string.
>> A typical example is provided by formatting a decimal string with thousands
>> separator.
>>
>> What is the pythonic way to do this ?
>>

...

>> Thanks
>
> FWIW:
>
> def chunks(s, length=3):
> stop = len(s)
> start = stop - length
> while start > 0:
> yield s[start:stop]
> stop, start = start, start - length
> yield s[:stop]
>
>
> s = '1234567890'
> print ','.join(reversed(list(chunks(s))))
> # prints '1,234,567,890'

or:

>>> def chunks(s, length=3):
i, j = 0, len(s) % length or length
while i < len(s):
yield s[i:j]
i, j = j, j + length

>>> print(','.join(list(chunks(s))))
1,234,567,890
>>> print(','.join(list(chunks(s,2))))
12,34,56,78,90
>>> print(','.join(list(chunks(s,4))))
12,3456,7890

Regards,
Gregor