Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

howto remove the thousand separator

161 views
Skip to first unread message

pyth0n3r

unread,
Apr 14, 2013, 2:57:35 PM4/14/13
to python-list
Hi,
I came across a problem that when i deal with int data with ',' as thousand separator, such as 12,916, i can not change it into int() or float().
How can i remove the comma in int data?
Any reply will be appreciated!!
 
Best,
Chen
 

Mark Janssen

unread,
Apr 14, 2013, 3:06:12 PM4/14/13
to pyth0n3r, python-list
cleaned=''
for c in myStringNumber:
if c != ',':
cleaned+=c
int(cleaned)

mark

Mitya Sirenef

unread,
Apr 14, 2013, 3:16:52 PM4/14/13
to pytho...@python.org
On 04/14/2013 02:57 PM, pyth0n3r wrote:
> Hi,
> I came across a problem that when i deal with int data with ',' as
thousand separator, such as 12,916, i can not change it into int() or
float().
> How can i remove the comma in int data?
> Any reply will be appreciated!!
>
> Best,
> Chen
>
>
>

I would do int(num.replace(',', ''))

-m


--
Lark's Tongue Guide to Python: http://lightbird.net/larks/

When a friend succeeds, I die a little. Gore Vidal

D'Arcy J.M. Cain

unread,
Apr 14, 2013, 3:17:49 PM4/14/13
to pyth0n3r, python-list
On Mon, 15 Apr 2013 02:57:35 +0800
"pyth0n3r" <pyth...@gmail.com> wrote:
> float(). How can i remove the comma in int data? Any reply will be

int(n.replace(',', ''))

--
D'Arcy J.M. Cain <da...@druid.net> | Democracy is three wolves
http://www.druid.net/darcy/ | and a sheep voting on
+1 416 788 2246 (DoD#0082) (eNTP) | what's for dinner.
IM: da...@Vex.Net, VOIP: sip:da...@Vex.Net

Mark Janssen

unread,
Apr 14, 2013, 3:22:27 PM4/14/13
to Mitya Sirenef, pytho...@python.org
> I would do int(num.replace(',', ''))

That's much more pythonic than my C-ish version

Mark

Mark Lawrence

unread,
Apr 14, 2013, 3:29:13 PM4/14/13
to pytho...@python.org
On 14/04/2013 19:57, pyth0n3r wrote:
> Hi,
> I came across a problem that when i deal with int data with ',' as
> thousand separator, such as 12,916, i can not change it into int() or
> float().
> How can i remove the comma in int data?
> Any reply will be appreciated!!
> Best,
> Chen
>
>

Use the string replace method thus.

>>> '12,916'.replace(',', '')
'12916'

--
If you're using GoogleCrap� please read this
http://wiki.python.org/moin/GoogleGroupsPython.

Mark Lawrence

Steven D'Aprano

unread,
Apr 14, 2013, 8:29:16 PM4/14/13
to
On Sun, 14 Apr 2013 12:06:12 -0700, Mark Janssen wrote:

> cleaned=''
> for c in myStringNumber:
> if c != ',':
> cleaned+=c
> int(cleaned)

Please don't write code like that. Firstly, it's long and bloated, and
runs at the speed of Python, not C. Second, it runs at the speed of
SLLLLOOOOOOOOOOWWWW Python, not fast Python, due to being an O(N**2)
algorithm.

If you don't know what O(N**2) means, you should read this for an
introduction:

http://www.joelonsoftware.com/articles/fog0000000319.html


--
Steven

Mark Janssen

unread,
Apr 14, 2013, 8:44:28 PM4/14/13
to Steven D'Aprano, pytho...@python.org
On Sun, Apr 14, 2013 at 5:29 PM, Steven D'Aprano
<steve+comp....@pearwood.info> wrote:
> On Sun, 14 Apr 2013 12:06:12 -0700, Mark Janssen wrote:
>
>> cleaned=''
>> for c in myStringNumber:
>> if c != ',':
>> cleaned+=c
>> int(cleaned)
>
> ....due to being an O(N**2) algorithm.

What on earth makes you think that is an O(n**2) algorithm and not O(n)?

Mark

Steven D'Aprano

unread,
Apr 14, 2013, 9:14:25 PM4/14/13
to
Strings are immutable. Consider building up a single string from four
substrings:

s = ''
s += 'fe'
s += 'fi'
s += 'fo'
s += 'fum'

Python *might* optimize the first concatenation, '' + 'fe', to just reuse
'fe', (but it might not). Let's assume it does, so that no copying is
needed. Then it gets to the second concatenation, and now it has to copy
characters, because strings are immutable and cannot be modified in
place. Showing the *running* total of characters copied:

'fe' + 'fi' => 'fefi' # four characters copied
'fefi' + 'fo' => 'fefifo' # 4 + 6 = ten characters copied
'fefifo' + 'fum' => 'fefifofum' # 10 + 9 = nineteen characters copied

Notice how each intermediate substring gets copied repeatedly? In order
to build up a string of length 9, we've had to copy at least 19
characters. With only four substrings, it's not terribly obvious how
badly this performs. So let's add some more substrings, and see how the
running total increases:

'fefifofum' + 'foo' => 'fefifofumfoo' # 19 + 12 = 31
'fefifofumfoo' + 'bar' => 'fefifofumfoobar' # 31 + 15 = 46
'fefifofumfoobar' + 'baz' => 'fefifofumfoobarbaz' # 46 + 18 = 64
'fefifofumfoobarbaz' + 'spam' => 'fefifofumfoobarbazspam' # 64 + 22 = 86


To build up a string of length 22, we've had to copy, and re-copy, and re-
re-copy, 86 characters in total. And the string gets bigger, the
inefficiency gets worse. Each substring (except the very last one) gets
copied multiple times; the number of times it gets copied is proportional
to the number of substrings.

If the substrings are individual characters, then each character is
copied a number of times proportional to the number of characters N;
since there are N characters, each being copied (proportional to) N
times, that makes N*N or N**2.



--
Steven

Chris Angelico

unread,
Apr 14, 2013, 9:29:17 PM4/14/13
to pytho...@python.org
On Mon, Apr 15, 2013 at 11:14 AM, Steven D'Aprano
<steve+comp....@pearwood.info> wrote:
> On Sun, 14 Apr 2013 17:44:28 -0700, Mark Janssen wrote:
>> What on earth makes you think that is an O(n**2) algorithm and not O(n)?
>
> Python *might* optimize the first concatenation, '' + 'fe', to just reuse
> 'fe', (but it might not). Let's assume it does, so that no copying is
> needed. Then it gets to the second concatenation, and now it has to copy
> characters, because strings are immutable and cannot be modified in
> place.

There are actually a lot of optimizations done, so it might turn out
to be O(n) in practice. But strictly in the Python code, yes, this is
definitely O(n*n).

ChrisA

Rotwang

unread,
Apr 14, 2013, 10:19:43 PM4/14/13
to
On 15/04/2013 02:14, Steven D'Aprano wrote:
> On Sun, 14 Apr 2013 17:44:28 -0700, Mark Janssen wrote:
>
>> On Sun, Apr 14, 2013 at 5:29 PM, Steven D'Aprano
>> <steve+comp....@pearwood.info> wrote:
>>> On Sun, 14 Apr 2013 12:06:12 -0700, Mark Janssen wrote:
>>>
>>>> cleaned=''
>>>> for c in myStringNumber:
>>>> if c != ',':
>>>> cleaned+=c
>>>> int(cleaned)
>>>
>>> ....due to being an O(N**2) algorithm.
>>
>> What on earth makes you think that is an O(n**2) algorithm and not O(n)?
>
> Strings are immutable. Consider building up a single string from four
> substrings:
>
> s = ''
> s += 'fe'
> s += 'fi'
> s += 'fo'
> s += 'fum'
>
> Python *might* optimize the first concatenation, '' + 'fe', to just reuse
> 'fe', (but it might not). Let's assume it does, so that no copying is
> needed. Then it gets to the second concatenation, and now it has to copy
> characters, because strings are immutable and cannot be modified in
> place.

Actually, I believe that CPython is optimised to modify strings in place
where possible, so that the above would surprisingly turn out to be
O(n). See the following thread where I asked about this:

http://groups.google.com/group/comp.lang.python/browse_thread/thread/990a695fe2d85c52

(Sorry for linking to Google Groups. Does anyone know of a better c.l.p.
web archive?)

Walter Hurry

unread,
Apr 14, 2013, 10:25:03 PM4/14/13
to
On Mon, 15 Apr 2013 11:29:17 +1000, Chris Angelico wrote:

> There are actually a lot of optimizations done, so it might turn out to
> be O(n) in practice. But strictly in the Python code, yes, this is
> definitely O(n*n).

In any event, Janssen should cease and desist offering advice here if he
can't do better than that.

Roy Smith

unread,
Apr 14, 2013, 10:35:42 PM4/14/13
to
In article <kkfodv$f5m$1...@news.albasani.net>,
That's a little harsh. Sure, it was a "sub-optimal" way to write the
code (for all the reasons people mentioned), but it engendered a good
discussion.

Ned Deily

unread,
Apr 15, 2013, 1:15:22 AM4/15/13
to pytho...@python.org
In article <kkfnun$kpj$1...@dont-email.me>, Rotwang <sg...@hotmail.co.uk>
wrote:
> (Sorry for linking to Google Groups. Does anyone know of a better c.l.p.
> web archive?)

http://dir.gmane.org/gmane.comp.python.general

--
Ned Deily,
n...@acm.org

Steven D'Aprano

unread,
Apr 15, 2013, 3:03:15 AM4/15/13
to
I deliberately didn't open that can of worms, mostly because I was in a
hurry, but also because it's not an optimization you can rely on. It
depends on the version, implementation, operating system, and the exact
code running.

1) It only applies to code running under some, but not all, versions of
CPython. It does not apply to PyPy, Jython, IronPython, and probably not
other implementations.


2) Even under CPython, it can fail. It *will* fail if you have multiple
references to the same strings. And it *may* fail depending on the
vagaries of the memory management system in place, e.g. code that is
optimized on Linux may fail to optimize under Windows, leading to slow
code.


As far as I'm concerned, the best advice regarding this optimization is:

- always program as if it doesn't exist;

- but be glad it does when you're writing quick and dirty code in the
interactive interpreter, where the convenience of string concatenation
may be just too darn convenient to bother doing the right thing.



> http://groups.google.com/group/comp.lang.python/browse_thread/
thread/990a695fe2d85c52
>
> (Sorry for linking to Google Groups. Does anyone know of a better c.l.p.
> web archive?)

The canonical (although possibly not the best) archive for c.l.p. is the
python-list mailing list archive:

http://mail.python.org/mailman/listinfo/python-list


--
Steven

Steven D'Aprano

unread,
Apr 15, 2013, 3:04:22 AM4/15/13
to
Agreed. I'd rather people come out with poor code, and LEARN from the
answers, than feel that they dare not reply until they're an expert.



--
Steven

Chris Angelico

unread,
Apr 15, 2013, 3:39:25 AM4/15/13
to pytho...@python.org
On Mon, Apr 15, 2013 at 5:03 PM, Steven D'Aprano
<steve+comp....@pearwood.info> wrote:
> On Mon, 15 Apr 2013 03:19:43 +0100, Rotwang wrote:
>
>> On 15/04/2013 02:14, Steven D'Aprano wrote:
>>> Strings are immutable. Consider building up a single string from four
>>> substrings:
>>
>> Actually, I believe that CPython is optimised to modify strings in place
>> where possible, so that the above would surprisingly turn out to be
>> O(n). See the following thread where I asked about this:
>
> I deliberately didn't open that can of worms, mostly because I was in a
> hurry, but also because it's not an optimization you can rely on. It
> depends on the version, implementation, operating system, and the exact
> code running.
>
> As far as I'm concerned, the best advice regarding this optimization is:
>
> - always program as if it doesn't exist;
>
> - but be glad it does when you're writing quick and dirty code in the
> interactive interpreter, where the convenience of string concatenation
> may be just too darn convenient to bother doing the right thing.

Agreed; that's why, in my reply, I emphasized that the pure Python
code IS quadratic, even though the actual implementation might turn
out linear. (I love that word "might". Covers myriad possibilities on
both sides.)

Same goes for all sorts of other possibilities. I wouldn't test string
equality with 'is' without explicit interning, even if I'm testing a
constant against another constant in the same module - but I might get
a big performance boost if the system's interned all its constants for
me.

ChrisA

Rotwang

unread,
Apr 15, 2013, 6:16:04 PM4/15/13
to
On 15/04/2013 08:03, Steven D'Aprano wrote:
> On Mon, 15 Apr 2013 03:19:43 +0100, Rotwang wrote:
>> [...]
>>
>> (Sorry for linking to Google Groups. Does anyone know of a better c.l.p.
>> web archive?)
>
> The canonical (although possibly not the best) archive for c.l.p. is the
> python-list mailing list archive:
>
> http://mail.python.org/mailman/listinfo/python-list

Thanks to both you and Ned.

pyth0n3r

unread,
Apr 14, 2013, 3:33:43 PM4/14/13
to D'Arcy J.M. Cain, python-list
Hi D'A,
Thanks alot for your reply, it works for me perfectly.
Best,
Chen
 
 
On Mon, 15 Apr 2013 02:57:35 +0800
"pyth0n3r" <pyth...@gmail.com> wrote:
> float(). How can i remove the comma in int data? Any reply will be

Duncan Booth

unread,
Apr 19, 2013, 11:04:57 AM4/19/13
to
Parse it using the locale module, just be sure to set the correct locale
first:

>>> import locale
>>> locale.setlocale(locale.LC_ALL, '')
'English_United Kingdom.1252'
>>> locale.atoi('1,000')
1000
>>> locale.atof('1,000')
1000.0
>>> locale.setlocale(locale.LC_ALL, 'French_France')
'French_France.1252'
>>> locale.atof('1,000')
1.0


--
Duncan Booth http://kupuguy.blogspot.com
0 new messages