Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Difference between str.isdigit() and str.isdecimal() in Python 3

778 views
Skip to first unread message

Marco

unread,
May 16, 2012, 11:48:19 AM5/16/12
to
Hi all, because

"There should be one-- and preferably only one --obvious way to do it",

there should be a difference between the two methods in the subject, but
I can't find it:

>>> '123'.isdecimal(), '123'.isdigit()
(True, True)
>>> print('\u0660123')
٠123
>>> '\u0660123'.isdigit(), '\u0660123'.isdecimal()
(True, True)
>>> print('\u216B')

>>> '\u216B'.isdecimal(), '\u216B'.isdigit()
(False, False)

Can anyone give me some help?
Regards, Marco

Ulrich Eckhardt

unread,
May 16, 2012, 12:24:57 PM5/16/12
to
Marco wrote:
> >>> '123'.isdecimal(), '123'.isdigit()
> (True, True)
> >>> print('\u0660123')
> ٠123
> >>> '\u0660123'.isdigit(), '\u0660123'.isdecimal()
> (True, True)
> >>> print('\u216B')
> Ⅻ
> >>> '\u216B'.isdecimal(), '\u216B'.isdigit()
> (False, False)

[chr(a) for a in range(0x20000) if chr(a).isdigit()]

Congratulations, you found a bug! Or maybe not, it all depends on whether
Roman numbers are considered digits or not. I could imagine there being a
difference.

:)

Uli

Marco

unread,
May 16, 2012, 2:45:49 PM5/16/12
to
On 05/16/2012 06:24 PM, Ulrich Eckhardt wrote:

> Marco wrote:
>> > >>> '123'.isdecimal(), '123'.isdigit()
>> > (True, True)
>> > >>> print('\u0660123')
>> > ٠123
>> > >>> '\u0660123'.isdigit(), '\u0660123'.isdecimal()
>> > (True, True)
>> > >>> print('\u216B')
>> > Ⅻ
>> > >>> '\u216B'.isdecimal(), '\u216B'.isdigit()
>> > (False, False)

> [chr(a) for a in range(0x20000) if chr(a).isdigit()]

Thanks to your list comprehension I found they are not equal:

>>> set([chr(a) for a in range(0x10FFFF) if chr(a).isdigit()]) - \
... set([chr(a) for a in range(0x10FFFF) if chr(a).isdecimal()])

Marco

jmfauth

unread,
May 16, 2012, 3:41:45 PM5/16/12
to
It seems to me that it is correct, and the reason lies in this:

>>> import unicodedata as ud
>>> ud.category('\u216b')
'Nl'
>>> ud.category('1')
'Nd'
>>>
>>> # Note
>>> ud.numeric('\u216b')
12.0

jmf

Thomas 'PointedEars' Lahn

unread,
May 16, 2012, 5:07:28 PM5/16/12
to
RTFM.

$ python3 -c 'print("42".isdecimal.__doc__ + "\n");
print("42".isdigit.__doc__)'
S.isdecimal() -> bool

Return True if there are only decimal characters in S,
False otherwise.

S.isdigit() -> bool

Return True if all characters in S are digits
and there is at least one character in S, False otherwise.

--
PointedEars

Please do not Cc: me. / Bitte keine Kopien per E-Mail.

Steven D'Aprano

unread,
May 16, 2012, 8:15:26 PM5/16/12
to
On Wed, 16 May 2012 17:48:19 +0200, Marco wrote:

> Hi all, because
>
> "There should be one-- and preferably only one --obvious way to do it",
>
> there should be a difference between the two methods in the subject, but
> I can't find it:

The Fine Manual has more detail, although I admit it isn't *entirely*
clear what it is talking about if you're not a Unicode expert:


http://docs.python.org/py3k/library/stdtypes.html#str.isdecimal

str.isdecimal()
Return true if all characters in the string are decimal characters
and there is at least one character, false otherwise. Decimal characters
are those from general category “Nd”. This category includes digit
characters, and all characters that can be used to form decimal-radix
numbers, e.g. U+0660, ARABIC-INDIC DIGIT ZERO.

str.isdigit()
Return true if all characters in the string are digits and there is
at least one character, false otherwise. Digits include decimal
characters and digits that need special handling, such as the
compatibility superscript digits. Formally, a digit is a character that
has the property value Numeric_Type=Digit or Numeric_Type=Decimal.


And also:

str.isnumeric()
Return true if all characters in the string are numeric characters,
and there is at least one character, false otherwise. Numeric characters
include digit characters, and all characters that have the Unicode
numeric value property, e.g. U+2155, VULGAR FRACTION ONE FIFTH. Formally,
numeric characters are those with the property value Numeric_Type=Digit,
Numeric_Type=Decimal or Numeric_Type=Numeric.


Examples:

py> c = '\u2155'
py> print(c)

py> c.isdecimal(), c.isdigit(), c.isnumeric()
(False, False, True)
py> import unicodedata
py> unicodedata.numeric(c)
0.2

py> c = '\u00B2'
py> print(c)
²
py> c.isdecimal(), c.isdigit(), c.isnumeric()
(False, True, True)
py> unicodedata.numeric(c)
2.0


--
Steven

Marco

unread,
May 17, 2012, 3:58:48 AM5/17/12
to
On 05/17/2012 02:15 AM, Steven D'Aprano wrote:

> the Fine Manual has more detail, although I admit it isn't *entirely*
> clear what it is talking about if you're not a Unicode expert:
>
>
> http://docs.python.org/py3k/library/stdtypes.html#str.isdecimal

You are right, that is clear, thanks :)

> Examples:
>
> py> c = '\u2155'
> py> print(c)
> ⅕
> py> c.isdecimal(), c.isdigit(), c.isnumeric()
> (False, False, True)
> py> import unicodedata
> py> unicodedata.numeric(c)
> 0.2
>
> py> c = '\u00B2'
> py> print(c)
> ²
> py> c.isdecimal(), c.isdigit(), c.isnumeric()
> (False, True, True)
> py> unicodedata.numeric(c)
> 2.0

Perfect explanation, thanks again, Marco


0 new messages