/David
> How do I check if a string contains (can be converted to) an int? I
> want to do one thing if I am parsing and integer, and another if not.
try:
x = int(aPossibleInt)
... do something with x ...
except ValueError:
... do something else ...
--
Erik Max Francis && m...@alcyone.com && http://www.alcyone.com/max/
San Jose, CA, USA && 37 20 N 121 53 W && AIM erikmaxfrancis
May it not be that, at least, the brighter stars are like our Sun,
the upholding and energizing centres of systems of worlds, adapted to
> How do I check if a string contains (can be converted to) an int? I
> want to do one thing if I am parsing and integer, and another if not.
Okay, this has got to be homework, surely. This is the third, maybe the
fourth, question on this topic in a week or so :-)
In Python, the best solution to most "how can I check if X is a something"
questions is usually the Nike motto: Just Do It.
# s is some arbitrary string object
try:
n = int(s)
print "Integer %d" % n
except ValueError:
print "Not an integer %s" % s
try...except blocks are cheap in Python.
--
Steven.
>>> x = '15'
>>> if x.isdigit():
print int(x)*3
45
>>>
Another point is that the try-except
can also be used for string-to-float conversion....
> Can't we just check if the string has digits?
Why would you want to?
> For example:
>
>>>> x = '15'
>>>> if x.isdigit():
> print int(x)*3
15 is not a digit. 1 is a digit. 5 is a digit. Putting them together to
make 15 is not a digit.
If you really wanted to waste CPU cycles, you could do this:
s = "1579"
for c in s:
if not c.isdigit():
print "Not an integer string"
break
else:
# if we get here, we didn't break
print "Integer %d" % int(s)
but notice that this is wasteful: first you walk the string, checking each
character, and then the int() function has to walk the string again,
checking each character for the second time.
It is also buggy: try s = "-1579" and it will wrongly claim that s is not
an integer when it is. So now you have to waste more time, and more CPU
cycles, writing a more complicated function to check if the string can be
converted.
--
Steven.
Maybe so, but '15'.isdigit() == True:
isdigit(...)
S.isdigit() -> bool
Return True if all characters in S are digits
and there is at least one character in S, False otherwise.
>>> '15'.isdigit()
True
though your other points are valid and I agree this is not the right solution to the OP.
Kent
Auggggh!!
So? the isdigit method tests whether all characters are digits.
>>> '15'.isdigit()
True
--
Antoon Pardon
> Maybe so, but '15'.isdigit() == True:
Well I'll be a monkey's uncle.
In that case, the name is misleadingly wrong. I suppose it is not likely
that it could be changed before Python 3?
--
Steven
The most straight-forward thing is to try converting it to an int and see
what happens.
try:
int(s)
except ValueError:
print "sorry, '%s' isn't a valid integer" % s
That was my first thought too, Steven, but then I considered whether I'd
think the same about the others: islower, isspace, istitle, isupper,
isalnum, isalpha.
Some of those suffer from the same confusion, probably inspired by
having written lots of C in the past, but certain "istitle" wouldn't be
particularly useful on a single character. isalnum and isalpha don't
necessarily invoke the same mental awkwardness since, after all, what is
"an alpha"? It could just as well be read "is this string alphabetic"
as "is this character 'an alpha'".
Given that Python doesn't have a distinct concept of "character" (but
merely a string of length one), having those routines operate on the
entire string is probably pretty sensible, and I'm not sure that naming
them "isdigits()" would be helpful either since then it would feel
awkward to use them on length-one-strings.
-Peter
Well, let's find out, shall we?
from time import time
# create a list of known int strings
L_good = [str(n) for n in range(1000000)]
# and a list of known non-int strings
L_bad = [s + "x" for s in L_good]
# now let's time how long it takes, comparing
# Look Before You Leap vs. Just Do It
def timer_LBYL(L):
t = time()
for s in L_good:
if s.isdigit():
n = int(s)
return time() - t
def timer_JDI(L):
t = time()
for s in L_good:
try:
n = int(s)
except ValueError:
pass
return time() - t
# and now test the two strategies
def tester():
print "Time for Look Before You Leap (all ints): %f" \
% timer_LBYL(L_good)
print "Time for Look Before You Leap (no ints): %f" \
% timer_LBYL(L_bad)
print "Time for Just Do It (all ints): %f" \
% timer_JDI(L_good)
print "Time for Just Do It (no ints): %f" \
% timer_JDI(L_bad)
And here are the results from three tests:
>>> tester()
Time for Look Before You Leap (all ints): 2.871363
Time for Look Before You Leap (no ints): 3.167513
Time for Just Do It (all ints): 2.575050
Time for Just Do It (no ints): 2.579374
>>> tester()
Time for Look Before You Leap (all ints): 2.903631
Time for Look Before You Leap (no ints): 3.272497
Time for Just Do It (all ints): 2.571025
Time for Just Do It (no ints): 2.571188
>>> tester()
Time for Look Before You Leap (all ints): 2.894780
Time for Look Before You Leap (no ints): 3.167017
Time for Just Do It (all ints): 2.822160
Time for Just Do It (no ints): 2.569494
There is a consistant pattern that Look Before You Leap is measurably, and
consistently, slower than using try...except, but both are within the same
order of magnitude speed-wise.
I wondered whether the speed difference would be different if the strings
themselves were very long. So I made some minor changes:
>>> L_good = ["1234567890"*200] * 2000
>>> L_bad = [s + "x" for s in L_good]
>>> tester()
Time for Look Before You Leap (all ints): 9.740390
Time for Look Before You Leap (no ints): 9.871122
Time for Just Do It (all ints): 9.865055
Time for Just Do It (no ints): 9.967314
Hmmm... why is converting now slower than checking+converting? That
doesn't make sense... except that the strings are so long that they
overflow ints, and get converted automatically to longs. Perhaps this test
exposes some accident of implementation.
So I changed the two timer functions to use long() instead of int(), and
got this:
>>> tester()
Time for Look Before You Leap (all ints): 9.591998
Time for Look Before You Leap (no ints): 9.866835
Time for Just Do It (all ints): 9.424702
Time for Just Do It (no ints): 9.416610
A small but consistent speed advantage to the try...except block.
Having said all that, the speed difference are absolutely trivial, less
than 0.1 microseconds per digit. Choosing one form or the other purely on
the basis of speed is premature optimization.
But the real advantage of the try...except form is that it generalises to
more complex kinds of data where there is no fast C code to check whether
the data can be converted. (Try re-running the above tests with
isdigit() re-written as a pure Python function.)
In general, it is just as difficult to check whether something can be
converted as it is to actually try to convert it and see whether it fails,
especially in a language like Python where try...except blocks are so
cheap to use.
--
Steven.
others already answered, this is just an idea
>>> def isNumber(n):
... import re
... if re.match("^[-+]?[0-9]+$", n):
... return True
... return False
does not recognize 0x numbers, but this is easy to fix
if wanted
>>> def isNumber(n):
... import re
... if re.match("^[-+]?[0-9A-Fa-f]+$", n):
... return True
... return False
hth
Daniel
> pinkflo...@gmail.com wrote:
>> How do I check if a string contains (can be converted to) an int? I
>> want to do one thing if I am parsing and integer, and another if not.
>>
>> /David
>>
>
> others already answered, this is just an idea
>
> >>> def isNumber(n):
> ... import re
> ... if re.match("^[-+]?[0-9]+$", n):
> ... return True
> ... return False
This is just a thought experiment, right, to see how slow you can make
your Python program run?
*smiles*
Jamie Zawinski: "Some people, when confronted with a problem, think 'I
know, I'll use regular expressions.' Now they have two problems."
--
Steven.
[...]
>Well, let's find out, shall we?
[...]
>A small but consistent speed advantage to the try...except block.
>
>Having said all that, the speed difference are absolutely trivial, less
>than 0.1 microseconds per digit. Choosing one form or the other purely on
>the basis of speed is premature optimization.
Or maybe on which actually works. LBYL will fail to recognize
negative numbers, e.g.
def LBYL(s):
if s.isdigit():
return int(s)
else:
return 0
def JDI(s):
try:
return int(s)
except:
return 0
test = '15'
print LBYL(test), JDI(test) #-> 15 15
test = '-15'
print LBYL(test), JDI(test) #-> 0 -15
>
>But the real advantage of the try...except form is that it generalises to
>more complex kinds of data where there is no fast C code to check whether
re: Generalization, apropos a different thread regarding the %
operator on strings. In Python, I avoid using the specific type
format conversions (such as %d) in favor of the generic string
conversion (%s) unless I need specific field width and/or padding or
other formatting, e.g.
for p in range(32):
v = 1<<p
print "%2u %#010x : %-d" % (p,v,v)
Regards,
-=Dave
--
Change is inevitable, progress is not.
> pinkflo...@gmail.com wrote:
>
> > How do I check if a string contains (can be converted to) an int? I
> > want to do one thing if I am parsing and integer, and another if not.
>
> try:
> x = int(aPossibleInt)
> ... do something with x ...
> except ValueError:
> ... do something else ...
Correct, but even better is a slight variation:
try:
x = int(aPossibleInt)
except ValueError:
... do something else ...
else:
... do something with x ...
this way, you avoid accidentally masking an unexpected ValueError in the
"do something with x" code.
Keeping your try-clauses as small as possible (as well as your
except-conditions as specific as possible) is important, to avoid
masking bugs and thus making their discovery hader.
Alex
>>> x = '-1'
>>> if x.isdigit(): print int(x)*3
...
To make sure you get it right, you'll have to do exactly what the Python
parser does in order to distinguish integer literals from other tokens.
Taken to the extreme for other types, such as floats, you're far
better off just using the internal mechanisms that Python itself uses,
which means to try to convert it and catch any exception that results
from failure.
--
Erik Max Francis && m...@alcyone.com && http://www.alcyone.com/max/
San Jose, CA, USA && 37 20 N 121 53 W && AIM erikmaxfrancis
Make it come down / Like molasses rain
-- Sandra St. Victor
> In that case, the name is misleadingly wrong. I suppose it is not likely
> that it could be changed before Python 3?
Why?
The primary purpose of the .isdigit, etc. methods is to test whether a
single character has a certain property. There is, however, no special
character data type in Python, and so by necessity those methods must be
on strings, not characters.
Thus, you have basically two choices: Have the methods throw exceptions
for strings with a length different from one, or have them just iterate
over every character in a string. The latter is clearly a more useful
functionality.
Right, those two sentences contradict each other. There's no
character data type so .isdigit can only test whether a string has a
certain property. That certain property is whether string is a digit,
which is to say, a single-character string with one of a certain set
of values.
> Thus, you have basically two choices: Have the methods throw
> exceptions for strings with a length different from one, or have them
> just iterate over every character in a string. The latter is clearly
> a more useful functionality.
There is a third choice which is the natural and obvious one: have the
function do what its name indicates. Return true if the arg is a
digit and false otherwise. If iterating over the whole string is
useful (which it may be), then the function should have been named
differently, like .isdigits instead of .isdigit.
FWIW, I've usually tested for digit strings with re.match. It never
occurred to me that isdigit tested a whole string.
I guess, if we want to avoid the exception paradigm for a particular
problem, we could just do something like:
def isNumber(n):
try:
dummy = int(n)
return True
except ValueError:
return False
and use that function from whereever in the program.
/David
>There is a third choice which is the natural and obvious one: have the
>function do what its name indicates. Return true if the arg is a
>digit and false otherwise. If iterating over the whole string is
>useful (which it may be), then the function should have been named
>differently, like .isdigits instead of .isdigit.
Following your logic to its conclusion, had the name isdigits been
chosen, '1'.isdigits() should return False. It's only one digit, not
more than one, as the plural would imply.
I, for one, don't see any utility in the dichotomy. We only need
(should only have) one function. I do agree that isdigits might have
been a better name, but we're stuck with isdigit for hysterical
raisins. And it's logical that string functions work over a string
rather than its first character.
>
>FWIW, I've usually tested for digit strings with re.match. It never
>occurred to me that isdigit tested a whole string.
Someone's been trotting out that old jwz chestnut about regular
expressions and problems... Not that I agree with it, but ISTM that
regular expressions are vast overkill for this problem.
> On Wed, 21 Dec 2005 16:39:19 +0100, Daniel Schüle wrote:
>
>> pinkflo...@gmail.com wrote:
>>> How do I check if a string contains (can be converted to) an int? I
>>> want to do one thing if I am parsing and integer, and another if not.
>>>
>>> /David
>>>
>>
>> others already answered, this is just an idea
>>
>> >>> def isNumber(n):
>> ... import re
>> ... if re.match("^[-+]?[0-9]+$", n):
>> ... return True
>> ... return False
>
> This is just a thought experiment, right, to see how slow you can make
> your Python program run?
Let's leave the thought experiments to the theoretical physicists and
compare a regex with an exception-based approach:
~ $ python -m timeit -s'import re; isNumber =
re.compile(r"^[-+]\d+$").match' 'isNumber("-123456")'
1000000 loops, best of 3: 1.24 usec per loop
~ $ python -m timeit -s'import re; isNumber =
re.compile(r"^[-+]\d+$").match' 'isNumber("-123456x")'
1000000 loops, best of 3: 1.31 usec per loop
~ $ python -m timeit -s'def isNumber(n):' -s' try: int(n); return True' -s
' except ValueError: pass' 'isNumber("-123456")'
1000000 loops, best of 3: 1.26 usec per loop
~ $ python -m timeit -s'def isNumber(n):' -s' try: int(n); return True' -s
' except ValueError: pass' 'isNumber("-123456x")'
100000 loops, best of 3: 10.8 usec per loop
A tie for number-strings and regex as a clear winner for non-numbers.
Peter
> Steven D'Aprano wrote:
>
>> In that case, the name is misleadingly wrong. I suppose it is not likely
>> that it could be changed before Python 3?
>
> Why?
>
> The primary purpose of the .isdigit, etc. methods is to test whether a
> single character has a certain property. There is, however, no special
> character data type in Python, and so by necessity those methods must be
> on strings, not characters.
>
> Thus, you have basically two choices: Have the methods throw exceptions
> for strings with a length different from one, or have them just iterate
> over every character in a string. The latter is clearly a more useful
> functionality.
*shrug*
If your argument was as obviously correct as you think, shouldn't
ord("abc") also iterate over every character in the string, instead of
raising an exception?
But in any case, I was arguing that the *name* is misleading, not that the
functionality is not useful. (Some might argue that the functionality is
harmful, because it encourages Look Before You Leap testing.) In English,
a digit is a single numeric character. In English, "123 is a digit" is
necessarily false, in the same way that "A dozen eggs is a single egg" is
false.
In any case, it isn't important enough to break people's code. I'd rather
that the method isdigit() were called isnumeric() or something, but I can
live with the fact that it is not.
--
Steven.
>> 15 is not a digit. 1 is a digit. 5 is a digit. Putting them together to
>> make 15 is not a digit.
>
> So? the isdigit method tests whether all characters are digits.
>
>>>> '15'.isdigit()
> True
But that is "obviously" wrong, since '15' is not a digit.
--
Grant Edwards grante Yow! I'm in LOVE with
at DON KNOTTS!!
visi.com
> Steven D'Aprano wrote:
>
>> On Wed, 21 Dec 2005 16:39:19 +0100, Daniel Schüle wrote:
>>
>>> pinkflo...@gmail.com wrote:
>>>> How do I check if a string contains (can be converted to) an int? I
>>>> want to do one thing if I am parsing and integer, and another if not.
>>>>
>>>> /David
>>>>
>>>
>>> others already answered, this is just an idea
>>>
>>> >>> def isNumber(n):
>>> ... import re
>>> ... if re.match("^[-+]?[0-9]+$", n):
>>> ... return True
>>> ... return False
>>
>> This is just a thought experiment, right, to see how slow you can make
>> your Python program run?
>
> Let's leave the thought experiments to the theoretical physicists
Didn't I have a smiley in there?
> and compare a regex with an exception-based approach:
>
> ~ $ python -m timeit -s'import re; isNumber =
> re.compile(r"^[-+]\d+$").match' 'isNumber("-123456")'
> 1000000 loops, best of 3: 1.24 usec per loop
But since you're going to take my protests about regexes more seriously
than I intended you to, it is ironic that you supplied a regex that
is nice and fast but doesn't work:
>>> re.compile(r"^[-+]\d+$").match("123456") is None
True
Isn't that the point of Jamie Zawinski's quote about regexes? I too can
write a regex that doesn't solve the problem -- and this regex is a dead
simple case, yet still easy to get wrong.
BTW, you might find it informative to run timeit on the code snippet
provided by Daniel before reflecting on the context of my "how slow"
comment.
--
Steven.
> But since you're going to take my protests about regexes more seriously
> than I intended you to, it is ironic that you supplied a regex that
> is nice and fast but doesn't work:
I think you said that "exceptions are cheap" elsewhere in this thread and
I read your post above as "regular expressions are slow". I meant to set
these statements into proportion.
Those who snip the Zawinski quote are doomed to demonstrate it in their
code, though it wouldn't have taken this lapse for me to grant you that
regexes are errorprone.
> BTW, you might find it informative to run timeit on the code snippet
> provided by Daniel before reflecting on the context of my "how slow"
> comment.
I'm getting about 10 usec for both cases, i. e. roughly the same as the
worstcase behaviour for try...except.
Peter
> > So? the isdigit method tests whether all characters are digits.
> >
> >>>> '15'.isdigit()
> > True
>
> But that is "obviously" wrong, since '15' is not a digit.
no, but all characters in the string belongs to the "digit" character
class, which is what the "is" predicates look for.
cf.
>>> "\t".isspace()
True
>>> "Life of Brian".istitle()
False
>>> u"\N{GREEK CAPITAL LETTER BETA}".isalpha()
True
and so on.
</F>
That description is not quite right. All characters in the empty
string belong to the "digit" character class, but isdigit returns
false (which it probably should).
Python 2.3.4 (#1, Feb 2 2005, 12:11:53)
[GCC 3.4.2 20041017 (Red Hat 3.4.2-6.fc3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> ''.isdigit()
False
> That description is not quite right. All characters in the empty
> string belong to the "digit" character class
A: are there any blue cars on the street?
B: no. not a single one.
A: you're wrong! all cars on the street are blue!
B: no, the street is empty.
A: yeah, so all the cars that are on the street are blue!
B: oh, please.
A: admit that you're wrong! admit that you're wrong! admit that you're wrong!
*smack*
B: (muttering) moron.
</F>
B and A are both correct. It's just logic ;-).
> no, but all characters in the string belongs to the "digit" character
> class, which is what the "is" predicates look for.
>
then gave examples including:
>>>> "Life of Brian".istitle()
> False
I don't see how istitle() matches your definition of what the "is"
predicates look for.
I know.
My point was that '15'.isdigit() returning True is in my
opinion "surprising" since '15' is not a digit in the most
obvious meaning of the phrase. In language design, "surprise"
is a bad quality.
It's like saying that [1,2,3,4] is an integer.
--
Grant Edwards grante Yow! Join the PLUMBER'S
at UNION!!
visi.com
Charles Lutwidge Dodgson spent his professional life arguing against
this, as I mentioned in
<http://mail.python.org/pipermail/python-list/2001-July/052732.html> --
but, mostly, "mainstream" logic proceeded along the opposite channel you
mention. Good thing he had interesting hobbies (telling stories to
children, and taking photographs), or today he perhaps might be
remembered only for some contributions to voting-theory;-).
I don't know of any "complete and correct" logic (or set-theory) where
there is more than one empty-set, but I'm pretty sure that's because I
never really dwelled into the intricacies of modern theories such as
modal logic (I would expect modal logic, and intensional logic more
generally, would please Dodgson far better than extensional logic...
but, as I said, I don't really understand them in sufficient depth)...
Alex