def is_prime(n):
for j in range(2,n):
if (n % j) == 0: return False
return True
It seems as though Python is actually expanding range(2,n) into a list of
numbers, even though this is incredibly wasteful of memory. There should be
a looping mechanism that generates the index variable values incrementally
as they are needed.
--
View this message in context: http://www.nabble.com/Python-%27for%27-loop-is-memory-inefficient-tp24980842p24980842.html
Sent from the Python - python-list mailing list archive at Nabble.com.
--
Kindest regards.
Mark Lawrence.
You already have an answer to the range issue, so I will only add that
putting a loop inside another loop is a decent way to expand the time taken.
I will also observe that if you were to stop programming whatever
language you are more familiar with in Python, and start programming
Python in Python, you'll have an easier time of it.
The Dive Into Python is an excellent start for that.
~Ethan~
I don't think Python was created to keep mathematician's happy.
Although strangely enough, the creator holds a masters degree in
mathematics *and* computer science. Just another one of life's little
ironies ;)
> It seems as though Python is actually expanding range(2,n) into a list of
> numbers, even though this is incredibly wasteful of memory. There should be
> a looping mechanism that generates the index variable values incrementally
> as they are needed.
There is.
Use xrange instead of range, and try again.
And while you are about it, you may as well teach them that it is much better
to do a multiplication than a division.
-Hendrik
Alternative hypothesis: the good doctor read the Python 3.x
documentation but absent-mindedly ran Python 2.x
> It seems as though Python is actually expanding range(2,n) into a list
> of numbers, even though this is incredibly wasteful of memory. There
> should be a looping mechanism that generates the index variable values
> incrementally as they are needed.
Others have already pointed out to you that xrange() (for Python 2.x)
does what you want. In Python 3.x, the old range() is gone for good, and
xrange() renamed to just range().
However, I'd like to point out that your subject line is fundamentally
incorrect. What you've discovered has *nothing* to do with for-loops: the
for-loop will happily iterate over whatever object you pass to it (or at
least try to, because not all objects are iterable). You might be
thinking that Python requires you to write for-loops as
for i in range(...):
but that's not correct. The for-loop doesn't care what object comes after
the "in" and before the ":", so long as it can iterate over it one item
at a time:
for i in [1, 3, 4, 5, 2, 0, -1, 8, 7, 6]:
print i
for c in "abcd":
print c
and many other variations will work perfectly fine.
The memory-consumption you have discovered is a property of the range()
function, not the for-loop:
>>> L = range(2, 20)
>>> L
[2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
Compare to xrange:
>>> L = xrange(2, 20)
>>> L
xrange(2, 20)
>>> list(L) # expand out into a list
[2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
By the way, both range() and xrange() take an optional 'stepsize'
argument, defaulting to 1:
>>> range(3, 20, 2)
[3, 5, 7, 9, 11, 13, 15, 17, 19]
help(range) and help(xrange) in the interactive interpreter are your
friends.
--
Steven
Actually, division speed hasn't been much of an issue in years. Arithmetic
has been faster than memory access since CPUs went superscalar. In CPython,
with all the interpreter overhead, you'll probably never notice the difference.
I'd thought "xrange" had already been obsoleted, and hadn't realized the
2.3 - 2.6 versions of CPython were still doing "range" the hard way, not
as a generator.
John Nagle
[snip]
> > def is_prime(n):
> > for j in range(2,n):
> > if (n % j) == 0: return False
> > return True
> >
> > It seems as though Python is actually expanding range(2,n) into a list of
> > numbers, even though this is incredibly wasteful of memory. There should
> > be a looping mechanism that generates the index variable values
> > incrementally as they are needed.
>
> You already have an answer to the range issue, so I will only add that
> putting a loop inside another loop is a decent way to expand the time
> taken.
>
> I will also observe that if you were to stop programming whatever
> language you are more familiar with in Python, and start programming
> Python in Python, you'll have an easier time of it.
I don't see what's particularly un-Pythonic with this code. Not using xrange()
is a mistake, certainly, but it remains clear, easily understandable code
which correctly demonstrates the naive algorithm for detecting whether n is a
prime. It doesn't call for condescension
> The Dive Into Python is an excellent start for that.
I never read it, but I was under the impression that the tutorial on
python.org was geared toward programmers moving to Python from other
languages. It was also my impression that Dive Into Python is rather outdated
and occasionally inaccurate (based on comments on this mailing list).
Cheers,
Emm
[...]
>> I will also observe that if you were to stop programming whatever
>> language you are more familiar with in Python, and start programming
>> Python in Python, you'll have an easier time of it.
>
> I don't see what's particularly un-Pythonic with this code. Not using
> xrange() is a mistake, certainly, but it remains clear, easily
> understandable code which correctly demonstrates the naive algorithm for
> detecting whether n is a prime. It doesn't call for condescension
It's a particular unfair criticism because the critic (Ethan Furman)
appears to have made a knee-jerk reaction. The "some language in Python"
behaviour he's reacting to is the common idiom:
for i in range(len(seq)):
do_something_with(seq[i])
instead of the "Python in Python" idiom:
for obj in seq:
do_something_with(obj)
That's a reasonable criticism, but *not in the case*. Ethan appears to
have made the knee-jerk reaction "for i in range() is Bad" without
stopping to think about what this specific piece of code is actually
doing.
(Replace 'obj' with 'j', 'seq' with 'range(2, n)', and
'do_something_with' with 'if (n % j) == 0: return False', and you have
the exact same code pattern.)
--
Steven
Fair enough. But as far as I know, for i in (x)range(num) is the canonical way
to iterate over numbers in Python.
Another case of lack of RTFM* before answering, I suppose.
Cheers,
Emm
*Read The Fine Mail
It's not that the code is bad, but too many people coming from Java
and C keep thinking of for loops like they're using Java or C and
therefore that "for i in range(a,b)" is identical to "for(int i = a; i
< b; i++)". It's not and, for the most part, you shouldn't code like
that. Since you're using numbers, range/xrange is the appropriate
idiom but you still have to remember that a for loop in python doesn't
loop until a condition is met, it loops through an iterator until the
interator says it's done.
It does seem rather worrying that whoever originally thought up Python
decided it would be a good idea to implement a simple 1 to N (or 0 to N-1)
for-loop by first creating an array of N consecutive integers!
They must have known Python was going to be slow anyway, but this wouldn't
have helped the speed any. Nor the memory consumption.
A for-loop, for iterating over a simple sequence, should be one of the
fastest things in the language.
[Presumably the internal code that created those consecutive integers used a
more conventional looping method...]
--
Bartc
> A for-loop, for iterating over a simple sequence, should be one of the
> fastest things in the language.
Anyone experienced with interpreted high-level languages knows this is
not true. Not because iterating a sequence is expensive, but because
the interpreter is constantly executing the body of the loop. There is
a reason why Matlab and Python/NumPy programmers rely heavily on
vectorized array expressions for efficient numerics.
The only thing more evil than a for-loop is recursive function calls.
This does not mean Python is slow. It just means you have to program
differently.
> Well, the alternative would be to have two keywords for looping: one
> for your "simple" incrementing integer loop, and another for a loop that
> operates over the elements of some collection type.
A compiler could easily recognise a statement like
for i in range(n):
as a simple integer loop. In fact, Cython is able to do this.
> A compiler could easily recognise a statement like
> for i in range(n):
> as a simple integer loop. In fact, Cython is able to do this.
Extremely easy, once users relinquish the right to replace built-in
"range" with their own concoctions ...
but special cases aren't special enough to break the rules.
Although I think PyPy also recognizes this case and makes it as
efficient as using xrange, and does so without breaking any rules.
There *are* _some_ legitimate complaints to be made about the CPython
runtime. :)
Jean-Paul
How can pypy possibly know that the user hasn't assigned some other
value to "range"?
PyPy uses a JIT compiler (which is still slower than CPython,
suggesting that perhaps they should have spent more time optimizing
the general case than optimizing for an easily avoided special case).
> There *are* _some_ legitimate complaints to be made about the CPython
> runtime. :)
This isn't one of them.
xrange has been part of Python for 10 years.
If there are any complaints to be made about this situation it's that
there are any 2.x learning materials anythere that continue to use
range() and not xrange() in this context.
Carl Banks
It would be a simple to do if you were writing it for a different
langauge was a lot less dynamic than Python is. It'd be quite a
complex hack to add it to CPython's compiler while maintaing the
current highly dynamic runtime semantics and backwards compatibility,
which is a design constraint of Python whether you like it or not.
And all this complaining about an issue that can be worked around by
xrange instead of range. Sheesh.
> In fact, Cython is able to do this.
Cython can do this easily because it is a different language that is a
lot less dynamic than Python.
If you don't care about the dynamic stuff why don't you just use
Cython? Or quit complaining and just use xrange.
Carl Banks
> It's not that the code is bad, but too many people coming from Java
> and C keep thinking of for loops like they're using Java or C and
> therefore that "for i in range(a,b)" is identical to "for(int i = a; i
> < b; i++)". It's not and, for the most part, you shouldn't code like
> that. Since you're using numbers, range/xrange is the appropriate
> idiom but you still have to remember that a for loop in python doesn't
> loop until a condition is met, it loops through an iterator until the
> interator says it's done.
Java also has iterators; it's more a case of people coming from C and BASIC.
Although, some of those may have come *through* Java without abandoning
old habits. You see the same thing with people coming from BASIC to C and
writing:
#define NUM_DATES 50
int day[NUM_DATES], month[NUM_DATES], year[NUM_DATES];
rather than defining a "struct".
Sometimes referred to as "I know ten languages and can write in BASIC in
all of them".
In your other message, you said this wasn't a legitimate CPython
complaint. Here, you say that it would be a "complex hack" to implement
this in CPython. "complex hack" has negative connotations in my mind.
This seems contradictory to me.
>
>And all this complaining about an issue that can be worked around by
>xrange instead of range. Sheesh.
Sure. The specific issue of range vs xrange is quite a small one.
There is a slightly larger one regarding the flexibility (or relative
lack thereof) of the CPython runtime, though.
Jean-Paul
It's true that PyPy has a JIT compiler (which is still slower than
CPython). However, this isn't where the range speedup comes from. The
range optimization is handled specifically in the implementation of
range (or possibly of list, or a combination of the two, I can't
remember exactly). It's effective even when the JIT compiler is
disabled.
>
>>There *are* _some_ legitimate complaints to be made about the CPython
>>runtime. :)
>
>This isn't one of them.
>
>xrange has been part of Python for 10 years.
>
>If there are any complaints to be made about this situation it's that
>there are any 2.x learning materials anythere that continue to use
>range() and not xrange() in this context.
Jean-Paul
It doesn't really need to. The optimization isn't applied when the
compiler sees the name "range" being called. It's applied after the
object the default builtin name "range" is bound to is called.
Jean-Paul
> If you don't care about the dynamic stuff why don't you just use
> Cython? Or quit complaining and just use xrange.
I think you are the only one complaining here.
Actually, Cython is able to do this because it knows the global scope of a
module. The only thing that this optimisation makes impossible when you
compile your module using Cython is to inject builtins into the module
namespace *after* the compilation, either by assigning module attributes or
by importing the module into a custom namespace.
Given that both use cases are extremely rare, it was decided that
optimisations like this are more important than the ability to redefine the
most common builtins (such as 'range' or 'enumerate'). So, in a way, Cython
really makes them "builtins".
Stefan
"Easily" huh? Would you like to put a small wager on that?
Here is an unedited copy-and-paste of the last few lines of an
interactive Python session:
>>> for i in range(5):
... print i
...
0
1
2
Surprise!
4
>>> __builtins__.range is range
True
Can you determine how I did this? How would the compiler avoid this? If
you can find a method for the compiler to avoid surprises like this, why
do you think it would be valuable to add all that extra complexity to the
language? (As opposed to an external tool like Cython.)
--
Steven
Cython allows you to do that. You just have to do it inside the module. If
Cython determines that "range" refers the global builtin name, it will
enable the loop optimisation. If you assign anything to that global name
inside your module (even the range function itself), this will disable the
optimisation.
Stefan
Ha, ha. I learned that pattern in Fortran. I confess to having written
code like that in C. I think I've gotten over it.
My comment about programming Python in Python was geared more towards
the subject line than the actual code, and the biases evident in his
comments in both this thread and earlier ones.
~Ethan~
Well, you missed the point, chief.
It's not a legitimate complaint because you can use xrange, you don't
need compiler magic to recognize and optimize range.
Carl Banks
There's a lot of things in Python that I don't strictly *need*. That
doesn't mean that they wouldn't be welcome if I could have them.
Getting rid of the range/xrange dichotomy would improve things. Yes, I
can work around it until the runtime is good enough to let me think
about an *interesting* problem instead. That makes it a legitimate
complaint in my eyes. You're welcome to disagree, of course, but do you
have an argument more compelling than the one you give here? It seems
to me one could use it to argue a lot of Python out of existence.
Chief. (Seriously, "chief"? What are you going for? It sounds like
condescension to me; what's the point of that? I hope I'm just
misreading you.)
Jean-Paul
You don't have to think about using xrange in a for loop, you just
always use it.
> That makes it a legitimate
> complaint in my eyes. You're welcome to disagree, of course, but do you
> have an argument more compelling than the one you give here?
I am not arguing in favor of range/xrange, I am saying that it's silly
to complain that the compiler isn't a whole lot more complex than it
is just so it can implemnent a semantically-diconnected special case
just so that you can avoid typing an extra "x". The cost doesn't even
remotely justify it.
Carl Banks
I mean, it doesn't even remotely justify the cost.
Carl Banks
> There's a lot of things in Python that I don't strictly *need*. That
> doesn't mean that they wouldn't be welcome if I could have them. Getting
> rid of the range/xrange dichotomy would improve things.
The developers agreed a couple of years ago. Starting using 3.1 if you
want this.
Since 'range' could refer to a user-defined object, rather than the
builtin function, there is no way the interpreter should substitute
'xrange'.
tjr
And there was much rejoicing, et cetera.
>Since 'range' could refer to a user-defined object, rather than the
>builtin function, there is no way the interpreter should substitute
>'xrange'.
See the earlier parts of this thread for the reasons this isn't true. :)
Jean-Paul