from itertools import izip, islice
for x12 in izip(islice(x,0,None,2),islice(x,1,None,2)):
print x12
(Of course the print statement is just illustrative.)
What is the fastest way? (Ignore the import time.)
Thanks,
Alan Isaac
You have to try a bunch of different ways and time them. One
idea (untested):
def pairs(seq):
while True:
yield (seq.next(), seq.next())
Look up the timeit module and test yourself the various alternatives;
that's the most reliable way to tell for sure.
George
- Paddy.
I believe the "what is the fastest way" question for such small well-
defined tasks is worth asking on its own, regardless of whether it
makes a difference in the application (or even if there is no
application to begin with). Just because cpu cycles are cheap these
days is not a good reason to be sloppy. Moreover, often the fastest
pure Python version happens to be among the most elegant and concise,
unlike other languages where optimization usually implies obfuscation.
George
> I believe the "what is the fastest way" question for such small well-
> defined tasks is worth asking on its own, regardless of whether it makes
> a difference in the application (or even if there is no application to
> begin with). Just because cpu cycles are cheap these days is not a good
> reason to be sloppy. Moreover, often the fastest pure Python version
> happens to be among the most elegant and concise, unlike other languages
> where optimization usually implies obfuscation.
I wonder why it is that people automatically assume that "optimization"
means optimize the time taken, and not the developer effort to write it
in the first place, the effort required to maintain it over time, or the
memory used at runtime, let alone some combination of all four factors.
Memory is cheap, but applications are hungry.
CPUs are fast, and for most applications the difference between 3ms and
30ms is undetectable by the user. Why do we care so little about saving
memory and so much about ever-decreasing time savings?
--
Steven
Don't know the fastest, but here's a very concise way:
from itertools import izip
def ipairs(seq):
it = iter(seq)
return izip(it, it)
>>> list(pairs(xrange(10)))
[(0, 1), (2, 3), (4, 5), (6, 7), (8, 9)]
>>> list(pairs('hello'))
[('h', 'e'), ('l', 'l')]
--
Arnaud
Thanks,
Alan Isaac
PS My understanding is that the behavior
of the last is implementation dependent
and not guaranteed.
def pairs1(x):
for x12 in izip(islice(x,0,None,2),islice(x,1,None,2)):
yield x12
def pairs2(x):
xiter = iter(x)
while True:
yield xiter.next(), xiter.next()
def pairs3(x):
for i in range( len(x)//2 ):
yield x[2*i], x[2*i+1],
def pairs4(x):
xiter = iter(x)
for x12 in izip(xiter,xiter):
yield x12
According to the docs [1], izip is defined to be equivalent to:
def izip(*iterables):
iterables = map(iter, iterables)
while iterables:
result = [it.next() for it in iterables]
yield tuple(result)
This guarantees that it.next() will be performed from left to right,
so there is no risk that e.g. pairs4([1, 2, 3, 4]) returns [(2, 1),
(4, 3)].
Is there anything else that I am overlooking?
[1] http://docs.python.org/lib/itertools-functions.html
--
Arnaud
Maybe someday someone will realize such stuff belongs to the python
STD lib...
If you need a lazy generator without padding, that splits starting
from the start, then this is the faster to me if n is close to 2:
def xpartition(seq, n=2):
return izip( *(iter(seq),)*n )
If you need the faster greedy version without padding then there are
two answers, one for Psyco and one for Python without... :-)
If you need padding or to start from the end then there are more
answers...
Bye,
bearophile
Can you post your results?
I get different ones (pairs1 and pairs2 rewritten slightly to avoid
unnecessary indirection).
====== pairs.py ===========
from itertools import *
def pairs1(x):
return izip(islice(x,0,None,2),islice(x,1,None,2))
def pairs2(x):
xiter = iter(x)
while True:
yield xiter.next(), xiter.next()
def pairs3(x):
for i in range( len(x)//2 ):
yield x[2*i], x[2*i+1],
def pairs4(x):
xiter = iter(x)
return izip(xiter,xiter)
def compare():
import timeit
for i in '1234':
t = timeit.Timer('list(pairs.pairs%s(l))' % i,
'import pairs; l=range(1000)')
print 'pairs%s: %s' % (i, t.timeit(10000))
if __name__ == '__main__':
compare()
=====================
marigold:python arno$ python pairs.py
pairs1: 0.789824962616
pairs2: 4.08462786674
pairs3: 2.90438890457
pairs4: 0.536775827408
pairs4 wins.
--
Arnaud
<URL:http://bugs.python.org/issue1121416>
fwiw,
Alan Isaac
> <URL:http://bugs.python.org/issue1121416>
>
> fwiw,
> Alan Isaac
Thanks. So I guess I shouldn't take the code snippet I quoted as a
specification of izip but rather as an illustration.
--
Arnaud
Oops. I see a smaller difference,
but yes, pairs4 wins.
Alan Isaac
import time
from itertools import islice, izip
x = range(500001)
def pairs1(x):
return izip(islice(x,0,None,2),islice(x,1,None,2))
def pairs2(x):
xiter = iter(x)
while True:
yield xiter.next(), xiter.next()
def pairs3(x):
for i in range( len(x)//2 ):
yield x[2*i], x[2*i+1],
def pairs4(x):
xiter = iter(x)
return izip(xiter,xiter)
t = time.clock()
for x1, x2 in pairs1(x):
pass
t1 = time.clock() - t
t = time.clock()
for x1, x2 in pairs2(x):
pass
t2 = time.clock() - t
t = time.clock()
for x1, x2 in pairs3(x):
pass
t3 = time.clock() - t
t = time.clock()
for x1, x2 in pairs4(x):
pass
t4 = time.clock() - t
print t1, t2, t3, t4
Output:
0.317524154606 1.13436847421 1.07100930426 0.262926712753
Hi George,
You need to 'get it right' first. Micro optimizations for speed
without thought of the wider context is a bad habit to form and a time
waster.
If the routine is all that needs to be delivered and it does not
perform at an acceptable speed then find out what is acceptable and
optimise towards that goal. My questions were set to get posters to
think more about the need for speed optimizations and where they
should be applied, (if at all).
A bit of forethought might justify leaving the routine alone, or
optimising for readability instead.
- Paddy.
But it's fun!
Some-of-us-can't-help-it'ly yours
--
Arnaud
You can be bolder here as the izip() docs explicitly state
"""
Note, the left-to-right evaluation order of the iterables is
guaranteed. This makes possible an idiom for clustering a data series into
n-length groups using "izip(*[iter(s)]*n)".
"""
and the bug report with Raymond Hettinger saying
"""
Left the evaluation order as an unspecified, implementation
specific detail.
"""
is about zip(), not izip().
Peter
FWIW, I just added a similar guarantee for zip().
Raymond
For such trivial problems, getting it right alone isn't a particularly
high expectation.
> Micro optimizations for speed
> without thought of the wider context is a bad habit to form and a time
> waster.
The OP didn't mention anything about the context; for all we know,
this might be a homework problem or the body of a tight inner loop.
There is this tendency on c.l.py to assume that every optimization
question is about a tiny subproblem of a 100 KLOC application. Without
further context, we just don't know.
> If the routine is all that needs to be delivered and it does not
> perform at an acceptable speed then find out what is acceptable
> and optimise towards that goal. My questions were set to get
> posters to think more about the need for speed optimizations and
> where they should be applied, (if at all).
I don't agree with this logic in general. Just because one can solve a
problem by throwing a quick and dirty hack with quadratic complexity
that happens to do well enough on current typical input, it doesn't
mean he shouldn't spend ten or thirty minutes more to write a proper
linear time solution, all else being equal or at least comparable
(elegance, conciseness, readability, etc.). Of course it's a tradeoff;
spending a week to save a few milliseconds on average is usually a
waste for most applications, but being a lazy keyboard banger writing
the first thing that pops into mind is not that good either.
George
> The OP didn't mention anything about the context; for all we know, this
> might be a homework problem or the body of a tight inner loop. There is
> this tendency on c.l.py to assume that every optimization question is
> about a tiny subproblem of a 100 KLOC application. Without further
> context, we just don't know.
Funny. As far as I can tell, the usual assumption on c.l.py is that every
tiny two-line piece of code is the absolute most critically important
heart of an application which gets called billions of times on petabytes
of data daily.
Given the human psychology displayed involved, in the absence of
definitive evidence one way or another it is a far safer bet to assume
that people are unnecessarily asking for "the fastest" out of a misguided
and often ignorant belief that they need it, rather than the opposite.
People who actually need a faster solution usually know enough to preface
their comments with an explanation of why their existing solution is too
slow rather than just a context-free demand for "the fastest" solution.
Fast code is like fast cars. There *are* people who really genuinely need
to have the fastest car available, but that number is dwarfed by the vast
legions of tossers trying to make up for their lack of self-esteem by
buying a car with a spoiler. Yeah, you're going to be traveling SO FAST
on the way to the mall that the car is at risk of getting airborne, sure,
we believe you.
(The above sarcasm naturally doesn't apply to those who actually do need
to travel at 200mph in a school zone, like police, taxi drivers and stock
brokers.)
--
Steven
> Given the human psychology displayed involved, in the absence of
> definitive evidence one way or another it is a far safer bet to assume
> that people are unnecessarily asking for "the fastest" out of a misguided
> and often ignorant belief that they need it, rather than the opposite.
> People who actually need a faster solution usually know enough to preface
> their comments with an explanation of why their existing solution is too
> slow rather than just a context-free demand for "the fastest" solution.
As I mentioned already, I consider the seeking of the most efficient
solution a legitimate question, regardless of whether a "dumb"
solution is fast enough for an application. Call it a "don't be
sloppy" principle if you wish. It's the same reason I always use
xrange() instead of range() for a loop, although in practice the
difference is rarely measurable.
> Fast code is like fast cars. There *are* people who really genuinely need
> to have the fastest car available, but that number is dwarfed by the vast
> legions of tossers trying to make up for their lack of self-esteem by
> buying a car with a spoiler. Yeah, you're going to be traveling SO FAST
> on the way to the mall that the car is at risk of getting airborne, sure,
> we believe you.
>
> (The above sarcasm naturally doesn't apply to those who actually do need
> to travel at 200mph in a school zone, like police, taxi drivers and stock
> brokers.)
Good example; it shows that there's more than the utilitarian point of
view. People don't buy these cars because of an actual need but rather
because of the brand, the (perceived) social value and other reasons.
And since you like metaphors, here's another one: caring about
efficient code only when you need it is like keeping notes for a
course only for the material to be included in the final exams,
skipping the more encyclopedic, general knowledge lectures. Sure, you
may pass the class, even with a good grade, but for some people a
class is more than a final grade.
George
> As I mentioned already, I consider the seeking of the most efficient
> solution a legitimate question, regardless of whether a "dumb" solution
> is fast enough for an application. Call it a "don't be sloppy" principle
> if you wish.
Sure, by why do you limit "efficient" and "don't be sloppy" to mean
"write the fastest executing code you can, regardless of every other
trade-off"?
Because there are trade-offs:
* execution time
* compilation time
* memory use at run-time
* size of source code on disk
* size of compiled code on disk
* ease of writing it in the first place
* ease of maintenance
* correctness
* fragility in the face of malformed data
* readability and ease of comprehension
* elegance (however you judge that!)
* making sure the usage of various resources never exceed some set of
upper bounds
etc.
Some of these are relatively unimportant (e.g. the size of the source
code), but others are extremely important and far too often ignored (e.g.
readability and even correctness).
In fact, "fastest" isn't even a meaningful attribute. Does it mean:
* the worst-case is fastest
* the best-case is fastest
* the average-case is fastest
* fastest on typical data
* all of the above
etc.
What counts as "typical" data? How do you average all the cases?
Asking "what's the fastest" without considering these issues is a good
sign that the person asking is being sloppy, something you should be
against.
> It's the same reason I always use xrange() instead of
> range() for a loop, although in practice the difference is rarely
> measurable.
Well, that's (now) a no-brainer. Once upon a time, range() used to be
faster for small enough lists, and whatever difference there is in
readability is insignificant except for utter newbies, so there's no real
trade-off to make unless you care about the extra "x".
But... do you write list.__len__() instead of len(list) to save a few
nanoseconds?
If you do, you're guilty of pessimization instead of optimization: for
built-in lists, len() is actually faster, by a lot. It's only an
optimization -- and a significant one -- for custom classes.
So, do you write this:
if isinstance(alist, list):
value = len(alist) # optimize for lists
else:
value = alist.__len__() # optimize for classes
every time you want the length of an arbitrary sequence?
If you did, you'd be pessimizing again. That code runs nearly twice as
slowly as just calling the built-in len(). Optimizing the parts doesn't
mean you have optimized the whole.
--
Steven
I confess that it did not occur to me that there
might be an interesting distinction among these
cases for the question of how to get sequential
pairs from a list. How would one draw these
distinctions in this case?
Thanks,
Alan Isaac
PS Just for context, the sequential pairs were
needed in a simulation, but my question was
primarily prompted by my surprise that the
approaches I looked at differed as much as they did.
def pairs5(x):
o=[]
for n in zip(x[::2],x[1:2]):
o.append(n)
return o
I dont know if that breaks any constraints placed on the problem, but I get
the following output
0.07158942896 0.266009705575 0.21342143313 0.0537146193457 0.0107680502972
does that make it faster than pairs(4) or am I breaking some constraints on
the problem?
Matt.
Internet
ais...@american.edu
To
python-list
Sent by: cc
python-list-bounces+matthew.warren=uk.bnpparibas.com@
python.org Subject
Re: pairs from a list
22/01/2008 18:09
Alan Isaac
x = range(500001)
def pairs1(x):
return izip(islice(x,0,None,2),islice(x,1,None,2))
--
http://mail.python.org/mailman/listinfo/python-list
This message and any attachments (the "message") is
intended solely for the addressees and is confidential.
If you receive this message in error, please delete it and
immediately notify the sender. Any use not in accord with
its purpose, any dissemination or disclosure, either whole
or partial, is prohibited except formal approval. The internet
can not guarantee the integrity of this message.
BNP PARIBAS (and its subsidiaries) shall (will) not
therefore be liable for the message if modified.
Do not print this message unless it is necessary,
consider the environment.
---------------------------------------------
Ce message et toutes les pieces jointes (ci-apres le
"message") sont etablis a l'intention exclusive de ses
destinataires et sont confidentiels. Si vous recevez ce
message par erreur, merci de le detruire et d'en avertir
immediatement l'expediteur. Toute utilisation de ce
message non conforme a sa destination, toute diffusion
ou toute publication, totale ou partielle, est interdite, sauf
autorisation expresse. L'internet ne permettant pas
d'assurer l'integrite de ce message, BNP PARIBAS (et ses
filiales) decline(nt) toute responsabilite au titre de ce
message, dans l'hypothese ou il aurait ete modifie.
N'imprimez ce message que si necessaire,
pensez a l'environnement.
> For such trivial problems, getting it right alone isn't a particularly
> high expectation.
Hi George, not 'alone' but 'first'. And it is the ultimate
expectation. Who wants to recieve wrong software?
>
> > Micro optimizations for speed
> > without thought of the wider context is a bad habit to form and a time
> > waster.
>
> The OP didn't mention anything about the context; for all we know,
> this might be a homework problem or the body of a tight inner loop.
Thats why I thought to ask about the context.
> There is this tendency on c.l.py to assume that every optimization
> question is about a tiny subproblem of a 100 KLOC application. Without
> further context, we just don't know.
Again, I did not assume; I asked about the wider context. I too don't
believe your statement about c.l.p.
>
> > If the routine is all that needs to be delivered and it does not
> > perform at an acceptable speed then find out what is acceptable
> > and optimise towards that goal. My questions were set to get
> > posters to think more about the need for speed optimizations and
> > where they should be applied, (if at all).
>
> I don't agree with this logic in general.
But you don't defend yourself well by such a far-fetched example.
I've heard quality expressed as "meeting requirements", which I think
is apt. Falling short of requirements everyone knows doesn't give a
quality result, but equally 'exceeding' requirements also detracts
from quality (as does not knowing your requirements).
It is good to learn optimization techniques, which may be part of what
you are saying, but part of that is learning when it pays to apply
them and/or search for them; and when it does not.
- Paddy.
> I'm just fiddling with this, am no great expert, but I added
>
> def pairs5(x):
> o=[]
> for n in zip(x[::2],x[1:2]):
The second argument should be x[1::2].
> o.append(n)
> return o
>
> I dont know if that breaks any constraints placed on the problem, but I
It breaks the constraints unless the above is not cut-n-paste ;)
Also note that your pairs5() offers no advantage over
def pairs6(x):
return zip(x[::2], x[1::2])
Peter
I explicitly didn't limit sloppiness to inefficiency and mentioned
it's a tradeoff:
"... all else being equal or at least comparable (elegance,
conciseness, readability, etc.). Of course it's a tradeoff;
spending a week to save a few milliseconds on average is usually a
waste for most applications, but being a lazy keyboard banger writing
the first thing that pops into mind is not that good either."
> But... do you write list.__len__() instead of len(list) to save a few
> nanoseconds?
No, of course not, it's not worth it, but that doesn't mean that being
curious about what's faster and using timeit to find out is totally
worthless.
Another example: avoiding attribute lookups within a loop. I rarely
write
bar = foo.bar
for i in big_list: bar(i)
but it's valuable to know that it can make a difference when I really
need it. Always writing the first thing that "just works" prevents one
from even considering that there might be faster (or more elegant,
more general, etc.) alternatives.
George
> I've heard quality expressed as "meeting requirements", which I think
> is apt. Falling short of requirements everyone knows doesn't give a
> quality result, but equally 'exceeding' requirements also detracts
> from quality (as does not knowing your requirements).
> It is good to learn optimization techniques, which may be part of what
> you are saying, but part of that is learning when it pays to apply
> them and/or search for them; and when it does not.
The OP wanted an answer to a simple question, not a lecture on good
software engineering principles. This whole subthread reminds of a
movie (can't remember which) where someone asks his buddy in the
stadium "what do you want?". His buddy gets it wrong and embarks in a
long diatribe of what he wants in life now, what he wanted as a child,
what's the meaning of one's life and so on. After a couple of minutes
the guy cuts him and asks again:
- "Man, what do you want, burger or hot dog?"
- "Oh, a hot dog".
Sometimes you want to see the tree right in front of you, not the
whole damn forest.
George
I wholeheartedly agree.
--
Arnaud
> On Jan 23, 4:37 am, Steven D'Aprano
> <ste...@REMOVE.THIS.cybersource.com.au> wrote:
>> On Tue, 22 Jan 2008 23:33:00 -0800, George Sakkis wrote:
>> > As I mentioned already, I consider the seeking of the most efficient
>> > solution a legitimate question, regardless of whether a "dumb"
>> > solution is fast enough for an application. Call it a "don't be
>> > sloppy" principle if you wish.
>>
>> Sure, by why do you limit "efficient" and "don't be sloppy" to mean
>> "write the fastest executing code you can, regardless of every other
>> trade-off"?
>
> I explicitly didn't limit sloppiness to inefficiency and mentioned it's
> a tradeoff:
Of course you did, and I was being sloppy. The "you" was meant more as a
generic you than you yourself. Sorry for the confusion.
As for your other points, I think we're actually very much in agreement,
except for your tolerance of random posters asking what I believe is an
incoherent question: "what's the fastest way to do ...?". It seems to me
you're willing to give them the benefit of the doubt that they've done
their profiling and considered their trade-offs, or at the very worst are
asking from purely intellectual curiosity. Call me cynical if you like,
but I think that in the absence of any direct evidence supporting those
things, the most likely possibility is the opposite.
--
Steven
> Steven D'Aprano wrote:
>> In fact, "fastest" isn't even a meaningful attribute. Does it mean:
>>
>> * the worst-case is fastest
>> * the best-case is fastest
>> * the average-case is fastest
>> * fastest on typical data
>> * all of the above
>
>
> I confess that it did not occur to me that there might be an interesting
> distinction among these cases for the question of how to get sequential
> pairs from a list. How would one draw these distinctions in this case?
Well, in this *specific* case I don't think they're terribly interesting
because your problem is so simple. Best-case will, presumably, occur when
the list is empty. Worst-case will occur if somebody passes a non-
terminating iterator to your code (although that depends on the nature of
your solution and how you are using it).
But in the broader context of optimizing, these distinctions are very
important. For example, Quicksort is very fast on randomly ordered data,
but terribly slow on data which is already sorted: about as slow as
Bubblesort. Mergesort's best-case isn't as fast as Quicksort's best-case,
but it's worst-case behaviour is much, much better. And both Quicksort
and Mergesort are complex enough that a simpler, slower algorithm like
Shell Sort will be significantly faster for small data sets.
But I'm sure you already know all this, I'm just mentioning it for the
benefit of anybody else reading who still thinks that a generic "what's
the fastest way" type question is meaningful.
--
Steven
> On Jan 23, 1:30 pm, Paddy <paddy3...@googlemail.com> wrote:
>
>> I've heard quality expressed as "meeting requirements", which I think
>> is apt. Falling short of requirements everyone knows doesn't give a
>> quality result, but equally 'exceeding' requirements also detracts from
>> quality (as does not knowing your requirements). It is good to learn
>> optimization techniques, which may be part of what you are saying, but
>> part of that is learning when it pays to apply them and/or search for
>> them; and when it does not.
>
> The OP wanted an answer to a simple question, not a lecture on good
> software engineering principles.
(1) It isn't a simple question. There were at least five alternatives,
one of which would have been "fastest" except it failed the correctness
test. The OP went on to ask:
"I suppose my question should have been, is there an obviously faster way?
Anyway, of the four ways below, the first is substantially fastest.
[Note: actually it wasn't.] Is there an obvious reason why?"
Asking *why* something is faster is very different from asking *which* is
faster. I don't believe anyone actually answered that "why" question.
So, for the record, here's my attempt at an answer:
No, there is no "obvious" reason why one solution should be faster than
another. But perhaps the two most general rules of thumb are:
* the more work you can push into built-ins written in C, and the least
in slow code written in Python, the faster;
* lazy iterators are often (but not always) faster than building the
entire list all at once.
Other than that, the details of why one particular technique is faster
than another depends on the implementation of each, and that's not
obvious.
(2) What the OP wanted and what he needed are not necessarily the same.
Answering the OP's direct question is like giving him a fish -- he'll be
hungry tomorrow. Teaching him good engineering principles is like
teaching him to fish.
(3) This is Usenet, and the advice is free. Sometimes you get more than
you expected, sometimes less. Regardless of what the OP wanted, the
thread will go where the thread will go. The discussions of software
engineering were mostly in response to you, not the OP.
--
Steven
> As for your other points, I think we're actually very much in agreement,
> except for your tolerance of random posters asking what I believe is an
> incoherent question: "what's the fastest way to do ...?". It seems to me
> you're willing to give them the benefit of the doubt that they've done
> their profiling and considered their trade-offs, or at the very worst are
> asking from purely intellectual curiosity.
It depends on the specific problem and the context. For small well-
defined algorithmic type problems, I just assume it's intellectual
curiosity or that performance really matters. If the same question was
asked in the context of, say, a typical web application fetching data
from a database and rendering dynamic pages, I might have dismissed it
as YAGNI.
George
George, I tend to think that the more context an OP gives, the more
thought they have given their problem. Often, to get a better answer
you need to help the OP divulge context.
I too like the intellectual challenge of exploring small problem, and
from lurking on c.l.p I thought there would be lots of answers of that
ilk, but this time I thought why not contribute in a different way?
Reading c.l.p I am often reminded of good software practice memes in
answers and think its part of what makes c.l.p. a rewarding
experience. It may be hubris to think that a random reader might read
my post and then follow my points before making a routine faster; but
there you go :-)
- Paddy.
and appropriately slows down when corrected, dang.
:)
Matt.
Internet
__pet...@web.de
To
python-list
Sent by: cc
python-list-bounces+matthew.warren=uk.bnpparibas.com@
python.org Subject
Re: pairs from a list
23/01/2008 18:32
Matthew_WARREN wrote:
Peter