mapLast, mapFirst, and just general iterator questions

25 views
Skip to first unread message

Travis Griggs

unread,
Jun 14, 2022, 2:06:19 PMJun 14
to
I want to be able to apply different transformations to the first and last elements of an arbitrary sized finite iterator in python3. It's a custom iterator so does not have _reversed_. If the first and last elements are the same (e.g. size 1), it should apply both transforms to the same element. I'm doing this because I have an iterator of time span tuples, and I want to clamp the first and last elements, but know any/all of the middle values are inherently in range.

A silly example might be a process that given an iterator of strings, chops the the outer characters off of the value, and uppercases the final value. For example:


def iterEmpty():
return iter([])

def iter1():
yield "howdy"

def iter2():
yield "howdy"
yield "byebye"

def iterMany():
yield "howdy"
yield "hope"
yield "your"
yield "day"
yield "is"
yield "swell"
yield "byebye"

def mapFirst(stream, transform):
try:
first = next(stream)
except StopIteration:
return
yield transform(first)
yield from stream

def mapLast(stream, transform):
try:
previous = next(stream)
except StopIteration:
return
for item in stream:
yield previous
previous = item
yield transform(previous)

def main():
for each in (iterEmpty, iter1, iter2, iterMany):
baseIterator = each()
chopFirst = mapFirst(baseIterator, lambda x: x[1:-1])
andCapLast = mapLast(chopFirst, lambda x: x.upper())
print(repr(" ".join(andCapLast)))


This outputs:

''
'OWD'
'owd BYEBYE'
'owd hope your day is swell BYEBYE'

Is this idiomatic? Especially my implementations of mapFirst and mapList there in the middle? Or is there some way to pull this off that is more elegant?

I've been doing more with iterators and stacking them (probably because I've been playing with Elixir elsewhere), I am generally curious what the performance tradeoffs of heavy use of iterators and yield functions in python is. I know the argument for avoiding big list copies when moving between stages. Is it one of those things where there's also some overhead with them, where for small stuff, you'd just be better list-ifying the first iterator and then working with lists (where, for example, I could do the first/last clamp operation with just indexing operations).

Chris Angelico

unread,
Jun 14, 2022, 2:48:06 PMJun 14
to
On Wed, 15 Jun 2022 at 04:07, Travis Griggs <travis...@gmail.com> wrote:
> def mapFirst(stream, transform):
> try:
> first = next(stream)
> except StopIteration:
> return
> yield transform(first)
> yield from stream

Small suggestion: Begin with this:

stream = iter(stream)

That way, you don't need to worry about whether you're given an
iterator or some other iterable (for instance, you can't call next()
on a list, but it would make good sense to be able to use your
function on a list).

(BTW, Python's convention would be to call this "map_first" rather
than "mapFirst". But that's up to you.)

> def mapLast(stream, transform):
> try:
> previous = next(stream)
> except StopIteration:
> return
> for item in stream:
> yield previous
> previous = item
> yield transform(previous)

Hmm. This might be a place to use multiple assignment, but what you
have is probably fine too.

> def main():
> for each in (iterEmpty, iter1, iter2, iterMany):
> baseIterator = each()
> chopFirst = mapFirst(baseIterator, lambda x: x[1:-1])
> andCapLast = mapLast(chopFirst, lambda x: x.upper())
> print(repr(" ".join(andCapLast)))

Don't bother with a main() function unless you actually need to be
able to use it as a function. Most of the time, it's simplest to just
have the code you want, right there in the file. :) Python isn't C or
Java, and code doesn't have to get wrapped up in functions in order to
exist.

> Is this idiomatic? Especially my implementations of mapFirst and mapList there in the middle? Or is there some way to pull this off that is more elegant?
>

Broadly so. Even with the comments I've made above, I wouldn't say
there's anything particularly *wrong* with your code. There are, of
course, many ways to do things, and what's "best" depends on what your
code is doing, whether it makes sense in context.

> I've been doing more with iterators and stacking them (probably because I've been playing with Elixir elsewhere), I am generally curious what the performance tradeoffs of heavy use of iterators and yield functions in python is. I know the argument for avoiding big list copies when moving between stages. Is it one of those things where there's also some overhead with them, where for small stuff, you'd just be better list-ifying the first iterator and then working with lists (where, for example, I could do the first/last clamp operation with just indexing operations).
>

That's mostly right, but more importantly: Don't worry about
performance. Worry instead about whether the code is expressing your
intent. If that means using a list instead of an iterator, go for it!
If that means using an iterator instead of a list, go for it! Python
won't judge you. :)

But if you really want to know which one is faster, figure out a
reasonable benchmark, and then start playing around with the timeit
module. Just remember, it's very very easy to spend hours trying to
make the benchmark numbers look better, only to discover that it has
negligible impact on your code's actual performance - or, in some
cases, it's *worse* than before (because the benchmark wasn't truly
representative). So if you want to spend some enjoyable time exploring
different options, go for it! And we'd be happy to help out. Just
don't force yourself to write bad code "because it's faster".

ChrisA

Roel Schroeven

unread,
Jun 14, 2022, 3:44:54 PMJun 14
to
Chris Angelico schreef op 14/06/2022 om 20:47:
> > def main():
> > for each in (iterEmpty, iter1, iter2, iterMany):
> > baseIterator = each()
> > chopFirst = mapFirst(baseIterator, lambda x: x[1:-1])
> > andCapLast = mapLast(chopFirst, lambda x: x.upper())
> > print(repr(" ".join(andCapLast)))
>
> Don't bother with a main() function unless you actually need to be
> able to use it as a function. Most of the time, it's simplest to just
> have the code you want, right there in the file. :) Python isn't C or
> Java, and code doesn't have to get wrapped up in functions in order to
> exist.
Not (necessarily) a main function, but these days the general
recommendation seems to be to use the "if __name__ == '__main__':"
construct, so that the file can be used as a module as well as as a
script. Even for short simple things that can be helpful when doing
things like running tests or extracting docstrings.

--
"This planet has - or rather had - a problem, which was this: most of the
people living on it were unhappy for pretty much of the time. Many solutions
were suggested for this problem, but most of these were largely concerned with
the movement of small green pieces of paper, which was odd because on the whole
it wasn't the small green pieces of paper that were unhappy."
-- Douglas Adams

Chris Angelico

unread,
Jun 14, 2022, 3:50:16 PMJun 14
to
On Wed, 15 Jun 2022 at 05:45, Roel Schroeven <ro...@roelschroeven.net> wrote:
>
> Chris Angelico schreef op 14/06/2022 om 20:47:
> > > def main():
> > > for each in (iterEmpty, iter1, iter2, iterMany):
> > > baseIterator = each()
> > > chopFirst = mapFirst(baseIterator, lambda x: x[1:-1])
> > > andCapLast = mapLast(chopFirst, lambda x: x.upper())
> > > print(repr(" ".join(andCapLast)))
> >
> > Don't bother with a main() function unless you actually need to be
> > able to use it as a function. Most of the time, it's simplest to just
> > have the code you want, right there in the file. :) Python isn't C or
> > Java, and code doesn't have to get wrapped up in functions in order to
> > exist.
> Not (necessarily) a main function, but these days the general
> recommendation seems to be to use the "if __name__ == '__main__':"
> construct, so that the file can be used as a module as well as as a
> script. Even for short simple things that can be helpful when doing
> things like running tests or extracting docstrings.

If it does need to be used as a module as well as a script, sure. But
(a) not everything does, and (b) even then, you don't need a main()
function; what you need is the name-is-main check. The main function
is only necessary when you need to be able to invoke your main entry
point externally, AND this main entry point doesn't have a better
name. That's fairly rare in my experience.

My recommendation is to write the code you need, and only add
boilerplate when you actually need it. Don't just start every script
with an if-name-is-main block at the bottom just for the sake of doing
it.

ChrisA

Greg Ewing

unread,
Jun 14, 2022, 7:45:57 PMJun 14
to
On 15/06/22 7:49 am, Chris Angelico wrote:
> If it does need to be used as a module as well as a script, sure. But
> (a) not everything does, and (b) even then, you don't need a main()

I think this is very much a matter of taste. Personally I find it tidier
to put the top level code in a function, because it ties it together
visually and lets me have locals that are properly local.

If the file is only ever used as a script, I just put an unconditional
call to the main function at the bottom.

--
Greg

Cameron Simpson

unread,
Jun 14, 2022, 11:17:42 PMJun 14
to
On 15Jun2022 05:49, Chris Angelico <ros...@gmail.com> wrote:
>On Wed, 15 Jun 2022 at 05:45, Roel Schroeven <ro...@roelschroeven.net> wrote:
>> Not (necessarily) a main function, but these days the general
>> recommendation seems to be to use the "if __name__ == '__main__':"
>> construct, so that the file can be used as a module as well as as a
>> script. Even for short simple things that can be helpful when doing
>> things like running tests or extracting docstrings.
>
>If it does need to be used as a module as well as a script, sure. But
>(a) not everything does, and (b) even then, you don't need a main()
>function; what you need is the name-is-main check. The main function
>is only necessary when you need to be able to invoke your main entry
>point externally, AND this main entry point doesn't have a better
>name. That's fairly rare in my experience.

While I will lazily not-use-a-function in dev, using a function has the
benefit of avoiding accidental global variable use, because assignments
within the function will always make local variables. That is a big plus
for me all on its own. I've used this practice as far back as Pascal,
which also let you write outside-a-function code, and consider it a
great avoider of a common potential bug situation.

Cheers,
Cameron Simpson <c...@cskk.id.au>

Leo

unread,
Jun 19, 2022, 8:08:31 AMJun 19
to
On Wed, 15 Jun 2022 04:47:31 +1000, Chris Angelico wrote:

> Don't bother with a main() function unless you actually need to be
> able to use it as a function. Most of the time, it's simplest to
> just have the code you want, right there in the file. :) Python
> isn't C or Java, and code doesn't have to get wrapped up in
> functions in order to exist.

Actually a main() function in Python is pretty useful, because Python
code on the top level executes a lot slower. I believe this is due to
global variable lookups instead of local.

Here is benchmark output from a small test.

```
Benchmark 1: python3 test1.py
Time (mean ± σ): 662.0 ms ± 44.7 ms
Range (min … max): 569.4 ms … 754.1 ms

Benchmark 2: python3 test2.py
Time (mean ± σ): 432.1 ms ± 14.4 ms
Range (min … max): 411.4 ms … 455.1 ms

Summary
'python3 test2.py' ran
1.53 ± 0.12 times faster than 'python3 test1.py'
```

Contents of test1.py:

```
l1 = list(range(5_000_000))
l2 = []

while l1:
l2.append(l1.pop())

print(len(l1), len(l2))
```

Contents of test2.py:

```
def main():
l1 = list(range(5_000_000))
l2 = []

while l1:
l2.append(l1.pop())

print(len(l1), len(l2))
main()
```

--
Leo

Chris Angelico

unread,
Jun 20, 2022, 5:02:30 PMJun 20
to
To be quite honest, I have never once in my life had a time when the
execution time of a script is dominated by global variable lookups in
what would be the main function, AND it takes long enough to care
about it. Yes, technically it might be faster, but I've probably spent
more time reading your post than I'll ever save by putting stuff into
a function :)

Also, often at least some of those *need* to be global in order to be
useful, so you'd lose any advantage you gain.

ChrisA
Reply all
Reply to author
Forward
0 new messages