unfold

612 views
Skip to first unread message

Kevin Downey

unread,
Mar 29, 2016, 6:40:43 PM3/29/16
to cloju...@googlegroups.com
I think unfold (maybe with another name?) would be a great addition to
clojure.core.

unfold is sort of the opposite of reduce. Given a single value and a
function to produce new values, it produces a series of values. Variants
of it exist in Haskell
(https://hackage.haskell.org/package/base-4.8.2.0/docs/Data-List.html#v:unfoldr),
and several Clojure utility libraries
(https://github.com/amalloy/useful/blob/develop/src/flatland/useful/seq.clj#L128-L147).


I have found unfold to be very useful dealing with web apis and database
apis. unfold provides a nice way to turn any api that requires a series
of api calls in to a series of api call results.

I have written an implementation of `unfold` using CollReduce
(https://gist.github.com/hiredman/4d8bf007ba7897f11594) but it would
likely be better to implement the new Reduce interfaces in 1.8.
Alternatively, or maybe a long side that a lazy-seq based unfold might
be useful.

Does this seem like something useful? Do you think a patch for this
would be well received? Should I open a jira issue?

--
And what is good, Phaedrus,
And what is not good—
Need we ask anyone to tell us these things?

Howard Lewis Ship

unread,
Mar 29, 2016, 6:46:12 PM3/29/16
to cloju...@googlegroups.com
Sounds (just?) like clojure.core/iterate.

--
You received this message because you are subscribed to the Google Groups "Clojure Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure-dev...@googlegroups.com.
To post to this group, send email to cloju...@googlegroups.com.
Visit this group at https://groups.google.com/group/clojure-dev.
For more options, visit https://groups.google.com/d/optout.



--
Howard M. Lewis Ship

Senior Mobile Developer at Walmart Labs

Creator of Apache Tapestry

Kevin Downey

unread,
Mar 29, 2016, 6:59:50 PM3/29/16
to cloju...@googlegroups.com
I would say it is definitely similar. I have found `unfold` in various
incarnations to be nicer to use than take-while + iterate, and of course
`iterate`s docstring says 'f' must be free of side effects. I am not
100% sure why iterate specifies that, if I had to guess it is because
some people are uncomfortable with mixing lazy seqs and io[1]. I think a
reduce based unfold side steps that (but my read on that could be all
wrong).


1. https://stuartsierra.com/2015/08/25/clojure-donts-lazy-effects
> <mailto:clojure-dev%2Bunsu...@googlegroups.com>.
> To post to this group, send email to cloju...@googlegroups.com
> <mailto:cloju...@googlegroups.com>.
> Visit this group at https://groups.google.com/group/clojure-dev.
> For more options, visit https://groups.google.com/d/optout.
>
>
>
>
> --
> Howard M. Lewis Ship
>
> Senior Mobile Developer at Walmart Labs
>
> Creator of Apache Tapestry
>
> (971) 678-5210
> http://howardlewisship.com
> @hlship
>
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure Dev" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to clojure-dev...@googlegroups.com
> <mailto:clojure-dev...@googlegroups.com>.
> To post to this group, send email to cloju...@googlegroups.com
> <mailto:cloju...@googlegroups.com>.
> Visit this group at https://groups.google.com/group/clojure-dev.
> For more options, visit https://groups.google.com/d/optout.


--

Alex Miller

unread,
Mar 30, 2016, 12:48:30 AM3/30/16
to Clojure Dev
Rather than starting with a solution, can we start with a problem that needs to be solved and consider options? Can you sketch the use case more fully? For the cases mentioned in the first post (web and db apis), those sound like cases where I would maybe look at loop/recur or a custom lazy seq to control iteration. 

Re side effects and iterate, it is intended to be a pure generator - there is no guarantee made on chunking (might work ahead) or even necessarily on whether f might be invoked multiple times (so stateful/io would be bad). The current implementation of iterate is *both* a lazy seq and reducible. Reducibles are processed eagerly (without caching) and separately from seqs so using it in both capacities may cause f to be invoked separately for each use. This was implemented in CLJ-1603 for Clojure 1.7 and there is a lot of history and work there.

Re implementation, it is preferable to implement IReduceInit directly if you control the implementation rather than to plug into CollReduce. We might also want this to be seqable which gets into some of the same territory as CLJ-1603 but with some twists if you expect the function to potentially have side effects. For once-only sources, maybe you could skip the seq impl though. We've seen the question of once-only traversal of external apis come up several times and I think there is something potentially to add here, whether it's unfold or something else.

>     <mailto:clojure-dev%2Bunsu...@googlegroups.com>.
>     To post to this group, send email to cloju...@googlegroups.com
>     <mailto:clojure-dev@googlegroups.com>.
>     Visit this group at https://groups.google.com/group/clojure-dev.
>     For more options, visit https://groups.google.com/d/optout.
>
>
>
>
> --
> Howard M. Lewis Ship
>
> Senior Mobile Developer at Walmart Labs
>
> Creator of Apache Tapestry
>
> (971) 678-5210
> http://howardlewisship.com
> @hlship
>
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure Dev" group.
> To unsubscribe from this group and stop receiving emails from it, send
> To post to this group, send email to cloju...@googlegroups.com

Kevin Downey

unread,
Mar 30, 2016, 4:36:27 AM3/30/16
to cloju...@googlegroups.com
On 03/29/2016 09:48 PM, Alex Miller wrote:
> Rather than starting with a solution, can we start with a problem that
> needs to be solved and consider options? Can you sketch the use case
> more fully? For the cases mentioned in the first post (web and db apis),
> those sound like cases where I would maybe look at loop/recur or a
> custom lazy seq to control iteration.

With elasticsearch, the scroll api (like a search) returns a token that
is used to retrieve each page of results, and each page of results
returns a new token which will give you the next page. The github api
when paging through results, each page has a header that points to the
next page of results. Just yesterday in the slack channel I spoke with
someone who was pulling effectively a page at a time of db rows back
from a database, each page being the rows with ids between the previous
pages last id and that id plus whatever the page size is. If I recall
the s3 (azure cloud storage too) listing api is very similar to the db
row fetching, you get back a page of results, and then the next api
request you ask for the page beginning after the last item in the
previous page.

These all have a common structure of iterated api requests
(where each api request depends on the result of the previous one), with
some halting condition.

This is certainly a solvable problem and can be/has been solved over and
over again, just like every time someone needs to map a function over a
collection they could write a custom lazy-seq or use loop/recur. I've
implemented custom lazy-seqs for this stuff, I've implemented reducible
collections, I've written both lazy-seq and reducible versions of unfold
in various projects to avoid repeating this stuff. Since it seems (at
least to me) to be something I see over and over again in apis it would
be nice to encapsulate that in some way that didn't involve a single
function library.

Right now, if I was tasked with, for example writing an elasticsearch
clojure api(which I was several months ago, and I used unfold) or with
s3 bucket listing code, I wouldn't even bother without writing unfold
first, and use that instead of reifying some reducible interface /
protocol or using lazy-seq directly. Which is kind of annoying, because
now, if someone asks "how would you write this code?" my current answer
involves a function that exists somewhere in my muscle memory which is
not super helpful to them.

> Re side effects and iterate, it is intended to be a pure generator -
> there is no guarantee made on chunking (might work ahead) or even
> necessarily on whether f might be invoked multiple times (so stateful/io
> would be bad). The current implementation of iterate is *both* a lazy
> seq and reducible. Reducibles are processed eagerly (without caching)
> and separately from seqs so using it in both capacities may cause f to
> be invoked separately for each use. This was implemented in CLJ-1603 for
> Clojure 1.7 and there is a lot of history and work there.

While I certainly prefer unfold, I think it could be replaced with some
combination of iterate and take-while. But the restrictions on iterate
make it unsuitable for these kind of iterated api calls. Maybe some
other iterate that doesn't require a pure function could solve that.

> Re implementation, it is preferable to implement IReduceInit directly if
> you control the implementation rather than to plug into CollReduce. We
> might also want this to be seqable which gets into some of the same
> territory as CLJ-1603 but with some twists if you expect the function to
> potentially have side effects. For once-only sources, maybe you could
> skip the seq impl though. We've seen the question of once-only traversal
> of external apis come up several times and I think there is something
> potentially to add here, whether it's unfold or something else.
>

My experience, which is very limited, and I am sure others see things
differently, is that I have never wanted the caching behavior for
lazy-seqs that were used to represent, uh external resources, like this.
In fact the behavior has been a significant source of pain (tracking
down issues with macros inadvertently holding on to the head of a seq),
because it was never reasonable to expect the elements to fit in memory
at once. So, as I use unfold for external resources, having it be
non-caching is ideal.

It also seems like caching behavior is recoverable (into [] ...) from
non-caching, but removing caching when caching is built in is trickier.

On one hand, as I said, for this use case I think a reducible and the
non-caching is a win technically; on the other hand there maybe an
ergonomic argument to be made for supporting seqs. Getting code that
uses transducers/reducers/non-seq things that can be processed using
reduce through code review can be challenging, and I've had at least one
job interview go south at least in part because I used transducers in
their coding challenge. It seems like people are much more comfortable
with seqs.


> On Tuesday, March 29, 2016 at 5:59:50 PM UTC-5, Kevin Downey wrote:
>
> I would say it is definitely similar. I have found `unfold` in various
> incarnations to be nicer to use than take-while + iterate, and of
> course
> `iterate`s docstring says 'f' must be free of side effects. I am not
> 100% sure why iterate specifies that, if I had to guess it is because
> some people are uncomfortable with mixing lazy seqs and io[1]. I
> think a
> reduce based unfold side steps that (but my read on that could be all
> wrong).
>
>
> 1. https://stuartsierra.com/2015/08/25/clojure-donts-lazy-effects
> <https://stuartsierra.com/2015/08/25/clojure-donts-lazy-effects>
>
> On 03/29/2016 03:46 PM, Howard Lewis Ship wrote:
> > Sounds (just?) like clojure.core/iterate.
> >
> > On Tue, Mar 29, 2016 at 3:40 PM, Kevin Downey <red...@gmail.com
> <mailto:red...@gmail.com>
> > <mailto:red...@gmail.com <mailto:red...@gmail.com>>> wrote:
> >
> > I think unfold (maybe with another name?) would be a great
> addition to
> > clojure.core.
> >
> > unfold is sort of the opposite of reduce. Given a single value
> and a
> > function to produce new values, it produces a series of
> values. Variants
> > of it exist in Haskell
> >
> (https://hackage.haskell.org/package/base-4.8.2.0/docs/Data-List.html#v:unfoldr
> <https://hackage.haskell.org/package/base-4.8.2.0/docs/Data-List.html#v:unfoldr>),
> <https://github.com/amalloy/useful/blob/develop/src/flatland/useful/seq.clj#L128-L147>).
>
> >
> >
> > I have found unfold to be very useful dealing with web apis
> and database
> > apis. unfold provides a nice way to turn any api that requires
> a series
> > of api calls in to a series of api call results.
> >
> > I have written an implementation of `unfold` using CollReduce
> > (https://gist.github.com/hiredman/4d8bf007ba7897f11594
> <https://gist.github.com/hiredman/4d8bf007ba7897f11594>) but it would
> > likely be better to implement the new Reduce interfaces in 1.8.
> > Alternatively, or maybe a long side that a lazy-seq based
> unfold might
> > be useful.
> >
> > Does this seem like something useful? Do you think a patch for
> this
> > would be well received? Should I open a jira issue?
> >
> > --
> > And what is good, Phaedrus,
> > And what is not good—
> > Need we ask anyone to tell us these things?
> >
> > --
> > You received this message because you are subscribed to the
> Google
> > Groups "Clojure Dev" group.
> > To unsubscribe from this group and stop receiving emails from it,
> > send an email to clojure-dev...@googlegroups.com
> <mailto:clojure-dev%2Bunsu...@googlegroups.com>
> > <mailto:clojure-dev%2Bunsu...@googlegroups.com
> <mailto:clojure-dev%252Buns...@googlegroups.com>>.
> > To post to this group, send email to
> cloju...@googlegroups.com <mailto:cloju...@googlegroups.com>
> > <mailto:cloju...@googlegroups.com
> <mailto:cloju...@googlegroups.com>>.
> <https://groups.google.com/group/clojure-dev>.
> > For more options, visit https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>.
> >
> >
> >
> >
> > --
> > Howard M. Lewis Ship
> >
> > Senior Mobile Developer at Walmart Labs
> >
> > Creator of Apache Tapestry
> >
> > (971) 678-5210
> > http://howardlewisship.com
> > @hlship
> >
> > --
> > You received this message because you are subscribed to the Google
> > Groups "Clojure Dev" group.
> > To unsubscribe from this group and stop receiving emails from it,
> send
> > an email to clojure-dev...@googlegroups.com
> <mailto:clojure-dev%2Bunsu...@googlegroups.com>
> > <mailto:clojure-dev...@googlegroups.com
> <mailto:clojure-dev%2Bunsu...@googlegroups.com>>.
> > To post to this group, send email to cloju...@googlegroups.com
> <mailto:cloju...@googlegroups.com>
> > <mailto:cloju...@googlegroups.com
> <mailto:cloju...@googlegroups.com>>.
> <https://groups.google.com/group/clojure-dev>.
> > For more options, visit https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>.
>
>
> --
> And what is good, Phaedrus,
> And what is not good—
> Need we ask anyone to tell us these things?
>
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure Dev" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to clojure-dev...@googlegroups.com
> <mailto:clojure-dev...@googlegroups.com>.
> To post to this group, send email to cloju...@googlegroups.com
> <mailto:cloju...@googlegroups.com>.

Akhil Wali

unread,
Mar 30, 2016, 9:25:35 AM3/30/16
to Clojure Dev
While I certainly prefer unfold, I think it could be replaced with some 
combination of iterate and take-while.
In fact the behavior has been a significant source of pain (tracking 
down issues with macros inadvertently holding on to the head of a seq), 
because it was never reasonable to expect the elements to fit in memory 
at once.

It sounds like unfold can be implemented as a macro based on take-while and iterate.
IMHO, holding onto the head has nothing to do with using a macro.

Also, it looks like you can use a transducer here with the eduction function if you're trying to reduce memory allocations.

TODO

Nicola Mometto

unread,
Mar 30, 2016, 9:35:51 AM3/30/16
to cloju...@googlegroups.com

> On 30 Mar 2016, at 13:51, Akhil Wali <green.tr...@gmail.com> wrote:
>
> Also, it looks like you can use a transducer here with the eduction function if you're trying to reduce memory allocations.

I'll be wary of using `eduction` to "reduce memory allocations", given that it holds onto the sequence you're using `eduction` on.

user=> (defn range* [x] (cons x (lazy-seq (range* (inc x)))))
#'user/range*
user=> (reduce + 0 (eduction (range* 0)))
OutOfMemoryError Java heap space clojure.lang.RT.cons (RT.java:660)
user=> (reduce + 0 (range* 0))
;; runs infinitely, no OOM

signature.asc

Alex Miller

unread,
Mar 30, 2016, 9:46:52 AM3/30/16
to cloju...@googlegroups.com
I think this is for the case of making repeated api calls, not working from a source sequence. The memory reduction comes from not caching values like seqs. eduction will be more useful with a reducible source.

I'm good with a ticket for this - I think the use case from Kevin is a good one and echoes others that I've heard. It would be good to have a mock api in the ticket that represents what it's like to interact with some of these sources to make the target concrete. Ultimately, I'm not sure where this will go but it's worth writing up.

--
You received this message because you are subscribed to the Google Groups "Clojure Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure-dev...@googlegroups.com.
To post to this group, send email to cloju...@googlegroups.com.

Nicola Mometto

unread,
Mar 30, 2016, 9:47:18 AM3/30/16
to cloju...@googlegroups.com
(OT but that eduction holding onto collections and causing OOM is fixed by http://dev.clojure.org/jira/browse/CLJ-1793, formerly known as http://dev.clojure.org/jira/browse/CLJ-1250, please vote that ticket for inclusion in the 1.9 fixlist)
signature.asc

Kevin Downey

unread,
Mar 30, 2016, 2:43:37 PM3/30/16
to cloju...@googlegroups.com
I created a jira ticket http://dev.clojure.org/jira/browse/CLJ-1906 and
linked back to this thread.

Ghadi Shayban

unread,
Apr 1, 2016, 2:01:19 PM4/1/16
to Clojure Dev
An example of using this to paginate some API calls
https://gist.github.com/ghadishayban/050d0059e60884b95ffbb6f11901d0bb
Line 34

Ghadi Shayban

unread,
Sep 19, 2016, 12:37:40 AM9/19/16
to Clojure Dev
I've been marinating on the `unfold` proposal going on here [1] and JIRA [2], and I think I've found a useful generalization.

motivating examples & docstring here [3].

Naming aside, I think the expressivity/minimality ratio is very high.  It covers a lot of overlapping territory with typical non-concurrent usages of generators/yield (a feature Clojure does not have), but is expressed functionally.
(It uses a sentinel to mark the end of stream and formulates the appearance of values in terms of calling a possibly stateful function.)

[3] https://gist.github.com/ghadishayban/f905be564d1d37ba9fa4d77b0d5e8848
Reply all
Reply to author
Forward
0 new messages