On 03/29/2016 09:48 PM, Alex Miller wrote:
> Rather than starting with a solution, can we start with a problem that
> needs to be solved and consider options? Can you sketch the use case
> more fully? For the cases mentioned in the first post (web and db apis),
> those sound like cases where I would maybe look at loop/recur or a
> custom lazy seq to control iteration.
With elasticsearch, the scroll api (like a search) returns a token that
is used to retrieve each page of results, and each page of results
returns a new token which will give you the next page. The github api
when paging through results, each page has a header that points to the
next page of results. Just yesterday in the slack channel I spoke with
someone who was pulling effectively a page at a time of db rows back
from a database, each page being the rows with ids between the previous
pages last id and that id plus whatever the page size is. If I recall
the s3 (azure cloud storage too) listing api is very similar to the db
row fetching, you get back a page of results, and then the next api
request you ask for the page beginning after the last item in the
previous page.
These all have a common structure of iterated api requests
(where each api request depends on the result of the previous one), with
some halting condition.
This is certainly a solvable problem and can be/has been solved over and
over again, just like every time someone needs to map a function over a
collection they could write a custom lazy-seq or use loop/recur. I've
implemented custom lazy-seqs for this stuff, I've implemented reducible
collections, I've written both lazy-seq and reducible versions of unfold
in various projects to avoid repeating this stuff. Since it seems (at
least to me) to be something I see over and over again in apis it would
be nice to encapsulate that in some way that didn't involve a single
function library.
Right now, if I was tasked with, for example writing an elasticsearch
clojure api(which I was several months ago, and I used unfold) or with
s3 bucket listing code, I wouldn't even bother without writing unfold
first, and use that instead of reifying some reducible interface /
protocol or using lazy-seq directly. Which is kind of annoying, because
now, if someone asks "how would you write this code?" my current answer
involves a function that exists somewhere in my muscle memory which is
not super helpful to them.
> Re side effects and iterate, it is intended to be a pure generator -
> there is no guarantee made on chunking (might work ahead) or even
> necessarily on whether f might be invoked multiple times (so stateful/io
> would be bad). The current implementation of iterate is *both* a lazy
> seq and reducible. Reducibles are processed eagerly (without caching)
> and separately from seqs so using it in both capacities may cause f to
> be invoked separately for each use. This was implemented in CLJ-1603 for
> Clojure 1.7 and there is a lot of history and work there.
While I certainly prefer unfold, I think it could be replaced with some
combination of iterate and take-while. But the restrictions on iterate
make it unsuitable for these kind of iterated api calls. Maybe some
other iterate that doesn't require a pure function could solve that.
> Re implementation, it is preferable to implement IReduceInit directly if
> you control the implementation rather than to plug into CollReduce. We
> might also want this to be seqable which gets into some of the same
> territory as CLJ-1603 but with some twists if you expect the function to
> potentially have side effects. For once-only sources, maybe you could
> skip the seq impl though. We've seen the question of once-only traversal
> of external apis come up several times and I think there is something
> potentially to add here, whether it's unfold or something else.
>
My experience, which is very limited, and I am sure others see things
differently, is that I have never wanted the caching behavior for
lazy-seqs that were used to represent, uh external resources, like this.
In fact the behavior has been a significant source of pain (tracking
down issues with macros inadvertently holding on to the head of a seq),
because it was never reasonable to expect the elements to fit in memory
at once. So, as I use unfold for external resources, having it be
non-caching is ideal.
It also seems like caching behavior is recoverable (into [] ...) from
non-caching, but removing caching when caching is built in is trickier.
On one hand, as I said, for this use case I think a reducible and the
non-caching is a win technically; on the other hand there maybe an
ergonomic argument to be made for supporting seqs. Getting code that
uses transducers/reducers/non-seq things that can be processed using
reduce through code review can be challenging, and I've had at least one
job interview go south at least in part because I used transducers in
their coding challenge. It seems like people are much more comfortable
with seqs.
> On Tuesday, March 29, 2016 at 5:59:50 PM UTC-5, Kevin Downey wrote:
>
> I would say it is definitely similar. I have found `unfold` in various
> incarnations to be nicer to use than take-while + iterate, and of
> course
> `iterate`s docstring says 'f' must be free of side effects. I am not
> 100% sure why iterate specifies that, if I had to guess it is because
> some people are uncomfortable with mixing lazy seqs and io[1]. I
> think a
> reduce based unfold side steps that (but my read on that could be all
> wrong).
>
>
> 1.
https://stuartsierra.com/2015/08/25/clojure-donts-lazy-effects
> <
https://stuartsierra.com/2015/08/25/clojure-donts-lazy-effects>
>
> On 03/29/2016 03:46 PM, Howard Lewis Ship wrote:
> > Sounds (just?) like clojure.core/iterate.
> >
> > On Tue, Mar 29, 2016 at 3:40 PM, Kevin Downey <
red...@gmail.com
> <mailto:
red...@gmail.com>
> <
https://hackage.haskell.org/package/base-4.8.2.0/docs/Data-List.html#v:unfoldr>),
> <
https://github.com/amalloy/useful/blob/develop/src/flatland/useful/seq.clj#L128-L147>).
>
> >
> >
> > I have found unfold to be very useful dealing with web apis
> and database
> > apis. unfold provides a nice way to turn any api that requires
> a series
> > of api calls in to a series of api call results.
> >
> > I have written an implementation of `unfold` using CollReduce
> > (
https://gist.github.com/hiredman/4d8bf007ba7897f11594
> <
https://gist.github.com/hiredman/4d8bf007ba7897f11594>) but it would
> > likely be better to implement the new Reduce interfaces in 1.8.
> > Alternatively, or maybe a long side that a lazy-seq based
> unfold might
> > be useful.
> >
> > Does this seem like something useful? Do you think a patch for
> this
> > would be well received? Should I open a jira issue?
> >
> > --
> > And what is good, Phaedrus,
> > And what is not good—
> > Need we ask anyone to tell us these things?
> >
> > --
> > You received this message because you are subscribed to the
> Google
> > Groups "Clojure Dev" group.
> > To unsubscribe from this group and stop receiving emails from it,
> > send an email to
clojure-dev...@googlegroups.com
> <mailto:
clojure-dev%2Bunsu...@googlegroups.com>
> > <mailto:
clojure-dev%2Bunsu...@googlegroups.com
> <mailto:
clojure-dev%252Buns...@googlegroups.com>>.
> > To post to this group, send email to
>
cloju...@googlegroups.com <mailto:
cloju...@googlegroups.com>
> > <mailto:
cloju...@googlegroups.com
> <mailto:
cloju...@googlegroups.com>>.
> <
https://groups.google.com/group/clojure-dev>.
> <
https://groups.google.com/d/optout>.
> >
> >
> >
> >
> > --
> > Howard M. Lewis Ship
> >
> > Senior Mobile Developer at Walmart Labs
> >
> > Creator of Apache Tapestry
> >
> >
(971) 678-5210
> >
http://howardlewisship.com
> > @hlship
> >
> > --
> > You received this message because you are subscribed to the Google
> > Groups "Clojure Dev" group.
> > To unsubscribe from this group and stop receiving emails from it,
> send
> > an email to
clojure-dev...@googlegroups.com
> <mailto:
clojure-dev%2Bunsu...@googlegroups.com>
> > <mailto:
clojure-dev...@googlegroups.com
> <mailto:
cloju...@googlegroups.com>
> > <mailto:
cloju...@googlegroups.com
> <mailto:
cloju...@googlegroups.com>>.
> <
https://groups.google.com/group/clojure-dev>.
> <
https://groups.google.com/d/optout>.
>
>
> --
> And what is good, Phaedrus,
> And what is not good—
> Need we ask anyone to tell us these things?
>
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure Dev" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to
clojure-dev...@googlegroups.com
> <mailto:
clojure-dev...@googlegroups.com>.
> <mailto:
cloju...@googlegroups.com>.