Discussion - 1.2 seq

6 views
Skip to first unread message

Sean Devlin

unread,
Apr 22, 2010, 8:59:23 PM4/22/10
to Clojure Dev
Specific thread to discuss seq-utils stuff.

I'll start. Include shuffle :)

Sean

--
You received this message because you are subscribed to the Google Groups "Clojure Dev" group.
To post to this group, send email to cloju...@googlegroups.com.
To unsubscribe from this group, send email to clojure-dev...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/clojure-dev?hl=en.

Chas Emerick

unread,
Apr 22, 2010, 9:05:31 PM4/22/10
to cloju...@googlegroups.com
On Apr 22, 2010, at 8:59 PM, Sean Devlin wrote:

> Specific thread to discuss seq-utils stuff.
>
> I'll start. Include shuffle :)

Seconded.

I'll also nominate partition-all -- that one is a FAQ.

- Chas

Sean Devlin

unread,
Apr 22, 2010, 9:34:52 PM4/22/10
to cloju...@googlegroups.com
Oh, and I was wondering something else.  What ns will the approved seq fns go into?  I ask, because from where I'm sitting it looks like they should go directly in core.  Consider what core does:

1.  List (seq) processing
2.  Concurrency
3.  A third category someone will think of to be contrary :-p

Anyway, if we're going to have official supported general sequence fns, why create a second ns for seq processing.

My $.02

Sean

Stuart Halloway

unread,
Apr 22, 2010, 9:36:44 PM4/22/10
to cloju...@googlegroups.com
Yes, they will go in core. This inevitably breaks people, but the
breakage is small compared to creating arbitrary new namespaces just
to avoid it.

The proposed namespaces are documented in the tickets.

Stu

Mark Engelberg

unread,
Apr 22, 2010, 9:57:39 PM4/22/10
to clojure-dev
On Thu, Apr 22, 2010 at 6:34 PM, Sean Devlin <francoi...@gmail.com> wrote:
> Oh, and I was wondering something else.  What ns will the approved seq fns
> go into?  I ask, because from where I'm sitting it looks like they should go
> directly in core.

There's something nice about having a core api that you can entirely
get your head around, and remember what every function does. The more
functions you have, the tougher it gets. I think Rich did a very nice
job with Clojure 1.0 in picking a good set of core sequencing
functions. A few core additions might be reasonable, but honestly,
for a number of the functions in seq-utils, it would be more burden
for me to remember what they are, what their names are, and what they
do, than to just combine the existing functions on the fly to achieve
whatever objective I have in mind.

shuffle is a great example of something I *would* like to see in the
core. The name is easy to remember what it does, and would take me
long enough to write it myself that I'd be glad to see it in there.

separate is an example of something I *wouldn't* like to see in the
core. The word "separate" can mean any number of things, and when I
go to look it up, I see that all it does is bundle into a vector calls
to (filter p s) and (filter (complement p) s). I rarely need to
filter on both p and its opposite, but when I do, I can just as easily
do that myself. separate adds little value.


I'm well aware this won't be decided by majority rule, but here's my
own 2 cents on the seq library:

YES
flatten
frequencies
indexed
shuffle

MAYBE (doesn't matter to me, but I can see how they might be useful)
find-first
group-by
partition-all
positions
rand-elt
reductions
rotations

NO
Anything not mentioned on the above lists.

Chouser

unread,
Apr 23, 2010, 8:23:37 AM4/23/10
to cloju...@googlegroups.com
On Thu, Apr 22, 2010 at 9:57 PM, Mark Engelberg
<mark.en...@gmail.com> wrote:
>
> There's something nice about having a core api that you can entirely
> get your head around, and remember what every function does.  The more
> functions you have, the tougher it gets.  I think Rich did a very nice
> job with Clojure 1.0 in picking a good set of core sequencing
> functions.  A few core additions might be reasonable, but honestly,
> for a number of the functions in seq-utils, it would be more burden
> for me to remember what they are, what their names are, and what they
> do, than to just combine the existing functions on the fly to achieve
> whatever objective I have in mind.

I agree wholeheartedly with this criteria, though it turns out to
be highly subjective.

> I'm well aware this won't be decided by majority rule, but here's my
> own 2 cents on the seq library:
>
> YES
> flatten
> frequencies
> indexed
> shuffle
>
> MAYBE (doesn't matter to me, but I can see how they might be useful)
> find-first
> group-by
> partition-all
> positions
> rand-elt
> reductions
> rotations

I'd move partition-all and reductions to YES, and I'd move
flatten and frequencies to MAYBE.

--Chouser
http://joyofclojure.com/

Matt Revelle

unread,
Apr 23, 2010, 8:32:38 AM4/23/10
to cloju...@googlegroups.com
On Apr 23, 2010, at 8:23 AM, Chouser wrote:

> On Thu, Apr 22, 2010 at 9:57 PM, Mark Engelberg
> <mark.en...@gmail.com> wrote:
>>
>> There's something nice about having a core api that you can entirely
>> get your head around, and remember what every function does. The more
>> functions you have, the tougher it gets. I think Rich did a very nice
>> job with Clojure 1.0 in picking a good set of core sequencing
>> functions. A few core additions might be reasonable, but honestly,
>> for a number of the functions in seq-utils, it would be more burden
>> for me to remember what they are, what their names are, and what they
>> do, than to just combine the existing functions on the fly to achieve
>> whatever objective I have in mind.
>
> I agree wholeheartedly with this criteria, though it turns out to
> be highly subjective.

Shouldn't the core ns only provide minimal functionality without being frustratingly incomplete?
Apologies if this was previously discussed, and it is highly subjective, but why not use a clojure.seq ns for all seq functions?

Perry Trolard

unread,
Apr 23, 2010, 11:22:34 AM4/23/10
to Clojure Dev
Jason Wolfe, I believe, left a comment about group-by on the Assembla
page; I wanted to draw attention to his comment over here & second two
things he said:

1. group-by should indeed be promoted
2. I don't care if the map it returns is sorted

FWIW, my must-haves are:

indexed
frequencies
group-by
flatten
partition-all
shuffle

(I don't think partition-all is the best name, but I can't think of
anything better. The problem is that "all," or the idea of the whole,
applies only to the sequence being partitioned, not to the returned
sequences – they're not *all* of the partition size, necessarily. The
idea I want in the name is something like "ragged right," or
"unchecked." But those don't seem to lend themselves to good names…)

Perry

Phil Hagelberg

unread,
Apr 23, 2010, 9:06:47 PM4/23/10
to cloju...@googlegroups.com
On Thu, Apr 22, 2010 at 6:57 PM, Mark Engelberg
<mark.en...@gmail.com> wrote:
> I'm well aware this won't be decided by majority rule, but here's my
> own 2 cents on the seq library:

I'll throw in my vote:
indexed
reductions
partition-all
rand-elt
flatten

I also agree that if "separate" goes in it should be renamed.
"includes?" might be good just given the fact that most people confuse
"contains?" for it, which has been a very FAQ in the past.

-Phil

Mark Engelberg

unread,
Apr 23, 2010, 10:41:51 PM4/23/10
to clojure-dev
On Fri, Apr 23, 2010 at 6:06 PM, Phil Hagelberg <ph...@hagelb.org> wrote:
> "includes?" might be good just given the fact that most people confuse
> "contains?" for it, which has been a very FAQ in the past.

IMHO, having two similar-sounding functions with different semantics
would be more confusing and error-prone than just needing to learn the
semantics of the one built-in function, and the trick of (some #{blah}
coll) to handle 99% of the cases that includes? would be used for. If
both functions were in there, I would constantly be trying to remember
which function has which meaning.

Phil Hagelberg

unread,
Apr 23, 2010, 11:24:29 PM4/23/10
to cloju...@googlegroups.com
On Fri, Apr 23, 2010 at 7:41 PM, Mark Engelberg
<mark.en...@gmail.com> wrote:
> On Fri, Apr 23, 2010 at 6:06 PM, Phil Hagelberg <ph...@hagelb.org> wrote:
>> "includes?" might be good just given the fact that most people confuse
>> "contains?" for it, which has been a very FAQ in the past.
>
> IMHO, having two similar-sounding functions with different semantics
> would be more confusing and error-prone than just needing to learn the
> semantics of the one built-in function, and the trick of (some #{blah}
> coll) to handle 99% of the cases that includes? would be used for.  If
> both functions were in there, I would constantly be trying to remember
> which function has which meaning.

Sure, but right now when people get tripped up they get pointed to
contrib, so we have two similar-sounding functions with different
semantics in different namespaces; not sure if that's much better. I
think we've seen by experience that contains? is not clearly named.
It's been one of the top things tripping people up in the past few
months.

If it were up to me we would move contains? to contains-key? and make
contains? a wrapper that prints a deprecation warning while calling
contains-key?, but maybe now's not a good time for that? I haven't
really kept up with the policy for deprecating things in core.

-Phil

Mark Engelberg

unread,
Apr 23, 2010, 11:49:52 PM4/23/10
to clojure-dev
On Fri, Apr 23, 2010 at 8:24 PM, Phil Hagelberg <ph...@hagelb.org> wrote:
> If it were up to me we would move contains? to contains-key? and make
> contains? a wrapper that prints a deprecation warning while calling
> contains-key?, but maybe now's not a good time for that? I haven't
> really kept up with the policy for deprecating things in core.

I agree that contains-key? is a much more accurate name.

Cosmin Stejerean

unread,
Apr 24, 2010, 11:13:04 AM4/24/10
to cloju...@googlegroups.com
On Fri, Apr 23, 2010 at 10:49 PM, Mark Engelberg <mark.en...@gmail.com> wrote:
On Fri, Apr 23, 2010 at 8:24 PM, Phil Hagelberg <ph...@hagelb.org> wrote:
> If it were up to me we would move contains? to contains-key? and make
> contains? a wrapper that prints a deprecation warning while calling
> contains-key?, but maybe now's not a good time for that? I haven't
> really kept up with the policy for deprecating things in core.

I agree that contains-key? is a much more accurate name.


+1 for contains-key? 

--
Cosmin Stejerean
http://offbytwo.com

Meikel Brandmeyer

unread,
Apr 24, 2010, 3:59:06 PM4/24/10
to cloju...@googlegroups.com
Hi,

On Fri, Apr 23, 2010 at 08:23:37AM -0400, Chouser wrote:

> I'd move partition-all and reductions to YES, and I'd move
> flatten and frequencies to MAYBE.

As Rich stated before, we should also consider the actual implementation
of the functions being moved. So here my take on reductions. See also
here: http://groups.google.com/group/clojure/browse_thread/thread/3e37df49ce5edf44/cbde2d3f8b5296cd

(defn reductions
([f coll]
(lazy-seq
(when-let [s (seq coll)]
(reductions f (first s) (rest s)))))
([f accum coll]
(cons accum
(lazy-seq
(when-let [s (seq coll)]
(reductions f (f accum (first coll)) (rest coll)))))))

This is certainly more straight-forward and doesn't require atom magic.

Thoughts?

Sincerely
Meikel

Remco van 't Veer

unread,
Apr 25, 2010, 5:08:20 AM4/25/10
to cloju...@googlegroups.com
Perry Trolard <tro...@gmail.com> writes:

> (I don't think partition-all is the best name, but I can't think of
> anything better. The problem is that "all," or the idea of the whole,
> applies only to the sequence being partitioned, not to the returned
> sequences – they're not *all* of the partition size, necessarily. The
> idea I want in the name is something like "ragged right," or
> "unchecked." But those don't seem to lend themselves to good names…)

Ruby uses "slice_each" for that function. Maybe "partition-each" is a
slightly better name?

Remco

Rich Hickey

unread,
Apr 26, 2010, 9:56:58 AM4/26/10
to cloju...@googlegroups.com

On Apr 24, 2010, at 3:59 PM, Meikel Brandmeyer wrote:

> Hi,
>
> On Fri, Apr 23, 2010 at 08:23:37AM -0400, Chouser wrote:
>
>> I'd move partition-all and reductions to YES, and I'd move
>> flatten and frequencies to MAYBE.
>
> As Rich stated before, we should also consider the actual
> implementation
> of the functions being moved. So here my take on reductions. See also
> here: http://groups.google.com/group/clojure/browse_thread/thread/3e37df49ce5edf44/cbde2d3f8b5296cd
>
> (defn reductions
> ([f coll]
> (lazy-seq
> (when-let [s (seq coll)]
> (reductions f (first s) (rest s)))))
> ([f accum coll]
> (cons accum
> (lazy-seq
> (when-let [s (seq coll)]
> (reductions f (f accum (first coll)) (rest coll)))))))
>
> This is certainly more straight-forward and doesn't require atom
> magic.
>
> Thoughts?
>
>

You need to handle:

(reductions + nil)

=> (0)

Rich

Meikel Brandmeyer

unread,
Apr 26, 2010, 2:26:06 PM4/26/10
to cloju...@googlegroups.com
Hi,

On Mon, Apr 26, 2010 at 09:56:58AM -0400, Rich Hickey wrote:

> You need to handle:
>
> (reductions + nil)
>
> => (0)

Huh? How could this possible work? How can reductions know the first
step for an arbitrary function f (not +) if there is none provided?

Sincerely
Meikel

Chouser

unread,
Apr 26, 2010, 2:32:02 PM4/26/10
to cloju...@googlegroups.com
On Mon, Apr 26, 2010 at 2:26 PM, Meikel Brandmeyer <m...@kotka.de> wrote:
> Hi,
>
> On Mon, Apr 26, 2010 at 09:56:58AM -0400, Rich Hickey wrote:
>
>> You need to handle:
>>
>> (reductions + nil)
>>
>> => (0)
>
> Huh? How could this possible work? How can reductions know the first
> step for an arbitrary function f (not +) if there is none provided?

By asking + itself:

user=> (+)
0

...just like the current reductions does:

user=> (reductions + nil)
(0)

user=> (reductions * nil)
(1)

--Chouser
http://joyofclojure.com/

Meikel Brandmeyer

unread,
Apr 26, 2010, 2:33:33 PM4/26/10
to cloju...@googlegroups.com
Hi again,

ok, got it. The reduced function has also to accept zero arguments if
the collection is empty. This is new to me actually... Did this change
somehow?

Here an update. If it is ok, I will attach a patch to the Assembla
ticket for c.c.seq.

(defn reductions
([f coll]
(lazy-seq
(if-let [s (seq coll)]
(reductions f (first s) (rest s))
(f))))
([f accum coll]
(cons accum
(lazy-seq
(when-let [s (seq coll)]
(reductions f (f accum (first s)) (rest s)))))))

Sincerely
Meikel

Meikel Brandmeyer

unread,
Apr 26, 2010, 2:59:33 PM4/26/10
to cloju...@googlegroups.com
Hi,

On Mon, Apr 26, 2010 at 08:33:33PM +0200, Meikel Brandmeyer wrote:

> ok, got it. The reduced function has also to accept zero arguments if
> the collection is empty. This is new to me actually... Did this change
> somehow?

To answer my question: no, it did not change. It was like this from the
beginning. Hmmmm... I've read the docstring of reduce many times and
always skipped the zero arg part....

Rich Hickey

unread,
Apr 27, 2010, 11:27:12 AM4/27/10
to cloju...@googlegroups.com

On Apr 26, 2010, at 2:33 PM, Meikel Brandmeyer wrote:

> Hi again,
>
> ok, got it. The reduced function has also to accept zero arguments if
> the collection is empty. This is new to me actually... Did this change
> somehow?
>
> Here an update. If it is ok, I will attach a patch to the Assembla
> ticket for c.c.seq.
>

Yes, please.

Rich

Perry Trolard

unread,
Apr 28, 2010, 5:52:56 PM4/28/10
to Clojure Dev
I notice that indexed didn't make it into core -- intentional or an
oversight?

Perry

Stuart Halloway

unread,
Apr 28, 2010, 5:54:00 PM4/28/10
to cloju...@googlegroups.com
Intentional, see map-indexed and keep-indexed.

Stu

Mark Engelberg

unread,
Apr 28, 2010, 9:14:38 PM4/28/10
to clojure-dev
What is a sample use case for keep-indexed? I can't offhand think of
anything it would be useful for.

--Mark

Rich Hickey

unread,
Apr 28, 2010, 11:12:55 PM4/28/10
to cloju...@googlegroups.com

On Apr 28, 2010, at 9:14 PM, Mark Engelberg wrote:

> What is a sample use case for keep-indexed? I can't offhand think of
> anything it would be useful for.
>

Well, it can be used to trivially implement positions, as map-indexed
can be to implement indexed. positions and indexed were not included
in favor of these two more general functions.

Rich

Mark Engelberg

unread,
Apr 29, 2010, 1:18:42 AM4/29/10
to clojure-dev
On Wed, Apr 28, 2010 at 2:54 PM, Stuart Halloway
<stuart....@gmail.com> wrote:
> Intentional, see map-indexed and keep-indexed.
>
> Stu

Hmmm, I can see the appeal of a "give them the general so they can
make the specific" philosophy, but there's also something to be said
for "give them easy-to-understand building blocks so they can build
what they need". In this case, it seems to be about the same
complexity to derive the general from the specific (combining building
blocks), as to derive the specific from the general.

Unless I'm misunderstanding these new functions, it seems like if you
have indexed, then these functions, as well as many others, can be
easily expressed with Clojure's other building blocks, for example:
(map-indexed f coll) <=> (for [[index element] (indexed coll)] (f
index element))
Many times, you're writing f as an anonymous function anyway, and can
just as easily make it work on pairs, in which case
(map-indexed f coll) can just be written as (map f-written-for-pairs
(indexed coll))

Just one man's opinion, but I definitely prefer indexed as the
building block, and the others as the derived forms, rather than vice
versa. I find that I just as frequently want to use indexed sequences
in doseq just as much as in for, and sometimes with functions that
work as pairs as well as ones that work on the index and elements
individually. I can express all these permutations fairly easily with
indexed. On the other hand, if map-indexed is the primitive, then you
end up writing something like this to print an indexed sequence s,
which feels convoluted to me:
(doseq [[index element] (map-indexed vector s)] (printf "%d - %d\n"
index element))

As an analogy, I think it's interesting to contrast Clojure's range
over Scheme's build-list.
Scheme provides a general list-building function that's analogous to
mapping over a range.
(build-list n f) in Scheme is equivalent to Clojure's (map f (range n))
In Scheme, range is not built-in. Despite the fact that it is the
common case, you end up constantly writing things like (build-list n
(lambda (i) i)) to express (range n). It gets annoying fast, and
newcomers to the language complain about needing to use higher-order
functions just to express something as simple as a range.

I think Clojure made the right call by including range rather than
build-list in the core. It's the common case, and you can easily do
(map f (range n)) when you want to apply a function to everything in
the range. You have a great building block to work with.

I feel that map-indexed is almost exactly analogous to Scheme's
build-list, and indexed is analogous to range. Looking at it in that
way, including indexed feels far more Clojurian to me than the current
plan to use map-indexed.


I think I see the utility of keep, although in my own code, I find
that just as frequently I want to strip out false values, or nil
values, or both. I can express all these easily enough with filter:
(filter (complement false?) coll)
(filter (complement nil?) coll)
(filter identity coll)

It's not really clear to me why stripping out nil is so much more
useful than the other variations that people want it in the core, so
I'd be curious to know what the reasoning was.

Chas Emerick

unread,
Apr 29, 2010, 7:32:40 AM4/29/10
to cloju...@googlegroups.com

On Apr 29, 2010, at 1:18 AM, Mark Engelberg wrote:

>> Intentional, see map-indexed and keep-indexed.
>>
>> Stu
>
> Hmmm, I can see the appeal of a "give them the general so they can
> make the specific" philosophy, but there's also something to be said
> for "give them easy-to-understand building blocks so they can build
> what they need". In this case, it seems to be about the same
> complexity to derive the general from the specific (combining building
> blocks), as to derive the specific from the general.

> Just one man's opinion, but I definitely prefer indexed as the
> building block, and the others as the derived forms, rather than vice
> versa.

After playing with map-indexed for a bit yesterday, I'm inclined to
agree. indexed and map-indexed are essentially isomorphic AFAICT, but
the latter is disadvantaged insofar as everyone has been using indexed
for years, and that a very common case just got a little more
complicated (e.g. (indexed foo) => (map-indexed vec foo)).

Has there been some frustration somewhere about having to pull apart
indexed's vectors?

- Chas

Rich Hickey

unread,
Apr 29, 2010, 7:57:41 AM4/29/10
to cloju...@googlegroups.com

On Apr 29, 2010, at 1:18 AM, Mark Engelberg wrote:

You are entitled to your opinion, and are under no obligation to use
the new functions. Personally, I cringe every time I build a vector
only to immediately destructure it.

You can easily build indexed out of map-indexed. Also, having map-
indexed/keep-indexed don't preclude having indexed. But they are
better building blocks. E.g. positions written in terms of keep-
indexed is 5x faster than the one from contrib built on indexed.

Rich

Mark Engelberg

unread,
Apr 29, 2010, 7:48:49 PM4/29/10
to clojure-dev
On Thu, Apr 29, 2010 at 4:57 AM, Rich Hickey <richh...@gmail.com> wrote:
> E.g. positions written in terms of keep-indexed is 5x
> faster than the one from contrib built on indexed.

Any speculation where the major speed improvements are coming from?
Is it because destructuring a vector is a slow operation (and if so,
are there ways to speed it up?), is it because keep-indexed has
special code to handle chunked sequences?

Timothy Pratley

unread,
Apr 29, 2010, 8:15:09 PM4/29/10
to cloju...@googlegroups.com
Hi Meikel,


Please close or delete my previous ticket about this if you create a new one
https://www.assembla.com/spaces/clojure-contrib/tickets/56-faster-version-of-reductions
[or feel free to recycle it]


group-by and reductions are the two functions I use most from seq-utils so I would be glad if they make the cut.


Regards,
Tim.

Chas Emerick

unread,
Apr 30, 2010, 6:20:40 AM4/30/10
to cloju...@googlegroups.com

On Apr 29, 2010, at 7:57 AM, Rich Hickey wrote:

> You can easily build indexed out of map-indexed. Also, having map-
> indexed/keep-indexed don't preclude having indexed. But they are
> better building blocks. E.g. positions written in terms of keep-
> indexed is 5x faster than the one from contrib built on indexed.

Well, that's more than enough justification for me; I'm a sucker for
perf improvements.

- Chas

Rich Hickey

unread,
Apr 30, 2010, 6:16:41 PM4/30/10
to cloju...@googlegroups.com

On Apr 29, 2010, at 7:48 PM, Mark Engelberg wrote:

> On Thu, Apr 29, 2010 at 4:57 AM, Rich Hickey <richh...@gmail.com>
> wrote:
>> E.g. positions written in terms of keep-indexed is 5x
>> faster than the one from contrib built on indexed.
>
> Any speculation where the major speed improvements are coming from?
> Is it because destructuring a vector is a slow operation (and if so,
> are there ways to speed it up?), is it because keep-indexed has
> special code to handle chunked sequences?
>

No speculation required. All else being equal, indexed has to allocate
the kv pair at each step.

Rich
Reply all
Reply to author
Forward
0 new messages