ANN: core.match 0.2.0-alpha5

131 views
Skip to first unread message

David Nolen

unread,
Oct 9, 2011, 3:31:43 PM10/9/11
to clojure
I've removed some fairly big bugs in the algorithm. This will probably the be the last alpha release before I cut a beta. Would love to hear any and all feedback.

In particular if people have strong opinions about the remaining issues, let me know now - http://dev.clojure.org/jira/browse/MATCH

Two things not listed there that I'm considering:

- clojure.core.match instead of clojure.core.match.core
- return nil instead of throwing if no match found to mirror the behavior of cond

David

Daniel Pittman

unread,
Oct 9, 2011, 4:11:08 PM10/9/11
to clo...@googlegroups.com
On Sun, Oct 9, 2011 at 12:31, David Nolen <dnolen...@gmail.com> wrote:

> I've removed some fairly big bugs in the algorithm. This will probably the
> be the last alpha release before I cut a beta. Would love to hear any and
> all feedback.
> In particular if people have strong opinions about the remaining issues, let
> me know now - http://dev.clojure.org/jira/browse/MATCH
> Two things not listed there that I'm considering:
> - clojure.core.match instead of clojure.core.match.core

That would have been a lot less surprising to me when I started
working with the library; it feels "right" that the main library is
named `match`, and extensions `match.whatever`, under the core
namespace.

> - return nil instead of throwing if no match found to mirror the behavior of cond

Given `:else (throw ...)` I would entirely support that from my use of
the library.

Thanks,
Daniel
--
♲ Made with 100 percent post-consumer electrons

Alan Malloy

unread,
Oct 9, 2011, 7:17:53 PM10/9/11
to Clojure
On Oct 9, 12:31 pm, David Nolen <dnolen.li...@gmail.com> wrote:
> I've removed some fairly big bugs in the algorithm. This will probably the
> be the last alpha release before I cut a beta. Would love to hear any and
> all feedback.
> - clojure.core.match instead of clojure.core.match.core

Those both seem weird to me, unless the plan is to make it part of
clojure.core eventually. clojure.match is the namespace I would
"expect" to find it under, with clojure.match.core a reasonable second
choice. I assume you have a reason, though; would you mind explaining
it?

Ambrose Bonnaire-Sergeant

unread,
Oct 9, 2011, 11:19:37 PM10/9/11
to clo...@googlegroups.com
On Mon, Oct 10, 2011 at 3:31 AM, David Nolen <dnolen...@gmail.com> wrote:
- return nil instead of throwing if no match found to mirror the behavior of cond


I don't like this.

Why are we emulating cond? clojure.core/case, for example, seems closer to what `match` provides,
and that throws an IllegalArgumentException if there is no match.

Seems arbitrary to me. 

Is there more to it?

Thanks,
Ambrose

David Nolen

unread,
Oct 10, 2011, 2:41:33 AM10/10/11
to clo...@googlegroups.com
I'm just following what seemed to be a convention - https://github.com/clojure/core.unify/blob/master/src/main/clojure/clojure/core/unify.clj


--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Ambrose Bonnaire-Sergeant

unread,
Oct 10, 2011, 3:46:21 AM10/10/11
to clo...@googlegroups.com
I've come up with some better reasons to return nil.

- smaller generated code size
- cleaner implementation

of which the latter is the most significant.


While we're on the topic of conventions, I think the most important convention match is breaking
is using the destructuring syntax to mean something less generic by default (only vectors).

(match [(list 1 2 3)]  
           [[x & xs]] 1)   ;; <- falls through because [x & xs] only matches vectors by default

I can see this tripping people up time and time again.

"Make it correct, then make it fast"

A wonderful asset of Clojure, is that once it's correct, the fast version is never far away syntactically.
I think we should adopt this ideology and default to :seq matching.

To make it fast, just add :vector (or matchv).

Thanks,
Ambrose

David Nolen

unread,
Oct 10, 2011, 9:57:55 AM10/10/11
to clo...@googlegroups.com
On Sun, Oct 9, 2011 at 11:19 PM, Ambrose Bonnaire-Sergeant <abonnair...@gmail.com> wrote:


On Mon, Oct 10, 2011 at 3:31 AM, David Nolen <dnolen...@gmail.com> wrote:
- return nil instead of throwing if no match found to mirror the behavior of cond


I don't like this.

I'm definitely open to talking about it. Strong opinions appreciated :)
 
Why are we emulating cond? clojure.core/case, for example, seems closer to what `match` provides,
and that throws an IllegalArgumentException if there is no match.

Seems arbitrary to me. 

Is there more to it?

Thanks,
Ambrose

A good point. I'm mostly thinking about user friendliness here. I'm also OK w/ the idea of providing two versions of match - one w/ verbose error reporting that throws and perhaps the default one that doesn't

David 

David Nolen

unread,
Oct 10, 2011, 10:14:24 AM10/10/11
to clo...@googlegroups.com
On Mon, Oct 10, 2011 at 3:46 AM, Ambrose Bonnaire-Sergeant <abonnair...@gmail.com> wrote:
While we're on the topic of conventions, I think the most important convention match is breaking
is using the destructuring syntax to mean something less generic by default (only vectors).

(match [(list 1 2 3)]  
           [[x & xs]] 1)   ;; <- falls through because [x & xs] only matches vectors by default

 We can support this but I'm afraid that it will be very slow. In the presence of rest syntax vector patterns split the data structure into left and right sides. But again, perhaps this is a case of user friendliness and people don't care that much?

David

Rob Lally

unread,
Oct 10, 2011, 10:28:38 AM10/10/11
to clo...@googlegroups.com
Would supporting other data structures make it slower when using vectors, or only when using non-vector seq's?

If it makes it substantially slower across the board, personally I'd still like core.match to support all of clojure's built in data structures; but I could understand why people would have a contrary opinion.

If it only makes the non-vector seq case slower, I'd certainly make that an available option - people are going to have to manually convert other sequences into vectors anyway which creates a coding overhead and also makes the code less likely to be JITed.

On the other hand.. YMMV.

Thanks for all your hard work David, I love core.match.


R.


Ambrose Bonnaire-Sergeant

unread,
Oct 10, 2011, 10:53:05 AM10/10/11
to clo...@googlegroups.com
On Mon, Oct 10, 2011 at 10:28 PM, Rob Lally <rob....@gmail.com> wrote:
Would supporting other data structures make it slower when using vectors, or only when using non-vector seq's?

If we use :seq pattern matching, it will use first/rest. This can be very expensive. Vectors will take a performance hit.

:vector matching uses subvec, which explains its advantages (speed) and disadvantages (only supports vectors).

What I'm proposing is defaulting to :seq matching. It's very easy to "switch on" :vector matching.

Something like

(match [v]
           [[x & xs]] 1)   <- :seq


(match [v]
           [([x & xs] :vector)] 1)   <- vector

Ambrose

Ambrose Bonnaire-Sergeant

unread,
Oct 10, 2011, 10:56:00 AM10/10/11
to clo...@googlegroups.com
How about:

`match` - defaults to :seq, returns nil

`match-debug` - defaults to :seq, w/ error checking, w/ comprehensiveness check

`matchv` - defaults to :vector

Ambrose

David Nolen

unread,
Oct 10, 2011, 11:55:54 AM10/10/11
to clo...@googlegroups.com
On Mon, Oct 10, 2011 at 10:53 AM, Ambrose Bonnaire-Sergeant <abonnair...@gmail.com> wrote:


On Mon, Oct 10, 2011 at 10:28 PM, Rob Lally <rob....@gmail.com> wrote:
Would supporting other data structures make it slower when using vectors, or only when using non-vector seq's?

If we use :seq pattern matching, it will use first/rest. This can be very expensive. Vectors will take a performance hit.

:vector matching uses subvec, which explains its advantages (speed) and disadvantages (only supports vectors).

Well vector matching actually uses whatever form is most efficient for the "idea" of subvec. For primitive arrays, bytes it's done w/ offsets.
 
What I'm proposing is defaulting to :seq matching. It's very easy to "switch on" :vector matching.

Something like

(match [v]
           [[x & xs]] 1)   <- :seq


(match [v]
           [([x & xs] :vector)] 1)   <- vector

Ambrose

If we're going to go down this route, it's probably best to actually do what Clojure does with destructuring - use nth. So instead of using subvec, vector matching would also use offsets.

David 

David Nolen

unread,
Oct 10, 2011, 11:59:39 AM10/10/11
to clo...@googlegroups.com
On Mon, Oct 10, 2011 at 10:28 AM, Rob Lally <rob....@gmail.com> wrote:
If it only makes the non-vector seq case slower, I'd certainly make that an available option - people are going to have to manually convert other sequences into vectors anyway which creates a coding overhead and also makes the code less likely to be JITed.

On the other hand.. YMMV.

Choosing to support seqs doesn't really make anything else slower. It's about managing expectations - people will use whatever's most convenient. If it performs "badly" then maybe that will be a unpleasant surprise. But then again, maybe it's fast enough that most people don't really care. That people use and enjoy destructuring on seqs is probably a good argument that most people don't care.

David 

David Nolen

unread,
Oct 10, 2011, 12:00:56 PM10/10/11
to clo...@googlegroups.com
I think we can just have vector patterns support seqs via nth.

I'm OK w/ match-debug.

David 

Daniel Pittman

unread,
Oct 10, 2011, 12:08:42 PM10/10/11
to clo...@googlegroups.com
On Mon, Oct 10, 2011 at 06:57, David Nolen <dnolen...@gmail.com> wrote:
> On Sun, Oct 9, 2011 at 11:19 PM, Ambrose Bonnaire-Sergeant <abonnair...@gmail.com> wrote:
>> On Mon, Oct 10, 2011 at 3:31 AM, David Nolen <dnolen...@gmail.com> wrote:
>>>
>>> - return nil instead of throwing if no match found to mirror the behavior
>>> of cond
>>
>> I don't like this.
>
> I'm definitely open to talking about it. Strong opinions appreciated :)
>
>> Why are we emulating cond? clojure.core/case, for example, seems closer to
>> what `match` provides,
>> and that throws an IllegalArgumentException if there is no match.
>> Seems arbitrary to me.
>> Is there more to it?
>
> A good point. I'm mostly thinking about user friendliness here. I'm also OK
> w/ the idea of providing two versions of match - one w/ verbose error
> reporting that throws and perhaps the default one that doesn't

So, I have one other argument in favour of "just return nil", which I prefer:

If you just return `nil`, I can use :else to throw fairly cheaply, and
quite visibly.

If you throw then I have to wrap any non-exhaustive pattern in a
try/catch block to avoid the exception propagating up the stack.

David Nolen

unread,
Oct 10, 2011, 12:20:13 PM10/10/11
to clo...@googlegroups.com
We'll only throw w/ match-debug.

David 

Steve Miner

unread,
Oct 10, 2011, 2:43:42 PM10/10/11
to clo...@googlegroups.com
I've just been playing around a bit with match so please forgive me if I've missed some prior discussions regarding issues that are considered settled.

One of my first attempts was to match a vector of two of the same thing using a pattern like [a a]. I naively thought that would imply an equality constraint between the two items, but in fact each variable matched anything independently. I now understand that the pattern variables are considered in different scopes, but I think it's confusing. It's reasonable that you want something faster than unification for simple literal patterns, but this to my mind is a special case that's likely to trip people up.

My work-around is to use something like this:

(match [x y]
[a (b :when #(= % x))] :match
:else :no-match)

But that doesn't look very nice, and it gets worse with multiple pattern variables. Still, it doesn't look too hard to make that sort of conversion automatically so maybe I'll try to write a macro. (Famous last words. :-)

In any case, if using multiple pattern variables of the same name does not imply an equality constraint, I suggest that it be considered an error to reuse a pattern variable. It's better to throw than to yield an unexpected match.

Regarding OR patterns, I didn't really like the infix notation, (1 | 2). As a lisper, I'd prefer to use something like (or 1 2), or maybe even a Clojure set notation: #{1 2}. I'm guessing you already thought about this and made your decision on syntax, but I thought I'd throw it out there.

For guards, I wonder if the extra parens are really necessary. Couldn't the :when bind tightly to the previous pattern variable? Like Clojure FOR comprehensions. I think it would be easier to read that way. Same comment applies to :as. To cover the rare case of matching a literal :when or :as, you could quote it or use it as the first item in an OR pattern.

As others have suggested, I agree that returning nil when nothing matches makes sense. That was my original expectation.

It was common when testing to wrap a let around the match so I made a little macro to save a few characters. Free for anyone who wants it. :-)

(defmacro match-let [bindings & body]
(let [bindvars# (take-nth 2 bindings)]
`(let ~bindings
(match [~@bindvars#]
~@body))))


Steve Miner
steve...@gmail.com

David Nolen

unread,
Oct 10, 2011, 3:11:44 PM10/10/11
to clo...@googlegroups.com
On Mon, Oct 10, 2011 at 2:43 PM, Steve Miner <steve...@gmail.com> wrote:
I've just been playing around a bit with match so please forgive me if I've missed some prior discussions regarding issues that are considered settled.

One of my first attempts was to match a vector of two of the same thing using a pattern like [a a].  I naively thought that would imply an equality constraint between the two items, but in fact each variable matched anything independently. I now understand that the pattern variables are considered in different scopes, but I think it's confusing.  It's reasonable that you want something faster than unification for simple literal patterns, but this to my mind is a special case that's likely to trip people up.

I would accept a patch that would make equality constraints work. We would have to track "named wildcards" and if they are reused, impose the constraint.
 
In any case, if using multiple pattern variables of the same name does not imply an equality constraint, I suggest that it be considered an error to reuse a pattern variable.  It's better to throw than to yield an unexpected match.

In the meantime throwing an error if names are reused in a pattern row is a good idea. http://dev.clojure.org/jira/browse/MATCH-30
 
Regarding OR patterns, I didn't really like the infix notation, (1 | 2).  As a lisper, I'd prefer to use something like (or 1 2), or maybe even a Clojure set notation: #{1 2}. I'm guessing you already thought about this and made your decision on syntax, but I thought I'd throw it out there.

Set notation doesn't communicate that order is important. (or 1 2) overloads the meaning of the or construct. I'm not sold on the infix notation either. Looking for more feedback / ideas on this.
 
For guards, I wonder if the extra parens are really necessary.  Couldn't the :when bind tightly to the previous pattern variable?  Like Clojure FOR comprehensions.  I think it would be easier to read that way.  Same comment applies to :as.  To cover the rare case of matching a literal :when or :as, you could quote it or use it as the first item in an OR pattern.

It would be nice to lose the extra parens - patch welcome :)
 
As others have suggested, I agree that returning nil when nothing matches makes sense.  That was my original expectation.

It was common when testing to wrap a let around the match so I made a little macro to save a few characters.  Free for anyone who wants it.  :-)

(defmacro match-let [bindings & body]
 (let [bindvars# (take-nth 2 bindings)]
   `(let ~bindings
      (match [~@bindvars#]
               ~@body))))


Steve Miner
steve...@gmail.com

match-let looks good. I see that you are Clojure contributor - I'm more than happy to include this.

Steve, this is great feedback - much appreciated.

David

Steve Miner

unread,
Oct 10, 2011, 3:42:09 PM10/10/11
to clo...@googlegroups.com
> match-let looks good. I see that you are Clojure contributor - I'm more than happy to include this.

Yes, I'm a registered contributor. It's all yours.

I'll take a look at the code and see if I can fix things for myself regarding the implied equality constraints and guard clauses.

By the way, there is a recurring typo in the README and the code: "occurance" should be "occurrence".

Steve Miner
steve...@gmail.com

David Nolen

unread,
Oct 10, 2011, 3:51:27 PM10/10/11
to clo...@googlegroups.com
On Mon, Oct 10, 2011 at 3:42 PM, Steve Miner <steve...@gmail.com> wrote:
> match-let looks good. I see that you are Clojure contributor - I'm more than happy to include this.

Yes, I'm a registered contributor.  It's all yours.

I'll take a look at the code and see if I can fix things for myself regarding the implied equality constraints and guard clauses.

The internals of core.match could use quite a bit of "spring cleaning". Feel free to open tickets on obvious improvements.
 
By the way, there is a recurring typo in the README and the code: "occurance" should be "occurrence".

Steve Miner
steve...@gmail.com

Heh, will fix, thanks.

David

Stephen Wrobleski

unread,
Oct 10, 2011, 6:07:55 PM10/10/11
to clo...@googlegroups.com
On Mon, Oct 10, 2011 at 09:08:42AM -0700, Daniel Pittman wrote:
> So, I have one other argument in favour of "just return nil", which I
> prefer:
>
> If you just return il, I can use :else to throw fairly cheaply, and

> quite visibly.
>
> If you throw then I have to wrap any non-exhaustive pattern in a
> try/catch block to avoid the exception propagating up the stack.
>
> Daniel


An :else clause *makes* for an exhaustive pattern; :else nil will never
throw an exception, and shows the intention to expliclty handle unknown
values by returning nil.

I think a match-debug is barking up the wrong tree. If throwing an exception
is the right thing to do to track down an unaccounted case, why make a
different macro just for a slightly different default behavior that is
easily specifiable by :else. In fact, the match macro can simply implement
default behavior by "adding an :else clause if there is none" (thus removing
concerns about 'code size').

I would definitely consider it a plus for the stock vector form to work on
any seqable, especially when using match to write macros. Right now I've
been doing (match-1 (into [] ...) ...) just to make the clauses look nicer.
If the type of the seqable is important to differentiate or for performance,
then there's more complex :seq and :vector.

Also, what's the point of having a specific match and match-1? I presume
it's to avoid creating an intermediate vector. Why not make match-1 the
default, and if the expression is a literal vector, fall back to current
match?

Steve

David Nolen

unread,
Oct 10, 2011, 9:24:09 PM10/10/11
to clo...@googlegroups.com
On Mon, Oct 10, 2011 at 6:07 PM, Stephen Wrobleski <st...@localtoast.org> wrote:
I think a match-debug is barking up the wrong tree. If throwing an exception
is the right thing to do to track down an unaccounted case, why make a
different macro just for a slightly different default behavior that is
easily specifiable by :else.  In fact, the match macro can simply implement
default behavior by "adding an :else clause if there is none" (thus removing
concerns about 'code size').

I'm not following. Getting accurate information about what failed to match needs to be integrated. Given that match makes no restrictions on types there's not much we can do except communicate where we were when the match failed via an exception.
 
Also, what's the point of having a specific match and match-1? I presume
it's to avoid creating an intermediate vector.  Why not make match-1 the
default, and if the expression is a literal vector, fall back to current
match?

Steve

Ambrose had some arguments for keeping match and match-1 separate. At the moment I don't see any real issues except that overloading match to handle two different cases seems like we're making things slightly more complex.

David

Stephen Wrobleski

unread,
Oct 11, 2011, 4:25:36 AM10/11/11
to clo...@googlegroups.com
On Mon, Oct 10, 2011 at 09:24:09PM -0400, David Nolen wrote:
> I'm not following. Getting accurate information about what failed to match
> needs to be integrated. Given that match makes no restrictions on types
> there's not much we can do except communicate where we were when the match
> failed via an exception.

Sorry, I was unclear. I was trying to say that having two versions,
match-debug that throws exceptions, and match-normal that returns nil, is
redundant - default behavior is easily set with an :else clause. Changing
the behavior for something marked '-debug' seems like a bad idea as well.

I personally think throwing exceptions is the Right Thing. When adding
clauses incrementally, I'd rather get a direct error message for unexpected
values. Exhaustive clauses are a good thing, and :else nil is a simple
idiom for the 'other behavior'.


> Ambrose had some arguments for keeping match and match-1 separate. At the
> moment I don't see any real issues except that overloading match to handle
> two different cases seems like we're making things slightly more complex.

(Where are these arguments? I thought I remember a wiki with design notes,
but can't seem to find it). I don't see how it's two different cases.
Having match take a vector of expressions with 'pattern rows' seems like an
unnecessarily visible optimization.


Steve

Ambrose Bonnaire-Sergeant

unread,
Oct 11, 2011, 8:46:51 AM10/11/11
to clo...@googlegroups.com
On Tue, Oct 11, 2011 at 4:25 PM, Stephen Wrobleski <st...@localtoast.org> wrote:
On Mon, Oct 10, 2011 at 09:24:09PM -0400, David Nolen wrote:
> I'm not following. Getting accurate information about what failed to match
> needs to be integrated. Given that match makes no restrictions on types
> there's not much we can do except communicate where we were when the match
> failed via an exception.

Sorry, I was unclear. I was trying to say that having two versions,
match-debug that throws exceptions, and match-normal that returns nil, is
redundant - default behavior is easily set with an :else clause. Changing
the behavior for something marked '-debug' seems like a bad idea as well.

I personally think throwing exceptions is the Right Thing. When adding
clauses incrementally, I'd rather get a direct error message for unexpected
values. Exhaustive clauses are a good thing, and :else nil is a simple
idiom for the 'other behavior'.

It's starting to look easier and less complex to stick with the current behavior. I'll +1 this.
 
> Ambrose had some arguments for keeping match and match-1 separate. At the
> moment I don't see any real issues except that overloading match to handle
> two different cases seems like we're making things slightly more complex.

(Where are these arguments? I thought I remember a wiki with design notes,
but can't seem to find it). I don't see how it's two different cases.
Having match take a vector of expressions with 'pattern rows' seems like an
unnecessarily visible optimization.



My concerns were we should tread carefully to avoid unexpected behavior.

Say we unify the behaviors of match-1 and match.

(match "asdf"  ;; <- looks like a "match-1"
           "no" 1
           "asdf" 2)

(match ["asdf"]   ;; <- looks like a "match"
           ["no"] 1
           ["asdf"] 2)

The second example _looks_ like it's matching a _vector_, but it's actually matching
a pattern row. There is an important difference when using guards, for example.

(match ["asdf"]
           (["asdf"] :when (fn [a] ..))  1)   ;; <- illegal, cannot wrap pattern row in list

If we can resolve this, it would be great!

Thanks,
Ambrose

Stephen Wrobleski

unread,
Oct 11, 2011, 5:09:38 PM10/11/11
to clo...@googlegroups.com
On Tue, Oct 11, 2011 at 08:46:51PM +0800, Ambrose Bonnaire-Sergeant wrote:
> (match ["asdf"]
> (["asdf"] :when (fn [a] ..)) 1) ;; <- illegal, cannot wrap
> pattern row in list
>
> If we can resolve this, it would be great!
>
> Thanks,
> Ambrose
>

I see a couple of options..

1. Scan for non vector pattern rows (and &rest), and if they exist, wrap
everything in vectors (seems needlessly complex)

2. Change the implementation to remove the specialness of 'pattern rows'.
Consider a literal vector with vector patterns as a case to optimize (seems
like a lot of work)

3. match + match-1

The weight of the implementation details hadn't occurred to me. When first
using match, I got tripped up by it expecting a vector. However, I've been
happily using match-1 since then. Given that a unification of the two in the
future would be backwards compatible, perhaps it's not even worth worrying
about.

Steve

Reply all
Reply to author
Forward
0 new messages