[Python-ideas] Pattern matching

86 views
Skip to first unread message

Szymon Pyżalski

unread,
Apr 7, 2015, 11:42:35 AM4/7/15
to python...@python.org
Hello!

I was trying to google if there were some attempts to add pattern
matching feature (as seen in e.g. haskell) to Python. I couldn't however
find anything. Were there any proposals for a feature like this?

Greetings
zefciu
_______________________________________________
Python-ideas mailing list
Python...@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Todd

unread,
Apr 7, 2015, 11:45:33 AM4/7/15
to python-ideas
On Tue, Apr 7, 2015 at 4:54 PM, Szymon Pyżalski <szy...@pythonista.net> wrote:
Hello!

I was trying to google if there were some attempts to add pattern
matching feature (as seen in e.g. haskell) to Python. I couldn't however
find anything. Were there any proposals for a feature like this?


Not everyone is familiar with Haskell.  Can you provide an example of how this would work (in pseudocode, perhaps)?

Paul Moore

unread,
Apr 7, 2015, 11:46:30 AM4/7/15
to Szymon Pyżalski, Python-Ideas
On 7 April 2015 at 15:54, Szymon Pyżalski <szy...@pythonista.net> wrote:
> I was trying to google if there were some attempts to add pattern
> matching feature (as seen in e.g. haskell) to Python. I couldn't however
> find anything. Were there any proposals for a feature like this?

From a quick look:

https://pypi.python.org/pypi/patterns
https://pypi.python.org/pypi/py-pattern-matching

I'm not sure if either of these is the sort of thing you're looking for.

Paul

Guido van Rossum

unread,
Apr 7, 2015, 11:53:26 AM4/7/15
to Paul Moore, Python-Ideas, Szymon Pyżalski

Quick idea: this might integrate well with PEP 484, or with the (still speculative) multiple dispatch that Lukas Lang wants to write.

Szymon Pyżalski

unread,
Apr 7, 2015, 3:25:45 PM4/7/15
to Python-Ideas
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 07.04.2015 17:52, Guido van Rossum wrote:
>
> On Apr 7, 2015 8:46 AM, "Paul Moore" <p.f....@gmail.com

> <mailto:p.f....@gmail.com>> wrote:
>
> On 7 April 2015 at 15:54, Szymon Pyżalski <szy...@pythonista.net

> <mailto:szy...@pythonista.net>> wrote:
>> I was trying to google if there were some attempts to add
>> pattern matching feature (as seen in e.g. haskell) to Python. I
>> couldn't
> however
>> find anything. Were there any proposals for a feature like this?
>
> From a quick look:
>
> https://pypi.python.org/pypi/patterns
> https://pypi.python.org/pypi/py-pattern-matching
>

Thanks these are nice but I think they try to mimic languages based on
algebraic data types too much. I was thinking about something like this:

Magic method for patterns
- -----------------------------

Any object can be a pattern. If an object has a special method (called
``__pattern__`` or ``__match__``) defined then this method will govern
how this object behaves as pattern. If not - pattern matching works
the same as ``__eq__``

The ``__pattern__`` method can return ``False`` for no match. For a
match it can return ``True`` (simplest patterns) a sequence or a mapping
.

Syntax for pattern matching
- -------------------------------

The syntax could look something like this::

for object:
pattern1: statement1
pattern2:
statement2
statement3
pattern3 as var1, var2:
statement4
statement5

For the simplest cases this works simply by calling appropriate
statement block. If the ``__pattern__`` method returns a mapping, then
the values from it will be merged into the local namespace. If it
returns a sequence and there is the ``as`` clause, the values will be
assigned to variables specified.

> Quick idea: this might integrate well with PEP 484, or with the
> (still speculative) multiple dispatch that Lukas Lang wants to
> write.

Function dispatch
- -----------------------[

Besides the above syntax the patterns could be also used to register
functions for dispatching. I dont' know how precisely it would
integrate with PEP 484, but I agree that any valid type hint should
also be a valid pattern.

Existing objects as patterns
- ----------------------------------

* Types should match their instances
* Tuples should match sequences that have the same length and whose
elements match against the elements of a tuple (subpatterns)
* Dictionaries should match mappings that contain all the keys in the
pattern and the values match the subpatterns under these keys
* Regular expressions should match strings. For named groups they
should populate the local namespace.
* The ``Ellipsis`` object can be used as a match-all pattern.


Example
- -----------

The code below would work if the ``point`` variable is specified as:

* a tuple (x, y)
* a mapping {'x': x, 'y': y}
* a string "{x}-{y}".format(x, y)

::

def print_point(x, y):
print('x = {}'.format(x))
print('y = {}'.format(y))

for point:
(Number, Number) as x, y:
print_point(x, y)
{'x': Number, 'y': Number}:
print_point(x, y)
re.compile('(?P<x>[0-9]+)-(?P<y>[0-9])+'):
# Or re.compile('([0-9]+)-([0-9])+') as x, y:
print_point(int(x), int(y))
...:
raise TypeError('Expected something else')


Greetings
zefciu
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQEcBAEBAgAGBQJVJC8YAAoJEBoIz6vZoBBhm04H/i5dJAMTloRxqSXXyY7QU8kx
0gBPjCoCsw6XZSu1nlKP0/yY+tO5uXjQEK5s1vid8gjuEBnrZCtwjfCYiiqSs96P
7iT9kSnyx/0jr5FvAC1ZItb3KN7G+MBxMfCHENM8yMz7Yxdw2F5ziT7zFE5aOELW
y3Mz37+GytmMUT/DXA4wIHBFJgALspWgoU2P/ilzg4oCtbQREmX5pcMnFIsMavPI
oIu7KJ0ZW2hJ4K2a0h8HWCpZ9aYwcCTF51HMkTAXo6leLSm3XG36jKiCEFJiZpxe
sLH8e/2rF93tBUVunJfyEQG/bOnlkoTX2rONavDZCqSqt8t3ehT170TtHJL10Os=
=iO75
-----END PGP SIGNATURE-----

Chris Angelico

unread,
Apr 7, 2015, 3:52:45 PM4/7/15
to Python-Ideas
On Wed, Apr 8, 2015 at 5:25 AM, Szymon Pyżalski <szy...@pythonista.net> wrote:
> Syntax for pattern matching
> - -------------------------------
>
> The syntax could look something like this::
>
> for object:
> pattern1: statement1
> pattern2:
> statement2
> statement3
> pattern3 as var1, var2:
> statement4
> statement5
>
> For the simplest cases this works simply by calling appropriate
> statement block. If the ``__pattern__`` method returns a mapping, then
> the values from it will be merged into the local namespace. If it
> returns a sequence and there is the ``as`` clause, the values will be
> assigned to variables specified.

So, tell me if I'm understanding you correctly: The advantage over a
basic if/elif/else tree is that it can assign stuff as well as giving
a true/false result? Because if all you want is the true/false,
there's not a lot of advantage over what currently exists.

The sequence and "as" clause syntax is reasonable, but I do not like
the idea that a mapping's keys would quietly become local name
bindings. It'd be like "from ... import *" inside a function -
suddenly _any_ name could become local, without any way for the
compiler to know. Also - although this is really just bikeshedding -
not a fan of the use of 'for' here. I'm imagining code something like
this:

for value in gen(): # this one iterates
for value: # this one doesn't
int: yield ("(%d)" if value<0 else "%d") % value
str: yield repr(value)
datetime.datetime: yield value.strftime(timefmt)
...: yield str(value)

Come to think of it, not a fan of ellipsis here either; "else" seems
better. But that's minor bikeshedding too.

The key is to show how this is materially better than an if/elif tree
*and* better than a dictionary lookup. (Obviously isinstance checks
are superior to dict lookups based on type(value), but I don't know
how often people actually write code like the above.)

> Besides the above syntax the patterns could be also used to register
> functions for dispatching. I dont' know how precisely it would
> integrate with PEP 484, but I agree that any valid type hint should
> also be a valid pattern.
>
> Existing objects as patterns
> - ----------------------------------
>
> * Types should match their instances
> * Tuples should match sequences that have the same length and whose
> elements match against the elements of a tuple (subpatterns)
> * Dictionaries should match mappings that contain all the keys in the
> pattern and the values match the subpatterns under these keys
> * Regular expressions should match strings. For named groups they
> should populate the local namespace.
> * The ``Ellipsis`` object can be used as a match-all pattern.

This is where the type hints could save you a lot of trouble. They
already define a sloppy form of isinstance checking, so you don't need
to do all the work of defining tuples, dicts, etc - all you have to
say is "typing.py type hint objects match any object which they would
accept", and let the details be handled elsewhere. That'd get you
Union types and such, as well as what you're talking about above.

> for point:
> (Number, Number) as x, y:
> print_point(x, y)
> {'x': Number, 'y': Number}:
> print_point(x, y)
> re.compile('(?P<x>[0-9]+)-(?P<y>[0-9])+'):
> # Or re.compile('([0-9]+)-([0-9])+') as x, y:
> print_point(int(x), int(y))
> ...:
> raise TypeError('Expected something else')

I'm not sure how to rescue the "regex with named groups" concept, but
this code really doesn't look good. Using "as" wouldn't work reliably,
since dicts are iterable (without a useful order). Something along the
lines of:

x, y from re.compile('(?P<x>[0-9]+)-(?P<y>[0-9])+'):

or maybe switch the arguments (to be more like "from ... import x,
y")? But whatever it is, I'd prefer to be explicit: This has to return
a dictionary containing keys 'x' and 'y' (what if there are others?
Ignore them?), and then they will be bound to identical names.

All in all, an interesting idea, but one that's going to require a
good bit of getting-the-head-around, so it wants some strong use
cases. Got some real-world example code that would benefit from this?

ChrisA

Andrew Barnert

unread,
Apr 7, 2015, 5:14:29 PM4/7/15
to Szymon Pyżalski, python...@python.org
On Apr 7, 2015, at 07:54, Szymon Pyżalski <szy...@pythonista.net> wrote:
>
> Hello!
>
> I was trying to google if there were some attempts to add pattern
> matching feature (as seen in e.g. haskell) to Python. I couldn't however
> find anything. Were there any proposals for a feature like this?

http://stupidpythonideas.blogspot.com/2014/08/a-pattern-matching-case-statement-for.html

I wrote this up after getting frustrated with people suggesting Python should have "a case statement like every other language" where by "every other language" they meant C and Pascal and their descendants with the relatively useless case that's just an elif chain rather than ML and its descendants with the more useful pattern matching case statement. (Nick Coghlan pointed out that go has an interesting middle ground--it looks like a JavaScript switch, but with an as clause to optionally bind the result of each expression to a local variable.)

Once you've defined the case statement, you can add pattern-matching assignments and function definitions (especially if you know Haskell, where the equivalents are just syntactic sugar for using case).

The big problem that you have in Python that you don't have in ML, Haskell, etc. is that most values don't have a unique, evaluable representation that can be pattern-matched. I think the best solution to this is a __match__ protocol, but I went over some other ideas.

The smaller problem is that in Python, only functions (and classes and modules) have scope; the idea of a name bound "locally" is tricky. There's a bit of hackery around except clauses, and comprehensions are expressions that get compiled into calls to hidden functions in part to deal with this, but I'm not sure either of those is something you'd want to extend here.

Anyway, I think that my straw man proposal is a start toward the best design you'd be able to come up with (although it's not in itself the best design, probably), and it still doesn't fit into Python. But read it and tell me if you disagree with either half of that.

Andrew Barnert

unread,
Apr 7, 2015, 7:03:26 PM4/7/15
to Szymon Pyżalski, Python-Ideas
On Apr 7, 2015, at 12:25, Szymon Pyżalski <szy...@pythonista.net> wrote:
>
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
>> On 07.04.2015 17:52, Guido van Rossum wrote:
>>
>> On Apr 7, 2015 8:46 AM, "Paul Moore" <p.f....@gmail.com
>> <mailto:p.f....@gmail.com>> wrote:
>>
>> On 7 April 2015 at 15:54, Szymon Pyżalski <szy...@pythonista.net
>> <mailto:szy...@pythonista.net>> wrote:
>>> I was trying to google if there were some attempts to add
>>> pattern matching feature (as seen in e.g. haskell) to Python. I
>>> couldn't
>> however
>>> find anything. Were there any proposals for a feature like this?
>>
>> From a quick look:
>>
>> https://pypi.python.org/pypi/patterns
>> https://pypi.python.org/pypi/py-pattern-matching
> Thanks these are nice but I think they try to mimic languages based on
> algebraic data types too much. I was thinking about something like this:
>
> Magic method for patterns
> - -----------------------------
>
> Any object can be a pattern. If an object has a special method (called
> ``__pattern__`` or ``__match__``) defined then this method will govern
> how this object behaves as pattern. If not - pattern matching works
> the same as ``__eq__``

"Works the same as __eq__" either doesn't allow you to assign parts while decomposing, or turns out to have a lot of problems. In my proposal, I restricted that to only work on values that are expressions that can't be read as assignment targets, and handled assignment lists recursively, to get around those problems. Yours just punts on decomposition here and offers something different (the mapping in the next paragraph), but I don't think that covers many common cases.

> The ``__pattern__`` method can return ``False`` for no match. For a
> match it can return ``True`` (simplest patterns) a sequence or a mapping
> .

The mapping is an interesting end-run around the decomposition problem, but it seems like it throws away the type of the object. In other words, Point(x, y) and Vector(x, y) both pattern-match identically to either {'x': x, 'y': y}. In ML or Haskell, distinguishing between the two is one of the key uses of pattern matching.

And another key use is distinguishing Point(0, y), Point(x, 0) and Point(x, y), which it looks like you also can't do this way; the only way to get x bound in the case block (assuming Point, like most classes, isn't iterable and therefore can't be decomposed as a tuple) is to match a dict.

> Syntax for pattern matching
> - -------------------------------
>
> The syntax could look something like this::
>
> for object:
> pattern1: statement1
> pattern2:
> statement2
> statement3
> pattern3 as var1, var2:
> statement4
> statement5

I don't like "for" here. Avoiding new keywords is good, but reusing a keyword to mean something unrelated is often confusing. With "for object in iterable:", the "object" names the thing that will be assigned each time through the loop. Here, it looks like you're omitting the "in iterable", but really you're omitting the "object in" part, which is half of one clause and half of another. Also, in cases where the matchable object/iterable is a long and complex expression (as it often is in both pattern matching and loops), it'll be very confusing to the eye if you misinterpret which kind of for you have.

On the other hand, I like the keyword-less match clauses better than my version with "of". The syntax may be harder to describe to the grammar, but it's easier to a read for a human.

The only potential advantage I can see to the "of" clause is that it opens the door to let a single-match case be written without requiring a double indent, or even in a single line:

case obj: of pattern:
statement
statement

case obj: of pattern: statement

But I left that out of my proposal because it breaks the compound statement rule that (informally) you can't put two colons on the same line.

> For the simplest cases this works simply by calling appropriate
> statement block. If the ``__pattern__`` method returns a mapping, then
> the values from it will be merged into the local namespace. If it
> returns a sequence and there is the ``as`` clause, the values will be
> assigned to variables specified.
>
>> Quick idea: this might integrate well with PEP 484, or with the
>> (still speculative) multiple dispatch that Lukas Lang wants to
>> write.
>
> Function dispatch
> - -----------------------[
>
> Besides the above syntax the patterns could be also used to register
> functions for dispatching. I dont' know how precisely it would
> integrate with PEP 484, but I agree that any valid type hint should
> also be a valid pattern.

Consider the syntax from Matthew Rocklin's multipledispatch library on PyPI:

@dispatch(int, int)
def add(x, y): return x+y

@dispatch(object, object):
def add(x, y): return "%s + %s" % (x, y)

Lukas pointed out that you can out the types in the annotations rather than the decorator, and use MyPy syntax for them instead of having to invent a similar but not identical syntax as Matthew does.

Anyway, it should be easy to see how this maps to a single function with pattern matching and vice-versa. Especially once you look at how Haskell implements overloading--it's just syntactic sugar for a single definition with a case expression inside.

> Existing objects as patterns
> - ----------------------------------
>
> * Types should match their instances

This is an interesting idea. This allows you to do something halfway between taking any value and checking a specific value--and something that would often be useful in Python--which my proposal didn't have.

If we had a syntax for annotating values or variables (rather than just function params) with types, you could do the same thing by specifying "_: Number", which means you could also match "x: Number" to check the type and bind the value. That's something that was missing from my proposal (see the JSON schema example in the linked post on ADTs).

Without that, the only way I can think of to check the type but nothing else in my proposal is something like Breakfast(*) or something equally ugly, so that's a win for your syntax.

> * Tuples should match sequences that have the same length and whose
> elements match against the elements of a tuple (subpatterns)

My proposal just used the same rules as assignment to a target list (except that it recurses on pattern matching each element instead of assigning each element, of course). The upside is more flexibility, and the details have already been worked out and proven useful in Python assignments (and extended a couple of times over the past few decades in ways that work just as well here as they did there). The downside is that if you pattern match an iterator, each tuple decomposition case will eat elements from the iterator. But presumably anyone who understands iterators already knows that.

> * Dictionaries should match mappings that contain all the keys in the
> pattern and the values match the subpatterns under these keys

This definitely solves the problem of how to deal with objects that are usually constructed with keyword arguments. But it doesn't help for objects usually constructed with a combination of positional and keyword arguments. (Of course you can make __pattern__ return an OrderedDict; the problem is on the other end, how to write a pattern that matches arg #0, arg #1, and arg "spam" all in one pattern.)

Anyway, I still don't like the idea of neither getting nor matching the type here, or the idea of dumping all the names into locals instead of binding names that you ask for specifically, as I mentioned earlier. Also, consider how you'd map this relatively common ML/Haskell use:

case p1 of Point(x1, y1):
case p2 of Point(x2, y2):
if x1 <= x < x2 and y1 <= y < y2:

With a syntax like mine, where you specify the names to be bound, this is easy; with yours, where the pattern just dumps its names into the namespace, you have to write something like:

x0, y0 = x, y
case p1 of {'x': Any, 'y': Any}:
x1, y1 = x, y
case p2 of {'x': Any, 'y': Any}:
x2, y2 = x, y
if x1 <= x0 < x2 and y1 <= y0 < y2

> * Regular expressions should match strings. For named groups they
> should populate the local namespace.

Good idea. Anyone coming from another modern scripting language would expect that. Then again, they expect regexp support in a whole lot of places Python doesn't have them, and that may not be a bad thing. Code that's inherently about regexps always looks simpler in other languages than in Python--but code that really isn't about regexps often gets written with regexps anyway in those languages, but rethought into something more readable in Python...

And here, my alternative of specifying values to check and names to bind instead of just dumping everything into the local namespace doesn't seem to fit very well. (Without regexp literals as part of the language syntax instead of as strings, you can't cram an arbitrary target or expression inside the <P> block.)

> * The ``Ellipsis`` object can be used as a match-all pattern.

I used the else keyword, but if you're going with keywordless match clauses, this makes sense.

> Example
> - -----------
>
> The code below would work if the ``point`` variable is specified as:
>
> * a tuple (x, y)
> * a mapping {'x': x, 'y': y}
> * a string "{x}-{y}".format(x, y)

But what if it's specified as an instance of a Point type, which is pretty much the paradigm case for pattern matching, and the only one that causes a problem?

Also, this raises an obvious problem with the regexp alternative: the first two cases match any Number, but the last case only matches nonnegative integers. And think about what it would mean to match any Number instance with a regexp. It's not just impossible, it doesn't even make sense, because values and their reprs aren't the same thing. If I import quaternion, its instances will match for the first two, but their reprs definitely won't match your regexp. This is one of those "now they have two problems" cases: a regexp looks like it will give you a simple solution to what you want, but it's just misleading you into thinking there's a simple solution that won't even pass your second test case.

Grant Jenks

unread,
Apr 7, 2015, 7:35:29 PM4/7/15
to python...@python.org, Andrew Barnert
On Tue, Apr 7, 2015 at 3:59 PM, Andrew Barnert <abar...@yahoo.com.dmarc.invalid> wrote:
The mapping is an interesting end-run around the decomposition problem, but it seems like it throws away the type of the object. In other words, Point(x, y) and Vector(x, y) both pattern-match identically to either {'x': x, 'y': y}. In ML or Haskell, distinguishing between the two is one of the key uses of pattern matching.

And another key use is distinguishing Point(0, y), Point(x, 0) and Point(x, y), which it looks like you also can't do this way; the only way to get x bound in the case block (assuming Point, like most classes, isn't iterable and therefore can't be decomposed as a tuple) is to match a dict.

Wouldn't it be better to be explicit about the type matching? For example, using pypatt, I've done this:

In [19]: import pypatt

In [20]: Point = namedtuple('Point', 'x y')

In [21]: @pypatt.transform
def origin_distance(point):
    with match((type(point), point)):
        with (Point, (0, quote(y))):
            return y
        with (Point, (quote(x), 0)):
            return x
        with (Point, (quote(x), quote(y))):
            return (x ** 2 + y ** 2) ** 0.5
   ....:         

In [22]: origin_distance(Point(0, 5))
Out[22]: 5

In [23]: origin_distance(Point(10, 0))
Out[23]: 10

In [24]: origin_distance(Point(3, 4))
Out[24]: 5.0

In [25]: origin_distance(Point(0, 'far'))
Out[25]: 'far'

In [26]: origin_distance(Point('near', 0))
Out[26]: 'near'


Grant

Andrew Barnert

unread,
Apr 7, 2015, 9:14:19 PM4/7/15
to Grant Jenks, python...@python.org
On Apr 7, 2015, at 16:35, Grant Jenks <grant...@gmail.com> wrote:

On Tue, Apr 7, 2015 at 3:59 PM, Andrew Barnert <abar...@yahoo.com.dmarc.invalid> wrote:
The mapping is an interesting end-run around the decomposition problem, but it seems like it throws away the type of the object. In other words, Point(x, y) and Vector(x, y) both pattern-match identically to either {'x': x, 'y': y}. In ML or Haskell, distinguishing between the two is one of the key uses of pattern matching.

And another key use is distinguishing Point(0, y), Point(x, 0) and Point(x, y), which it looks like you also can't do this way; the only way to get x bound in the case block (assuming Point, like most classes, isn't iterable and therefore can't be decomposed as a tuple) is to match a dict.

Wouldn't it be better to be explicit about the type matching?

You mean matching the type of the object? If so, that's my point. A Point and a Vector aren't the same thing just because they have the same two members, which is why in ML or Haskell or my proposal you have to match Point(x, y), not just (x, y) or {'x': x, 'y': y}.

(If you mean the types of the matched elements, the trick is to be able to specify the name and the type in one place; the obvious solution is to extend annotations to assignment/match targets, so you can match Point(x: int, y: int). But anyway, I don't think that's what you meant.)

With a library that doesn't extend Python syntax, that would have to look something like (Point, (q.x, q.y)), which you can probably make a little nicer with some hacks that inspect a compiled function object or a stack frame (to remove the need for quotes around variable names), but then presumably you knew that because you wrote pypatt, which I'm guessing uses one of those hacks. :)

For example, using pypatt, I've done this:

In [19]: import pypatt

In [20]: Point = namedtuple('Point', 'x y')

In [21]: @pypatt.transform
def origin_distance(point):
    with match((type(point), point)):
        with (Point, (0, quote(y))):
            return y
        with (Point, (quote(x), 0)):
            return x
        with (Point, (quote(x), quote(y))):
            return (x ** 2 + y ** 2) ** 0.5
   ....:         

In my proposed syntax, you'd write that as:

    case point:
        of Point(0, y):
            return y
        of Point(x, 0):
            return x
        of Point(x, y):
            return (x ** 2 + y ** 2) ** 0.5

In his syntax, I don't think you can write it at all. You either get a type match or a decomposition, not both, and there's no way (implicitly or explicitly) to match targets to bind and values to check; the best you can do is something like:

    for point:
        Point:
            if x == 0:
                return y
            elif y == 0:
                etc.

... using the fact that Point.__pattern__ returns a dict which is magically dumped into the enclosing scope. So it's simultaneously less explicit and more verbose.

In [22]: origin_distance(Point(0, 5))
Out[22]: 5

In [23]: origin_distance(Point(10, 0))
Out[23]: 10

In [24]: origin_distance(Point(3, 4))
Out[24]: 5.0

In [25]: origin_distance(Point(0, 'far'))
Out[25]: 'far'

In [26]: origin_distance(Point('near', 0))
Out[26]: 'near'


PyPatt looks pretty neat, but I suspect it has a _different_ fundamental problem. As far as I can tell from a quick glance, it doesn't seem to have any way for types to opt in to the matching protocol like Szymon's proposal or mine, which presumably means that it matches by constructing a value and checking it for equality, which means that any type that does anything non-trivial on construction is unmatchable (or maybe even dangerous--consider FileIO(path, 'w') as a pattern). I'm sure Szymon didn't want to add his __pattern__ protocol any more than I wanted to add my __match__ protocol, but once you start trying it with real types I don't think there's any real alternative. (See my linked blog post for more on that.)

Chris Angelico

unread,
Apr 7, 2015, 10:04:30 PM4/7/15
to python...@python.org
On Wed, Apr 8, 2015 at 7:10 AM, Andrew Barnert
<abar...@yahoo.com.dmarc.invalid> wrote:
> The smaller problem is that in Python, only functions (and classes and modules) have scope; the idea of a name bound "locally" is tricky. There's a bit of hackery around except clauses...
>

FWIW it's not hackery around scope at all - it's simply that the name
gets unbound:

>>> e = 2.71828
>>> try: 1/0
... except ZeroDivisionError as e: print("1/0!")
...
1/0!
>>> e
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'e' is not defined

ChrisA

Andrew Barnert

unread,
Apr 7, 2015, 10:48:24 PM4/7/15
to Chris Angelico, python...@python.org
On Apr 7, 2015, at 19:04, Chris Angelico <ros...@gmail.com> wrote:
>
> On Wed, Apr 8, 2015 at 7:10 AM, Andrew Barnert
> <abar...@yahoo.com.dmarc.invalid> wrote:
>> The smaller problem is that in Python, only functions (and classes and modules) have scope; the idea of a name bound "locally" is tricky. There's a bit of hackery around except clauses...
>
> FWIW it's not hackery around scope at all - it's simply that the name
> gets unbound:

Sorry, you're right, "hackery" is misleading here; I should have just said "(yes, I know about comprehensions and except clauses, but they're not relevant here)" or something similar. I don't consider the 3.x rules for except "hacky" or bad in any way, and didn't mean to imply otherwise.

Anyway, an ML/Haskell case expression binds the matched variables inside the case branch; the most obvious interpretation of this proposal (or mine) would bind them in the entire function.
Of course it's possible to solve that (if you think it needs solving) either the way comprehensions do (compile the case statement, or each branch, as a function definition and call) or the way except does (unbind names that appear in the "as" clause after the statement), or probably other ways, but if you don't do anything, they work like assignments. (Which is actually a perfectly good parallel to ML, where they work like let expressions, it's just that let expressions don't work like assignments.)

Chris Angelico

unread,
Apr 8, 2015, 12:57:07 AM4/8/15
to python...@python.org
On Wed, Apr 8, 2015 at 12:45 PM, Andrew Barnert <abar...@yahoo.com> wrote:
> On Apr 7, 2015, at 19:04, Chris Angelico <ros...@gmail.com> wrote:
>>
>> On Wed, Apr 8, 2015 at 7:10 AM, Andrew Barnert
>> <abar...@yahoo.com.dmarc.invalid> wrote:
>>> The smaller problem is that in Python, only functions (and classes and modules) have scope; the idea of a name bound "locally" is tricky. There's a bit of hackery around except clauses...
>>
>> FWIW it's not hackery around scope at all - it's simply that the name
>> gets unbound:
>
> Sorry, you're right, "hackery" is misleading here; I should have just said "(yes, I know about comprehensions and except clauses, but they're not relevant here)" or something similar. I don't consider the 3.x rules for except "hacky" or bad in any way, and didn't mean to imply otherwise.
>

Gotcha.

> Anyway, an ML/Haskell case expression binds the matched variables inside the case branch; the most obvious interpretation of this proposal (or mine) would bind them in the entire function.
> Of course it's possible to solve that (if you think it needs solving) either the way comprehensions do (compile the case statement, or each branch, as a function definition and call) or the way except does (unbind names that appear in the "as" clause after the statement), or probably other ways, but if you don't do anything, they work like assignments. (Which is actually a perfectly good parallel to ML, where they work like let expressions, it's just that let expressions don't work like assignments.)
>

I'd treat these case branches like 'with' statements. The name binding
from "as" lasts past the end of the block, it's no different from any
other assignment. The only reason exceptions unbind is to avoid the
refloop of an exception having a reference to locals, and the local
name referencing the exception. Everything else can be just like
assignment.

So basically, what we have here is a cross between a 'with' statement
and an 'if'. I started writing up a demo that mainly used 'with', and
then realized that I had no idea how to have the __enter__ method
stipulate that the body should be skipped... oh, hello PEP 377, that's
where the problem is. This proposal would need some new syntax, but
conceptually, it'd be something like an __enter__ method returning
"hey, this body should be skipped", based on whatever criteria it
likes (isinstance, regex matching, etc).

ChrisA

Anthony Towns

unread,
Apr 8, 2015, 1:13:12 AM4/8/15
to Szymon Pyżalski, Python-Ideas
On 8 April 2015 at 05:25, Szymon Pyżalski <szy...@pythonista.net> wrote:
Any object can be a pattern. If an object has a special method (called
``__pattern__`` or ``__match__``) defined then this method will govern
how this object behaves as pattern. If not - pattern matching works
the same as ``__eq__``
 
The ``__pattern__`` method can return ``False`` for no match. For a
match it can return ``True`` (simplest patterns) a sequence or a mapping

​Always returning a sequence or None would seem simpler to deal with, wouldn't it?

I'm imaginging the default implementations would be something like:

  class object:
    def __match__(self, item):
      if item == self​:
        return [item]
      else:
        return None

  class type:
    def __match__(cls, item):
      if isinstance(item, cls):
        return [item]
      else:
        return None

  class tuple:
    def __match__(self, item):
      if isinstance(item, tuple):
        if all( si.__match__(ii) for si,ii in zip(self, item) ):
          return item
     
      return None

Syntax for pattern matching
- -------------------------------
The syntax could look something like this::

for object:
    pattern1: statement1

Is the two level indent actually a win? It saves re-evaluating the expression, but you could just as easly have:

  object = f(x, y+w, g(z/3 for z in x))
  match object ~~ pattern1 as a,b,c:
    statements...
  ormatch object ~~ pattern2 as a,d:
    statements...

​but in that case you're coming pretty close to an 'if' block that's able to set locals:

  object = ...
  if object ~~ pattern1 as a,b,c:
    ...
  elif object ~~ pattern2 as a,d:
    ...

(Using ~~ as a placeholder for the syntax for "matches against")

    for point:
        (Number, Number) as x, y:
            print_point(x, y)
        {'x': Number, 'y': Number}:
            print_point(x, y)
        re.compile(
​​
'(?P<x>[0-9]+)-(?P<y>[0-9])+'):
        # Or re.compile('([0-9]+)-([0-9])+') as x, y:
            print_point(int(x), int(y))
        ...:
            raise TypeError('Expected something else')

My understanding of the way Haskell does name assignment in case matching is that it lets you put placeholders in a constructor. So where you might say

  x = MyX(a=1, b=2)

to create an object 'x', you could then say something along the lines of:

  x ~~ MyX(a=foo, b=bar)

to pull the values (1, 2) back out (as foo and bar). Doing exactly that doesn't make sense for python because reversing a constructor doesn't make much sense. If you could make that work, you'd get the python version of Haskell pattern matching looking like:

 re_pt = re.compile(​'(?P<x>[0-9]+)-(?P<y>[0-9])+')
 case point of:
   (Number(x), Number(y)): ...
   ('x': Number(x), 'y': Number(y)): ...
   re_pt(x=x, y=y): ..

​where you're at least mentioning the 'x' and 'y' variables that are getting filled in.

Okay, probably horrible idea:

Matching syntax looks like:

  case point is:
    (x:=Number, y:=Number): ...

a:=x is the same as calling assigning_match(locals(), x, "a") run in the local scope.
a,b:=x is the same as calling assigning_match(locals(), x, ("a", "b")).

'case obj is: \n pattern[0]: stmt[0] \n pattern[1]: stmt[1]' does:

   for i in range(2):
     try:
       pattern[i].__match__(obj)
     except NoMatch:
       continue
     stmt[i]
     break


__match__() raises NoMatch on failure, or returns the match:

  class object:
    def __match__(self, item):
      if self == item:
        return item
      raise NoMatch

  class type:
    def __match__(self, item):
      if isinstance(item, self):
        return item
      raise NoMatch

  class tuple:
    def __match__(self, item):
      if isinstance(item, tuple):
        if all( s_i.__match__(i_i) for s_i, i_i in zip(self, item) ):
            return item
      raise NoMatch

  class dict:
    def __match__(self, item):
      if isinstance(item, dict):
        if item.keys() == self.keys():         
          if all(s_v.__match__(item[s_k]) for s_k,s_v in self.items()):
            return item
      raise NoMatch

​  class SRE_Pattern:
    def __match__(self, item):
       m = self.match(it​em)
       if m is None:
           raise NoMatch
       return m.groups()

​assigning_match works something like:​

  class assigning_match(object):
    def __init__(self, l, matchobj, varname):
      assert isinstance(varname, str) or isinstance(varname, tuple)

      self.l = l
      self.m = matchobj
      self.n = varname

    def __match__(self, item):
      r = self.m.__match__(item)
      if isinstance(self.n, str):
          self.l[self.n] = other
      else:
          for n, r_i in zip(self.n, r):
              self.l[n] = r_i
      return r

Then the example looks like:

  case point is:
    (x:=Number, y:=Number):
        ...
    {"x": x:=Number, "y": y:=Number}:
        ...
    x,y:=re_pt:
        ...

Being able to pull out and name something that's a deep part of an object seems like it could be useful; otherwise it's not adding much over if/elif.

Though maybe it'd be interesting to have a "deconstructor" protocol for that instead, ie:

    Foo(a,b,key=c) = foo

is equivalent to something like:

    x = Foo.__deconstruct__(foo)
    a = x[0]
    b = x[1]
    c = x["key"]

I think it might even extend sanely to unpacking subobjects:

    Foo(Bar(a), b) = foo

becomes:

    x = Foo.__deconstruct__(foo)
    Bar(a) = x[0]
    b = x[1]

becomes:

    x = Foo.__deconstruct__(foo)
    x2 = Bar.__deconstruct__(x[0])
    a = x2[0]
    b = x[1]

Using "_" to accept and ignore any value would work fine there. If you allowed literals instead of an identifier, you could treat:

    Foo(0, 99, key="x") = foo

as:

    x = Foo.__deconstruct__(foo)
    assert 0 == x[0]
    assert 99 == x[1]
    assert "x" = x["key"]

That'd be enough to use it for matching I think:

    case foo is:
       Foo(a,b): ...

would become:

    try:
         Foo(a,b) = foo
    except:
         continue
     ...
     break

But it might be confusing figuring out whether:

    b = 0
    case foo is:
        Foo(a,b): print(a)

should result in value checking:

    if foo ~~ Foo(a,0): print(a)

or value assignment:

    del b
    if foo ~~ Foo(a,b): print(a)

Eh, I don't know. Maybe something there is interesting though.

Cheers,
aj

--
Anthony Towns <a...@erisian.com.au>

Andrew Barnert

unread,
Apr 8, 2015, 2:42:48 AM4/8/15
to Chris Angelico, python...@python.org
On Apr 7, 2015, at 21:56, Chris Angelico <ros...@gmail.com> wrote:
>
>> On Wed, Apr 8, 2015 at 12:45 PM, Andrew Barnert <abar...@yahoo.com> wrote:
>>> On Apr 7, 2015, at 19:04, Chris Angelico <ros...@gmail.com> wrote:
>>>
>>> On Wed, Apr 8, 2015 at 7:10 AM, Andrew Barnert
>>> <abar...@yahoo.com.dmarc.invalid> wrote:
>>>> The smaller problem is that in Python, only functions (and classes and modules) have scope; the idea of a name bound "locally" is tricky. There's a bit of hackery around except clauses...
>>>
>>> FWIW it's not hackery around scope at all - it's simply that the name
>>> gets unbound:
>>
>> Sorry, you're right, "hackery" is misleading here; I should have just said "(yes, I know about comprehensions and except clauses, but they're not relevant here)" or something similar. I don't consider the 3.x rules for except "hacky" or bad in any way, and didn't mean to imply otherwise.
>
> Gotcha.
>
>> Anyway, an ML/Haskell case expression binds the matched variables inside the case branch; the most obvious interpretation of this proposal (or mine) would bind them in the entire function.
>> Of course it's possible to solve that (if you think it needs solving) either the way comprehensions do (compile the case statement, or each branch, as a function definition and call) or the way except does (unbind names that appear in the "as" clause after the statement), or probably other ways, but if you don't do anything, they work like assignments. (Which is actually a perfectly good parallel to ML, where they work like let expressions, it's just that let expressions don't work like assignments.)
>
> I'd treat these case branches like 'with' statements. The name binding
> from "as" lasts past the end of the block, it's no different from any
> other assignment. The only reason exceptions unbind is to avoid the
> refloop of an exception having a reference to locals, and the local
> name referencing the exception. Everything else can be just like
> assignment.

Exactly my thought. Binding in Python case statements is different from Haskell case expressions in exactly the same way that assignment statements are different from let expressions, which is pretty much what you should expect.

> So basically, what we have here is a cross between a 'with' statement
> and an 'if'.

I understand why people keep reaching for "with" here, but I really don't like it.

Part of the attraction is the "as" clause, which looks like the "as" clause in an ML case expression. But that's missing the point of patterns: what you mostly want to do is decompose the pattern, binding variables to (some of) the components; you don't want to bind a variable to the whole thing. Except that _occasionally_ you _also_ want to bind the whole thing (or bind a subexpression and also some of its own subexpressions), which is what "as" can add.

Szymon's insight seems to be that pattern matching can be done, not directly in the way Haskell and ML do it, but instead by having types that know how to transform themselves into simpler structures that Python already knows how to decompose (tuples for tuple unpacking when possible--where the "as" clause is useful--and dicts for updating locals for the less-trivial cases where it's not). That's clever, and it makes good use of an "as" clause, but (especially with the locals-updating dict) it seems so far from what a with statement is intended to do that it seems like an abuse to try to fit his solution into that syntax.

Also, "with" just sounds wrong when you read it out loud. Although I'm not sure how much that matters; the same is arguably true with using with to simulate the related but different "if let" statement (as seen in Swift, as Grant sort of proposed in the last email I responded to), but I think I like it there...

> I started writing up a demo that mainly used 'with', and
> then realized that I had no idea how to have the __enter__ method
> stipulate that the body should be skipped... oh, hello PEP 377, that's
> where the problem is.

There are some hacks for getting around that in CPython 2.6-7 or 3.3+, which you can use for experimenting with it. Or you can always use MacroPy to transform the ASTs before it even gets that far.

Andrew Barnert

unread,
Apr 8, 2015, 3:25:47 AM4/8/15
to Anthony Towns, Python-Ideas, Szymon Pyżalski
This is the big thing that Szymon is trying to add, by adding a __match__ protocol that gives types a way to, in effect, reverse their constructor. That's the same thing I was trying to do in my blog post that I linked earlier. If you don't have that, then what you're building is a strictly weaker version of tuple unpacking (which we already have), together with a less clumsy way to write "try each of these blocks until one of them doesn't raise" (which you could do more generally), not pattern matching.

If you could make that work, you'd get the python version of Haskell pattern matching looking like:

 re_pt = re.compile(​'(?P<x>[0-9]+)-(?P<y>[0-9])+')
 case point of:
   (Number(x), Number(y)): ...
   ('x': Number(x), 'y': Number(y)): ...
   re_pt(x=x, y=y): ..

​where you're at least mentioning the 'x' and 'y' variables that are getting filled in.

Okay, probably horrible idea:

Matching syntax looks like:

  case point is:
    (x:=Number, y:=Number): ...

a:=x is the same as calling assigning_match(locals(), x, "a") run in the local scope.

Except, of course, that modifying locals isn't guaranteed to actually affect the locals, and in current CPython it generally doesn't...
This is pretty much my proposal with different syntax. I don't think the := buys you anything, and it doesn't make it obvious how to have a pattern that recursively matches its parts. In my proposal, this would look like:

    case point:
        x: Number, y: Number:
            ...
        {"x": x: Number, "y": y: Number}:
            ...
        re_pt(x, y):
            ...

(My blog post has an "of" for each clause, but I don't think it's necessary. It also doesn't have type annotations or dict matching; since it builds on normal assignment targets instead of reproducing most of the work done there, the obvious way to add those is to add them to assignments, which has other benefits--no more "a = 3 # type: int" in place of "a: int = 3". And meanwhile, I don't think dict matching is all that useful.)

Being able to pull out and name something that's a deep part of an object seems like it could be useful; otherwise it's not adding much over if/elif.

Exactly.

Though maybe it'd be interesting to have a "deconstructor" protocol for that instead, ie:

    Foo(a,b,key=c) = foo

is equivalent to something like:

    x = Foo.__deconstruct__(foo)
    a = x[0]
    b = x[1]
    c = x["key"]

That is almost exactly my version of __match__, except for two things.

I didn't think through keyword arguments. You've partly solved the problem, but only partly. For example, Foo(a, b, key=c) and Foo(a, sub=b, key=c) may be identical calls, but the deconstructor can only return one of them, and it may not even be the one that was used for construction. I think something like inspect's argspec objects may handle this, but I'd have to work it through.

I didn't try to replace everything tuple unpacking does, but to add onto it. We already have the concept of assignment targets and target lists; specifying what happens for each of those, and what happens when something which isn't a target is in a target list, gets you the behavior you want for matching tuples--including matching other iterables, matching first, *rest, and so on--without having to build a complicated tuple.__match__ and then duplicate it in every other sequence type.
Sure. Haskell actually defines pattern-matching let in terms of a translation to case--and pattern-matching function overload definitions too. Once you define either assignment or case, the other two come easy.

But it might be confusing figuring out whether:

    b = 0
    case foo is:
        Foo(a,b): print(a)

should result in value checking:

    if foo ~~ Foo(a,0): print(a)

or value assignment:

    del b
    if foo ~~ Foo(a,b): print(a)

This last part is really the only problem.

I suggested that it always means "bind to b", not "match the current value or b", with special syntax for when you want the latter. That's effectively what ML-family languages do, and in my experience you usually go a long time before you run into the need to match b's value or read some code that does so, at which point you finally learn the @b or whatever syntax.

Eh, I don't know. Maybe something there is interesting though.

Cheers,
aj

--
Anthony Towns <a...@erisian.com.au>
_______________________________________________

Chris Angelico

unread,
Apr 8, 2015, 5:00:50 AM4/8/15
to python...@python.org
On Wed, Apr 8, 2015 at 4:39 PM, Andrew Barnert <abar...@yahoo.com> wrote:
>> So basically, what we have here is a cross between a 'with' statement
>> and an 'if'.
>
> I understand why people keep reaching for "with" here, but I really don't like it.
>
> Part of the attraction is the "as" clause, which looks like the "as" clause in an ML case expression. But that's missing the point of patterns: what you mostly want to do is decompose the pattern, binding variables to (some of) the components; you don't want to bind a variable to the whole thing. Except that _occasionally_ you _also_ want to bind the whole thing (or bind a subexpression and also some of its own subexpressions), which is what "as" can add.
>

The other similar syntax that I looked at was the from-import.
Positing a new keyword "patmatch":

regex = re.compile('(?P<x>[0-9]+)-(?P<y>[0-9])+')
case line:
from regex patmatch x, y:
print(x,y)

(or possibly combine in the "case" part into a single statement, to
remove the two-level indentation and some of the magic)

What this means is:

1) Evaluate line and regex (they'd allow arbitrary expressions)
2) Call regex.__patmatch__(line) - I'll call that "ret"
3a) If it returns None, skip to the next patmatch statement.
3b) If it returns a tuple, assign x=ret[0]; y=ret[1]
3c) If it returns a dict, assign x=ret["x"]; y=ret["y"]
3d) If it returns any other object, assign x=ret.x; y=ret.y
4) Discard "ret" itself (ie it never actually gets bound to a name)
5) Execute the block.

This would cover all the obvious use-cases. Specifically for cases 3c
and 3d, you could also use 'as' syntax to bind to other names:

case line:
from re.compile('(?P<x>[0-9]+)') patmatch x as y:
print(y)

This would also deal with ambiguities like namedtuple; you can either
use "patmatch x, y" to treat it as a tuple, or "patmatch x as x, y as
y" to force attribute access.

Anthony Towns

unread,
Apr 8, 2015, 8:35:21 AM4/8/15
to Andrew Barnert, Python-Ideas, Szymon Pyżalski
On 8 April 2015 at 17:25, Andrew Barnert <abar...@yahoo.com> wrote:
On Apr 7, 2015, at 22:12, Anthony Towns <a...@erisian.com.au> wrote: 
  case point is:
    (x:=Number, y:=Number):
        ...
    {"x": x:=Number, "y": y:=Number}:
        ...
    x,y:=re_pt:
        ...
This is pretty much my proposal with different syntax. I don't think the := buys you anything, and it doesn't make it obvious how to have a pattern that recursively matches its parts. In my proposal, this would look like:

​Hmm, I thought that was reasonably straightforward. You'd say things like:

    case rect_coords is:
      (pos := (left := Number, top := Number), size := (width := Number, height := Number)):
          ...

to pull out rect_coords == (pos, size), pos == (left, top), size == (width, height).

Though maybe it'd be interesting to have a "deconstructor" protocol for that instead, ie:

    Foo(a,b,key=c) = foo

is equivalent to something like:

    x = Foo.__deconstruct__(foo)
    a = x[0]
    b = x[1]
    c = x["key"]

That is almost exactly my version of __match__, except for two things. 

I didn't think through keyword arguments. You've partly solved the problem, but only partly. For example, Foo(a, b, key=c) and Foo(a, sub=b, key=c) may be identical calls, but the deconstructor can only return one of them, and it may not even be the one that was used for construction. I think something like inspect's argspec objects may handle this, but I'd have to work it through.

​I think you could deal with that okay:

class type:
    def __deconstruct__(cls, obj):
        if not isinstance(obj, cls):
            raise NotDeconstructable

        result = obj.__dict__.copy()
        if callable(cls.__init__):
            argspec = inspect.getargspec(cls.__init__)
            for i, argname in enumeraate(argspec[1:]):
                result[i] = result.get(argname, None)

        return result

So if your have def __init__(a, b, key) and you just store them as attributes, you can access them by name or by the same position as per __init__. I guess you could deconstruct *args and **kwargs too, by pulling out the unused values from result, though you'd probably need to use an ordereddict to keep track of things then. Though that's how recursive unpacking of lists work anyway, isn't it? Match against [head, *tail]? The only other difference is I was just ignoring unspecified bits, maybe it would make more sense to require you to unpack everything in the same way list/tuple unpacking does.

    Foo(x, y, *args, kw=z, **kwargs) = foo

becomes something like:

     __unpack = Foo.__deconstruct__(foo)
     __seen = set()
     __it = iter(__unpack)
     
     x = __unpack.pop( next(__it) )
     y = __unpack.pop( next(__it) )
     z = __unpack.pop( "kw" )

     args = [ __unpack.pop(__i) for __i in __unpack ]
     kwargs = { k: __unpack.pop(k) for k in __unpack.keys() }

     if __unpack:
         raise ValueError("too many values to unpack")

​You'd need something fancier than an ordereddict for *args to leave anything for **kwargs to pick up though.​

I think that adds up to letting you say:

    re_point = re.compile('(?P<x>[0-9]+)-(?P<y>[0-9])+')

    case '1-3' is:
        re_point(x=xcoord, y=ycoord):
           print("X coordinate: %s, Y coordinate: %s" % (xcoord, ycoord))

by defining

    class SRE_Pattern:
        def __deconstruct__(self, obj):
            m = self.match(obj)
            if m is None:
                raise NotDeconstructable

            result = m.groupdict()
            return result

which seems plausible. That seems like a prtty friendly syntax for dealing with regexps actually...

(I think that's an advantage of having the protocol be Foo.__deconstruct__(obj) rather than obj.__match__() -- I don't think you could handle regexps without having both params)



Hmm, as far as literal-checking versus binding goes, if you're using := or "as" to bind subexpressions, as in:

    case order is:
        customer, Breakfast(spam, eggs, beans) as breakfast: ...

or

    case order is:
        customer, breakfast:=Breakfast(spam, eggs, beans)

then I /think/ you could use that to distinguish between binding and checking. So:

    customer = Customer("alice")

    name = "bob"
    case customer is:
        Customer(name):
           # succeeds, binds name as "alice"

    case customer is:
        Customer("bob"): 
           # compares "alice" to "bob", fails
           # if it had succeeded, wouldn't have bound any variables

    name = "bob"
    case customer is:
        Customer(name as x):
           # compares "alice" to name ("bob") and fails
           # but would have bound x as "alice" if it had succeeded

That seems like a rule that's reasonably understandable to me?

("expr as name" has definitely grown on me over "name := expr")



I guess I'm thinking "literal" matching is just a special case of deconstructing in general, and done by a method like:

    class object:
        def __deconstruct__(self, obj):
            if self == obj:
                return ()
            else:
                raise NotDeconstructable

That would have the potential side-effect of allowing things like:

   0 = n

as a valid statement (it'd raise NotDeconstructable if 0 != n). Not sure that's much worse than:

   [] = n

though, which is already allowed. You could have a syntax error if there wasn't anything to bind though.

Cheers,
aj

--
Anthony Towns <a...@erisian.com.au>

Andrew Barnert

unread,
Apr 8, 2015, 10:11:34 AM4/8/15
to Anthony Towns, Python-Ideas, Szymon Pyżalski
On Apr 8, 2015, at 05:34, Anthony Towns <a...@erisian.com.au> wrote:

On 8 April 2015 at 17:25, Andrew Barnert <abar...@yahoo.com> wrote:
On Apr 7, 2015, at 22:12, Anthony Towns <a...@erisian.com.au> wrote: 
  case point is:
    (x:=Number, y:=Number):
        ...
    {"x": x:=Number, "y": y:=Number}:
        ...
    x,y:=re_pt:
        ...
This is pretty much my proposal with different syntax. I don't think the := buys you anything, and it doesn't make it obvious how to have a pattern that recursively matches its parts. In my proposal, this would look like:

​Hmm, I thought that was reasonably straightforward. You'd say things like:

    case rect_coords is:
      (pos := (left := Number, top := Number), size := (width := Number, height := Number)):
          ...

to pull out rect_coords == (pos, size), pos == (left, top), size == (width, height).

Ah, I see. I was thrown by := looking like assignment in Pascal or some of the mutable ML variants, where it obviously can't be part of a subexpression, but once you get past that, yeah, it's obvious. Sorry.

Though maybe it'd be interesting to have a "deconstructor" protocol for that instead, ie:

    Foo(a,b,key=c) = foo

is equivalent to something like:

    x = Foo.__deconstruct__(foo)
    a = x[0]
    b = x[1]
    c = x["key"]

That is almost exactly my version of __match__, except for two things. 

I didn't think through keyword arguments. You've partly solved the problem, but only partly. For example, Foo(a, b, key=c) and Foo(a, sub=b, key=c) may be identical calls, but the deconstructor can only return one of them, and it may not even be the one that was used for construction. I think something like inspect's argspec objects may handle this, but I'd have to work it through.

​I think you could deal with that okay:

class type:
    def __deconstruct__(cls, obj):
        if not isinstance(obj, cls):
            raise NotDeconstructable

        result = obj.__dict__.copy()
        if callable(cls.__init__):
            argspec = inspect.getargspec(cls.__init__)
            for i, argname in enumeraate(argspec[1:]):
                result[i] = result.get(argname, None)

        return result

So if your have def __init__(a, b, key) and you just store them as attributes, you can access them by name or by the same position as per __init__. I guess you could deconstruct *args and **kwargs too, by pulling out the unused values from result, though you'd probably need to use an ordereddict to keep track of things then.

But argspec is more powerful and simpler than that--well, not argspec, but fullargspec and especially Signature. Just use inspect.signature and its bind method. That gives you a BoundArguments, and then you can just do boundargs.args[0] to match by position or boundargs.arguments['spam'] to match by keyword; for an argument that could have been passed either way, the value will show up both ways. (And for args that were passed into *args, however they got there, that works too--for example, if you def __init__(self, x, *args) and then do signature(cls.__init__).bind(1, 2, 3), you can see the 3 as args[3] or as arguments['args'][1].)

Meanwhile, I'm a bit wary about type implicitly assuming that attributes and __init__ parameters match one-to-one like this. Sure, that's often right, and it will often raise an AttributeError or a TypeError (from Signature.bind) when it's not right--but those cases where it does the wrong thing will be pretty surprising, and hard for a novice to work around. Making classes explicitly opt in to matching seems safer.

On the other hand, allowing them to opt in with a dead-simple decorator like @simplematch (or a base class or metaclass or register function or whatever) that just adds a __deconstruct__ method like the one you defined does seem a lot easier than any of the ideas I originally suggested.

One last thing here: I like your NotDeconstructable; if we make that a subclass of ValueError, we can change normal tuple unpacking to raise that as well, without breaking any code.

Though that's how recursive unpacking of lists work anyway, isn't it? Match against [head, *tail]?

Sure, but that's a pretty bad way to process Python lists (you end up making N copies of average length N/2, plus N stack frames out of a finite limit, and the code ends up more verbose and less readable), so anything that makes that Scheme/ML style look more inviting than iteration is probably not an advantage...
You're starting to turn me around on my dislike of using pattern matching for regexps; that's more readable than existing re code...

(I think that's an advantage of having the protocol be Foo.__deconstruct__(obj) rather than obj.__match__() -- I don't think you could handle regexps without having both params)

Yes.

I originally thought the extra indirection of letting something pose as the match value's type while actually not being that type was unnecessary, but this case, an re object's type posing as the string's type, seems like a great counterexample.

And it doesn't add much complexity, so... I think it's worth it.

Hmm, as far as literal-checking versus binding goes, if you're using := or "as" to bind subexpressions, as in:

    case order is:
        customer, Breakfast(spam, eggs, beans) as breakfast: ...

or

    case order is:
        customer, breakfast:=Breakfast(spam, eggs, beans)

then I /think/ you could use that to distinguish between binding and checking. So:

    customer = Customer("alice")

    name = "bob"
    case customer is:
        Customer(name):
           # succeeds, binds name as "alice"

    case customer is:
        Customer("bob"): 
           # compares "alice" to "bob", fails
           # if it had succeeded, wouldn't have bound any variables

    name = "bob"
    case customer is:
        Customer(name as x):
           # compares "alice" to name ("bob") and fails
           # but would have bound x as "alice" if it had succeeded

That seems like a rule that's reasonably understandable to me?

I don't know about that. It wouldn't be hard to remember the rule once you learn it, but it may not be that easy to learn, because at first glance it doesn't make sense. It's weird to have "foo" mean "assign to foo" but "foo as bar" in the exact same context mean "use the value of foo, then assign to bar". It's unprecedented in Python (with doesn't assign to the context manager name instead of using it as an expression if you leave off the "as") and I can't think of any other language that does something equivalent. (And I'm pretty sure it would inspire rants about "warts" by a certain person on this list. :)

And again, unless use cases for pattern matching in Python turn out to be very different than in ML and friends (which is certainly _possible_, but not something I'd want to just assume--if we don't expect somewhat similar uses, why are we even looking to them for inspiration?), I think the need for matching a variable's value won't be that common, so we shouldn't waste the nicest syntactic form on it... (Although I'm definitely not sold on my throwaway suggestion of "@foo", which has misleading parallels both in Python and in some of the languages the syntax is inspired by...)

("expr as name" has definitely grown on me over "name := expr")



I guess I'm thinking "literal" matching is just a special case of deconstructing in general, and done by a method like:

    class object:
        def __deconstruct__(self, obj):
            if self == obj:
                return ()
            else:
                raise NotDeconstructable

That would have the potential side-effect of allowing things like:

   0 = n

as a valid statement (it'd raise NotDeconstructable if 0 != n). Not sure that's much worse than:

   [] = n

though, which is already allowed. You could have a syntax error if there wasn't anything to bind though.

Yeah, I tried to come up with a rule that disallowed literal values in an assignment (but not case) target list unless there's some kind of "real pattern" somewhere within (possibly recursively) the target list, but all of the possibilities seem horrible. So I decided the best thing to do was just not mention that "0 = n" is no longer a syntax error, but a legal runtime assertion that hopefully people don't use.

Also, while I don't think people should explicitly write "0 = n", I can imagine a code generator or MacroPy macro or whatever that takes generates a pattern based on its arguments that might want to generate something like "0 = n" for the no-args case, and why not allow it?

Anyway, I think your additions and changes make a much better proposal than what I originally had, and we're now probably on track to the best Pythonic design for ML-style pattern matching--but I still don't think it's worth adding to Python, or even writing as a PEP for Guido to reject. Have you ever written code where you really missed the feature (except in the first hour back on Python after a week of Haskell hacking)? Or seen any real code that would be substantially improved? If not, this still just seems like a feature for its own sake, or more than one way to do it for no reason.

Devin Jeanpierre

unread,
Apr 8, 2015, 12:25:07 PM4/8/15
to Andrew Barnert, Python-Ideas, Szymon Pyżalski
All compiling / tree handling code is really annoying without pattern patching.

Here is a fragment of code from the ast module:

elif isinstance(node, BinOp) and \
isinstance(node.op, (Add, Sub)) and \
isinstance(node.right, Num) and \
isinstance(node.right.n, complex) and \
isinstance(node.left, Num) and \
isinstance(node.left.n, (int, long, float)):

Here is what it would look like if we were inside a match statement thingy:

# I picked the arg order out of the docs, but it's a bit too "clever"
case BinOp(Num(left), (Add | Sub), Num(right)):
if isinstance(left, (int, long, float)) and isinstance(right, complex):
...

NodeVisitor etc. are hacked up pattern matching-like thingies that
mitigates this when they apply.

It's harder to demonstrate, but sre_compile is even worse -- it takes
in a tree of opcodes and argument vectors, where the argument vector
values are totally undocumented etc. (and, IIRC, sometimes they are
lists, sometimes they aren't, depending on the opcode). If it used
some disjoint union type that unpacked nicely in a match statement,
the compiling code would be much more readable, and the internal data
structure would be easier to understand.

-- Devin

Szymon Pyżalski

unread,
Apr 8, 2015, 1:48:03 PM4/8/15
to Python-Ideas
On 08.04.2015 00:59, Andrew Barnert wrote:
>> "Works the same as __eq__" either doesn't allow you to assign parts while decomposing, or turns out to have a lot of problems.
The "works the same as __eq__" referers to simple objects like numbers,
strings or booleans. You could for example expect a tuple containing an
element that is some metadata with information, how to handle the data.
You could then use patterns like (True, Number), (False, Number).
>> The mapping is an interesting end-run around the decomposition problem, but it seems like it throws away the type of the object. In other words, Point(x, y) and Vector(x, y) both pattern-match identically to either {'x': x, 'y': y}. In ML or Haskell, distinguishing between the two is one of the key uses of pattern matching.
The mapping is only a way to populate the local namespace. If the
pattern matches or not is a different notion.

>> And another key use is distinguishing Point(0, y), Point(x, 0) and Point(x, y), which it looks like you also can't do this way; the only way to get x bound in the case block (assuming Point, like most classes, isn't iterable and therefore can't be decomposed as a tuple) is to match a dict.
I believe that objects that we want to be matchable in this way should
be subclasses of a namedtuple.------
>
> * Types should match their instances
>
>> This is an interesting idea. This allows you to do something halfway between taking any value and checking a specific value--and something that would often be useful in Python--which my proposal didn't have.

There is a problem with this idea. The isinstance method can accept a
tuple. And the tuple means "OR" in this case. In pattern matching
however the tuple would match a tuple. This is an ugly inconsistency.

Greetings
zefciu

Cara

unread,
Apr 9, 2015, 9:57:22 AM4/9/15
to python...@python.org
Fair warning: this is long.

> I was trying to google if there were some attempts to add pattern
> matching feature (as seen in e.g. haskell) to Python. I couldn't however
> find anything. Were there any proposals for a feature like this?
>
> Greetings
> zefciu

Many. In no particular order:

http://pypi.python.org/pypi/pyfpm
https://github.com/martinblech/pyfpm

http://www.aclevername.com/projects/splarnektity/
https://pypi.python.org/pypi/splarnektity/0.3

http://blog.chadselph.com/adding-functional-style-pattern-matching-to-python.html
https://gist.github.com/chadselph/1320421

http://svn.colorstudy.com/home/ianb/recipes/patmatch.py

http://monkey.org/~marius/pattern-matching-in-python.html
https://github.com/mariusae/match/tree/master

https://pypi.python.org/pypi/PEAK-Rules
http://peak.telecommunity.com/DevCenter/PEAK-Rules

http://home.in.tum.de/~bayerj/patternmatch.py

https://github.com/admk/patmat

https://github.com/lihaoyi/macropy#pattern-matching

These are in addition to the ones Paul gave. There are some more
attempts at related areas (unification and other forms of logic
programming, multiple dispatch) and discussions without implementations
that I didn't list.

Before I say anything else, I should disclose my bias: I think pattern
matching would be a great addition to Python. Python tries hard to be
readable, and pattern matching would make certain kinds of code much
more legible than Python currently allows. I'll give two examples, one
canonical and one from something I've been working on.

The first is red-black tree balancing. Not by accident,
rosettacode.org's pattern matching comparison page
( http://rosettacode.org/wiki/Pattern_matching ) uses that as its
example for pattern matching in different languages. Most of the
implementations fit onto a single page. I won't argue they're marvels
of readability, but that's mostly because of too-succinct variable
naming. Here's an implementation of the same algorithm in Python:
http://scienceblogs.com/goodmath/2008/05/28/the-basic-balanced-search-tree
. I'd paste the code, but it's too long to be instructive because the
algorithm gets scattered across against five nontrivial functions,
each of which has nested conditionals, not counting the implicit
nesting from the functions calling one another.

The second is performing algebra on the struct module's format strings.
I've written some automatic optimization of them with concatenation and
scalar multiplication, so that for instance 'b' + 'b' = '2b' rather than
'bb' and '10s' + '10s' = '10s10s'. The value of this is in dealing with
format strings that can be combined in various ways to describe the
layouts of complicated binary data formats. With some context stripped
for brevity,

def _check_add(self, other):
if not isinstance(other, _StructBase):
return NotImplemented
if not self.same_endianness(other):
return NotImplemented
if self.endianness == other.endianness or len(self.endianness) >
len(other.endianness):
return type(self)
else:
if isinstance(other, _StructSequence):
return type(other)
else:
return other._STRUCT_SEQUENCES[other.endianness]

def __add__(self, other):
struct_sequence = self._check_add(other)
if isinstance(other, _StructSequence):
if other._elements == []:
return struct_sequence(self._elements.copy())
else:
begin = other._elements[0]
rest = other._elements[1:]
else:
begin = other
rest = []
if self._elements == []:
rest.insert(0, begin)
return struct_sequence(rest)
if self._elements[-1].character == begin.character:
return struct_sequence(self._elements[:-1] +
[self._elements[-1] + begin] + rest)
else:
return struct_sequence(self._elements + [begin] + rest)

The exact details aren't important, my point is this also leads to
complicated nested conditionals because to know what actions I need to
perform on a given data type, I need checks to make sure the endianness
of the two strings matches, multiple type checks, checks for whether one
or the other object is empty, and so on, then after verifying what I'm
dealing with, decompositions to break a list apart into its first
element and remainder and handling the empty-list case. This
implementation isn't even complete because it doesn't, for instance,
deal with all the possible endianness permutations. There are some
simplifications possible, for instance moving all information described
by finite sets like endianness into types (so there's a StructSequence,
a BigEndianStructSequence, a LittleEndianStructSequence, ...) but this
can't handle all the necessary conditionals and doesn't scale well.

Those are two small-scale examples. Some broad areas where pattern
matching is useful are:

1) Parsing: It's useful both for performing manipulations on grammars
like left factoring and operating on the trees that parsers produce.

2) Mathematics: manipulations of abstract mathematical objects like
algebraic expressions (not numerical analysis), for instance algebraic
simplification rules.

3) Recursive data structures: Python doesn't have algebraic data types
or embed recursion into everything the way languages like Haskell do,
but there's plenty of recursively-structured data out in the wild like
HTML and JSON.

4) Compilers/interpreters: this use is closely related to parsing and
recursive data structures because compilers and interpreters are often
acting on abstract syntax trees. Often transformations like flattening
an AST into bytecode and constant folding can be implemented with
pattern matching.

I'll stop there because those are the applications I know best, but I'm
sure there are others. Pattern matching also subsumes features
including multiple dispatch on types that have been proposed, discussed,
and implemented many times for Python. I'd argue the existence of more
than a dozen versions of pattern matching and their cumulative downloads
from PyPi says the desire for the feature isn't zero, but I'm not sure
how indicative they are.

Whether I've convinced you or not that pattern matching is a worthwhile
addition to the language, I'd like it, and if I could, I'd write a
library to get it. Past discussions on pattern matching have started
and ended with debates about the syntax for adding it as a core language
feature. Because the whole point of pattern matching is readability,
since you can't do anything with it you couldn't with a
sufficiently-complicated nested if-elif-else, arguing about syntax isn't
putting the cart before the horse like it is with some features, but
there's more to implementing pattern matching than just syntax, and I'd
like to expand the discussion to some of those issues. More than that,
though, I think pattern matching with imperfect syntax would still be
useful, and I'd rather see a mypy-like library now than a language
feature that might make it into Python in some distant future. In that
spirit, I'll offer some syntax that doesn't involve changing the
grammar. To emulate a case expression like Haskell, the only way I can
see to do it is abusing 'with' statements because they're the only
block-like statement aside from exceptions that allow assignment:

with match(expr) as case:
with case(pattern) as matches:
if matches:
target_list = matches
suite
with case(pattern) as matches:
if matches:
target_list = matches
suite
...

Pattern matching fits naturally into for statements and comprehensions:

for target_list in pattern(expr):
suite

(expr for target_list in pattern(expr) if expr)

A syntax similar to functools.singledispatch's works for pattern
dispatch:

@match(pattern)
def f(parameter_list):
suite

@f.case(pattern)
def g(parameter_list):
suite

@f.case(pattern)
def h(parameter_list):
suite

For defining list and dictionary patterns themselves, my best thought is
abusing __getitem__, ellipsis, and the slice syntax:

Listpattern[1, ..., 'a': int]

[1, 'a', 'b', 'c', 2] # matches, name a is bound to 2
[3, 4, 5] # doesn't match because the first element is 1, no binding
[1, 2, None] # also doesn't match because None is not an int

Dictpattern['a': 1, 'b': 2, 'c': 'a': str]

['a': 1, 'b': 2, 'c': 'foo'] # matches, name a is bound to 'foo'
['a': 1, 'b': 2, 'c': 'bar', 'd': 3] # doesn't match because no ellipsis

These are far from the best possible syntax available by changing
Python's grammar or syntax, and some of them would benefit from using
Macropy to rewrite ASTs. I'm sure they could be improved even within
those constraints, and if anyone has ideas, I'm all ears.

A lot of work has been done on pattern matching in object-oriented
languages with and without algebraic data types. I think Scala's
implementation is most mature and well worth looking at for ideas.
(Macropy's experimental pattern matching is patterned on Scala's.)
"Patterns as Objects in
Grace" ( http://homepages.ecs.vuw.ac.nz/~djp/files/DLS2012.pdf ) is a
paper on pattern matching in an object-oriented teaching language with
structural typing. "Robust Scripting Via
Patterns" ( http://hirzels.com/martin/papers/dls12-thorn-patterns.pdf )
is about pattern matching in Thorn, a dynamically-typed language
implemented on the JVM. Thorn's pattern matching is integrated into the
language at a deep level, and it has some good ideas about using
patterns in control structures, some of which I shamelessly stole in
writing the above suggestions for pattern matching syntax in Python.
I'm just scratching the surface here, much more can be and has been
said.

I haven't talked at all about the nitty-gritty of implementation, though
I can. I think the most productive way to have that discussion on this
list would be to look at what smaller features would be most helpful for
implementing pattern matching as a library, and likewise for syntax,
shades of the recent discussion about macros.

Cara

Andrew Barnert

unread,
Apr 9, 2015, 7:58:46 PM4/9/15
to Cara, python...@python.org
This seems to be overcomplicating things by trying to write ML code in Python instead of just writing Python code. For example, the only reason you decompose other._elements into begin and rest is so you can recompose them (a copy of) into the same list you started with. Why bother? (Also, if you're avoiding the limited pattern-matching already available in Python by not writing "begin, *rest = other._elements", that doesn't make a great case for adding more.)

def __add__(self, other):
struct_sequence = self._check_add(other)
if isinstance(other, _StructSequence):
elements = other._elements
else:
elements = [other]
if (self._elements and elements
and self.elements[-1].character == elements[0].character):
return struct_sequence(self._elements[:-1] +
[self._elements[-1] + elements[0]] + elements[1:])
return struct_sequence(self._elements + elements)

If this is all you're doing, your argument is equivalent to showing some inherently iterative code written instead in tail-recursive form to argue that Python should do tail call elimination. You're asking to make it easier to provide another, less Pythonic way to do something that can already be easily done Pythonically.

> The exact details aren't important, my point is this also leads to

> complicated nested conditionals because to know what actions I need to
> perform on a given data type, I need checks to make sure the endianness
> of the two strings matches, multiple type checks, checks for whether one
> or the other object is empty, and so on, then after verifying what I'm
> dealing with, decompositions to break a list apart into its first
> element and remainder and handling the empty-list case.

But everything from "decompositions" on is only necessary because you're thinking in terms of decomposition instead of list operations.

> This
> implementation isn't even complete because it doesn't, for instance,
> deal with all the possible endianness permutations. There are some
> simplifications possible, for instance moving all information described
> by finite sets like endianness into types (so there's a StructSequence,
> a BigEndianStructSequence, a LittleEndianStructSequence, ...) but this
> can't handle all the necessary conditionals and doesn't scale well.
>
> Those are two small-scale examples. Some broad areas where pattern
> matching is useful are:
>
> 1) Parsing: It's useful both for performing manipulations on grammars
> like left factoring and operating on the trees that parsers produce.

Parsing, and related things (compiling, schema validation, etc.) looks like a good case, but I'm not sure it is.

For simple languages, an expression-tree library like PyParsing or Boost.Spirit seems to be more than sufficient, and often be even nicer than explicit pattern matching.

For more complex languages (and sometimes even for simple ones), you can use or write a framework that lets you write a fully-declarative syntax and then executes it or generates executable code from it. For example, if you want to validate and parse an XML language, doing it by hand in ML would be nicer than in Python, but writing it in XML Schema and using a schema-validating parser is much nicer than either. Parsing the Python grammar by hand would be nicer in ML than in C or Python, but writing it as a declarative GRAMMAR file is nicer than either. And so on. You could argue that the work to write that framework is wasted if it's only going to be used once, but how many complex languages really need to be parsed only once?Even CPython's parser, which isn't used for anything but parsing Python's GRAMMAR, is effectively reused every time Python's syntax changes, and every time someone wants to hack on it to play with new syntax.

The one really good case for pattern matching seems to be when you pretty much _have_ to write a custom parser--maybe because the language is full of special-case rules and tangled context, like C++. At least the GCC and LLVM/Clang groups gave up trying to write C++ as a declarative specification and instead wrote custom recursive-descent parsers in C, which I suspect could be a lot nicer with pattern matching. But has anyone written an equivalent C++ parser in Haskell, ML, etc. to compare, or is that just a suspicion?

> 2) Mathematics: manipulations of abstract mathematical objects like
> algebraic expressions (not numerical analysis), for instance algebraic
> simplification rules.

Are you talking about parsing the expressions into a tree? If so, we're still on parsing.

> 3) Recursive data structures: Python doesn't have algebraic data types
> or embed recursion into everything the way languages like Haskell do,
> but there's plenty of recursively-structured data out in the wild like
> HTML and JSON.

The parsing of this data is covered above. But what do you do with it once it's parsed? If you're transforming it, that again seems to be a case where writing the transformer declaratively in, say, XSSL wins. And if you're just extracting data from it, usually you're better off with the simple data['machines'][0]['name'] syntax (in an except KeyError block), and when that's not good enough, you're usually better off with a declarative search language like XPath. So this seems to be the same as with parsing.

> 4) Compilers/interpreters: this use is closely related to parsing and

> recursive data structures because compilers and interpreters are often
> acting on abstract syntax trees. Often transformations like flattening
> an AST into bytecode and constant folding can be implemented with
> pattern matching.
>
> I'll stop there because those are the applications I know best, but I'm
> sure there are others. Pattern matching also subsumes features
> including multiple dispatch on types that have been proposed, discussed,
> and implemented many times for Python. I'd argue the existence of more
> than a dozen versions of pattern matching and their cumulative downloads
> from PyPi says the desire for the feature isn't zero, but I'm not sure
> how indicative they are.
>
> Whether I've convinced you or not that pattern matching is a worthwhile
> addition to the language, I'd like it, and if I could, I'd write a
> library to get it.

That raises the question: what's wrong with the more than a dozen libraries already written?
First, this seems like it will execute _every_ branch that matches, instead of just the first one. Surely you don't want that, do you? You can wrap the whole thing in a "while True:" and have each suite (and the "else" case) end with "break", or wrap it in a try/except and have each suite raise a custom exception, but you need some way to get out.

Also, why not just put the target_list in the as clause (inside parens, to fit it in a target, which is what an as clause actually takes)? Of course that means you get an exception rather than an else when it doesn't match, but that's fine. (It would be nice if it were a subclass of ValueError, in case the suite raises some ValueError of its own, however...)

> Pattern matching fits naturally into for statements and comprehensions:

>
> for target_list in pattern(expr):
> suite


What is this going to do? From your above example, if looks like match(expr)(pattern) returns either an iterable that can be assigned to a target_list or None. What does pattern(expr) return? Return an iterable of one target_list or an empty iterable?

Also, the very idea of having an object which can be called on an expression or can have a (processed) expression called on it with similar but not identical results seems very confusing.

> (expr for target_list in pattern(expr) if expr)
>
> A syntax similar to functools.singledispatch's works for pattern
> dispatch:
>
> @match(pattern)
> def f(parameter_list):
> suite
>
> @f.case(pattern)
> def g(parameter_list):
> suite
>
> @f.case(pattern)
> def h(parameter_list):
> suite
>
> For defining list and dictionary patterns themselves, my best thought is
> abusing __getitem__, ellipsis, and the slice syntax:
>
> Listpattern[1, ..., 'a': int]
>
> [1, 'a', 'b', 'c', 2] # matches, name a is bound to 2
> [3, 4, 5] # doesn't match because the first element is 1, no binding
> [1, 2, None] # also doesn't match because None is not an int
>
> Dictpattern['a': 1, 'b': 2, 'c': 'a': str]
>
> ['a': 1, 'b': 2, 'c': 'foo'] # matches, name a
> is bound to 'foo'
> ['a': 1, 'b': 2, 'c': 'bar', 'd': 3] #
> doesn't match because no ellipsis

The useful part of pattern matching--and the hard part, too--is decomposing structures. A proposal that ignores that and only decomposes lists and dicts misses the point.

Yes, it's true that Python doesn't have ADTs. So that means one of two things: (1) types that want to be matched will have to do a bit of work to expose their structure and/or to decompose themselves, or (2) pattern matching isn't useful in Python. The examples you give from other languages show that the first is at least plausible. And both Szymon's proposal and mine and various other ideas suggested on this thread show attempts at a Pythonic design. But just ignoring structure decomposition, as you're doing, is not a way toward anything useful.

> These are far from the best possible syntax available by changing
> Python's grammar or syntax, and some of them would benefit from using
> Macropy to rewrite ASTs. I'm sure they could be improved even within
> those constraints, and if anyone has ideas, I'm all ears.

If you're willing to use MacroPy, that brings up the question: what's wrong with the pattern matching that it comes with as an example, and why don't you want to build on it instead of starting over?

> A lot of work has been done on pattern matching in object-oriented
> languages with and without algebraic data types. I think Scala's
> implementation is most mature and well worth looking at for ideas.
> (Macropy's experimental pattern matching is patterned on Scala's.)
> "Patterns as Objects in
> Grace" ( http://homepages.ecs.vuw.ac.nz/~djp/files/DLS2012.pdf ) is a
> paper on pattern matching in an object-oriented teaching language with
> structural typing. "Robust Scripting Via
> Patterns" ( http://hirzels.com/martin/papers/dls12-thorn-patterns.pdf )
> is about pattern matching in Thorn, a dynamically-typed language
> implemented on the JVM. Thorn's pattern matching is integrated into the
> language at a deep level, and it has some good ideas about using
> patterns in control structures, some of which I shamelessly stole in
> writing the above suggestions for pattern matching syntax in Python.
> I'm just scratching the surface here, much more can be and has been
> said.
>
> I haven't talked at all about the nitty-gritty of implementation, though
> I can. I think the most productive way to have that discussion on this
> list would be to look at what smaller features would be most helpful for
> implementing pattern matching as a library, and likewise for syntax,
> shades of the recent discussion about macros.

This last part, I think, is very useful: After building the best pattern-matching library you can with Python today, figure out what small changes to the language would allow you to either make it better or simplify its implementation, then propose those small changes instead of proposing pattern matching itself. For example, if you have to use nonportable hacks (or a MacroPy macro) so your context manager's __enter__ method can skip the statement body, that would be a good argument for reopening PEP 377 (because you have a new use case that wasn't known at the time). But trying to guess in advance what features might be helpful seems harder and less likely to pay off.

That being said, I suspect it's not going to pay off, because I think the with statement is a false lead in the first place. The semantics look promising, but they're not actually what you want to build on (unless you want to completely transform with statements into something that isn't actually a with statement, using MacroPy AST transformations or otherwise). I'd love to be proven wrong, of course.

Meanwhile transforming a pattern-match into something that can be assigned to a target_list instead of just assigning in the match does work in simple cases, but I think Szymon's proposal shows why it's not sufficient, and I think any real examples will show that it's redundant. Even without the language support in my proposal, explicit quoting of names like in Grant Jenks' PyPatt library shows that you can get most of the way there. (The "as" clause is useful for the case when you want to bind the whole structure at the same time you decompose it and potentially bind its parts, but that's pretty rare--which is another reason I think "with" is misleading.)

Cara

unread,
Apr 16, 2015, 4:24:28 PM4/16/15
to Andrew Barnert, python...@python.org
In my previous post I aimed for brevity and skimmed over some of the
details. In some places, I was a bit too brief, and so this post will
be much more detailed.
I didn't use "begin, *rest = other._elements" because this code has to
be Python-2 compatible. (This is another argument for a library
solution.)

I meant more to give the flavor of the complications of needing to test
and decompose objects at the same time in Python than provide the
best possible solution. Your version of __add__ is better and simpler
than mine, but I still find it hard to read and understand, and you're
using _check_add too. The following is another description of this
algorithm, simplifying by ignoring the case where you're adding a
StructAtom to a StructSequence. (Arguably, that should be another
method, or StructAtoms shouldn't be part of the public API just as
Python has no char built-in type).

A StructSequence is a sequence of StructAtoms. A StructAtom is composed
of an endianness, a legal format character, and a positive integer. To
concatenate two StructSequences in __add__ or __radd__, here are the
steps:

1) Check that `other` is a StructSequence. If it's not, return
NotImplemented.

2) Check that `self` and `other` have compatible endiannesses. Probably
the best way to do this is use a dictionary with two-element frozensets
of endianness characters mapping to one endianness character each. I
used conditionals above in _check_add, and that's messier. A missing
key means they aren't compatible, and then __radd__ has to raise some
kind of exception indicating invalid endiannesses for concatenation.
__add__ needs to return NotImplemented. A successful lookup will give
the endianness character for the result StructSequence.

3) Check that neither `self` nor `other` are empty. If either is empty,
return a copy of the other. (If they're both empty, you'll end up
returning a copy of an empty StructSequence, which is fine.)

4) Compare the first StructAtom of one StructSequence with the final
StructAtom of the other. (Which is `self` and which is `other` depends
on whether we're in __add__ or __radd__.) If the format characters are
the same and not 's' or 'p', then create a new StructSequence composed
of all but the last character of one, all but the first character of the
other, and a StructAtom that has the endianness of the result
StructSequence, the same character as the two original StructAtoms, and
an integer that's the sum of the two integers in the original
StructAtoms. If the format characters aren't the same or are both 's'
or 'p', return a new StructSequence that's the concatenation of the two
original sequences.

Replacing _check_add with the aforesaid dictionary won't improve its
clarity much, if any. You still end up with at least four conditionals
of some form and needing to do object decomposition in some of the
branches. Moreover, struct format string manipulation is simple for
problems of these kinds: consider doing simplification on regular
expressions. Devin's examples from the ast and sre_compile modules are
better illustrations than this. If you don't find the code in these
examples hard to read or comprehend, I don't expect you to agree that
pattern matching is useful. Do you or don't you? That's an honest
question.
I became interested in pattern matching when I started writing a parsing
library and realized that both the manipulations I needed to perform in
the parser itself and the manipulations I would need to do on the AST
after parsing were better expressed using pattern matching than the
if-elif-else trees I was writing. As part of this, I've been doing some
informal research on how people do parsing in the real world.
Unfortunately, I don't have any formal data and don't know how I'd
collect it, not to mention not having the money or time to do a formal
survey even if I knew how to do it. With that caveat, some patterns
I've noticed:

1) Hand-written recursive-descent parsers are ubiquitous in real code.
If you take a random program that needs to do parsing somewhere, I'd
give better than even odds that it's done with a hand-written
recursive-descent parser, not a library. I have some guesses as to why
this is, but I won't speculate here. I think there's a solid case for
putting a parser package in the standard library---Haskell and Scala,
for instance, have already done this---which helps with one part of this
problem, but then you have to do something with the tree you've parsed,
and pattern matching is also useful for that.

2) Many programmers don't know (E)BNF or formal grammars. I prefer to
use some declarative DSL too, but I know the languages and the concepts
already. Those who don't might benefit from features that make parsers
easier to write without that background.

3) The world is filled with obscure data formats that none of us have
ever heard of; this is is especially true of binary data formats. Each
of these requires some new parser whether hand-written, built with a
library, or generated.

4) Most languages and formats can't be parsed with anything less than a
context-sensitive grammar, and sometimes require more than that. To
handle this problem, successful parser libraries make it easy to add
custom code to the parsing process.

Thus, there's a lot of parsing code in the world, whether there should
be or not, and even if you know how formal grammars and how to use
parser generators and similar tools, you might still end needing to
write parsing code because you have an obscure format to parse, a format
that has non-standard properties, or both. Aside: if you do end up
needing to write a framework for a language or format, wouldn't pattern
matching make it easier to write that framework?

As far as I know, there's no C++ parser in Haskell or ML, mainly because
C++ is so hard to parse. There exist parsers for other languages (C,
for instance) that do make extensive use of pattern matching.

> > 2) Mathematics: manipulations of abstract mathematical objects like
> > algebraic expressions (not numerical analysis), for instance algebraic
> > simplification rules.
>
> Are you talking about parsing the expressions into a tree? If so, we're still on parsing.

I'm talking about doing manipulations on mathematical objects, like say
doing substitutions with trigonometric identities.

> > 3) Recursive data structures: Python doesn't have algebraic data types
> > or embed recursion into everything the way languages like Haskell do,
> > but there's plenty of recursively-structured data out in the wild like
> > HTML and JSON.
>
> The parsing of this data is covered above. But what do you do with it
> once it's parsed? If you're transforming it, that again seems to be a
> case where writing the transformer declaratively in, say, XSSL wins.
> And if you're just extracting data from it, usually you're better off
> with the simple data['machines'][0]['name'] syntax (in an except
> KeyError block), and when that's not good enough, you're usually
> better off with a declarative search language like XPath. So this
> seems to be the same as with parsing.

What is XSSL? All my comments above on parsing apply here, too. One
problem I've encountered is that since most languages and formats don't
have nice solutions like XPath already available, you end up
implementing your own.

> >
> > Whether I've convinced you or not that pattern matching is a worthwhile
> > addition to the language, I'd like it, and if I could, I'd write a
> > library to get it.
>
> That raises the question: what's wrong with the more than a dozen libraries already written?

The short answer is some combination of poor performance, lack of
compatibility (with 2.7, with 3.4, with PyPy, with other alternate
interpreters), bad syntax, and missing functionality, depending on the
library. If you have a particular library you're curious about, I can
be more specific.

> > with match(expr) as case:
> > with case(pattern) as matches:
> > if matches:
> > target_list = matches
> > suite
> > with case(pattern) as matches:
> > if matches:
> > target_list = matches
> > suite
> > ...
>
>
> First, this seems like it will execute _every_ branch that matches,
> instead of just the first one. Surely you don't want that, do you? You
> can wrap the whole thing in a "while True:" and have each suite (and
> the "else" case) end with "break", or wrap it in a try/except and have
> each suite raise a custom exception, but you need some way to get out.

Yes, I have `while True:` in my notes, it was a clumsy copy-paste, and
omitting the break lines was a simple mistake. There are some reasons
that try/except might be better.

> Also, why not just put the target_list in the as clause (inside
> parens, to fit it in a target, which is what an as clause actually
> takes)? Of course that means you get an exception rather than an else
> when it doesn't match, but that's fine. (It would be nice if it were a
> subclass of ValueError, in case the suite raises some ValueError of
> its own, however...)

I didn't know you could do that :). Yes, that's preferable.

> >
> > for target_list in pattern(expr):
> > suite
>
>
> What is this going to do? From your above example, if looks like
> match(expr)(pattern) returns either an iterable that can be assigned
> to a target_list or None. What does pattern(expr) return? Return an
> iterable of one target_list or an empty iterable?
>
> Also, the very idea of having an object which can be called on an
> expression or can have a (processed) expression called on it with
> similar but not identical results seems very confusing.

Here, what should happen is that `expr` should be an iterator and
calling `pattern()` on it returns another iterator containing the
deconstructed parts of each object in the original iterator *if* they
match. I agree the lack of parallelism is confusing.

>
> The useful part of pattern matching--and the hard part, too--is
> decomposing structures. A proposal that ignores that and only
> decomposes lists and dicts misses the point.
>
> Yes, it's true that Python doesn't have ADTs. So that means one of two
> things: (1) types that want to be matched will have to do a bit of
> work to expose their structure and/or to decompose themselves, or (2)
> pattern matching isn't useful in Python. The examples you give from
> other languages show that the first is at least plausible. And both
> Szymon's proposal and mine and various other ideas suggested on this
> thread show attempts at a Pythonic design. But just ignoring structure
> decomposition, as you're doing, is not a way toward anything useful.

I assumed the answer to this was obvious because you've already given
it :). Objects have to opt in to pattern matching by having some method
that inverts their constructor or at least provides some plausible
facsimile thereof. For any object-oriented language, if you want to
preserve the ability to define with any attributes and have pattern
matching, you have to do it this way or with structural typing, and
Python doesn't really have that. Python classes don't have private
variables like Java except by convention, but neither do they define
their variables except by convention. __init__ is a black box that
could be doing any Turing-complete computation on its arguments and with
properties, any attribute access could be the result of a computation
rather than a member of the class. Scala calls its pattern-matching
method unapply by analogy with its apply method. In Python, I'd
probably pick a different name, but the concept is the same.

> If you're willing to use MacroPy, that brings up the question: what's
> wrong with the pattern matching that it comes with as an example, and
> why don't you want to build on it instead of starting over?

I'm tempted to answer, "Nothing!" because I am thinking about building
on it, but the short answer is that I'm not sure the syntactic and
performance advantages of macros outweigh the usability problems
introduced by macros (unfamiliarity of macros to most Python developers,
bootstrapping, complication of the implementation). This is my major
concern: if you have a case to make one way or the other, I'd like to
hear it. Other secondary concerns are that, despite doing AST
rewriting, MacroPy's pattern matching doesn't seem to do any
optimization, it isn't feature-complete (it's still listed as
experimental for good reasons), and MacroPy itself requires updating to
3.4 (and soon 3.5). None of these are insurmountable problems if I were
convinced that macros were the right approach. I would prefer to build
off one of the existing solutions rather than make a new one.
Unfortunately, for most of them it would be more work to comprehend the
existing code than it would be to reimplement their functionality,
especially for the ones with nonexistent documentation. In that
respect, MacroPy's is definitely one of the more promising options.

> >
> > I haven't talked at all about the nitty-gritty of implementation, though
> > I can. I think the most productive way to have that discussion on this
> > list would be to look at what smaller features would be most helpful for
> > implementing pattern matching as a library, and likewise for syntax,
> > shades of the recent discussion about macros.
>
> This last part, I think, is very useful: After building the best
> pattern-matching library you can with Python today, figure out what
> small changes to the language would allow you to either make it better
> or simplify its implementation, then propose those small changes
> instead of proposing pattern matching itself. For example, if you have
> to use nonportable hacks (or a MacroPy macro) so your context
> manager's __enter__ method can skip the statement body, that would be
> a good argument for reopening PEP 377 (because you have a new use case
> that wasn't known at the time). But trying to guess in advance what
> features might be helpful seems harder and less likely to pay off.
>
> That being said, I suspect it's not going to pay off, because I think
> the with statement is a false lead in the first place. The semantics
> look promising, but they're not actually what you want to build on
> (unless you want to completely transform with statements into
> something that isn't actually a with statement, using MacroPy AST
> transformations or otherwise). I'd love to be proven wrong, of course.

After some thinking about it, I've concluded you're right, because
both of the other options (try and for) combine binding with control
flow, while with requires an additional control flow statement without
something like PEP 377. Let's leave aside MacroPy and other ways of
changing Python's semantics like the decorators in Grant's PyPatt and
\url{https://github.com/Suor/patterns} for the moment. Pattern
matching has two basic forms, the case/match block and function
definition, but before I can talk about them, I have to discuss
patterns themselves. Without metaprogramming, patterns have to be
defined either as ordinary functions or classes. You have some set of
elementary patterns like class patterns, dict patterns, list patterns,
variable patterns, and so on. You construct or initialize patterns
with either objects to match against or other patterns, mimicking the
structure of the data you want to match. With the same tentative
syntax I suggested last time:

ClassPattern(Foo(1, VariablePattern('a', str)))
DictPattern['a': 1, 'b': 2, 'c': 'a': int]
ListPattern[1, ..., 'a': Bar]

Alternately, if you don't want to use the slicing syntax like I am
there, you can make everything very explicit:

ClassPattern(Foo(1, VariablePattern('a', str))
DictPattern(('a', 1), ('b', 2), ('c', VariablePattern('a', int)))
ListPattern(1, ..., VariablePattern('a', Bar))

Even the slicing syntax is verbose. Both forms of syntax could be
made less verbose. Would that make it more or less clear? I'm not
sure. This syntax is tightly circumscribed because the whole point of
pattern matching is that the patterns should look like the objects
they're being matched against. The closer the resemblance, the
happier I'd be. After defining a pattern, you call it or one of its
methods on the object you want to match against, and it returns either
some indication of failure, maybe None, maybe some object with
information about the non-match, or an object containing the
deconstructed objects in a form suitable for tuple assignment.

The case/match block should start with the object that the patterns
are to be matched against (avoiding this duplication is a syntactic
advantage of pattern matching over if/elif/else blocks), contain a
list of patterns to match against, have access to variables in the
local scope, and bind variables in the same scope. The only
statements that bind variables in the same scope beyond basic
assignment statements in Python are, as far as I know, try-except,
with, and for. (Well, import, but it's obviously unsuitable.) Here
are some sketches of syntax using try-except and for.

try:
pattern0(expression,
pattern1,
pattern2,
...)
except pattern0.exception as identifier:
target_list = identifier
suite
except pattern1.exception as identifier:
target_list = identifier
suite
except pattern2.exception as identifier:
target_list = identifier
suite
...
else:
suite

Here, each pattern has an associated exception. If a pattern matches,
it throws its exception; if it doesn't match, it calls the first
pattern in its varargs list on the value of the expression it was
passed, and so on. The last pattern has no pattern to call so returns
None if it fails to match. The exceptions have iterator methods for
tuple unpacking their arguments after they've been caught. Another
possible choice for setting up the patterns uses operator overloading
and a dummy function to look vaguely like ML's syntax:

match(expression,
(pattern0
| pattern1
| pattern2
...))

Using for statements can avoid the extra assignment statements at the
cost of needing to reiterate the expression and some raises.

try:
for target_list in pattern0(expression):
suite
raise MatchFound
for target_list in pattern1(expression):
suite
raise MatchFound
...
except MatchFound:
pass
else:
suite

A pattern either returns a one-element iterator containing an iterator
to be bound to target_list or raises StopIteration, in which case the
loop body gets skipped and the next pattern is tried. Once any loop
body is executed, it raises MatchFound, which skips all the other
patterns.

In Python, function definition is easier both for syntax and
implementation. All the pattern matching implementations for Python
that dispatch to functions
( http://blog.chadselph.com/adding-functional-style-pattern-matching-to-python.html ,
https://github.com/martinblech/pyfpm ,
http://svn.colorstudy.com/home/ianb/recipes/patmatch.py ,
https://github.com/admk/patmat ,
http://peak.telecommunity.com/DevCenter/PEAK-Rules ) use
decorators and all use them in similar ways. There are two major
variations. First, some decorators take patterns as arguments while
others use introspection on keyword arguments to embed the patterns in
the function definition. You could use annotations instead of keyword
arguments, though I don't know how easily that would integrate with
type-hinting---it's probably possible because pattern matching and
typing are closely related. I assume the reason no one's done so is
because annotations are Python-3 only. Second, some implementations
have one uniform decorator across all definitions of a function with
frame-hacking or similar hacks to introspect the namespace and link up
the decorated functions, while others use two decorators like
functools.singledispatch, one to create a pattern-matched
function and another to link other functions to the first one. I
prefer the latter approach because I dislike frame-hacking on
principle, since it introduces compatibility issues, and I think two
decorators is more explicit as to what's going on. Beyond those
differences, they're all similar: there's an outer function that tries
patterns against its arguments until it finds one that matches, then
it takes the names bound by the pattern and passes them as arguments
to the inner function. Quoting my suggested function definition
syntax from my last message:

@match(pattern)
def f(parameter_list):
suite

@f.case(pattern)
def g(parameter_list):
suite

@f.case(pattern)
def h(parameter_list):
suite

f(expression)

In all of these potential syntax choices, the names of bound variables
are assigned manually in assignment statements or function arguments.

These approaches all have some boilerplate, but I'm more worried about
overhead. Without any desugaring, most of them will require at least
one function call per pattern on top of the conditionals and copying
necessary. With metaprogramming, it's possible to eliminate some of
the function calls for function definitions and the pseudo-ML
match-case statement by constructing a single dispatch function or
single pattern before calling any patterns. Unfortunately, in the
match-case statement, moving overhead to pattern definition time
doesn't help because the patterns have to be redefined every time
they're executed unless they're explicitly defined outside it, making
the code less readable; this is another way in which pattern matching
for function definition is easier. Any metaprogramming could remove the
boilerplate, but MacroPy could move the overhead to module import time.
This is, I think, the best reason to use MacroPy over both
non-metaprogramming approaches and other metaprogramming approaches.

Unfortunately, Python's current implementation imposes some
limitations on desugaring. In ML and Haskell, desugaring pattern
matching involves creating a finite automaton. In Python, the VM
doesn't provide any way to jump to an offset determined by runtime
information short of Philip J. Eby's computed goto hack (
https://mail.python.org/pipermail/python-dev/2006-June/066115.html
), which isn't portable. This means that any state with more than two
outgoing transitions has to be represented with an if-else tree that
turns what could be a constant-time operation (look up the jump
location in a hash table, jump) to a linear-time operation (check
conditionals equal to one minus the number of transitions). This is,
of course, the switch statement, PEPs 275 and 3103, in another guise.
Another, less serious problem is there's a constant-time performance
penalty from needing to use exceptions to implement backtracking or
nodes with more than one incoming transition.

Supposing I were to build on MacroPy's pattern matching, is MacroPy's
current syntax for the match-case statement good enough? If not, how
could it be improved? As I understand it, by using import hooks like
MacroPy does and a new parser, the sky's the limit as far as changing
syntax goes, but I'd much rather stay within MacroPy's limitations,
using Python's current parsers, than go outside them.

> Meanwhile transforming a pattern-match into something that can be
> assigned to a target_list instead of just assigning in the match does
> work in simple cases, but I think Szymon's proposal shows why it's not
> sufficient, and I think any real examples will show that it's
> redundant. Even without the language support in my proposal, explicit
> quoting of names like in Grant Jenks' PyPatt library shows that you
> can get most of the way there. (The "as" clause is useful for the case
> when you want to bind the whole structure at the same time you
> decompose it and potentially bind its parts, but that's pretty
> rare--which is another reason I think "with" is misleading.)

I don't understand your comment about how Szymon's proposal shows
assigning to a target_list isn't sufficient.
Reply all
Reply to author
Forward
0 new messages