Hello!
I was trying to google if there were some attempts to add pattern
matching feature (as seen in e.g. haskell) to Python. I couldn't however
find anything. Were there any proposals for a feature like this?
From a quick look:
https://pypi.python.org/pypi/patterns
https://pypi.python.org/pypi/py-pattern-matching
I'm not sure if either of these is the sort of thing you're looking for.
Paul
Quick idea: this might integrate well with PEP 484, or with the (still speculative) multiple dispatch that Lukas Lang wants to write.
On 07.04.2015 17:52, Guido van Rossum wrote:
>
> On Apr 7, 2015 8:46 AM, "Paul Moore" <p.f....@gmail.com
> <mailto:p.f....@gmail.com>> wrote:
>
> On 7 April 2015 at 15:54, Szymon Pyżalski <szy...@pythonista.net
> <mailto:szy...@pythonista.net>> wrote:
>> I was trying to google if there were some attempts to add
>> pattern matching feature (as seen in e.g. haskell) to Python. I
>> couldn't
> however
>> find anything. Were there any proposals for a feature like this?
>
> From a quick look:
>
> https://pypi.python.org/pypi/patterns
> https://pypi.python.org/pypi/py-pattern-matching
>
Thanks these are nice but I think they try to mimic languages based on
algebraic data types too much. I was thinking about something like this:
Magic method for patterns
- -----------------------------
Any object can be a pattern. If an object has a special method (called
``__pattern__`` or ``__match__``) defined then this method will govern
how this object behaves as pattern. If not - pattern matching works
the same as ``__eq__``
The ``__pattern__`` method can return ``False`` for no match. For a
match it can return ``True`` (simplest patterns) a sequence or a mapping
.
Syntax for pattern matching
- -------------------------------
The syntax could look something like this::
for object:
pattern1: statement1
pattern2:
statement2
statement3
pattern3 as var1, var2:
statement4
statement5
For the simplest cases this works simply by calling appropriate
statement block. If the ``__pattern__`` method returns a mapping, then
the values from it will be merged into the local namespace. If it
returns a sequence and there is the ``as`` clause, the values will be
assigned to variables specified.
> Quick idea: this might integrate well with PEP 484, or with the
> (still speculative) multiple dispatch that Lukas Lang wants to
> write.
Function dispatch
- -----------------------[
Besides the above syntax the patterns could be also used to register
functions for dispatching. I dont' know how precisely it would
integrate with PEP 484, but I agree that any valid type hint should
also be a valid pattern.
Existing objects as patterns
- ----------------------------------
* Types should match their instances
* Tuples should match sequences that have the same length and whose
elements match against the elements of a tuple (subpatterns)
* Dictionaries should match mappings that contain all the keys in the
pattern and the values match the subpatterns under these keys
* Regular expressions should match strings. For named groups they
should populate the local namespace.
* The ``Ellipsis`` object can be used as a match-all pattern.
Example
- -----------
The code below would work if the ``point`` variable is specified as:
* a tuple (x, y)
* a mapping {'x': x, 'y': y}
* a string "{x}-{y}".format(x, y)
::
def print_point(x, y):
print('x = {}'.format(x))
print('y = {}'.format(y))
for point:
(Number, Number) as x, y:
print_point(x, y)
{'x': Number, 'y': Number}:
print_point(x, y)
re.compile('(?P<x>[0-9]+)-(?P<y>[0-9])+'):
# Or re.compile('([0-9]+)-([0-9])+') as x, y:
print_point(int(x), int(y))
...:
raise TypeError('Expected something else')
Greetings
zefciu
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAEBAgAGBQJVJC8YAAoJEBoIz6vZoBBhm04H/i5dJAMTloRxqSXXyY7QU8kx
0gBPjCoCsw6XZSu1nlKP0/yY+tO5uXjQEK5s1vid8gjuEBnrZCtwjfCYiiqSs96P
7iT9kSnyx/0jr5FvAC1ZItb3KN7G+MBxMfCHENM8yMz7Yxdw2F5ziT7zFE5aOELW
y3Mz37+GytmMUT/DXA4wIHBFJgALspWgoU2P/ilzg4oCtbQREmX5pcMnFIsMavPI
oIu7KJ0ZW2hJ4K2a0h8HWCpZ9aYwcCTF51HMkTAXo6leLSm3XG36jKiCEFJiZpxe
sLH8e/2rF93tBUVunJfyEQG/bOnlkoTX2rONavDZCqSqt8t3ehT170TtHJL10Os=
=iO75
-----END PGP SIGNATURE-----
So, tell me if I'm understanding you correctly: The advantage over a
basic if/elif/else tree is that it can assign stuff as well as giving
a true/false result? Because if all you want is the true/false,
there's not a lot of advantage over what currently exists.
The sequence and "as" clause syntax is reasonable, but I do not like
the idea that a mapping's keys would quietly become local name
bindings. It'd be like "from ... import *" inside a function -
suddenly _any_ name could become local, without any way for the
compiler to know. Also - although this is really just bikeshedding -
not a fan of the use of 'for' here. I'm imagining code something like
this:
for value in gen(): # this one iterates
for value: # this one doesn't
int: yield ("(%d)" if value<0 else "%d") % value
str: yield repr(value)
datetime.datetime: yield value.strftime(timefmt)
...: yield str(value)
Come to think of it, not a fan of ellipsis here either; "else" seems
better. But that's minor bikeshedding too.
The key is to show how this is materially better than an if/elif tree
*and* better than a dictionary lookup. (Obviously isinstance checks
are superior to dict lookups based on type(value), but I don't know
how often people actually write code like the above.)
> Besides the above syntax the patterns could be also used to register
> functions for dispatching. I dont' know how precisely it would
> integrate with PEP 484, but I agree that any valid type hint should
> also be a valid pattern.
>
> Existing objects as patterns
> - ----------------------------------
>
> * Types should match their instances
> * Tuples should match sequences that have the same length and whose
> elements match against the elements of a tuple (subpatterns)
> * Dictionaries should match mappings that contain all the keys in the
> pattern and the values match the subpatterns under these keys
> * Regular expressions should match strings. For named groups they
> should populate the local namespace.
> * The ``Ellipsis`` object can be used as a match-all pattern.
This is where the type hints could save you a lot of trouble. They
already define a sloppy form of isinstance checking, so you don't need
to do all the work of defining tuples, dicts, etc - all you have to
say is "typing.py type hint objects match any object which they would
accept", and let the details be handled elsewhere. That'd get you
Union types and such, as well as what you're talking about above.
> for point:
> (Number, Number) as x, y:
> print_point(x, y)
> {'x': Number, 'y': Number}:
> print_point(x, y)
> re.compile('(?P<x>[0-9]+)-(?P<y>[0-9])+'):
> # Or re.compile('([0-9]+)-([0-9])+') as x, y:
> print_point(int(x), int(y))
> ...:
> raise TypeError('Expected something else')
I'm not sure how to rescue the "regex with named groups" concept, but
this code really doesn't look good. Using "as" wouldn't work reliably,
since dicts are iterable (without a useful order). Something along the
lines of:
x, y from re.compile('(?P<x>[0-9]+)-(?P<y>[0-9])+'):
or maybe switch the arguments (to be more like "from ... import x,
y")? But whatever it is, I'd prefer to be explicit: This has to return
a dictionary containing keys 'x' and 'y' (what if there are others?
Ignore them?), and then they will be bound to identical names.
All in all, an interesting idea, but one that's going to require a
good bit of getting-the-head-around, so it wants some strong use
cases. Got some real-world example code that would benefit from this?
ChrisA
http://stupidpythonideas.blogspot.com/2014/08/a-pattern-matching-case-statement-for.html
I wrote this up after getting frustrated with people suggesting Python should have "a case statement like every other language" where by "every other language" they meant C and Pascal and their descendants with the relatively useless case that's just an elif chain rather than ML and its descendants with the more useful pattern matching case statement. (Nick Coghlan pointed out that go has an interesting middle ground--it looks like a JavaScript switch, but with an as clause to optionally bind the result of each expression to a local variable.)
Once you've defined the case statement, you can add pattern-matching assignments and function definitions (especially if you know Haskell, where the equivalents are just syntactic sugar for using case).
The big problem that you have in Python that you don't have in ML, Haskell, etc. is that most values don't have a unique, evaluable representation that can be pattern-matched. I think the best solution to this is a __match__ protocol, but I went over some other ideas.
The smaller problem is that in Python, only functions (and classes and modules) have scope; the idea of a name bound "locally" is tricky. There's a bit of hackery around except clauses, and comprehensions are expressions that get compiled into calls to hidden functions in part to deal with this, but I'm not sure either of those is something you'd want to extend here.
Anyway, I think that my straw man proposal is a start toward the best design you'd be able to come up with (although it's not in itself the best design, probably), and it still doesn't fit into Python. But read it and tell me if you disagree with either half of that.
"Works the same as __eq__" either doesn't allow you to assign parts while decomposing, or turns out to have a lot of problems. In my proposal, I restricted that to only work on values that are expressions that can't be read as assignment targets, and handled assignment lists recursively, to get around those problems. Yours just punts on decomposition here and offers something different (the mapping in the next paragraph), but I don't think that covers many common cases.
> The ``__pattern__`` method can return ``False`` for no match. For a
> match it can return ``True`` (simplest patterns) a sequence or a mapping
> .
The mapping is an interesting end-run around the decomposition problem, but it seems like it throws away the type of the object. In other words, Point(x, y) and Vector(x, y) both pattern-match identically to either {'x': x, 'y': y}. In ML or Haskell, distinguishing between the two is one of the key uses of pattern matching.
And another key use is distinguishing Point(0, y), Point(x, 0) and Point(x, y), which it looks like you also can't do this way; the only way to get x bound in the case block (assuming Point, like most classes, isn't iterable and therefore can't be decomposed as a tuple) is to match a dict.
> Syntax for pattern matching
> - -------------------------------
>
> The syntax could look something like this::
>
> for object:
> pattern1: statement1
> pattern2:
> statement2
> statement3
> pattern3 as var1, var2:
> statement4
> statement5
I don't like "for" here. Avoiding new keywords is good, but reusing a keyword to mean something unrelated is often confusing. With "for object in iterable:", the "object" names the thing that will be assigned each time through the loop. Here, it looks like you're omitting the "in iterable", but really you're omitting the "object in" part, which is half of one clause and half of another. Also, in cases where the matchable object/iterable is a long and complex expression (as it often is in both pattern matching and loops), it'll be very confusing to the eye if you misinterpret which kind of for you have.
On the other hand, I like the keyword-less match clauses better than my version with "of". The syntax may be harder to describe to the grammar, but it's easier to a read for a human.
The only potential advantage I can see to the "of" clause is that it opens the door to let a single-match case be written without requiring a double indent, or even in a single line:
case obj: of pattern:
statement
statement
case obj: of pattern: statement
But I left that out of my proposal because it breaks the compound statement rule that (informally) you can't put two colons on the same line.
> For the simplest cases this works simply by calling appropriate
> statement block. If the ``__pattern__`` method returns a mapping, then
> the values from it will be merged into the local namespace. If it
> returns a sequence and there is the ``as`` clause, the values will be
> assigned to variables specified.
>
>> Quick idea: this might integrate well with PEP 484, or with the
>> (still speculative) multiple dispatch that Lukas Lang wants to
>> write.
>
> Function dispatch
> - -----------------------[
>
> Besides the above syntax the patterns could be also used to register
> functions for dispatching. I dont' know how precisely it would
> integrate with PEP 484, but I agree that any valid type hint should
> also be a valid pattern.
Consider the syntax from Matthew Rocklin's multipledispatch library on PyPI:
@dispatch(int, int)
def add(x, y): return x+y
@dispatch(object, object):
def add(x, y): return "%s + %s" % (x, y)
Lukas pointed out that you can out the types in the annotations rather than the decorator, and use MyPy syntax for them instead of having to invent a similar but not identical syntax as Matthew does.
Anyway, it should be easy to see how this maps to a single function with pattern matching and vice-versa. Especially once you look at how Haskell implements overloading--it's just syntactic sugar for a single definition with a case expression inside.
> Existing objects as patterns
> - ----------------------------------
>
> * Types should match their instances
This is an interesting idea. This allows you to do something halfway between taking any value and checking a specific value--and something that would often be useful in Python--which my proposal didn't have.
If we had a syntax for annotating values or variables (rather than just function params) with types, you could do the same thing by specifying "_: Number", which means you could also match "x: Number" to check the type and bind the value. That's something that was missing from my proposal (see the JSON schema example in the linked post on ADTs).
Without that, the only way I can think of to check the type but nothing else in my proposal is something like Breakfast(*) or something equally ugly, so that's a win for your syntax.
> * Tuples should match sequences that have the same length and whose
> elements match against the elements of a tuple (subpatterns)
My proposal just used the same rules as assignment to a target list (except that it recurses on pattern matching each element instead of assigning each element, of course). The upside is more flexibility, and the details have already been worked out and proven useful in Python assignments (and extended a couple of times over the past few decades in ways that work just as well here as they did there). The downside is that if you pattern match an iterator, each tuple decomposition case will eat elements from the iterator. But presumably anyone who understands iterators already knows that.
> * Dictionaries should match mappings that contain all the keys in the
> pattern and the values match the subpatterns under these keys
This definitely solves the problem of how to deal with objects that are usually constructed with keyword arguments. But it doesn't help for objects usually constructed with a combination of positional and keyword arguments. (Of course you can make __pattern__ return an OrderedDict; the problem is on the other end, how to write a pattern that matches arg #0, arg #1, and arg "spam" all in one pattern.)
Anyway, I still don't like the idea of neither getting nor matching the type here, or the idea of dumping all the names into locals instead of binding names that you ask for specifically, as I mentioned earlier. Also, consider how you'd map this relatively common ML/Haskell use:
case p1 of Point(x1, y1):
case p2 of Point(x2, y2):
if x1 <= x < x2 and y1 <= y < y2:
With a syntax like mine, where you specify the names to be bound, this is easy; with yours, where the pattern just dumps its names into the namespace, you have to write something like:
x0, y0 = x, y
case p1 of {'x': Any, 'y': Any}:
x1, y1 = x, y
case p2 of {'x': Any, 'y': Any}:
x2, y2 = x, y
if x1 <= x0 < x2 and y1 <= y0 < y2
> * Regular expressions should match strings. For named groups they
> should populate the local namespace.
Good idea. Anyone coming from another modern scripting language would expect that. Then again, they expect regexp support in a whole lot of places Python doesn't have them, and that may not be a bad thing. Code that's inherently about regexps always looks simpler in other languages than in Python--but code that really isn't about regexps often gets written with regexps anyway in those languages, but rethought into something more readable in Python...
And here, my alternative of specifying values to check and names to bind instead of just dumping everything into the local namespace doesn't seem to fit very well. (Without regexp literals as part of the language syntax instead of as strings, you can't cram an arbitrary target or expression inside the <P> block.)
> * The ``Ellipsis`` object can be used as a match-all pattern.
I used the else keyword, but if you're going with keywordless match clauses, this makes sense.
> Example
> - -----------
>
> The code below would work if the ``point`` variable is specified as:
>
> * a tuple (x, y)
> * a mapping {'x': x, 'y': y}
> * a string "{x}-{y}".format(x, y)
But what if it's specified as an instance of a Point type, which is pretty much the paradigm case for pattern matching, and the only one that causes a problem?
Also, this raises an obvious problem with the regexp alternative: the first two cases match any Number, but the last case only matches nonnegative integers. And think about what it would mean to match any Number instance with a regexp. It's not just impossible, it doesn't even make sense, because values and their reprs aren't the same thing. If I import quaternion, its instances will match for the first two, but their reprs definitely won't match your regexp. This is one of those "now they have two problems" cases: a regexp looks like it will give you a simple solution to what you want, but it's just misleading you into thinking there's a simple solution that won't even pass your second test case.
Wouldn't it be better to be explicit about the type matching?
For example, using pypatt, I've done this:In [19]: import pypattIn [20]: Point = namedtuple('Point', 'x y')In [21]: @pypatt.transformdef origin_distance(point):  with match((type(point), point)):    with (Point, (0, quote(y))):      return y    with (Point, (quote(x), 0)):      return x    with (Point, (quote(x), quote(y))):      return (x ** 2 + y ** 2) ** 0.5  ....:     Â
In [22]: origin_distance(Point(0, 5))Out[22]: 5In [23]: origin_distance(Point(10, 0))Out[23]: 10In [24]: origin_distance(Point(3, 4))Out[24]: 5.0In [25]: origin_distance(Point(0, 'far'))Out[25]: 'far'In [26]: origin_distance(Point('near', 0))Out[26]: 'near'
Any object can be a pattern. If an object has a special method (called
``__pattern__`` or ``__match__``) defined then this method will govern
how this object behaves as pattern. If not - pattern matching works
the same as ``__eq__``
The ``__pattern__`` method can return ``False`` for no match. For a
match it can return ``True`` (simplest patterns) a sequence or a mapping
Syntax for pattern matching
- -------------------------------
The syntax could look something like this::
for object:
  pattern1: statement1
  for point:
    (Number, Number) as x, y:
      print_point(x, y)
    {'x': Number, 'y': Number}:
      print_point(x, y)
    re.compile(​​'(?P<x>[0-9]+)-(?P<y>[0-9])+'):
    # Or re.compile('([0-9]+)-([0-9])+') as x, y:
      print_point(int(x), int(y))
    ...:
      raise TypeError('Expected something else')
If you could make that work, you'd get the python version of Haskell pattern matching looking like: re_pt = re.compile(​'(?P<x>[0-9]+)-(?P<y>[0-9])+') case point of:  (Number(x), Number(y)): ...  ('x': Number(x), 'y': Number(y)): ...  re_pt(x=x, y=y): ..​where you're at least mentioning the 'x' and 'y' variables that are getting filled in.
Okay, probably horrible idea:Matching syntax looks like:Â case point is:Â Â Â (x:=Number, y:=Number): ...a:=x is the same as calling assigning_match(locals(), x, "a") run in the local scope.
Being able to pull out and name something that's a deep part of an object seems like it could be useful; otherwise it's not adding much over if/elif.
Though maybe it'd be interesting to have a "deconstructor" protocol for that instead, ie:Â Â Foo(a,b,key=c) = foois equivalent to something like:Â Â x = Foo.__deconstruct__(foo)Â Â a = x[0]Â Â b = x[1]Â Â c = x["key"]
But it might be confusing figuring out whether:  b = 0  case foo is:    Foo(a,b): print(a)should result in value checking:  if foo ~~ Foo(a,0): print(a)or value assignment:  del b  if foo ~~ Foo(a,b): print(a)
--Eh, I don't know. Maybe something there is interesting though.Cheers,ajAnthony Towns <a...@erisian.com.au>
_______________________________________________
On Apr 7, 2015, at 22:12, Anthony Towns <a...@erisian.com.au> wrote:Â
 case point is:  (x:=Number, y:=Number):
       ...  {"x": x:=Number, "y": y:=Number}:
       ...   x,y:=re_pt:
       ...This is pretty much my proposal with different syntax. I don't think the := buys you anything, and it doesn't make it obvious how to have a pattern that recursively matches its parts. In my proposal, this would look like:
Though maybe it'd be interesting to have a "deconstructor" protocol for that instead, ie:Â Â Foo(a,b,key=c) = foois equivalent to something like:Â Â x = Foo.__deconstruct__(foo)Â Â a = x[0]Â Â b = x[1]Â Â c = x["key"]That is almost exactly my version of __match__, except for two things.Â
I didn't think through keyword arguments. You've partly solved the problem, but only partly. For example, Foo(a, b, key=c) and Foo(a, sub=b, key=c) may be identical calls, but the deconstructor can only return one of them, and it may not even be the one that was used for construction. I think something like inspect's argspec objects may handle this, but I'd have to work it through.
On Apr 7, 2015, at 22:12, Anthony Towns <a...@erisian.com.au> wrote:ÂÂ case point is:Â Â (x:=Number, y:=Number):
       ...  {"x": x:=Number, "y": y:=Number}:
       ...   x,y:=re_pt:
       ...This is pretty much my proposal with different syntax. I don't think the := buys you anything, and it doesn't make it obvious how to have a pattern that recursively matches its parts. In my proposal, this would look like:​Hmm, I thought that was reasonably straightforward. You'd say things like:  case rect_coords is:   (pos := (left := Number, top := Number), size := (width := Number, height := Number)):     ...to pull out rect_coords == (pos, size), pos == (left, top), size == (width, height).
Though maybe it'd be interesting to have a "deconstructor" protocol for that instead, ie:  Foo(a,b,key=c) = foois equivalent to something like:  x = Foo.__deconstruct__(foo)  a = x[0]  b = x[1]  c = x["key"]That is almost exactly my version of __match__, except for two things.ÂI didn't think through keyword arguments. You've partly solved the problem, but only partly. For example, Foo(a, b, key=c) and Foo(a, sub=b, key=c) may be identical calls, but the deconstructor can only return one of them, and it may not even be the one that was used for construction. I think something like inspect's argspec objects may handle this, but I'd have to work it through.​I think you could deal with that okay:class type:  def __deconstruct__(cls, obj):    if not isinstance(obj, cls):      raise NotDeconstructable    result = obj.__dict__.copy()    if callable(cls.__init__):      argspec = inspect.getargspec(cls.__init__)      for i, argname in enumeraate(argspec[1:]):        result[i] = result.get(argname, None)    return resultSo if your have def __init__(a, b, key) and you just store them as attributes, you can access them by name or by the same position as per __init__. I guess you could deconstruct *args and **kwargs too, by pulling out the unused values from result, though you'd probably need to use an ordereddict to keep track of things then.
Though that's how recursive unpacking of lists work anyway, isn't it? Match against [head, *tail]?
(I think that's an advantage of having the protocol be Foo.__deconstruct__(obj) rather than obj.__match__() -- I don't think you could handle regexps without having both params)
Hmm, as far as literal-checking versus binding goes, if you're using := or "as" to bind subexpressions, as in:  case order is:    customer, Breakfast(spam, eggs, beans) as breakfast: ...or  case order is:    customer, breakfast:=Breakfast(spam, eggs, beans)then I /think/ you could use that to distinguish between binding and checking. So:  customer = Customer("alice")  name = "bob"  case customer is:    Customer(name):      # succeeds, binds name as "alice"  case customer is:    Customer("bob"):      # compares "alice" to "bob", fails      # if it had succeeded, wouldn't have bound any variables  name = "bob"  case customer is:    Customer(name as x):      # compares "alice" to name ("bob") and fails      # but would have bound x as "alice" if it had succeededThat seems like a rule that's reasonably understandable to me?
("expr as name" has definitely grown on me over "name := expr")I guess I'm thinking "literal" matching is just a special case of deconstructing in general, and done by a method like:Â Â class object:Â Â Â Â def __deconstruct__(self, obj):Â Â Â Â Â Â if self == obj:Â Â Â Â Â Â Â Â return ()Â Â Â Â Â Â else:Â Â Â Â Â Â Â Â raise NotDeconstructableThat would have the potential side-effect of allowing things like:Â Â 0 = nas a valid statement (it'd raise NotDeconstructable if 0 != n). Not sure that's much worse than:Â Â [] = nthough, which is already allowed. You could have a syntax error if there wasn't anything to bind though.