Just found out about Elixirs function argument pattern matching...

508 views
Skip to first unread message

Amith George

unread,
Sep 5, 2015, 4:24:19 AM9/5/15
to Clojure
Hi,

I just read a blog post [1] talking about Elixir pattern matching. I was thoroughly impressed with the way its handled in Elixir. I am posting this here cuz I got rather excited and wanted to discuss this with you all.

My experience with pattern matching is limited to the basics of F# and reading the docs of core.match. I think its a great idea, but I also feel unless its supported by language at the fundamental level, it remains as syntactic sugar.

All pattern matching code I had read previously, involve matching on a specific argument, often inside a function. From what I see in the blog post, in Elixir its taken one step further and pattern matching is done at the function declaration/invocation level. Normally one would create one outer public function and multiple private (?) functions to handle each branch. The outer function would only have the pattern matching code. At the very least, I find the Elixir version easier to read. It seems to be idiomatic for Elixir libraries, functions to return a tuple whose first (few?) elements are purely used for pattern matching.

Consider the following code samples from that blog post,

def to_registration_result({:ok, res}) do
 
{:ok, %Membership.RegistrationResult{
    success
: res["success"],
    message
: res["message"],
    new_id
: res["new_id"],
    validation_token
: res["validation_token"],
    authentication_token
: res["authentication_token"]
 
}}
end

def to_registration_result({:error, err}) do
 
{:error, err}
end

Normally, I would have one function with an if condition to check for an error value in the args and then call respective functions to handle each branch. I could also do

(defmulti to-registration-result first)

(defmethod to-registration-result :ok
 
[[_ email password]]
 
(println email password))

(defmethod to-registration-result :err
 
[[_ err]]
 
(println "err: " err))

But this wouldn't be idiomatic Clojure. This would also require all libraries and functions to return data in a certain form, one where in the first element is some kind of status.

He has another example where pattern matching is used with recursive functions to handle transition from one state/step to another and handle terminating conditions. This specific example is a very poor choice, but it does demonstrate the possibilities.

def map_single({:ok, res}) do
  cols
= res.columns
 
[first_row | _] = res.rows
  map_single
{:cols_and_first, cols, first_row}
end

def map_single({:cols_and_first, cols, first_row}) do
  zipped
= List.zip([cols,first_row])
  map_single
{:zipped, zipped}
end

def map_single({:zipped, list}) do
 
{:ok, Enum.into(list, %{})}
end

def map_single({:error, err}) do
 
{:error, err}
end

Interesting stuff.

[1] - http://rob.conery.io/2015/09/04/using-recursion-in-elixir-to-break-your-oo-brain/

James Reeves

unread,
Sep 5, 2015, 4:35:02 AM9/5/15
to clo...@googlegroups.com
You might want to take a look at defun: https://github.com/killme2008/defun

- James

--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Amith George

unread,
Sep 5, 2015, 8:16:09 AM9/5/15
to Clojure, ja...@booleanknot.com
Nice. Hadn't heard of it before. It looks interesting. The criterium benchmark is kinda disappointing though. The pattern matched function took nearly 15x the time of the normal function.

Performance aside, in Elixir, there seems to be an established convention for creating the function argument tuple. Every function can expect to be pattern matched against both :ok and :err. NodeJS callbacks also have follow a convention, error first, which would make it trivial to pattern match against. Goodbye to all those mundane `if (err) {} else {}` checks. I can't quite think of any similar conventions in Clojure.

Gary Verhaegen

unread,
Sep 5, 2015, 1:07:33 PM9/5/15
to clo...@googlegroups.com, ja...@booleanknot.com
It won't really help for the library/ecosystem problem, but for your own code I'd recommend watching Jeanine Atkinson's Conj talk from last year:

Rob Lally

unread,
Sep 5, 2015, 4:34:17 PM9/5/15
to clo...@googlegroups.com
Out of interest, I ran the benchmarks as is, and got more or less the same results - 15x. Then I tried upgrading the defun dependencies - clojure, core.match and tools.macro - all of which have newer versions, and then running the benchmarks without leiningen’s jvm-opts and in a trampolined repl. The results are better (see below). Still not great - but down from 15x to 10x. 

That said:

* I’m not sure I’d care: for most applications the overhead of function dispatch is probably not the bottleneck.
* Elixir and the BEAM VM are awesome at many things, but I suspect (from experience not evidence) that the defun version is still faster than the elixir version.


Rob

---

user=> (bench (accum-defn 10000))
WARNING: Final GC required 2.590098761776679 % of runtime
Evaluation count : 429360 in 60 samples of 7156 calls.
             Execution time mean : 139.664539 µs
    Execution time std-deviation : 4.701755 µs
   Execution time lower quantile : 134.451108 µs ( 2.5%)
   Execution time upper quantile : 150.214646 µs (97.5%)
                   Overhead used : 1.565276 ns

Found 5 outliers in 60 samples (8.3333 %)
        low-severe       5 (8.3333 %)
 Variance from outliers : 20.5880 % Variance is moderately inflated by outliers

user=> (bench (accum-defun 10000))
Evaluation count : 44940 in 60 samples of 749 calls.
             Execution time mean : 1.361631 ms
    Execution time std-deviation : 40.489537 µs
   Execution time lower quantile : 1.333474 ms ( 2.5%)
   Execution time upper quantile : 1.465123 ms (97.5%)
                   Overhead used : 1.565276 ns

Found 9 outliers in 60 samples (15.0000 %)
        low-severe       1 (1.6667 %)
        low-mild         8 (13.3333 %)
 Variance from outliers : 17.3434 % Variance is moderately inflated by outliers

---

Amith George

unread,
Sep 5, 2015, 6:37:39 PM9/5/15
to Clojure, ja...@booleanknot.com
Thanks, it helps to know using a tagged vector is a real pattern :) Gives the confidence to explore this further for my own code. 

Amith George

unread,
Sep 5, 2015, 6:55:08 PM9/5/15
to Clojure
* Elixir and the BEAM VM are awesome at many things, but I suspect (from experience not evidence) that the defun version is still faster than the elixir version.

In Clojure, the defun version is not the default or idiomatic way to write functions. I kind of expected it to be slower. Maybe if the core team felt it was good enough to be the default, they could make internal changes to the core to optimize stuff (pure speculation here). 

On the other hand, skimming through the elixir documentation gives me the feeling that pattern matched functions is the default way to write. As a first class language feature, I find it hard to believe it is slower (compared to what in Elixir?).   Or do you mean the Beam VM in general is slower than the JVM? 

Timothy Baldridge

unread,
Sep 5, 2015, 9:31:40 PM9/5/15
to clo...@googlegroups.com
>> Thanks, it helps to know using a tagged vector is a real pattern :) 

I don't know that it's a "real pattern". If I saw code like this in production I would probably raise quite a stink about it during code reviews. It's a cute hack, but it is also an abuse of a data structure. Now when I see [:foo 42] I don't know if I have a vector of data or a tagged value. It's a better idea IMO to use something like deftype or defrecord to communicate the type of something. I'd much rather see #foo.bar.Age{:val 42} than [:foo.bar/age 42]. At least then when I do (type val) I don't get clojure.lang.PersistentVector.

Timothy

--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
“One of the main causes of the fall of the Roman Empire was that–lacking zero–they had no way to indicate successful termination of their C programs.”
(Robert Firth)

Amith George

unread,
Sep 5, 2015, 10:43:07 PM9/5/15
to Clojure
In Elixir, tuples are used where in the first element is the tag. A similar thing can be done in Clojure using vectors. That much was clear. What bothered me and prompted me to start this thread was I wasn't sure "what" it is I was doing by creating that vector. Was it purely a convention thing? What did the convention mean? It looked weird.

The "aha" moment for me came while watching her talk. The tagged vector is simply a way to represent one case of an open discriminated union. Discriminated unions are a common approach in a language like F#. A function could return an union type Result, with cases Success and Error. And it gels very nicely with pattern matching.

In that light, I can acknowledge tagged vectors as a real pattern. Creating records for all these cases wouldn't be worth it in my opinion. Infact one would usually just return a map of values. The caller would branch depending on the presence of an `:err` key. Each branch would have access to the entire map and its not evident which keys are branch specific. The pattern matching approach on the other hand makes explicit the values for each branch.

Amith George

unread,
Sep 5, 2015, 11:49:59 PM9/5/15
to Clojure
defun, core.match, tagged vectors - seems like I can emulate Elixir function pattern match behaviour. I took some simple code I found online (https://twitter.com/Xzilend/status/640282621042233344) and rewrote it to 1) use only tagged vectors (not quite) and 2) use defun and tagged vectors. I am eager to hear your thoughts on the rewrite.

Original
(defn- register
 
[db mailer email password]
 
(if (users/valid? email password)
   
(if (users/taken? db email)
     
(response/bad-request {:errors [{:message "That email is taken."}]})
     
(do (users/do-create! db email password)
         
(users/notify mailer email activate-subject activate-text)
         
(response/ok (auth/authenticate email))))
   
(response/bad-request {:errors [{:message errors/invalid-creds}]})))

Lets ignore the possible concurrency issues for now.

Rewrite without using core.match

(defn register
 
([db mailer email password]
   
(register [:start {:db db :mailer mailer :email email :password password}]))
 
([status {:keys [db mailer email password] :as args}]
   
(cond
     
(= status :start)
     
(if (users/valid? email password)
       
(register [:check-unique args])
       
(register [:err {:res "Invalid credentials"}]))

     
(= status :check-unique)
     
(if (users/taken? db email)
       
(register [:create args])
       
(register [:err {:res "User already exisits"}]))

     
(= status :create)
     
(do (users/do-create! db email password)
         
(users/notify mailer email activate-subject activate-text)
         
(register [:ok {:res (auth/authenticate email)}]))

     
(= status :ok) (response/ok (:res args))

     
(= status :err) (response/bad-request {:errors [{:message (:res args)}]}))))


With no pattern matching I had to wrap the actual arguments into a map so that all recursive calls hit the two argument arity.

Using defun and tagged vectors,

(defn register [& args] (register' (into [:start] args)))

(defun- register'

 
([:start db mailer email password]
   
(if (users/valid? email password)
     
(register' [:check-unique db mailer email password])
     (register'
[:err "Invalid credentials"])))
 
([:check-unique db mailer email password]
   
(if (users/taken? db email)
     
(register' [:create db mailer email password])
     (register'
[:err "User already exists"])))
 
([:create db mailer email password]
   
(do (users/do-create! db email password)
       
(users/notify mailer email activate-subject activate-text)
       
(register' [:ok (auth/authenticate email)])))
  ([:ok res] (response/ok (:res args)))
  ([:err msg] (response/bad-request {:errors [{:message (:res args)}]})))

I couldn't find a way to refer to the arguments vector from within the body of a case. I would have preferred writing the :start case like this,

  ([:start _ _ email password]
   
(if (users/valid? email password)
     
(register' (into [:check-unique] it)
     (register'
[:err "Invalid credentials"])))



I find the defun version better compared to the plain if else version. Step transitions are explicit. Early exit is a transition to err state. The `response/bad-request` call occurs only once as opposed to at each early exit point.

James Reeves

unread,
Sep 6, 2015, 3:59:05 AM9/6/15
to clo...@googlegroups.com
On 6 September 2015 at 02:31, Timothy Baldridge <tbald...@gmail.com> wrote:
>> Thanks, it helps to know using a tagged vector is a real pattern :) 

I don't know that it's a "real pattern". If I saw code like this in production I would probably raise quite a stink about it during code reviews. It's a cute hack, but it is also an abuse of a data structure. Now when I see [:foo 42] I don't know if I have a vector of data or a tagged value. It's a better idea IMO to use something like deftype or defrecord to communicate the type of something. I'd much rather see #foo.bar.Age{:val 42} than [:foo.bar/age 42]. At least then when I do (type val) I don't get clojure.lang.PersistentVector.

I'll have to disagree with you here. To my mind, tagged literals don't quite have the same purpose as variants do.

For example, consider a map:

{:foreground #color/rgb "ff0000"
 :background #color/rgb "ffffff"}

The hex strings are given a tag to indicate the type of data they contain, while the keys in the map tell us the purpose of that data.

We clearly wouldn't write something like:

#{#foreground {:val #color/rgb "ff0000"}
  #background {:val #color/rgb "ffffff"}}

So given that, if we want to represent a single key/value pair, the most natural way to do it would be:

[:foreground #color/rgb "ff0000"]

Variants fulfil the same purpose as key/value pairs in a map. The key denotes a context-sensitive purpose for the data, rather than its type.

- James

Amith George

unread,
Sep 6, 2015, 4:57:50 AM9/6/15
to Clojure, ja...@booleanknot.com
TIL that "tagged literals" have an existing meaning in clojure. In my mind, the terms "tagged vector" and "tagged literal" were interchangeable. From a quick Google search there doesn't seem to be an existing meaning for "tagged vector". I think we can agree that it a representation of variants in Clojure.


    So given that, if we want to represent a single key/value pair, the most natural way to do it would be:

    [:foreground #color/rgb "ff0000"]

    Variants fulfil the same purpose as key/value pairs in a map. The key denotes a context-sensitive purpose for the data, rather than its type.


Could you elaborate on what you mean by variants being like a key value pair? Also, for the purposes of discussing variants, we can ignore reader tags right? I understand they serve a different purpose. The example in her talk had the map

{:type :pickup :store-id 123
 
:address nil
 
:email nil}

and the equivalent variant was `[:pickup 123]`

If the map was instead

{:pickup {:store-id 123}
 
:digital nil
 
:delivery nil}

then yes, a variant could be thought of as a key value pair. Is this what you meant?

Timothy Baldridge

unread,
Sep 6, 2015, 9:41:50 AM9/6/15
to clo...@googlegroups.com
>> "Variants fulfil the same purpose as key/value pairs in a map. The key denotes a context-sensitive purpose for the data, rather than its type."

Then use a key/value type. That's my problem with this approach, it abuses a collection type and therefore creates confusion as to the type of data is contains. At least create something like this:

(deftype Variant [name value])

Or if you don't want to be bothered, use clojure.lang.MapEntry. 

That way I won't accidentally use the variant with conj, concat, count, pop, push, or the dozens of other vector functions that don't apply to variants at all. You also can't extend protocols to them without also applying those protocols to vectors of all sizes. 

In addition a variant deftype (or record) will only need 1 allocation each time you create it. Doing [:foo :bar] requires two allocations: one for the vector, and one for the tail array. In addition it imposes another pointer deref on every access as you have to jump from the type to the tail array to the value. 

All in all, I see very little reason to use vectors as variants instead of a custom type, it complicates the code, and makes it harder to understand. Vectors aren't always vectors...they could also be variants?

Timothy

--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

James Reeves

unread,
Sep 6, 2015, 10:15:33 AM9/6/15
to clo...@googlegroups.com
On 6 September 2015 at 14:41, Timothy Baldridge <tbald...@gmail.com> wrote:
>> "Variants fulfil the same purpose as key/value pairs in a map. The key denotes a context-sensitive purpose for the data, rather than its type."

Then use a key/value type. That's my problem with this approach, it abuses a collection type and therefore creates confusion as to the type of data is contains.

I don't see why it "abuses a collection type". Vectors in Clojure are not inherently homogenous, any more than maps are. If it's valid to use a bare map to hold data, then it's equally valid to use a bare vector.

If you're going to argue that variants need to be wrapped in a type, then surely that must apply to maps as well. Yet bare maps are used all the time in Clojure. We don't need to enforce the shape of data with deftype and defrecord; if necessary, we can do that with preconditions, runtime schema, or static types.

That way I won't accidentally use the variant with conj, concat, count, pop, push, or the dozens of other vector functions that don't apply to variants at all.

You have the same problem with records; arbitrary keys can be added to the structure.
 
You also can't extend protocols to them without also applying those protocols to vectors of all sizes. 

But if you don't need protocols, then there's no problem.
 
In addition a variant deftype (or record) will only need 1 allocation each time you create it. Doing [:foo :bar] requires two allocations: one for the vector, and one for the tail array. In addition it imposes another pointer deref on every access as you have to jump from the type to the tail array to the value.

If that's your performance bottleneck, then optimise it with a deftype by all means, but I imagine that in most cases that's not going to make a significant difference.
 
All in all, I see very little reason to use vectors as variants instead of a custom type, it complicates the code, and makes it harder to understand. Vectors aren't always vectors...they could also be variants?

Variants are just one use of vectors. Vectors are not homogenous collections. It's perfectly valid to store a particular data value at a particular index. For example, a coordinate [x y] is a perfectly valid use for a vector in my book.

I'm not sure why you think that it "complicates the code, and makes it harder to understand". All the examples I've seen of variants have been exactly the opposite.

- James

Timothy Baldridge

unread,
Sep 6, 2015, 10:39:00 AM9/6/15
to clo...@googlegroups.com
>> I'm not sure why you think that it "complicates the code, and makes it harder to understand".

Alright, I'll use the phrase: using vectors as variants complects order with data. If you hand me a vector that says [:name "tim"] I have to know that "the first thing in this vector is the name of the thing, the second thing is the value of the thing". In addition, I have to know that you passed me a variant. But if you simply passed me either #my.app.Name["tim"] or #my.app.Variant{:key :name :value "tim"} I can programmatically deconstruct that value into something usable by my system. 

As much as possible I try to build my apps in such a way that the program can self-explore the data you give it to self-optimize, self-extend, or otherwise provide flexibility to the programmer. You can't do that with a variant as a vector. Hand a vector to a function and the logic has to be something like "If this is a two element vector and the first thing is a keyword, then this might be a variant, but I'm not sure because it could also just be a vector". While if you pass in a record, type, or even a variant type, it's trivial to have a program recognize that value and act upon it. 

As far as performance goes, this is normally the sort of thing that gets baked into an app at a pretty low level, that's why I suggest it should be as fast as possible. If you're going to be processing millions of these things during the life of an app, and transforming them from one format to another, a little tweaking here and there may save you some pain in the end. 


Timothy

--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Amith George

unread,
Sep 6, 2015, 11:37:34 PM9/6/15
to Clojure
In a possible messaging system, a user could denote a recipient using one of three ways - 1) select an existing user id, 2) enter a new name and email, 3) pick a placeholder "all my team mates".

 
Possible F# (might not compipe!!)

Recipient = | Placeholder of string
                   | Existing of ContactId
                   | OneoffRecipient of string * string


Possible options in Clojure

{:type :placeholder ; or :existing :one-off
 :placeholder-text :all-my-team-mates
 :name nil
 :email nil
 :contact-id nil
}

;; or

(defrecord PlaceholdRecipient [placeholder-text])
(defrecord ExistingReceipient [contact-id])
(defrecord OneoffReceipient [name email])

;; or

[:one-off name email]
[:existing contact-id]
[:placeholder text]
;; [:self]

The variant needed not be a two element vector. It is a 1 or more element vector. We could just as easily have a fourth variant [:self] with no data to indicate send message to self.

To actually use the above data structures and get the actual email addresses from them, we need atleast three methods - 1) fetch the emails of all the users team mates, 2) fetch the email for a contact in the datastore, 3) return the email as is. We could implement these using a multi method or implement some protocol or use a pattern matched method. Each choice has a different extensibility story and are all equally valid. We don't always replace every hashmap with a record. Variants are a viable alternative over using a map like the one shown in the first Clojure choice.

>> As much as possible I try to build my apps in such a way that the program can self-explore the data you give it to self-optimize, self-extend, or otherwise provide flexibility to the programmer. You can't do that with a variant as a vector. Hand a vector to a function and the logic has to be something like "If this is a two element vector and the first thing is a keyword, then this might be a variant, but I'm not sure because it could also just be a vector". While if you pass in a record, type, or even a variant type, it's trivial to have a program recognize that value and act upon it.

I have no experience writing apps that had to operate on any arbitrary data with any arbitrary structure. Pretty much all the functions in my apps could rely on data arriving in a particular format in a particular order (positional destructuring).

More importantly we are pretty much agreeing that [:tagged vectors with some data] are only a representation of a variant. One chosen out of convenience and existing feature set. We could have a defrecord Variant with two fields, a type keyword and value hashmap. If core.match was able to easily pattern match over a record as well as easily destructure the values in the value hashmap, we would use it. From what I have seen of core.match, it is nowhere near as easy/clean as pattern matching a vector.


>> As far as performance goes, this is normally the sort of thing that gets baked into an app at a pretty low level, that's why I suggest it should be as fast as possible. If you're going to be processing millions of these things during the life of an app, and transforming them from one format to another, a little tweaking here and there may save you some pain in the end.

Pattern matching itself is too slow for it to be viable in any performance sensitive context. That said, I feel optimize first for readability/maintainability, if profiling shows a bottleneck, then and only then optimize that area for performance.

Thomas Heller

unread,
Sep 7, 2015, 7:43:53 AM9/7/15
to Clojure
FWIW before I came to Clojure I did a lot of Erlang and in the beginning I was at the exact same spot wanting to use pattern matching everywhere because it is so damn cool. Same goes for tagged literals.

After a little while I realized that it is just not the way to do it in Clojure and forcing the Erlang(or Elixir)-Way onto Clojure is not ideal and probably overthinking it. Protocols provide a good/better solution to some problems and using Clojure's excellent handling&support of "data" solves the rest. While it might feel weird in the beginning, wait a while for Clojure to click and you probably won't even miss pattern matching.

Looking at the "(defn register [...])" example. Where is the problem with the first solution? It doesn't have the bugs the other implementations have and is extremely simple to reason about? The other two solutions do the exact same thing just slower with absolutely no gain. If you need the "status" abstraction use a real state machine. Don't write a recursive function when you don't have to. The code should never be at a point where it can be called with forged data and directly skip over the :create and :check-unqiue states which is possible in 2 of the 3 solutions.

Embrace the data and data all the things is all I have to say.

Also if you like types (using prismatic/schema [1], core.typed looks pretty similar):

(def ContactId s/Int)

(s/defrecord Placeholder
    [text :- s/Str])

(s/defrecord Existing
    [contact-id :- ContactId])

(s/defrecord OneOff
    [name :- s/Str
     email :- s/Str])

(def Recipient
  (s/either PlaceHolder
            Existing
            OneOff))

Just my 2 cents,
/thomas

James Reeves

unread,
Sep 7, 2015, 9:00:13 AM9/7/15
to clo...@googlegroups.com
On 6 September 2015 at 15:38, Timothy Baldridge <tbald...@gmail.com> wrote:
>> I'm not sure why you think that it "complicates the code, and makes it harder to understand".

Alright, I'll use the phrase: using vectors as variants complects order with data. If you hand me a vector that says [:name "tim"] I have to know that "the first thing in this vector is the name of the thing, the second thing is the value of the thing". In addition, I have to know that you passed me a variant. But if you simply passed me either #my.app.Name["tim"] or #my.app.Variant{:key :name :value "tim"} I can programmatically deconstruct that value into something usable by my system.

How? What can your program reasonably infer from that? If you remove the semantic information from your examples (which is only meaningful to humans), you get:

#xxx.yyy["zzz"]
#xxx.yyy{:aaa :www, :bbb "zzz"}

What does that allow us to programmatically infer? That the type is "xxx.yyy" and that it may contain strings and keywords, but that's about it. What good is that?

If the type satisfies known protocols, then you have more information, but if you don't use polymorphism the only benefit is that you know the type, which is almost meaningless by itself.

In fact, one could argue that it hampers programmatic analysis if you're using maps. For instance, if you have a variant to set an expiry date:

[:cache/expires #inst "2016-01-01T00:00:00.000Z"]

Then you can fold this into a map:

{:cache/resource #url "http://example.com/foo"
 :cache/expires #inst "2016-01-01T00:00:00.000Z"}

If you use a custom type instead of a variant:

#cache/Expires [#inst "2016-01-01T00:00:00.000Z"]

Then the software can no longer infer that this is related to the key on the map.

I also disagree that variants "complect order with data". By that logic, a coordinate of:

[4 3]

Should be better expressed as:

#coord2d {:x 4, :y 3}

And a function like:

(swap! a inc)

Should be expressed as:

(swap! :atom a, :function inc)

Keywords have their place, but I don't think using positional indexes to look up data is necessarily bad, assuming that the vector or list is very small.

It's worth noting that Datomic uses vectors to represent transactions, rather than maps or records, so presumably Rich and the other folks at Cognitect are not above using positional indexing in certain cases.
 
As far as performance goes, this is normally the sort of thing that gets baked into an app at a pretty low level, that's why I suggest it should be as fast as possible. If you're going to be processing millions of these things during the life of an app, and transforming them from one format to another, a little tweaking here and there may save you some pain in the end. 

I was curious as to whether records really were faster than a vector lookup. It turns out that vectors are faster:

=> (defrecord Foo [x y])
user.Foo

=> (let [f (->Foo 1 2)] (quick-bench (:x f)))
Execution time mean : 6.592069 ns

=> (let [f [1 2]] (quick-bench (f 0)))
Execution time mean : 4.758705 ns

=> (let [f [1 2]] (quick-bench (let [[x y] f] (+ x y))))
Execution time mean : 19.388727 ns

=> (let [f (->Foo 1 2)] (quick-bench (let [{:keys [x y]} f] (+ x y))))
Execution time mean : 68.845332 ns

That said, a core.match is still going to be slower than a protocol lookup, so depending on your use-case using a record might still be quicker, but in terms of actually pulling information out, vectors are quicker.

- James

James Reeves

unread,
Sep 7, 2015, 9:10:56 AM9/7/15
to Amith George, Clojure
On 6 September 2015 at 09:57, Amith George <strid...@gmail.com> wrote:
Could you elaborate on what you mean by variants being like a key value pair?

I mean that tags are usually used to describe what the data is, whereas keys are usually used to describe what the data is for. For instance, one might have a map like:

{:created-date #inst "2015-09-07T12:30:00.000Z"
 :modified-date #inst "2015-09-07T13:00:00.000Z"}

The tags describe what the values are: instances in time. The keys describe what the data is for: to record when the entity was created and modified.

In the same way, I'd consider variants another way of connecting a data value with an indicator of its purpose.

- James

James Reeves

unread,
Sep 7, 2015, 9:32:19 AM9/7/15
to clo...@googlegroups.com
On 7 September 2015 at 13:59, James Reeves <ja...@booleanknot.com> wrote:
On 6 September 2015 at 15:38, Timothy Baldridge <tbald...@gmail.com> wrote:
As far as performance goes, this is normally the sort of thing that gets baked into an app at a pretty low level, that's why I suggest it should be as fast as possible. If you're going to be processing millions of these things during the life of an app, and transforming them from one format to another, a little tweaking here and there may save you some pain in the end. 

I was curious as to whether records really were faster than a vector lookup. It turns out that vectors are faster:

I realised after I sent my email that you were earlier referring to creation rather than lookup. In such a case, it is slightly quicker to use a record:

=> (let [x 1 y 2] (quick-bench (->Foo x y)))
Execution time mean : 9.578002 ns
=> (let [x 1 y 2] (quick-bench [x y]))
Execution time mean : 12.213409 ns

So in terms of performance, it matters whether you're primarily reading or primarily writing. Vectors are faster for lookups, records are faster for creation. That said, if performance is important enough that even a couple of nanoseconds per iteration matter, then perhaps an array should be used instead.

- James

Amith George

unread,
Sep 7, 2015, 10:44:41 AM9/7/15
to Clojure
Looking at the "(defn register [...])" example. Where is the problem with the first solution? It doesn't have the bugs the other implementations have and is extremely simple to reason about? The other two solutions do the exact same thing just slower with absolutely no gain. If you need the "status" abstraction use a real state machine. Don't write a recursive function when you don't have to. The code should never be at a point where it can be called with forged data and directly skip over the :create and :check-unqiue states which is possible in 2 of the 3 solutions.

It bothered me as well that register could skip states. I couldn't figure out how to deal with that and resorted to making the function private :( . I was originally thinking - the sum of the all steps being the variant/sum-type and each step being one of the cases of the variant. However as you said, a state machine might be a better representation. Maybe this was a bad use case for variants. The output of the state machine steps could be a variant over :ok, :err. Either way lets ignore this usage.


(def Recipient
  (s/either PlaceHolder
            Existing
            OneOff))

This looks interesting. Where would I actually use this? I mean, if I have created three records, I may as well implement multi methods or protocols, right? Even if I don't do those, I will still need to use `(condp instance? obj ...)` or equivalent to select the appropriate branch for processing. Is there a way I can use Recipient to select a branch?

I am not proposing the variant vector to be used in place of records. I am looking at it as an improvement over using a hashmap containing a :type key. For some reason I am wary of records. The variant vector seemed like a good intermediate solution.

On Monday, 7 September 2015 17:13:53 UTC+5:30, Thomas Heller wrote:


Timothy Baldridge

unread,
Sep 7, 2015, 10:49:21 AM9/7/15
to clo...@googlegroups.com

>> Should be expressed as:
>> (swap! :atom a, :function inc)

One of Rich's talks on simplicity actually addresses that. He states that the above method (with keyword arguments) is actually simpler, but that this isn't exactly easy to program. 

And yes, I would take this same position about positional vectors being a type instead of a list or persistent vector. Same thing with Ratios we could represent them as [numerator denominator] but we don't as its simpler to give them their own type, and then I can easily determine if a [11 22] is a ratio, a 2D point or even a variant.

Types are good even in dynamic languages, we should use the more. As mentioned by Thomas, records are the idiomatic way to store typed information in Clojure. They're simple, elegant, and interop very cleanly with the rest of the system. So that's why I react when someone takes a conference talk as an example of "the way it should be done". The above video is a exploratory explanation of a different way of thinking of data, but I have never seen code like that in production, and it's so different from idiomatic Clojure, that I would be hesitant to adopt it wholesale. 

Timothy

Thomas Heller

unread,
Sep 7, 2015, 11:27:46 AM9/7/15
to Clojure

(def Recipient
  (s/either PlaceHolder
            Existing
            OneOff))

This looks interesting. Where would I actually use this? I mean, if I have created three records, I may as well implement multi methods or protocols, right? Even if I don't do those, I will still need to use `(condp instance? obj ...)` or equivalent to select the appropriate branch for processing. Is there a way I can use Recipient to select a branch? 

I probably wouldn't use protocols since I doubt there is a function signature that is exactly identical for all branches. Each branch probably needs access to different parts of your system (eg. database) and always passing everything to everything is not ideal.

Multi-Method is great if you want something openly extensible but that again seems unlikely here and also assumes that everything requires the same arguments.

cond(p) sounds perfect for this case. I tend to write each branch as a single function and keep the dispatch as compact as possible.

(defn send-placeholder [thing {:keys [text]}]
  ...)

(defn send-existing [db thing {:keys [contact-id]}]
  ...)

(defn send-one-off [something thing {:keys [name email]}]
  ...)

(defn send [app thing recipient]
  (condp instance? recipient
    PlaceHolder
    (send-placeholder thing recipient)
    Existing
    (send-existing (:db app) thing recipient)
    OneOff
    (send-one-off (:something app) thing recipient)))


That greatly reduces the cognitive load when looking at each separate implementation and also keeps the actual internal structure of the Recipient out of the dispatch. The conpd does not need to know how many fields are in OneOff, the tuple/vector/variant match versions must know that. 

/thomas






James Reeves

unread,
Sep 7, 2015, 2:42:02 PM9/7/15
to clo...@googlegroups.com
On 7 September 2015 at 15:49, Timothy Baldridge <tbald...@gmail.com> wrote:
Types are good even in dynamic languages, we should use the more. As mentioned by Thomas, records are the idiomatic way to store typed information in Clojure.

I don't think that's true. Or rather, I think it depends on what you mean by "type".

In core.typed parlance, a "type" is a means of validating the shape of data at compile time. We can use a vector to represent a variant, and still have it typed:

(defalias FooResult
  (U '[(Value :ok) Int]
     '[(Value :err) Str]))

Dynamic typing libraries like Schema or Annotate have a similar notion of types, but perform their checks at runtime. In both cases, the type is independent of the data; we can determine whether a data structure matches a specific type, but the type isn't tied to the data.

In contrast, we have a notion of what core.typed and edn calls a "tag", which is attached directly to the data structure. A record is essentially a tagged map, and if we didn't have to worry about performance, tags would probably just be implemented as metadata.

If we're talking about records, then we're talking about tags more than types, and the question becomes: what data should we tag?

If we want runtime polymorphism, then it makes sense to use a tag so we have something to dispatch off. But is there any other practical reason we'd want to use a tagged structure in Clojure?

I'm not convinced there is. Records are useful for polymorphism, and perhaps performance if that level of optimisation is necessary, but I don't see any other benefit.
 
The above video is a exploratory explanation of a different way of thinking of data, but I have never seen code like that in production, and it's so different from idiomatic Clojure, that I would be hesitant to adopt it wholesale.

A lot of people use Instaparse and Hiccup in production, which I believe the video explicitly mentions as being examples of variants.

- James

Amith George

unread,
Sep 7, 2015, 10:42:38 PM9/7/15
to Clojure
>> I probably wouldn't use protocols since I doubt there is a function signature that is exactly identical for all branches. Each branch probably needs access to different parts of your system (eg. database) and always passing everything to everything is not ideal.

>> Multi-Method is great if you want something openly extensible but that again seems unlikely here and also assumes that everything requires the same arguments.

I had the same concerns and wanted to use a simple function with pattern/condition matching to dispatch to the appropriate function. I had considered using Records as synonymous with using polymorphic calls (multi methods, protocols). Its good to know the alarm bells ringing in my head had merit :). Thanks for that. 

Thinking out loud, if we are not using Records for polymorphism, are we using it to guarantee structure? If you could indulge me a little more, consider the following two implementations.


(def OneOff [(s/one s/Str 'name) (s/one s/Str 'email)])

(defn send-one-off
  [something thing [name email :as data]]
  {:pre [(s/validate OneOff data)]}
  ,,,)


(defn send
  [app thing recipient]
  (match [recipient]
    [[:one-off & data]] (send-one-off (:something app) thing data)))




Individual schema are created for each variant case and are checked against by the respective functions. The dispatch function only needs to know how to check for individual cases of the variant.


(def OneOffMap {:type (s/eq :one-off) :name s/Str :email s/Str})
(def ExistingContactMap {:type (s/eq :existing) :contact-id s/Int})

(def Recipient (s/either ExistingContactMap OneOffMap))
;; s/either is deprecated and s/cond-pre is recommended
;; however, validating valid OneOffMap data using the following
;; still throws an error.
;; (def Recipient (s/cond-pre ExistingContactMap OneOffMap))

(defn send-one-off
  [something thing {:keys [name email] :as data}]
  ,,,)


(defn send
  [app thing recipient]
  {:pre [(s/validate Recipient recipient)]}
  (match [(:type recipient)]
    [:one-off] (send-one-off (:something app) thing recipient)))

This reverts to using a map with a :type key. Individuals schema for each :type value. A combined schema to represent a recipient. The dispatch function only needs to know about the existence of the :type key and the values it can handle.


Whether we use records or maps or variants, the dispatch function needs to know what contract is implemented by recipient. Whether it will be an instance of something, or a variant or a map with type keys. Neither versions care for any other data present in recipient or its structure.

At this point I am confused, what makes one version better than the other. Creating schema definitions for the records seemed a lot easier than for the other two. Also I am assuming that if records are created using s/defrecord, the factory functions (->, map->) will automatically validate them?

I really appreciate you taking the time to clarify things for me.

--
Amith

dennis zhuang

unread,
Sep 7, 2015, 11:40:33 PM9/7/15
to Clojure
Thanks for your benchmark.
I will upgrade all the dependencies and release 0.2.0

We are using defun with instparse in a DSL implementation, the performance is acceptable, but the code is much more readable.





--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
庄晓丹
Email:        killm...@gmail.com xzh...@avos.com
Site:           http://fnil.net
Twitter:      @killme2008


Thomas Heller

unread,
Sep 8, 2015, 6:38:20 AM9/8/15
to clo...@googlegroups.com
I don't use schema/core.typed much in my actual projects, while I have done many attempts it just never worked out. I like the idea and should definitely use them more but it is just too much of a moving system and not stable enough yet (eg. didn't even know s/either is deprecated).

If you look at these implementations

(def OneOff [(s/one s/Str 'name) (s/one s/Str 'email)])

(s/defrecord OneOff
    [name :- s/Str
     email :- s/Str])

(defrecord OneOff [name email])

All of them do more or less the same thing, just different. Clojure has really good support for records and they feel natural to me. I don't need to remember that :email is the second field. So I can do (:email one-off) instead of (get one-off 1). Remembering the positions can get tedious and very error prone over time. Always remember that software evolves over time.

(:email one-off) still works if my data changes to (defrecord OneOff [name note email]), the vector version doesn't. Referring to things by name is a good thing.

I do not know Elixir but in Erlang you very rarely use tuples for actual data, usually just messages and return values. Data is all Records, maybe maps these days but I left before R15B so can't say.

I usually only do data validation on the system boundary and trust in it after. Otherwise you might end up validating the same data over and over again. So if I get something from a user (eg. HTTP) I validate that it is what I expect and transform if needed. I have an explicit (if (is-this-what-i-expect? data) ...) and not something hidden in a {:pre ...} or some macro magic. I expect the data to be wrong (never trust the user) and want to test that assumption ASAP. I don't like to use exception for validation errors. Writing validation functions is not fun but it is very simple.

YMMV, do what feels right.

Keep it simple.

/thomas




--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to a topic in the Google Groups "Clojure" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/clojure/cuHfhVVE2zg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to clojure+u...@googlegroups.com.

James Reeves

unread,
Sep 8, 2015, 7:04:53 AM9/8/15
to clo...@googlegroups.com
On 8 September 2015 at 11:38, Thomas Heller <in...@zilence.net> wrote:
If you look at these implementations

(def OneOff [(s/one s/Str 'name) (s/one s/Str 'email)])

(s/defrecord OneOff
    [name :- s/Str
     email :- s/Str])

(defrecord OneOff [name email])

All of them do more or less the same thing, just different. Clojure has really good support for records and they feel natural to me. I don't need to remember that :email is the second field. So I can do (:email one-off) instead of (get one-off 1). Remembering the positions can get tedious and very error prone over time. Always remember that software evolves over time.

(:email one-off) still works if my data changes to (defrecord OneOff [name note email]), the vector version doesn't. Referring to things by name is a good thing.

I think everyone agrees that in the above example, a map or record would be better than a vector. The discussion is more is how to represent what's loosely analogous to a single key and value pair.

For instance, which one of these to you consider to be the best representation of a event to set the expiry time:

   [:cache/expire #inst "2015-09-08T12:00:00Z"]

   {:type :cache/expire, :value #inst "2015-09-08T12:00:00Z"}

   #cache.Expire [#inst "2015-09-08T12:00:00Z"]

   #cache.Expire {:value #inst "2015-09-08T12:00:00Z"}

- James

Thomas Heller

unread,
Sep 8, 2015, 7:42:03 AM9/8/15
to Clojure, ja...@booleanknot.com

For instance, which one of these to you consider to be the best representation of a event to set the expiry time:

   [:cache/expire #inst "2015-09-08T12:00:00Z"]

   {:type :cache/expire, :value #inst "2015-09-08T12:00:00Z"}

   #cache.Expire [#inst "2015-09-08T12:00:00Z"]

   #cache.Expire {:value #inst "2015-09-08T12:00:00Z"}

None of those, well the {:type ... :value ...} one is closest.

I tried to stay away from the Type/Variant discussion since I'm not familiar enough with all the theory behind it and generally like to be more practical. Also there isn't enough context in your question to give an acceptable answer. Generally I'd have something like (set-expiration-time thing time) but since you said "event" I assume you have some kind of messaging system. So I'd abstract the usual message patterns and use a message "envelope".

(defrecord Message [type payload])

type would probably be a Keyword and payload probably Any.

So to write it as edn:

#my.app/message [:cache/expire #inst "2015-09-08T12:00:00Z"]
#my.app/message {:type :cache/expire, :payload ...}

Note that this is ONLY the representation on-the-wire which you generally want to be compact as possible, so I'd choose the vector variant since it is more compact and doesn't encode the keys.

What I get when I "read" this data is not tied to the data format used on the wire though, don't mix those up. The wire-format has totally different requirements than your code.

/thomas






Reply all
Reply to author
Forward
0 new messages