SKI School: The Combinator Calculus Demystified

What is the SKI calculus?

A particular form of combinatory logic.
Can be used to represent any computable function.
Designed by Moses Schönfinkel in 1924 and further developed by Haskell Curry.
Predates Church's lambda calculus, but the latter proved more popular.

Q: What's a combinator?
A: A function that operates on a functions to produce a function.

Lambda calculus is the basis of Scheme or Haskell functions. It's based on substitution of parameters into function bodies. E.g.,

\x -> \y -> x

Is a function that takes a value, and returns a function that takes a second value, but returns the first value. So, applying it to 1 and 2:

(\x -> \y -> x) 1 2
--> 1

In Haskell, this is called const.

It's easy to write functions in this way, and lambda has become popular, appearing in languages such as Ruby, Python, and C#.

But, when it comes to doing mathematical proofs about functions, it's a bit of a pain. You have to make up variable names, and you have to worry about clashes. For example, which xs in the following expression refer to the same thing?

\x -> \x -> x

In combinatory logic, you don't define functions with parameters, you just apply a small set of functions to each other. A complete set of functions that can be used to represent all computations is called a basis.

One of those bases are the three functions S, K, and I: the SKI calculus. I'll write them in uppercase to make them stand out, but in Haskell you'd have to define them with lowercase letters.

I f = f

I is the identity function, called id in Haskell. It returns its argument.

K f g = f

K is the constant function, called const in Haskell. It takes two (curried) arguments and returns the first, ignoring the second argument.

S f g h = (f h) (g h)

S is the substitution function. It's a generalization of function application, that lets you apply both the function and the argument to an extra parameter before applying the function to the argument.

It helps for me to think of this in terms of functional reactive programming. I think of a function that is changing over time. Like one second it's adding .5, but a second later it has become a function to add 1.5. That's your f. Then imagine g is a value that's changing over time. Then at some specific point in time h, you want to apply the current snapshot of the function to the current snapshot of the value.

Except that, in combinatory logic, there are no numbers. There are only functions. Functions that take functions as arguments, and return functions, that take functions as arguments, and return functions. It is completely untyped. We will soon see how to recover a notion of number.

It may surprise you to find that S can also be found in the standard libraries of Haskell. It turns out to be the (-> t) instance of the applicative functor operator <*>.

You can pull lots of tricks with S. It's where all the power comes from. It may at first seem unsatisfying. Why is that bothersome extra h in the way? What about a function to swap argument order like flip in Haskell?

flip f x y = f y x

Well, you're not alone. Curry called flip C, and defined

B f g h = f (g h)

Schönfinkel called them T and Z, and also showed how they could be defined in terms of S and K.

But give S a chance. Isn't it fun to be minimalist?

Let's try a sample reduction:

S I K S
I S (K S)
S (K S)

We can't reduce any further, because S needs three arguments to reduce, and only has one. Let's try another:

S K S K
K K (S K)
K

Well, that turned out nice and short!

But how exactly are you supposed to compute things with these functions, if they only return other functions? There are no datatypes whatsoever!

Think for a second about how computer memory works. Everything is stored in binary. Looking at a byte, you don't know if it's supposed to be a letter, a number, or part of a pointer to some other value. It's all untyped. The typing is all in the interpretation. If you expect something to return an integer, you regard the result as an integer. So maybe there is a way to encode numbers in combinatory logic?

Encoding numbers

Let's take a step back and look at a few ways the natural numbers can be encoded. Computer hardware uses a binary representation, so

[3] -> 00000011

The set-theoretic definition of a natural number is the set of all natural numbers smaller than itself. So,

[3] -> {0, 1, 2} -> {{}, {{}}, {{}, {{}}}}

Peano numerals

Even though numbers are normally seen as values of all the same size, e.g. 32-bit numbers, it turns out you can treat numbers like a structured datatype. Let's make constructors for natural numbers:

data Nat = Zero | Succ Nat

Now we can translate the number three:

[3] -> Succ (Succ (Succ (Zero)))

Probably not very efficient, but it has a nice symmetry with our use of structured datatypes.

Church numerals

But we can't do any of those things with these untyped functions. So Alonzo Church came up with a way of encoding numerals in the lambda calculus. We call these the Church numerals.

It's closely related to the Peano numerals. Note that Succ is a function that takes a number and returns its successor. What if we abstracted out the constructors, just treated them as arbitrary functions?

zero s z = z
one s z = s z
two s z = s (s z)
three s z = s (s (s z))
...

These are the Church numerals. Feed in Succ and Zero as parameters, and you get back the Peano numerals!

three Succ Zero = Succ (Succ (Succ Zero))

Alternately, you could apply it to (+1) and 0:

three (+1) 0 = ((+1) ((+1) ((+1) 0))) = 3

(Nerd note: The Church numerals are catamorphisms over the natural numbers, that have been applied to a number, but are waiting for an F-algebra as parameters.)

Because the Church numerals are purely functions and don't rely on datatypes, we can represent them in the lambda calculus.

And anything you can represent in the lambda calculus, you can also represent with the SKI-calculus.

So how can we represent zero? It needs to take two curried arguments, and return the second.

zero = K I
K I s z
I z
z

By passing I to K, we can skip the first argument, instead of the second!

Now how about one?

one = I
I s z
s z

Even simpler!

Two is a lot trickier. Somehow we need to take an argument, and make it repeat itself. Note again the definition of S:

S f g h = f h (g h)

S is the only combinator that lets you repeat something! It repeats the h. But we don't want to apply s to s, we want to apply it to s applied to z. Hmm. So what we want to do is compose s with itself.

Let's break it down into two pieces: function composition, and duplication.

Function compostion

For function composition, we want a function . such that

f . g h = f (g h)

In other words, it's shifting the parenthesization from the (implicit) left-hand side to the right-hand side. Well, S has parentheses on the right:

S f g h
f h (g h)

But there's that problematic h on the left. How can we get rid of the function in the second position? K!

K f h (g h)
f (g h)

So our answer is K S, right? Not so fast:

K S f g h
S g h

We want... S K?

S K f g h
K g (f g) h

No, it's not even getting to the h! We need K to be applied to f first:

S (K f) g h
K f h (g h)
f (g h)

Yeah. Make that K eat the first h. But... how do we get there? We can't just stick the f in with the K like that. It looks like we need the . operator to solve our problem!

S . K f g h
S (K f) g h

So we are stuck needing . to write .! But it turns out that we can take advantage of the fact that we know what functions we are trying to compose. Consider the following, where E1 and E2 are known combinatory expressions:

S (K E1) E2 f
K E1 f (E2 f)
E1 (E2 f)

So, we can't just shove E1 in there in the case it's some parameter, but we can do it when we know in advance what it's supposed to be. OK, let's apply that, where E1 = S and E2 = K:

S (K S) K f g h
K S f (K f) g h
S (K f) g h
K f h (g h)
f (g h)

Voilà! So S (K S) S is our composition operator:

f . g = S (K S) S f g

Replicating arguments

How can we use our handy-dandy S combinator to replicate arguments? Consider again the definition of S:

S f g h = f h (g h)

What if g were I? Then we'd get f h h. That's great... except we can't control g... we can only substitute predefined arguments starting on the left.

But wait... what if f were the composition operator? After all, what we want to do is compose h with itself. So,

S (.) I h
(.) h (I h)
(.) h h
h . h

Great! Now all we need to do is expand . into our definition:

S (S (K S) K) I s z
S (K S) K s (I s) z
S (K S) K s s z
K S s (K s) s z
S (K s) s z
K s z (s z)
s (s z)

And we have our definition for the Church numeral two!

two = S (S (K S) K) I

Wow, that was complicated. Fortunately, it turns out all the other numerals just involve repeating the magical S (S (K S) K).

three = S (S (K S) K) (S (S (K S) K) I)

You may have noticed that the SKI calculus is verbose compared to the lambda calculus. That's the down side. Due to the smaller number of symbols, it requires lengthier encodings for the same functions.

But just like with the lambda calculus, you can combine functions and give the new functions their own names, to ease the notational burden.

Testing it out

Although SKI-calculus is untyped, we have seen that we can regard output as representing particular values. Church numerals can be converted to Peano numerals just by applying them to Succ and Zero.

Other magical incantations

Reverse application:

S (K (S I)) K f g
K (S I) f (K f) g
S I (K f) g
I g (K f g)
g (K f g)
g f

Self-application:

S I I f
I f (I f)
f f

Crazy infinite loop:

S I I (S I I)
I (S I I) I (S I I)
S I I (S I I)

Y combinator for ultimate recursion power:

Y = S (K (S I I)) (S (S (K S) K) (K (S I I)))

The burden of guessing

Since the SKI calculus and lambda calculus are equivalent, it turns out there is a straightforward transformation from one to the other, and vice versa.

So if we can write something as a function in lambda calculus, we can mechanically translate it to SKI.

Making it smaller

So you thought it was fun doing everything with only three combinators, right? It turns out I isn't necessary. Consider:

S K K f
K f (K f)
f

So S K K is the same as I! I is just a convenience. You may also notice that the second K was ignored, so S K S works just as well.

It turns out we can do it all with just one combinator! Chris Barker invented a combinator called iota, or ι, that can be used to generate S and K.

Sort of like how the square root of -1, when multiplied by itself, gives you -1, iota applied to itself gives you I! Here, I'll refer to iota as the X combinator for easier reading.

I = X X

Here's the definition of X:

X f = f S K

That's it. It feeds S and K to a function as arguments. Let's see what happens when we keep chaining S and K:

S K S K
K K (S K)
K

So, X (X X) = K! Let's keep going:

K S K
S

So, X (X (X X)) = S! We have recovered the original combinators from self-application of X. Keep going:

S S K

Well, that can't be reduced. One more cycle:

S S K S K
S S (K S) K
S K (K S K)

Hmm... that looks complicated. But wait: we know that S K followed by any function is I! So it just reduces to I.

We have a five-step rotation:

X X       = I
X I       = S K
X (S K)   = K
X K       = S
X S       = S S K
X (S S K) = I

You can play with iota in Javascript on Barker's page.

What is iota in SKI?

Here is a step-by-step translation of iota to SKI:

\f -> (f S) K
S (\f -> f S) (\f -> K)
S (\f -> f S) (K K)
S (S (\f -> f) (\f -> S)) (K K)
S (S I (K S)) (K K)

Here is a proof that it works:

S (S I (K S)) (K K) f
S I (K S) f (K K f)
I f (K S f) (K K f)
f (K S f) (K K f)
f S (K K f)
f S K

To heck with parens

Barker didn't stop there. He thought, don't all those parentheses make reasoning a bit hard? You need a parser to read them out. So he also invented Jot, a representation with two combinators, and no parentheses.

To facilitate writing things in binary, he called the combinators 0 and 1. The 0 combinator is just iota.

0 f = f S K

The 1 combinator is just function composition!

1 f g h = f (g h)

Just for fun, the empty string is defined to be I.

Here's how to encode K in Jot. I'll put brackets around the Jot code, and incrementally evaluate it into SKI.

[11100]
[1110]SK
[111]SKSK
[11](SK)SK
[1](SKS)K
[](SKSK)
ISKSK
SKSK
K

We know that last step from the reasoning above. And here's S:

[11111000]
[1111100]SK
[111110]SKSK
[11111]SKSKSK
[1111](SK)SKSK
[111](SKS)KSK
[11](SKSK)SK
[1](SKSKS)K
[](SKSKSK)
SKSKSK
S

It may seem like the 1s are unnecessary, but they are needed if you are nesting these inside some larger expression, to maintain grouping.