Introduction

Lately I’ve been fascinated with simple models of computation.

It’s astounding to me that systems like the lambda calculus and SKI combinators, defined in just a few lines, can express any computable function.

But there’s one thing in particular that has really surprised me: how much you can achieve with categorical products.

In this article, we’ll explore how to build a Joy-like stack machine from simple primitives.

A Quick Note

The code in this article is written in Haskell, but it should be easy to follow even if you’re new to the language.

Here’s a primer:

Syntax For	Python	JavaScript	Haskell
Function application	`f(x, y)`	`f(x, y);`	`f x y`
Function composition	`h = lambda x: g(f(x))`	`const h = x => g(f(x));`	`h = g . f`

For code examples, I’ll use λ> to indicate REPL input, and show the printed output below it:

λ> "Hello, " ++ "World!"
"Hello, World!"

You can try Haskell here if you’re curious.

Products

We’ll start by defining a data type for the terminal object, Unit:

data Unit = Unit

instance Show Unit where 
    show Unit = "()"

Which shows as follows:

λ> Unit
()

And a Product type to represent the categorical product:

data Product a b = Product a b

instance (Show a, Show b) => Show (Product a b) where
    show (Product a b) = "(" ++ show a ++ " >< " ++ show b ++ ")"

infixr 5 ><
(><) :: a -> b -> Product a b
(><) = Product

The last part defines a right-associative operator (><), so we can use math-like syntax for $a \times b$:

λ> 'a' >< 'b'
('a' >< 'b')

λ> 'a' >< 'b' >< 'c'
('a' >< ('b' >< 'c'))

Those of you familiar with Lisp may recognise this structure as an S-expression.

As McCarthy said in his seminal paper:

An S-expression is then simply an ordered pair…

We can can represent a list of arbitrary length in terms of S-expressions as follows.

The list (m1, m2, ··· , mn) is represented by the S-expression (m1 · (m2 · (···(mn · NIL)···)))
Here NIL is an atomic symbol used to terminate lists.

In the same spirit, we can think of our nested binary products as heterogeneous lists, with Unit playing the role of NIL.

The head of the list is the first element of the pair, and the tail is the second.

And if we can represent a list, we can represent a stack.

Projections

Categorical products come equipped with projections $\pi_{1}$ and $\pi_{2}$, which extract the first and second components of a pair respectively:

Source: Wikimedia Commons

Since we’re primarily thinking about lists, I’m going to use the names head and tail instead:

head :: Product h t -> h
head (Product h _) = h

tail :: Product h t -> t
tail (Product _ t) = t

λ> head ('a' >< 'b' >< 'c')
'a'

λ> tail ('a' >< 'b' >< 'c')
('b' >< 'c')

A well-formed list should terminate with Unit.

We can still extract a head and a tail from a singleton list:

λ> head ('a' >< Unit)
'a'

λ> tail ('a' >< Unit)
()

For Fork’s Sake

I’m going to bastardise the product diagram so you can see where we’re up to:

We can represent a list as a product of a head and a tail.

We can extract the head of a list with the head function, and the tail of a list with the tail function.

What we haven’t touched upon yet is the magic triangle in the middle, which we call fork.

For each object $A$ and each pair of morphisms $f : A \to H$ and $g : A \to T$, there’s a unique morphism $f \vartriangle g : A \to H \times T$.

I hope the naming convention makes it clear: fork, in this context, represents list-forming functions.

Any list-forming function is a fork.

So how do we express this in code?

fork :: (a -> h) -> (a -> t) -> a -> Product h t
fork f g a = f a >< g a

fork is a higher-order function that takes as arguments:

A function f, which maps an a to a head h.
A function g, which maps an a to a tail t.
And an a.

Now you’re probably wondering what exactly a could be.

In a simple case, we could pass two functions from Int to Int, and an integer a:

λ> fork (+ 10) (* 3) 10
(20 >< 30)

And get back a Product $Int \times Int$.

But things gets really interesting when a is itself a Product.

fork is a combinator that can take a list as input, and produce a list as output.

We can use it to construct everything else we need for our stack machine.

But before we do, let’s define an infix operator for fork:

(&&&) = fork

Which I think reads wonderfully:

list = head &&& tail

A list is a head and a tail.

Push and Pop

Let’s define the two essential stack operations: push and pop.

pop removes the top element of the stack, so it’s synonymous with tail:

pop = tail

λ> pop ('a' >< 'b' >< Unit)
('b' >< ())

For push, we could partially apply the >< operator:

push a = (a ><)

But I did say everything else could be defined in terms of fork, so let’s whet our appetite.

First of all, it’s helpful to think of stack functions as being polymorphic over the tail of the stack.

push takes any a, and any stack z, and returns a >< z.

We know that the fork combinator gives us back a product h >< t, so the head must be the value we’re pushing, and the tail must be the input stack.

push a = const a &&& id

You can read push as forming a list where the head is the constant a, and the tail is identical to the original list.

The const function in Haskell relates to the classical K combinator: K x y = x.

It ignores the second argument and returns the first.

In this case, it ignores the stack and returns the value a.

λ> const 'a' ('b' >< Unit)
'a'

id is the identity function (the classical I combinator), which is used here to preserve the input stack.

We can use push as follows:

λ> push 'a' ('b' >< Unit)
('a' >< ('b' >< ()))

And of course, pop after push should leave the stack unchanged:

λ> pop . push 'a' $ ('b' >< Unit)
('b' >< ())

The Stack Combinator

A list is a head and a tail.

We will often want to transform it as such; applying one function to the head, and another to the tail.

The product map (***) does exactly that:

(***) :: (a -> c) -> (b -> d) -> Product a b -> Product c d
(***) f g = (f . head) &&& (g . tail)

Notice that we haven’t introduced anything new - *** is defined in terms of fork.

A stack is a special type of list that can only be modified at one end.

While a list admits a pair of functions that may transform both the head and the tail, a stack function can only operate on the head of the stack.

To encode this property, we define a stack combinator:

stack :: (a -> b) -> Product a z -> Product b z
stack f = f *** id

Which applies a function f to the head of the stack, and can only preserve the tail.

The type system guarantees that the tail cannot change.

We can now apply unary functions with ease:

λ> stack (+ 10) (1 >< 2 >< Unit)
(11 >< (2 >< ()))

Binary Operations

Binary operations are a little bit trickier.

Traditionally, to add two numbers on a stack, you pop two values and push their sum.

In languages like Forth, you’ll often find an instruction called pop2, which we can easily replicate:

pop2 = pop . pop

But I find it easier to think about this as taking the head of a stack, and the head of its tail.

I’ll use the following function names to make things more concise:

-- head of head, head of tail, ...
hoh = head . head
hot = head . tail
tot = tail . tail
toh = tail . head

To extract a pair of values from the stack, we need a function $pair : (a \times (b \times c)) \to a \times b$.

Which we can define as:

pair = head &&& hot

λ> pair ('a' >< 'b' >< 'c' >< Unit)
('a' >< 'b')

How do we apply a binary operation to a pair?

To avoid having to define our own operators, let’s create a higher-order function that takes a curried Haskell function, and returns one that operates on an uncurried Product:

uncurry :: (a -> b -> c) -> Product a b -> c
uncurry f (Product a b) = f a b

λ> uncurry (+) (1 >< 2)
3

The final step is to be able to apply a function to a pair on a stack, while preserving everything beneath it.

Remembering that pair takes the head and the head of the tail:

pair = head &&& hot

We know the part that needs to be preserved is the tail of the tail.

Piecing it together, we have a disassociator function $(a \times (b \times c)) \to ((a \times b) \times c)$:

disassoc = pair &&& tot

And can now write a higher-order function called bop:

bop f = stack (uncurry f) . disassoc

Which pairs the top two elements of the stack, and then applies an uncurried binary operation to that pair.

λ> bop (+) (1 >< 2 >< Unit)
(3 >< ())

Perfect!

Dup and Swap

Every self-respecting stack language has dup and swap:

$$\begin{align} dup &: a \to a \times a \\ swap &: a \times b \to b \times a \end{align}$$

The former duplicates the top element, and the latter swaps the order of the top two elements.

dup is quite interesting because, like push, it preserves the entire input stack.

We can define it as:

dup = head &&& id

λ> dup ('a' >< 'b' >< Unit)
('a' >< ('a' >< ('b' >< ())))

swap on a pair could be defined as:

swap = tail &&& head

But on a stack it’s slightly more complicated.

We want to pair up the top two elements, swap them, and then unpair or flatten the stack again.

For this we need an associator function $((a \times b) \times c) \to (a \times (b \times c))$, which is the inverse of disassoc.

pair = head &&& hot
unpair = toh &&& tail

assoc = hoh &&& unpair
disassoc = pair &&& tot

Naturally, assoc after disassoc should leave the stack unchanged:

λ> assoc . disassoc $ ('a' >< 'b' >< 'c' >< Unit)
('a' >< ('b' >< ('c' >< ())))

We can now define swap as:

swap = assoc . stack (tail &&& head) . disassoc

λ> swap ('a' >< 'b' >< Unit)
('b' >< ('a' >< ()))

Credits & Further Reading

The ideas in this article are inspired by the following works:

Calculating Functional Programs, by Jeremy Gibbons
Compiling to Categories and Calculating Compilers Categorically, by Conal Elliott
The Theory of Concatenative Combinators, by Brent Kerby
The Mathematical Foundations of Joy and The Algebra of Joy, by Manfred Von Thun

The core ideas pop up in several places under different names.

If you’re interested to see some other applications, have a look at the Bifunctor library and writings on Arrows.

An Elegant Stack Machine

Table of contents