If you’re located near London, you might be interested in the fact that I will give a talk about the contents of this blogpost at the Haskell eXchange 2018 that is happening Thursday 11th - Friday 12th of October.

After that conference, a video should be available as well which I will link here.

This post makes a bit of a departure from the “practical Haskell” I usually try to write about, although – believe it or not – this blogpost actually originated from a very practical origin ^{1}.

This blogpost is a literate Haskell file, which means you can just download it here and load it into GHCi to play around with it. In this case, you can also verify the properties we will be talking about (yes, GHC as a proof checker). Since we are dipping our toes into dependent types territory here, we will need to enable some extensions that are definitely a bit more on the advanced side.

```
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE GADTs #-}
{-# LANGUAGE KindSignatures #-}
{-# LANGUAGE PolyKinds #-}
{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE TypeOperators #-}
{-# LANGUAGE UndecidableInstances #-}
```

Since the goal of this blogpost is mainly educational, we will only use a few standard modules and generally define things ourselves. This also helps us to show that there is no magic going on behind the scenes: all term-level functions in this file are total and compile fine with `-Wall`

.

```
import Data.List (intercalate, minimumBy)
import Data.List.NonEmpty (NonEmpty (..))
import qualified Data.List.NonEmpty as NonEmpty
import Data.Ord (comparing)
```

I assume most readers will be at least somewhat familiar with the standard length-indexed list:

```
data Nat = Zero | Succ Nat deriving (Show)
data Vec (n :: Nat) a where
VNull :: Vec 'Zero a
VCons :: a -> Vec n a -> Vec ('Succ n) a
```

These vectors carry their length in their types. In GHCi:

```
*Main> :t VCons "Hello" (VCons "World" VNull)
Vec ('Succ ('Succ 'Zero)) [Char]
```

This blogpost defines a similar way to deal with binomial heaps. Binomial heaps are one of my favorite data structures because of their simple elegance and the fascinating way their structure corresponds to binary numbers.

We will combine the idea of Peano number-indexed lists with the idea that binomial heaps correspond to binary numbers to lift **binary numbers to the type level**. This is great because we get *O(log(n))* size and time in places where we would see *O(n)* for the Peano numbers defined above (in addition to being insanely cool). In GHCi:

```
*Main> :t pushHeap 'a' $ pushHeap 'b' $ pushHeap 'c' $
pushHeap 'd' $ pushHeap 'e' emptyHeap
Heap ('B1 ('B0 ('B1 'BEnd))) Char
```

Where *101* ^{2} is, of course, the binary representation of the number 5.

Conveniently, *101* also represents the basics of a subject. So the title of this blogpost works on two levels, and we present an introductory-level explanation of a non-trivial (and again, insanely cool) example of dependent Haskell programming.

- Introduction
- Singletons and type equality
- Binomial heaps: let’s build it up
- Binomial heaps: let’s break it down
- Acknowledgements
- Appendices

If I perform an appropriate amount of hand-waving and squinting, I feel like there are two ways to work with these stronger-than-usual types in Haskell. We can either make sure things are correct by *construction*, or we can come up with a *proof* that they are in fact correct.

The former is the simpler approach we saw in the `Vec`

snippet: by using the constructors provided by the GADT, our constraints are always satisfied. The latter builds on the singletons approach introduced by Richard Eisenberg and Stephanie Weirich.

We need both approaches for this blogpost. We assume that the reader is somewhat familiar with the first one and in this section we will give a brief introduction to the second one. It is in no way intended to be a full tutorial, we just want to give enough context to understand the code in the blogpost.

If we consider a closed type family for addition of natural numbers (we are using an `N`

prefix since we will later use `B`

for addition of binary numbers):

```
type family NAdd (x :: Nat) (y :: Nat) :: Nat where
NAdd ('Succ x) y = 'Succ (NAdd x y)
NAdd 'Zero y = y
```

We can trivially define the following function:

`NAdd 'Zero x`

is easily reduced to `x`

by GHC since it is simply a clause of the type family, so it accepts the definition `id`

. However, if we try to write

We run into trouble, and GHC will tell us:

`Couldn't match type ‘x’ with ‘NAdd x 'Zero’`

We will need to prove to GHC that these two types are equal – commutativity doesn’t come for free! This can be done by providing evidence for the equality by way of a GADT constructor ^{3}.

```
data EqualityProof (a :: k) (b :: k) where
QED :: EqualityProof a a
type a :~: b = EqualityProof a b
```

Take a minute to think about the implications this GADT has – if we can construct a `QED`

value, we are actually providing evidence that the two types are equal. We assume that the two types (`a`

and `b`

) have the same kind `k`

^{4}.

The `QED`

constructor lives on the term-level though, not on the type-level. We must synthesize this constructor using a term-level computation. This means we need a term-level representation of our natural numbers as well. This is the idea behind *singletons* and again, a much better explanation is available in said paper and some talks, but I wanted to at least provide some intuition here.

The singleton for `Nat`

is called `SNat`

and it’s easy to see that each `Nat`

has a unique `SNat`

and the other way around:

We can use such a `SNat`

to define a proof for what we are trying to accomplish. Since this proof can be passed any `n`

in the form of an `SNat`

, it must be correct for all `n`

.

GHC can figure out the base case on its own by reducing `NAdd 'Zero 'Zero`

to `'Zero`

:

And we can use induction to complete the proof. The important trick here is that in the body of the pattern match on `EqualityProof a b`

, GHC knows that `a`

is equal to `b`

.

This can be used to write `cast02`

:

`cast02`

takes an extra parameter and there are several ways to synthesize this value. The common one is a typeclass that can give us an `SNat x`

from a `Proxy x`

. In this blogpost however, we keep things simple and make sure we always have the right singletons on hand by passing them around in a few places. In other words: don’t worry about this for now.

A *binomial heap* consists of zero or more *binomial trees*. I will quote the text from the Wikipedia article here since I think it is quite striking how straightforward the definition translates to GADTs that enforce the structure:

- A binomial tree of order 0 is a single node
- A binomial tree of order k has a root node whose children are roots of binomial trees of orders k−1, k−2, …, 2, 1, 0 (in this order).

```
data Children (k :: Nat) a where
CZero :: Children 'Zero a
CCons :: Tree k a -> Children k a -> Children ('Succ k) a
```

Some illustrations to make this a bit more clear:

This is definitely a very good example of the correctness by construction approach I talked about earlier: it is simply impossible to create a tree that does not have the right shape.

Empty trees do not exist according to this definition. A singleton tree is easy to create:

We only need to define one operation on trees, namely merging two trees.

A tree of order `k`

has `2ᵏ`

elements, so it makes sense that merging two trees of order `k`

creates a tree of order `k+1`

. We can see this in the type signature as well:

Concretely, we construct the new tree by taking either the left or the right tree and attaching it as new child to the other tree. Since we are building a heap to use as a priority queue, we want to keep the smallest element in the root of the new tree.

```
mergeTree l@(Tree lroot lchildren) r@(Tree rroot rchildren)
| lroot <= rroot = Tree lroot (CCons r lchildren)
| otherwise = Tree rroot (CCons l rchildren)
```

With these trees defined, we can move on to *binomial heaps*.

While binomial trees are interesting on their own, they can really only represent collections that have a number of elements that are exactly a power of two.

Binomial heaps solve this in a surprisingly simple way. A binomial heap is a collection of binomial trees where we may only have at most one tree for every order.

This is where the correspondence with binary numbers originates. If we have a binomial heap with 5 elements, the only way to do this is to have binomial trees of orders 2 and 0 (2² + 2⁰ = 5).

We start out by defining a simple datatype that will be lifted to the kind-level, just as we did with `Nat`

:

It’s important to note that we will represent binary numbers in a right-to-left order since this turns out to match up more naturally with the way we will be defining heaps.

For example, the type:

`'B0 ('B1 ('B1 'BEnd))`

represents the number 6 (conventionally written *110*).

I think it is fairly common in Haskell for a developer to play around with different ways of representing a certain thing until you converge on an elegant representation. This is many, many times more important when we are dealing with dependently-typed Haskell.

Inelegant and awkward data representations can make term-level programming clunky. Inelegant and awkward type representations can make type-level programming downright infeasible due to the sheer amount of lemmas that need to be proven.

Consider the relative elegance of defining a type family for incrementing a binary number that is read from the right to the left:

```
type family BInc (binary :: Binary) :: Binary where
BInc 'BEnd = 'B1 'BEnd
BInc ('B0 binary) = 'B1 binary
BInc ('B1 binary) = 'B0 (BInc binary)
```

Appendix 3 contains an (unused) implementation of incrementing left-to-right binary numbers. Getting things like this to work is not too much of a stretch these days (even though GHC’s error messages can be very cryptic). However, due to the large amount of type families involved, proving things about it presumably requires ritually sacrificing an inappropriate amount of Agda programmers while chanting Richard Eisenberg’s writings.

To that end, it is almost always worth spending time finding alternate representations that work out more elegantly. This can lead to some arbitrary looking choices – we will see this in full effect when trying to define CutTree further below.

Addition is not too hard to define:

```
type family BAdd (x :: Binary) (y :: Binary) :: Binary where
BAdd 'BEnd y = y
BAdd x 'BEnd = x
BAdd ('B0 x) ('B0 y) = 'B0 (BAdd x y)
BAdd ('B1 x) ('B0 y) = 'B1 (BAdd x y)
BAdd ('B0 x) ('B1 y) = 'B1 (BAdd x y)
BAdd ('B1 x) ('B1 y) = 'B0 (BInc (BAdd x y))
```

Let’s quickly define a number of examples

```
type BZero = 'B0 'BEnd
type BOne = BInc BZero
type BTwo = BInc BOne
type BThree = BInc BTwo
type BFour = BInc BThree
type BFive = BInc BFour
```

This allows us to play around with it in GHCi:

```
*Main> :set -XDataKinds
*Main> :kind! BAdd BFour BFive
BAdd BFour BFive :: Binary
= 'B1 ('B0 ('B0 ('B1 'BEnd)))
```

Finally, we define a corresponding singleton to use later on:

```
data SBin (b :: Binary) where
SBEnd :: SBin 'BEnd
SB0 :: SBin b -> SBin ('B0 b)
SB1 :: SBin b -> SBin ('B1 b)
```

Our heap will be a relatively simple wrapper around a recursive type called `Forest`

. This datastructure follows the definition of the binary numbers fairly closely, which makes the code in this section surprisingly easy and we end up requiring no lemmas or proofs whatsoever.

A `Forest k b`

refers to a number of trees starting with (possibly) a tree of order `k`

. The `b`

is the binary number that indicates the shape of the forest – i.e., whether we have a tree of a given order or not.

Using a handwavy but convenient notation, this means that *Forest 3 101* refers to binomial trees of order 3 and 5 (and no tree of order 4).

```
data Forest (k :: Nat) (b :: Binary) a where
FEnd :: Forest k 'BEnd a
F0 :: Forest ('Succ k) b a -> Forest k ('B0 b) a
F1 :: Tree k a -> Forest ('Succ k) b a -> Forest k ('B1 b) a
```

Note that we list the trees in increasing order here, which contrasts to `Children`

, where we listed them in decreasing order. You can see this in the way we are removing layers of `'Succ`

as we add more constructors. This is the opposite of what happens in `Children`

.

The empty forest is easily defined:

`insertTree`

inserts a new tree into the forest. This might require merging two trees together – roughly corresponding to carrying in the binary increment operation.

```
insertTree
:: Ord a
=> Tree k a -> Forest k b a
-> Forest k (BInc b) a
insertTree s FEnd = F1 s FEnd
insertTree s (F0 f) = F1 s f
insertTree s (F1 t f) = F0 (insertTree (mergeTree s t) f)
```

Similarly, merging two forests together corresponds to adding two binary numbers together:

```
mergeForests
:: Ord a
=> Forest k lb a -> Forest k rb a
-> Forest k (BAdd lb rb) a
mergeForests FEnd rf = rf
mergeForests lf FEnd = lf
mergeForests (F0 lf) (F0 rf) = F0 (mergeForests lf rf)
mergeForests (F1 l lf) (F0 rf) = F1 l (mergeForests lf rf)
mergeForests (F0 lf) (F1 r rf) = F1 r (mergeForests lf rf)
mergeForests (F1 l lf) (F1 r rf) =
F0 (insertTree (mergeTree l r) (mergeForests lf rf))
```

It’s worth seeing how the different branches in `insertTree`

and `mergeForests`

match up almost 1:1 with the different clauses in the definition of the type families `BInc`

and `BAdd`

. If we overlay them visually:

That is the intuitive explanation as to why no additional proofs or type-level trickery are required here.

Here is an informal illustration of what happens when we don’t need to merge any trees. The singleton `Forest`

on the left is simply put in the empty `F0`

spot on the right.

When there is already a tree there, we merge the trees using `mergeTree`

and carry that, in a very similar way to how carrying works in the addition of binary numbers:

The `Forest`

structure is the main workhorse and `Heap`

is just a simple wrapper on top of that, where we start out with a tree of order 0:

The operations on `Heap`

are also simple wrappers around the previously defined functions:

```
pushHeap :: Ord a => a -> Heap b a -> Heap (BInc b) a
pushHeap x (Heap forest) = Heap (insertTree (singletonTree x) forest)
```

```
mergeHeap :: Ord a => Heap lb a -> Heap rb a -> Heap (BAdd lb rb) a
mergeHeap (Heap lf) (Heap rf) = Heap (mergeForests lf rf)
```

We are now ready to show this off in GHCi again:

```
*Main> :t pushHeap 'a' $ pushHeap 'b' $ pushHeap 'c' $
pushHeap 'd' $ pushHeap 'e' emptyHeap
Heap ('B1 ('B0 ('B1 'BEnd))) Char
```

We can also take a look at the internals of the datastructure using a custom show instance provided in the appendix 2:

```
*Main> pushHeap 'a' $ pushHeap 'b' $ pushHeap 'c' $
pushHeap 'd' $ pushHeap 'e' emptyHeap
(tree of order 0)
'a'
(no tree of order 1)
(tree of order 2)
'b'
'd'
'e'
'c'
```

Neat!

I think it’s interesting that we have implemented an append-only heap without even requiring any lemmas so far. It is perhaps a good illustration of how append-only datastructures are conceptually much simpler.

Things get *significantly* more complicated when we try to implement popping the smallest element from the queue. For reference, I implemented the current heap in a couple of hours, whereas I worked on the rest of the code on and off for about a week.

Let’s look at a quick illustration of how popping works.

We first select the tree with the smallest root and remove it from the heap:

We break up the tree we selected into its root (which will be the element that is “popped”) and its children, which we turn into a new heap.

We merge the remainder heap from step 1 together with the new heap we made out of the children of the removed tree:

The above merge requires carrying twice.

We will start by implementing step 2 of the algorithm above since it is a bit easier. In this step, we are taking all children from a tree and turning that into a new heap.

We need to keep all our invariants intact, and in this case this means tracking them in the type system. A tree of `k`

has `2ᵏ`

elements. If we remove the root, we have `k`

children trees with `2ᵏ - 1`

elements in total. Every child becomes a tree in the new heap. This means that the heap contains `k`

full trees, and its shape will be written as `k`

“1”s. This matches our math: if you write `k`

“1”s, you get the binary notation of `2ᵏ - 1`

.

Visually:

We introduce a type family for computing `n`

“1”s:

We will use a helper function `childrenToForest_go`

to maintain some invariants. The wrapper `childrenToForest`

is trivially defined but its type tells us a whole deal:

```
childrenToForest
:: Children n a
-> Forest 'Zero (Ones n) a
childrenToForest children =
childrenToForest_go SZero (childrenSingleton children) FEnd children
```

We use `childrenSingleton`

to obtain a singleton for `n`

.

```
childrenSingleton :: Children n a -> SNat n
childrenSingleton CZero = SZero
childrenSingleton (CCons _ c) = SSucc (childrenSingleton c)
```

The tricky bit is that the list of trees in `Children`

has them in descending order, and we want them in ascending order in `Forest`

. This means we will have to reverse the list.

We can reverse a list easily using an accumulator in Haskell. In order to maintain the type invariants at every step, we will increase the size of the accumulator as we decrease the size of the children. This can be captured by requiring that their sum remains equal (`m ~ NAdd x n`

).

```
childrenToForest_go
:: m ~ NAdd x n
=> SNat x
-> SNat n
-> Forest n (Ones x) a
-> Children n a
-> Forest 'Zero (Ones m) a
```

I will not always go into detail on how the lemmas apply but let’s do it here nonetheless.

For the base case, we simply want to return our accumulator. However, our accumulator has the type `Forest n (Ones x)`

and we expect something of the type `Forest n (Ones m)`

. Furthermore, we know that:

```
n ~ 'Zero, m ~ NAdd x n
⊢ m ~ NAdd x 'Zero
```

We need to prove that `x ~ m`

in order to do the cast from `Forest n (Ones x)`

to `Forest n (Ones m)`

.

We can do so by applying `lemma1`

to `x`

(the latter represented here by `xnat`

). This gives us `lemma1 xnat :: NAdd x 'Zero :~: n`

. Combining this with what we already knew:

```
m ~ NAdd x 'Zero, NAdd x 'Zero ~ n
⊢ m ~ x
```

…which is what we needed to know.

The inductive case is a bit harder and requires us to prove that:

```
m ~ NAdd x n, m ~ NAdd x n, n ~ 'Succ k
⊢ Ones m ~ 'B1 (Ones (NAdd x k))
```

GHC does a great job and ends up with something like:

`Ones (NAdd x (Succ k)) ~ 'B1 (Ones (NAdd x k))`

Which only requires us to prove commutativity on `NAdd`

. You can see that proof in `lemma2`

a bit further below. This case also illustrates well how we carry around the singletons as inputs for our lemmas and call on them when required.

```
childrenToForest_go xnat (SSucc nnat) acc (CCons tree children) =
case lemma2 xnat nnat of
QED -> childrenToForest_go
(SSucc xnat)
nnat
(F1 tree acc)
children
```

Proving `lemma2`

is trivial… once you figure out what you need to prove and how all of this works.

It took me a good amount of time to put the different pieces together in my head. It is not only a matter of proving the lemma: restructuring the code in `childrenToForest_go`

leads to different lemmas you can attempt to prove, and figuring out which ones are feasible is a big part of writing code like this.

```
lemma2 :: SNat n -> SNat m -> NAdd n ('Succ m) :~: 'Succ (NAdd n m)
lemma2 SZero _ = QED
lemma2 (SSucc n) m = case lemma2 n m of QED -> QED
```

These are some minor auxiliary functions we need to implement on `Vec`

. We mention them here because we’ll also need two type classes dealing with non-zeroness.

First, we need some sort of `map`

, and we can do this by implementing the `Functor`

typeclass.

Secondly, we need a very simple function to convert a `Vec`

to a list. Note that this erases the information we have about the size of the list.

Using `vecToList`

, we can build a function to convert a non-empty `Vec`

to a `NonEmpty`

list. This uses an additional `NNonZero`

typeclass.

```
vecToNonEmpty :: NNonZero n ~ 'True => Vec n a -> NonEmpty a
vecToNonEmpty (VCons x v) = x :| vecToList v
```

Non-zeroness can be defined on binary numbers as well:

```
type family BNonZero (b :: Binary) :: Bool where
BNonZero 'BEnd = 'False
BNonZero ('B1 b) = 'True
BNonZero ('B0 b) = BNonZero b
```

You might be asking why we cannot use a simpler type, such as:

It we use this, we run into trouble when trying to prove that a `Vec`

is not empty later on. We would have to construct a singleton for `n`

, and we only have something that looks a bit like `∃n. 'Succ n`

. Trying to get the `n`

out of that requires some form of non-zeroness constraint… which would be exactly what we are trying to avoid by using the simpler type. ^{5}

The minimal element will always be the root of one of our trees. That means we have as many choices for our minimal element as there are trees in our heap. We need some way to write down this number as a type.

Since we have a tree for every *1* in our binary number, we can define the number of trees as the popcount of the binary number.

In a weird twist of fate, you can also pretend this stands for “the count of trees which we can *pop*”, which is exactly what we will be using it for.

```
type family Popcount (b :: Binary) :: Nat where
Popcount 'BEnd = 'Zero
Popcount ('B1 b) = 'Succ (Popcount b)
Popcount ('B0 b) = Popcount b
```

`Popcount`

can be used to relate the non-zeroness of a natural number, and the non-zeroness of a binary number.

```
lemma3
:: BNonZero b ~ 'True
=> SBin b
-> NNonZero (Popcount b) :~: 'True
lemma3 (SB1 _) = QED
lemma3 (SB0 b) = case lemma3 b of QED -> QED
```

In addition to caring about the `popcount`

of a binary number, we are sometimes interested in its `width`

(number of bits). This is also easily captured in a type family:

```
type family Width (binary :: Binary) :: Nat where
Width 'BEnd = 'Zero
Width ('B0 binary) = 'Succ (Width binary)
Width ('B1 binary) = 'Succ (Width binary)
```

That is a fair amount of type families so far. To make things a bit more clear, here is an informal visual overview of all the type families we have defined, including `BDec`

(binary decrement, defined further below).

Now, popping the smallest element from the heap first involves cutting a single tree from the forest inside the heap. We take the root of that tree and merge the children of the tree back together with the original heap.

However, just selecting (and removing) a single tree turns out to be quite an endeavour on its own. We define an auxiliary GADT which holds the tree, the remainder of the heap, and most importantly, a lot of invariants.

Feel free to scroll down to the datatype from here if you are willing to assume the specific constraint and types are there for a reason.

The two first fields are simply evidence singletons that we carry about. `k`

stands for the same concept as in `Forest`

; it means we are starting with an order of `k`

. `x`

stands for the index of the tree that was selected.

This means the tree that was selected has an order of `NAdd k x`

, as we can see in the third field. If the remainder of the heap is `Forest k b a`

, its shape is denoted by `b`

and we can reason about the shape of the original heap.

The children of tree (`Tree (NAdd k x) a`

) that was selected will convert to a heap of shape `Ones x`

. We work backwards from that to try and write down the type for the *original* heap. The tree (`Tree (NAdd k x) a`

) would form a singleton heap of shape `BInc (Ones x)`

. The remainder (i.e., the forest with this tree removed) had shape `b`

, so we can deduce that the original shape of the forest must have been `BAdd b (BInc (Ones x))`

.

Finally, we restructure the type in that result to `BInc (BAdd b (Ones x))`

. The restructuring is trivially allowed by GHC since it just requires applying the necessary type families. The restructured type turns out to be more easily usable in the places where we case-analyse `CutTree`

further down in this blogpost.

We also carry a constraint here that seems very arbitrary and relates the widths of two binary numbers. It is easier to understand from an intuitive point of view: the new (merged) heap has the same width as the original heap. Why is it here?

Well, it turns out we will need this fact further down in a function definition. If we can conclude it here by construction in the GADT, we avoid having to prove it further down.

Of course, I know that I will need this further down because I already have the code compiling. When writing this, there is often a very, very painful dialogue in between different functions and datatypes, where you try to mediate by making the requested and expected types match by bringing them closer together step by step. In the end, you get a monstrosity like:

```
data CutTree (k :: Nat) (b :: Binary) a where
CutTree
:: Width (BAdd b (Ones x)) ~ Width (BInc (BAdd b (Ones x)))
=> SNat x
-> SNat k
-> Tree (NAdd k x) a
-> Forest k b a
-> CutTree k (BInc (BAdd b (Ones x))) a
```

Fortunately, this type is internal only and doesn’t need to be exported.

`lumberjack_go`

is the worker function that takes all possible trees out of a heap. For every *1* in the shape of the heap, we have a tree: therefore it should not be a surprise that the length of the resulting vector is `Popcount b`

.

The definition is recursive and a good example of how recursion corresponds with inductive proofs (we’re using `lemma1`

and `lemma2`

here). We don’t go into much detail with our explanation here – this code is often hard to write, but surprisingly easy to read.

```
lumberjack_go _ FEnd = VNull
lumberjack_go nnat0 (F0 forest0) = fmap
(\cutTree -> case cutTree of
CutTree xnat (SSucc nnat) t1 forest1 -> CutTree
(SSucc xnat)
nnat
(case lemma2 nnat xnat of QED -> t1)
(F0 forest1))
(lumberjack_go (SSucc nnat0) forest0)
lumberjack_go nnat0 (F1 tree0 forest0) = VCons
(CutTree
SZero
nnat0
(case lemma1 nnat0 of QED -> tree0)
(F0 forest0))
(fmap
(\cutTree -> case cutTree of
CutTree xnat (SSucc nnat) t1 forest1 -> CutTree
(SSucc xnat)
nnat
(case lemma2 nnat xnat of QED -> t1)
(F1 tree0 forest1))
(lumberjack_go (SSucc nnat0) forest0))
```

Now that we can select `Popcount b`

trees, it’s time to convert this to something more convenient to work with. We will use a `NonEmpty`

to represent our list of candidates to select from.

```
lumberjack
:: forall b a. BNonZero b ~ 'True
=> Forest 'Zero b a
-> NonEmpty.NonEmpty (CutTree 'Zero b a)
```

First, we select the `Popcount b`

trees:

```
lumberjack trees =
let cutTrees :: Vec (Popcount b) (CutTree 'Zero b a)
cutTrees = lumberjack_go SZero trees in
```

Then we convert it to a `NonEmpty`

. This requires us to call `lemma3`

(the proof that relates non-zeroness of a binary number with non-zeroness of a natural number through popcount). We need an appropriate `SBin`

to call `lemma3`

and the auxiliary function `forestSingleton`

defined just below does that for us.

This function is similar to `childrenSingleton`

– it constructs an appropriate singleton we can use in proofs.

```
forestSingleton :: Forest k b a -> SBin b
forestSingleton FEnd = SBEnd
forestSingleton (F0 t) = SB0 (forestSingleton t)
forestSingleton (F1 _ t) = SB1 (forestSingleton t)
```

We can now find all trees in the heap that may be cut. They are returned in a `CutTree`

datatype. If we assume that we are taking a specific `CutTree`

, we can take the root from the tree inside this datatype, and we can construct a new heap from its children using `childrenToForest`

. Then, we merge it back together with the original heap.

The new heap has one less element – hence we use `BDec`

(binary decrement, defined just a bit below).

We deconstruct the `CutTree`

to get the root (`x`

) of the selected tree, the children of the selected trees (`children`

), and the remaining trees in the heap (`forest`

).

We construct a new forest from the children.

We merge it with the remainder of the heap:

The illustration from above applies here:

Now, we cast it to the result using a new `lemma4`

with a singleton that we construct from the trees:

```
evidence :: SBin (BAdd l (Ones r))
evidence = forestSingleton merged in
(x, case lemma4 evidence of QED -> merged)
```

This is the type family for binary decrement. It is partial, as expected – you cannot decrement zero. This is a bit unfortunate but necessary. Having the `BNonZero`

type family and using it as a constraint will solve that though.

```
type family BDec (binary :: Binary) :: Binary where
BDec ('B1 b) = 'B0 b
BDec ('B0 b) = 'B1 (BDec b)
```

The weirdly specific `lemma4`

helps us prove that we can take a number, increment it and then decrement it, and then get the same number back *provided* incrementing doesn’t change its width. This ends up matching perfectly with the width constraint generated by the `CutTree`

, where the number that we increment is a number of “1”s smaller than the shape of the total heap (intuitively).

Using another constraint in `CutTree`

with another proof here should also be possible. I found it hard to reason about *why* this constraint is necessary, but once I understood that it wasn’t too abnormal. The proof is easy though.

```
lemma4
:: (Width x ~ Width (BInc x))
=> SBin x
-> BDec (BInc x) :~: x
lemma4 (SB0 _) = QED
lemma4 (SB1 b) = case lemma4 b of QED -> QED
```

We don’t need to define a clause for `SBEnd`

since `Width SBEnd ~ Width (BInc SBEnd)`

does not hold.

Tying all of this together makes for a relatively easy readable `popHeap`

:

Out of the different candidates, select the one with the minimal root (`minimumBy`

is total on `NonEmpty`

):

Pop that tree using `popForest`

:

Helper to compare candidates by root:

In GHCi:

```
*Main> let heap = pushHeap 'a' $ pushHeap 'b' $ pushHeap 'c' $
pushHeap 'd' $ pushHeap 'e' emptyHeap
*Main> :t heap
heap :: Heap ('B1 ('B0 ('B1 'BEnd))) Char
*Main> :t popHeap heap
popHeap heap :: (Char, Heap ('B0 ('B0 ('B1 'BEnd))) Char)
*Main> fst $ popHeap heap
'a'
*Main> snd $ popHeap heap
(no tree of order 0)
(no tree of order 1)
(tree of order 2)
'b'
'd'
'e'
'c'
```

Beautiful! Our final interface to deal with the heap looks like this:

```
emptyHeap
:: Heap ('B0 'BEnd) a
pushHeap
:: Ord a
=> a -> Heap b a -> Heap (BInc b) a
mergeHeap
:: Ord a
=> Heap lb a -> Heap rb a -> Heap (BAdd lb rb) a
popHeap
:: (BNonZero b ~ 'True, Ord a)
=> Heap b a -> (a, Heap (BDec b) a)
```

I would like to thank Alex Lang for many discussions about this and for proofreading, Akio Takano and Fumiaki Kinoshita for some whiteboarding, and Titouan Vervack and Becki Lee for many additional corrections.

I am by no means an expert in dependent types so while GHC can guarantee that my logic is sound, I cannot guarantee that my code is the most elegant or that my explanations are waterproof. In particular, I am a bit worried about the fact that binary numbers do not have unique representations – even though it does seem to make the code a bit simpler. If you have any ideas for improvements, however, feel free to reach out!

**Update**: Lars Brünjes contacted me and showed me a similar implementation he did for leftist heaps. You can see it in this repository. He uses a similar but unique representation of binary numbers, along the lines of:

I think is actually more elegant than the representation I used. The only disadvantage is that is a bit less concise (which is somewhat relevant for a blogpost), requiring two functions and two datatypes for most cases (e.g. a `Forest k Binary`

and a `PForest k Positive`

, with `mergeForests`

and `mergePForests`

, and so on). But if you wanted to use this idea in a real implementation, I encourage you to check that out.

Since we represent the proofs at runtime, we incur an overhead in two ways:

- Carrying around and allocating the singletons;
- Evaluating the lemmas to the
`QED`

constructor.

It should be possible to remove these at runtime once the code has been typechecked, possibly using some sort of GHC core or source plugin (or CPP in a darker universe).

Another existing issue is that the tree of the spine is never “cleaned up”. We never remove trailing `F0`

constructors. This means that if you fill a heap of eight elements and remove all of them again, you will end up with a heap with zero elements that has the shape `'B0 ('B0 ('B0 ('B0 'BEnd)))`

rather than `B0 'BEnd`

. However, this sufficed for my use case. It should be possible to add and prove a clean-up step, but it’s a bit outside the scope of this blogpost.

```
instance forall a b. Show a => Show (Heap b a) where
show = intercalate "\n" . goTrees 0 . unHeap
where
goTrees :: forall m c. Show a => Int -> Forest m c a -> [String]
goTrees _ FEnd = []
goTrees order (F0 trees) =
("(no tree of order " ++ show order ++ ")") :
goTrees (order + 1) trees
goTrees order (F1 tree trees) =
("(tree of order " ++ show order ++ ")") :
goTree " " tree ++
goTrees (order + 1) trees
goTree :: forall m. String -> Tree m a -> [String]
goTree indentation (Tree x children) =
(indentation ++ show x) :
goChildren (' ' : indentation) children
goChildren :: forall m. String -> Children m a -> [String]
goChildren _ CZero = []
goChildren indentation (CCons x xs) =
goTree indentation x ++ goChildren indentation xs
```

Increment gets tricky mainly because we need some way to communicate the carry back in a right-to-left direction. We can do this with a type-level `Either`

and some utility functions. It’s not too far from what we would write on a term-level, but again, a bit more clunky. We avoid this kind of clunkiness since having significantly more code obviously requires significantly more proving.

```
type family Carry (b :: Binary) :: Either Binary Binary where
Carry ('B1 'BEnd) = 'Left ('B0 'BEnd)
Carry ('B0 'BEnd) = 'Right ('B1 'BEnd)
Carry ('B0 b) = 'Right (UnEither 'B1 'B0 (Carry b))
Carry ('B1 b) = MapEither 'B0 'B1 (Carry b)
```

```
type family MapEither
(f :: a -> c) (g :: b -> d) (e :: Either a b) :: Either c d where
MapEither f _ ('Left x) = 'Left (f x)
MapEither _ g ('Right y) = 'Right (g y)
```

```
type family UnEither
(f :: a -> c) (g :: b -> c) (e :: Either a b) :: c where
UnEither f _ ('Left x) = f x
UnEither _ g ('Right y) = g y
```

```
type family FromRight (f :: a -> b) (e :: Either a b) :: b where
FromRight f ('Left x) = f x
FromRight _ ('Right y) = y
```

For work, I recently put together an interpreter for a lambda calculus that was

*way*faster than I expected it to be – around 30 times as fast. I suspected this meant that something was broken, so in order to convince myself of its correctness, I wrote a well-typed version of it in the style of Francesco’s well-typed suspension calculus blogpost. It used a standard length-indexed list which had the unfortunate side effect of pushing me into*O(n)*territory for random access. I started looking for an asymptotically faster way to do this, which is how I ended up looking at heaps. In this blogpost, I am using the binomial heap as a priority queue rather than a bastardized random access skip list since that is what readers are presumably more familiar with.↩For reasons that will become clear later on, the binary numbers that pop up on the type level should be read right-to-left. A palindrome was chosen as example here to avoid having to explain that at this point.↩

This type and related utilities are found in Data.Type.Equality, but redefined here for educational purposes.↩

The datatype in

`Data.Type.Equality`

allows equality between heterogeneous kinds as well, but we don’t need that here. This saves us from having to toggle on the “scary”`{-# LANGUAGE TypeInType #-}`

.↩I’m not sure if it is actually

*impossible*to use this simpler type, but I did not succeed in finding a proof that uses this simpler type.↩

I’d like to talk about a design pattern in Haskell that I’ve been calling *the Handle pattern*. This is far from novel – I’ve mentioned this before and the idea is definitely not mine. As far as I know, in fact, it has been around since basically forever^{1}. Since it is ridiculously close to what we’d call *common sense*^{2}, it’s often used without giving it any explicit thought.

I first started more consciously using this pattern when I was working together with Simon Meier at Better (aka erudify). Simon did a writeup about this pattern as well. But as I was explaining this idea again at last week’s HaskellerZ meetup, I figured it was time to do an update of that article.

The *Handle pattern* allows you write stateful applications that interact with external services in Haskell. It complements pure code (e.g. your business logic) well, and it is somewhat the result of iteratively applying the question:

- Can we make it simpler?
- Can we make it simpler still?
- And can we still make it simpler?

The result is a powerful and simple pattern that does not even require Monads^{3} or Monad transformers to be useful. This makes it extremely suitable for beginners trying to build their first medium-sized Haskell application. And that does not mean it is beginners-only: this technique has been applied successfully at several Haskell companies as well.

In Haskell, we try to capture ideas in beautiful, pure and mathematically sound patterns, for example *Monoids*. But at other times, we can’t do that. We might be dealing with some inherently mutable state, or we are simply dealing with external code which doesn’t behave nicely.

In those cases, we need another approach. What we’re going to describe feels suspiciously similar to Object Oriented Programming:

- Encapsulating and hiding state inside objects
- Providing methods to manipulate this state rather than touching it directly
- Coupling these objects together with methods that modify their state

As you can see, it is not exactly the same as Alan Kay’s original definition of OOP^{4}, but it is far from the horrible incidents that permeate our field such as UML, abstract factory factories and broken subtyping.

Before we dig in to the actual code, let’s talk about some disclaimers.

Pretty much any sort of Haskell code can be written in this particular way, but *that doesn’t mean that you should*. This method relies heavily on `IO`

. Whenever you can write things in a pure way, you should attempt to do that and avoid `IO`

. This pattern is only useful when `IO`

is required.

Secondly, there are many alternatives to this approach: complex monad transformer stacks, interpreters over free monads, uniqueness types, effect systems… I don’t want to claim that this method is better than the others. All of these have advantages and disadvantages, so one must always make a careful trade-off.

For this pattern, we’ve got a very well-defined module layout. I believe this helps with recognition which I think is also one of the reasons we use typeclasses like *Monoid*.

When I’m looking at the documentation of libraries I haven’t used yet, the types will sometimes look a bit bewildering. But then I see that there’s an `instance Monoid`

. That’s an “Aha!” moment for me. I *know* what a Monoid is. I *know* how they behave. This allows me to get up to speed with this library much faster!

Using a consistent module layout in a project (and even across projects) provides, I think, very similar benefits to that. It allows new people on the team to learn parts of the codebase they are yet unfamiliar with much faster.

Anyway, let’s look at the concrete module layout we are proposing with this pattern. As an example, let’s consider a database. The type in which we are encapsulating the state is *always* called `Handle`

. That is because we design for qualified import.

We might have something like:

```
module MyApp.Database
data Handle = Handle
{ hPool :: Pool Postgres.Connection
, hCache :: IORef (PSQueue Int Text User)
, hLogger :: Logger.Handle -- Another handle!
, …
}
```

The internals of the `Handle`

typically consist of static fields and other handles, `MVar`

s, `IORef`

s, `TVar`

s, `Chan`

s… With our `Handle`

defined, we are able to define functions using it. These are usually straightforward imperative pieces of code and I’ll omit them for brevity^{5}:

```
module MyApp.Database where
data Handle = …
createUser :: Handle -> Text -> IO User
createUser = …
getUserMail :: Handle -> User -> IO [Mail]
getUserMail = …
```

Some thoughts on this design:

We call our functions

`createUser`

rather than`databaseCreateUser`

. Again, we’re working with qualified imports so there’s no need for “C-style” names.**All functions take the**This is very important for consistency, but also for polymorphism and code style.`Handle`

as the first argument.With code style, I mean that the

`Handle`

is often a syntactically simpler expression (e.g. a name) than the argument (which is often a composed expression). Consider:Versus:

Other

`Handle`

s (e.g.`Logger.Handle`

) are stored in a field of our`Database.Handle`

. You could also remove it there and instead have it as an argument wherever it is needed, for example:I usually prefer to put it inside the

`Handle`

since that reduces the amount of arguments required for functions such as`createUser`

. However, if the lifetime of a`Logger.Handle`

is very short^{6}, or if you want to reduce the amount of dependencies for`new`

, then you could consider doing the above.The datatypes such as

`Mail`

may be defined in this module may even be specific to this function. I’ve written about ad-hoc datatypes before.

I mentioned before that an important advantage of using these patterns is that programmers become “familiar” with it. That is also the goal we have in mind when designing our API for the creation of `Handle`

s.

In addition to always having a type called `Handle`

, we’ll require the module to always have a type called `Config`

. This is where we encode our static configuration parameters – and by static I mean that we shouldn’t have any `IORef`

s or the like here: this `Config`

should be easy to create from pure code.

We can also offer some way to create a `Config`

. This really depends on your application. If you use the configurator library, you might have something like:

On the other hand, if you use aeson or yaml, you could write:

You could even use a Monoid to support loading configurations from multiple places. But I digress – the important part is that there is a type called `Config`

.

Next is a similar pattern: in addition to always having a `Config`

, we’ll also always provide a function called `new`

. The parameters follow a similarly strict pattern:

```
new :: Config -- 1. Config
-> Logger.Handle -- 2. Dependencies
-> … -- (usually other handles)
-> IO Handle -- 3. Result
```

Inside the `new`

function, we can create some more `IORef`

s, file handles, caches… if required and then store them in the `Handle`

.

We’ve talked about creation of a `Handle`

, and we mentioned the normal functions operating on a `Handle`

(e.g. `createUser`

) before. So now let’s consider the final stage in the lifetime of `Handle`

.

Haskell is a garbage collected language and we can let the runtime system take care of destroying things for us – but that’s not always a great idea. Many resources (file handles in particular come to mind as an example) are scarce.

There is quite a strong correlation between scarce resources and things you would naturally use a `Handle`

for. That’s why I recommend always providing a `close`

as well, even if does nothing. This is a form of forward compatibility in our API: if we later decide to add some sort of log files (which needs a `close`

), we can do so without individually mailing all our module users that they now need to add a `close`

to their code.

When you’re given a `new`

and `close`

, it’s often tempting to add an auxiliary function like:

```
withHandle
:: Config -- 1. Config
-> Logger.Handle -- 2. Dependencies
-> … -- (usually other handles)
-> (Handle -> IO a) -- 3. Function to apply
-> IO a -- 4. Result, handle is closed automatically
```

I think this is a great idea. In fact, it’s sometimes useful to *only* provide the `withHandle`

function, and hide `new`

and `close`

in an internal module.

The only caveat is that the naive implementation of this function:

```
withHandle config dep1 dep2 … depN f = do
h <- new config dep1 dep2 … depN
x <- f h
close h
return x
```

Is **wrong**! In any sort of `withXyz`

function, you should always use `bracket`

to guard against exceptions. This means the correct implementation is:

Well, it’s even shorter! In case you want more information on why `bracket`

is necessary, this blogpost gives a good in-depth overview. My summary of it as it relates to this article would be:

- Always use
`bracket`

to match`new`

and`close`

- You can now use
`throwIO`

and`killThread`

safely

It’s important to note that `withXyz`

functions do not provide complete safety against things like use-after-close our double-close. There are many interesting approaches to fix these issues but they are *way* beyond the scope of this tutorial – things like Monadic Regions and The Linearity Monad come to mind. For now, we’ll rely on `bracket`

to catch common issues and on code reviews to catch team members who are not using `bracket`

.

If we quickly summarise the module layout, we now have:

```
module MyApp.Database
( Config (..) -- Internals exported
, parseConfig -- Or some other way to load a config
, Handle -- Internals usually not exported
, new
, close
, withHandle
, createUser -- Actual functions on the handle
, …
) where
```

This is a well-structured, straightforward and easy to learn organisation. Most of the `Handle`

s in any application should probably look this way. In the next section, we’ll see how we can build on top of this to create dynamic, customizable `Handle`

s.

It’s often important to split between the interface and implementation of a service. There are countless ways to do this in programming languages. For Haskell, there is:

- Higher order functions
- Type classes and type families
- Dictionary passing
- Backpack module system
- Interpreters over concrete ASTs
- …

The list is endless. And because Haskell on one hand makes it so easy to abstract over things, and on the other hand makes it possible to abstract over pretty much anything, I’ll start this section with a disclaimer.

*Premature* abstraction is a real concern in Haskell (and many other high-level programming languages). It’s easy to quickly whiteboard an abstraction or interface and unintentionally end up with completely the wrong thing.

It usually goes like this:

- You need to implement a bunch of things that look similar
- You write down a typeclass or another interface-capturing abstraction
- You start writing the actual implementations
- One of them doesn’t
*quite*match the interface so you need to change it two weeks in - You add another parameter, or another method, mostly for one specific interface
- This causes some problems or inconsistencies for interfaces
- Go back to (4)

What you end up with is a leaky abstraction that is the *product* of all concrete implementations – where what you really wanted is the *greatest common divisor*.

There’s no magic bullet to avoid broken abstractions so my advice is usually to first painstakingly do all the different implementations (or at least a few of them). *After* you have something working and you have emerged victorous from horrible battles with the guts of these implementations, *then* you could start looking at what the different implementations have in common. At this point, you’ll also be a bit wiser about where they differ – and you’ll be able to take these important details into account, at which point you retire from just being an idiot drawing squares and arrows on a whiteboard.

This is why I recommend sticking with simple `Handle`

s until you really need it. But naturally, sometimes we really need the extra power.

So let’s do the simplest thing that can possibly work. Consider the following definition of the `Handle`

we discussed before:

```
module MyApp.Database
( Handle (..) -- We now need to export this
) where
data Handle = Handle
{ createUser :: Text -> IO User
, …
}
```

What’s the type of `createUser`

now?

It’s exactly the same as before! This is pretty much a requirement: it means we can move our `Handle`

s to this approach when we need it, not when we envision that we will need it at some point in the future.

We can now create a concrete implementation for this abstract `Handle`

type. We’ll do this in a module like `MyApp.Database.Postgres`

.

```
module MyApp.Database.Postgres where
import MyApp.Database
data Config = …
new :: Config -> Logger.Handle -> … -> IO Handle
```

The `Config`

datatype and the `new`

function have now moved to the implementation module, rather than the interface module.

Since we can have any number of implementation modules, it is worth mentioning that we will have multiple `Config`

types and `new`

functions (exactly one of each per implementation). Configurations are always specific to the concrete implementation. For example, an sqlite database may just have a `FilePath`

in the configuration, but our `Postgres`

implementation will have other details such as port, database, username and password.

In the implementation of `new`

, we simply initialize a `Handle`

:

```
new config dep1 dep2 … depN = do
-- Intialization of things inside the handle
…
-- Construct record
return Handle
{ createUser = \name -> do
…
, …
}
```

Of course, we can manually float out the body of `createUser`

since constructing these large records gets kind of ugly.

We’ve presented an approach to modularize the effectful layer of medium- to large-scaled Haskell applications. There are many other approaches to tackling this, so any comparison I come up with would probably be inexhaustive.

Perhaps the most important advantage of using `Handle`

s is that they are first class values that we can freely mix and match. This often does not come for free when using more exotic strategies.

Consider the following type signature from a Hackage package – and I do not mean to discredit the author, the package works fine but simply uses a different approach than my personal preference:

```
-- | Create JSON-RPC session around conduits from transport layer.
-- When context exits session disappears.
runJsonRpcT
:: (MonadLoggerIO m, MonadBaseControl IO m)
=> Ver -- ^ JSON-RPC version
-> Bool -- ^ Ignore incoming requests/notifs
-> Sink ByteString m () -- ^ Sink to send messages
-> Source m ByteString -- ^ Source to receive messages from
-> JsonRpcT m a -- ^ JSON-RPC action
-> m a -- ^ Output of action
```

I’m a fairly experienced Haskeller and it still takes me a bit of eye-squinting to see how this will fit into my application, especially if I want to use this package with other libraries that do not use the `Sink`

/`Source`

or `MonadBaseControl`

abstractions.

It is somewhat obvious that one running call to `runJsonRpcT`

corresponds to being connected to one JSON-RPC endpoint, since it takes a single sink and source. But what if we want our application to be connected to multiple endpoints at the same time?

What if we need to have hundreds of thousands of these, and we want to store them in some priority queue and only consider the most recent ones in the general case. How would you go about that?

You could consider running a lightweight thread for every `runJsonRpcT`

, but that means you now need to worry about thread overhead, communicating exceptions between threads and killing the threads after you remove them. Whereas with first-class handles, we would just have a `HashPSQ Text Int JsonRpc.Handle`

, which is much easier to reason about.

So, I guess one of the oldest and most widely used approaches is MTL-style monad transformers. This uses a hierarchy of typeclasses to represent access to various subsystems.

I love working with MTL-style transformers in the case of pure code, since they often allow us to express complex ideas concisely. For effectful code, on the other hand, they do not seem to offer many advantages and often make it harder to reason about code.

My personal preference for writing complex effectful code is to reify the effectful operations as a datatype and then write pure code manipulating these effectful operations. An interpreter can then simply use the `Handle`

s to perform the effects. For simpler effectful code, we can just use `Handle`

s directly.

I have implemented a number of these patterns in the (ever unfinished) example web application fugacious, in case you want to see them in action or if you want a more elaborate example than the short snippets in this blogpost. Finally, I would like to thank Alex Lang and Nicolas Mattia for proofreading, and Titouan Vervack for many corrections and typos.

Well,

`System.IO.Handle`

has definitely been around for a while.↩If you’re reading this article and you’re thinking:

*“What does this guy keep going on about? This is all so obvious!”*– Well, that’s the point!↩It does require

`IO`

, but we don’t require thinking about`IO`

as a`Monad`

. If this sounds weird – think of lists. We work with lists all the time but we just consider them lists of things, we don’t constantly call them “List Monad” or “The Free Monoid” for that matter.↩And indeed, we will touch on a common way of encoding OOP in Haskell – creating explicit records of functions – but we’ll also explain why this isn’t always necessary.↩

If you want to see a full example, you can refer to this repository that I have been using to teach practical Haskell.↩

With a short lifetime I mean you would create a new

`Logger.Handle`

for every call to`createUser`

. But even in that case you could consider turning`Logger.Handle`

into something like a resource pool, from which you could request a new concrete logging interface to log things. It really depends on your use case in the end…↩

The slides can be found here.

It’s a talk aimed towards beginners. If you are writing a medium-sized Haskell application for the very first time, you will typically end up with three modules: `Types.hs`

, `Utils.hs`

and `Main.hs`

. While this is a very clear split, it typically doesn’t scale very well as applications become larger.

I try to answer some questions like:

- When is it a good idea to use something like Monad/Applicative (and when is it not)?
- When is it a good idea to invent my own typeclass (and when is it not)?
- How do I design interfaces and services like in OOP?

Thanks again to Skills Matter for putting together this excellent conference.

]]>This is a small write-up of a fun Haskell project that Andras Slemmer, Francesco Mazzoli and I worked on during ZuriHac 2017.

I work with Haskell professionally, and it can be hard to motivate myself to work on similar stuff during Hackathons. This year I also had a large part in organising the Hackathon, so there was little room to take on a serious project.

This is why I joined in the fun of of creating a deliberately silly thing during this Hackathon (aside from still trying to help people out as much as possible).

This year, we decided to implement something in the style of *Twitch Plays Pokémon*. Rather than picking a slow, turn-based game such as Pokémon, however, we wanted to try the same thing for a fast game, such as a platformer.

For the impatient, here is a quick preview:

Here is the video I actually meant to upload but twitter cropped. #zuriHac plays Mario pic.twitter.com/Ku76aH0OpN

— cocharles (@acid2) June 12, 2017

The core design question of the project is how to handle keypresses and aggregate them when you have many concurrent users. *Twitch Plays Pokémon* solved this problem in two distinct modes:

Anarchy mode (the default): any user keypress is sent directly to the game.

Democracy mode: there is a voting window, after which the most popular user keypress is selected and sent to the game.

With a majority vote, players could switch between the modes.

There are a bunch of reasons why this does not work great for faster games:

Action games typically need you to hold a key for a certain amount of time, rather than just pressing and then releasing the key (e.g. Mario jumps higher if you hold

`jump`

for longer).There’s little time to switch between modes in a fast-paced game.

Many games require you to press more than one key at the same time (e.g.

`jump`

and`right`

in Mario).…

We solved this by putting a key voting algorithm in place to aggregate the key events from the users. We think our algorithm should work well with most games, if the parameters are tweaked a bit.

First, imagine that we are looking at every key independently. For a given key, we might receive the following input from users, where a block means that the key is pressed:

We divide the time in sample intervals. The length of the sample interval can be tweaked per game. Let’s imagine it is 10ms for our example.

Every key press is expanded to match the sample interval first. This gives us something like:

We can aggregate them according to a treshold. This is another parameter that can be tweaked per game. In our example, we can set this treshold to 0.5. This means that 50% of users must be pressing a key before we consider it pressed. Concretely, for our 3 users, that means that at least two people must be pressing the key. This gives us the following aggregates:

After we’ve aggregated the key presses, we can send the result to the game. It’s important to note that this happens one sample interval *after* the actual user keypresses, since you are not easily able to make any conclusions before the interval has ended. This adds some latency but we didn’t find this a problem in practice for the games we tried.

Apart from that, we added to more complications to make the experience smoother:

We look at all keys independently using the algorithm above, but before we decide on the final output, we take

*key groups*into account.In

*Super Mario World*, if you press`left`

and`right`

at the same time, Mario does not move. That is a problem: if the treshold i set to 0.2, 30% of people are pressing`left`

, and 40% of people are pressing`right`

, we would expect Mario to move right. However, using our naive algorithm, nothing happens.This is why we added key groups. A key group is a set of keys out of which at most one can be pressed. For example,

`{left, right}`

forms such a key group for Mario. We select the most popular key if there are multiple candidates within a group (`right`

in the example).There is a timeout timer for activity per user. If the user does not press a key in a while, he is considered inactive, and this user is not counted towards the total amount of users. This prevents people from loading the page up but not participating from influencing the game too much.

That takes care of the key logic component, so now let’s look at the stack.

It’s all pretty self-explanatory:

Users open an HTML page which contains a JavaScript keylogger. We send the

`KeyPress`

and`KeyRelease`

events to the server. On mobile, people can use a touchscreen interface which sends the same events.The server runs the key voting algorithm we discussed before and sends the aggregated

`KeyPress`

and`KeyRelease`

events to any connected sinks.The main sink we implemented just executes the events using the

`XTestFakeInput`

call from xtest. In our case, the sink ran on the same machine as the server (my laptop).

We played *Super Mario World* with around 60 people on local Wi-Fi. We required 40% of people to press a key for the voting, in 10ms sampling windows. The system performed very smoothly, although the same cannot be said about the collaboration between users.

Thanks for joining in the fun! The code for our project can be found here.

]]>Collaborative Mario at @ZuriHac #zurihac2017 fun! :-) pic.twitter.com/6h3ulZRoZH

— Simon Thompson (@thompson_si) June 11, 2017

Haskell is very commonly used to implement DSLs. When you implement one of these DSLs, the focus is usually performance or correctness. Something that’s not mentioned often is the “quality” of error messages.

In this talk I talked through some techniques to improve various kinds (parser, renamer, modules, interpreter, typechecker) of error messages and discuss how that impacts user experience.

The recording can be watched on their website. Unfortunately I cannot embed it here, but at least you can watch it without creating an account.

The slides can be found here.

]]>Today, I released version 0.11 of the Haskell websockets library. Minor changes include:

- Support for IPv6 in the built-in server, client and tests (thanks to agentm).
- Faster masking (thanks to Dmitry Ivanov).

But most importantly, this release adds support for the `permessage-deflate`

extension as described in RFC 7692. A big thanks go out to Marcin Tolysz who first submitted patches for an implementation. Unfortunately, merging these patches turned out to be a rocky road.

After merging all these changes and improving upon them, I’m very happy that the library now passes the Autobahn Testsuite. This language-agnostic testsuite is very extensive and contains test cases covering most of the protocol surface.

When I started running this testsuite against the websockets library, it was very encouraging to learn that – apart from a few corner cases – it was already passing most of this testsuite without any additional work.

The majority of failing tests were caused by the problem that the Haskell websockets library was *too lenient*: it would accept invalid UTF-8 characters when reading the messages as a `ByteString`

. The RFC, however, dictates that a server should immediately close the connection if the client sends invalid UTF-8.

This has now been rectified, but as an *opt-in*, in order not to break any existing applications using this library. For example, in order to enable the new compression and UTF-8 validation, you can use something like:

```
main :: IO ()
main = WS.runServerWith "0.0.0.0" 9001 options application
where
options = WS.defaultConnectionOptions
{ WS.connectionCompressionOptions =
WS.PermessageDeflateCompression WS.defaultPermessageDeflate
, WS.connectionStrictUnicode = True
}
```

Note that if you are already decoding the incoming messages to `Text`

values through the `WebSocketsData`

interface, enabling `connectionStrictUnicode`

should not add any additional overhead. On the other hand, if your application is a proxy which just takes the `ByteString`

s and sends them through, enabling UTF-8 validation will of course come at a price.

In a future release, `permessage-deflate`

will be enabled by default.

I realise that this blog has been a little quiet lately. This is because (aside from work being busy) I’ve been involved in some other things:

I have been co-organising Summer of Haskell 2017. If you are a student, and you would like to make some money while contributing to the Haskell ecosystem this summer, please think about applying. If you are a (Haskell) project maintainer and you would like to mentor a student, please consider adding an idea for a proposal.

I am also co-organising ZuriHac 2017. It looks like this event will be the largest Haskell Hackathon ever, with over 250 participants. Unfortunately, the event is now full, but if you’re interested you can still register – we will add you to the waiting list. We’ve seen about 10% cancellations each year, so there is still a good chance of getting into the event.

Lastly, I have been playing the new Zelda game, which has of course been a blast.

This post is about Haskell, and lazy I/O in particular. It is a bit longer than usual, so I will start with a high-level overview of what you can expect:

We talk about how we can represent graphs in a

*“shallow embedding”*. This means we will not use a dedicated`Graph`

type and rather represent edges by directly referencing other Haskell values.This is a fairly good match when we want to encode infinite

^{1}graphs. When dealing with infinite graphs, there is no need to “reify” the graph and enumerate all the nodes and egdes – this would be futile anyway.We discuss a Haskell implementation of shortest path search in a weighted graph that works on these infinite graphs and that has good performance characteristics.

We show how we can implement lazy I/O to model infinite graphs as pure values in Haskell, in a way that only the “necessary” parts of the graph are loaded from a database. This is done using the

`unsafeInterleaveIO`

primitive.Finally, we discuss the disadvantages of this approach as well, and we review some of common problems associated with lazy I/O.

Let’s get to it!

As usual, this is a literate Haskell file, which means that you can just load this blogpost into GHCi and play with it. You can find the raw `.lhs`

file here.

```
import Control.Concurrent.MVar (MVar, modifyMVar, newMVar)
import Control.Monad (forM_, unless)
import Control.Monad.State (State, gets, modify, runState)
import Data.Hashable (Hashable)
import qualified Data.HashMap.Strict as HMS
import qualified Data.HashPSQ as HashPSQ
import Data.Monoid ((<>))
import qualified Data.Text as T
import qualified Data.Text.IO as T
import qualified Database.SQLite.Simple as SQLite
import qualified System.IO.Unsafe as IO
```

As an example problem, we will look at finding the shortest path between cities in Westeros, the fictional location where the A Song of Ice and Fire novels (and HBO’s Game of Thrones) take place.

We model the different cities in a straightforward way. In addition to a unique ID used to identify them, they also have a name, a position (*X,Y* coordinates) and a list of reachable cities, with an associated time (in days) it takes to travel there. This travel time, also referred to as the *cost*, is not necessarily deducable from the sets of *X,Y* coordinates: some roads are faster than others.

```
data City = City
{ cityId :: CityId
, cityName :: T.Text
, cityPos :: (Double, Double)
, cityNeighbours :: [(Double, City)]
}
```

Having direct access to the neighbouring cities, instead of having to go through `CityId`

s both has advantages and disadvantages.

On one hand, updating these values becomes cumbersome at best, and impossible at worst. If we wanted to change a city’s name, we would have to traverse all other cities to update possible references to the changed city.

On the other hand, it makes access more convenient (and faster!). Since we want a read-only view on the data, it works well in this case.

We will be using data extracted from got.show, conveniently licensed under a Creative Commons license. You can find the complete SQL dump here. The schema of the database should not be too surprising:

```
CREATE TABLE cities (
id text PRIMARY KEY NOT NULL,
name text NOT NULL,
x float NOT NULL,
y float NOT NULL
);
CREATE TABLE roads (
origin text NOT NULL,
destination text NOT NULL,
cost float NOT NULL,
PRIMARY KEY (origin, destination)
);
CREATE INDEX roads_origin ON roads (origin);
```

The road costs have been generated by multiplying the actual distances with a random number uniformly chosen between `0.6`

and `1.4`

. Cities have been (bidirectionally) connected to at least four closest neighbours. This ensures that every city is reachable.

We will use sqlite in our example because there is almost no setup involved. You can load this database by issueing:

But instead of considering the whole database (which we’ll get to later), let’s construct a simple example in Haskell so we can demonstrate the interface a bit. We can use a `let`

to create bindings that refer to one another easily.

```
test01 :: IO ()
test01 = do
let winterfell = City "wtf" "Winterfell" (-105, 78)
[(13, moatCailin), (12, whiteHarbor)]
whiteHarbor = City "wih" "White Harbor" (-96, 74)
[(15, braavos), (12, winterfell)]
moatCailin = City "mtc" "Moat Cailin" (-104, 72)
[(20, crossroads), (13, winterfell)]
braavos = City "brv" "Braavos" (-43, 67)
[(17, kingsLanding), (15, whiteHarbor)]
crossroads = City "crs" "Crossroads Inn" (-94, 58)
[(7, kingsLanding), (20, crossroads)]
kingsLanding = City "kgl" "King's Landing" (-84, 45)
[(7, crossroads), (17, kingsLanding)]
printSolution $
shortestPath cityId cityNeighbours winterfell kingsLanding
```

`printSolution`

is defined as:

```
printSolution :: Maybe (Double, [City]) -> IO ()
printSolution Nothing = T.putStrLn "No solution found"
printSolution (Just (cost, path)) = T.putStrLn $
"cost: " <> T.pack (show cost) <>
", path: " <> T.intercalate " -> " (map cityName path)
```

We get exactly what we expect in `GHCi`

:

```
*Main> test01
cost: 40.0, path: Winterfell -> Moat Cailin ->
Crossroads Inn -> King's Landing
```

So far so good! Now let’s dig in to how `shortestPath`

works.

The following algorithm is known as *Uniform Cost Search*. It is a variant of Dijkstra’s graph search algorithm that is able to work with infinite graphs (or graphs that do not fit in memory anyway). It returns the shortest path between a known start and goal in a weighted directed graph.

Because this algorithm attempts to solve the problem the right way, including keeping back references, it is not simple. Therefore, if you are only interested in the part about lazy I/O, feel free to skip to this section and return to the algorithm later.

We have two auxiliary datatypes.

`BackRef`

is a wrapper around a node and the previous node on the shortest path to the former node. Keeping these references around is necessary to iterate a list describing the entire path at the end.

We will be using a `State`

monad to implement the shortest path algorithm. This is our state:

```
data SearchState node key cost = SearchState
{ ssQueue :: HashPSQ.HashPSQ key cost (BackRef node)
, ssBackRefs :: HMS.HashMap key node
}
```

In our state, we have:

A priority queue of nodes we will visit next in

`ssQueue`

, including back references. Using a priority queue will let us grab the next node with the lowest associated cost in a trivial way.Secondly, we have the

`ssBackRefs`

map. That one serves two purposes: to keep track of which nodes we have already explored (the keys in the map), and to keep the back references of those locations (the values in the map).

These two datatypes are only used internally in the `shortestPath`

function. Ideally, we would be able to put them in the `where`

clause, but that is not possible in Haskell.

Instead of declaring a `Node`

typeclass (possibly with associated types for the key and cost types), I decided to go with simple higher-order functions. We only need two of those function arguments after all: a function to give you a node’s key (`nodeKey`

) and a function to get the node’s neighbours and associated costs (`nodeNeighbours`

).

```
shortestPath
:: forall node key cost.
(Ord key, Hashable key, Ord cost, Num cost)
=> (node -> key)
-> (node -> [(cost, node)])
-> node
-> node
-> Maybe (cost, [node])
shortestPath nodeKey nodeNeighbours start goal =
```

We start by creating an initial `SearchState`

for our algorithm. Our initial queue holds one item (implying that we need explore the `start`

) and our initial back references map is empty (we haven’t explored anything yet).

```
let startbr = BackRef start start
queue0 = HashPSQ.singleton (nodeKey start) 0 startbr
backRefs0 = HMS.empty
searchState0 = SearchState queue0 backRefs0
```

`walk`

is the main body of the shortest path search. We call that and if we found a shortest path, we return its cost together with the path which we can reconstruct from the back references (`followBackRefs`

).

```
(mbCost, searchState1) = runState walk searchState0 in
case mbCost of
Nothing -> Nothing
Just cost -> Just
(cost, followBackRefs (ssBackRefs searchState1))
where
```

Now, we have a bunch of functions that are used within the algorithm. The first one, `walk`

, is the main body. We start by exploring the next node in the queue. By construction, this is *always* a node we haven’t explored before. If this node is the goal, we’re done. Otherwise, we check the node’s neighbours and update the queue with those neighbours. Then, we recursively call `walk`

.

```
walk :: State (SearchState node key cost) (Maybe cost)
walk = do
mbNode <- exploreNextNode
case mbNode of
Nothing -> return Nothing
Just (cost, curr)
| nodeKey curr == nodeKey goal ->
return (Just cost)
| otherwise -> do
forM_ (nodeNeighbours curr) $ \(c, next) ->
updateQueue (cost + c) (BackRef next curr)
walk
```

Exploring the next node is fairly easy to implement using a priority queue: we simply need to pop the element with the minimal priority (cost) using `minView`

. We also need indicate that we reached this node and save the back reference by inserting that info into `ssBackRefs`

.

```
exploreNextNode
:: State (SearchState node key cost) (Maybe (cost, node))
exploreNextNode = do
queue0 <- gets ssQueue
case HashPSQ.minView queue0 of
Nothing -> return Nothing
Just (_, cost, BackRef curr prev, queue1) -> do
modify $ \ss -> ss
{ ssQueue = queue1
, ssBackRefs =
HMS.insert (nodeKey curr) prev (ssBackRefs ss)
}
return $ Just (cost, curr)
```

`updateQueue`

is called as new neighbours are discovered. We are careful about adding new nodes to the queue:

- If we have already explored this neighbour, we don’t need to add it. This is done by checking if the neighbour key is in
`ssBackRefs`

. - If the neighbour is already present in the queue with a lower priority (cost), we don’t need to add it, since we want the
*shortest*path. This is taken care of by the utility`insertIfLowerPrio`

, which is defined below.

```
updateQueue
:: cost -> BackRef node -> State (SearchState node key cost) ()
updateQueue cost backRef = do
let node = brNode backRef
explored <- gets ssBackRefs
unless (nodeKey node `HMS.member` explored) $ modify $ \ss -> ss
{ ssQueue = insertIfLowerPrio
(nodeKey node) cost backRef (ssQueue ss)
}
```

If the algorithm finishes, we have found the lowest cost from the start to the goal, but we don’t have the path ready. We need to reconstruct this by following the back references we saved earlier. `followBackRefs`

does that for us. It recursively looks up nodes in the map, constructing the path in the accumulator `acc`

on the way, until we reach the start.

```
followBackRefs :: HMS.HashMap key node -> [node]
followBackRefs paths = go [goal] goal
where
go acc node0 = case HMS.lookup (nodeKey node0) paths of
Nothing -> acc
Just node1 ->
if nodeKey node1 == nodeKey start
then start : acc
else go (node1 : acc) node1
```

That’s it! The only utility left is the `insertIfLowerPrio`

function. Fortunately, we can easily define this using the `alter`

function from the psqueues package. That function allows us to change a key’s associated value and priority. It also allows to return an additional result, but we don’t need that, so we just use `()`

there.

```
insertIfLowerPrio
:: (Hashable k, Ord p, Ord k)
=> k -> p -> v -> HashPSQ.HashPSQ k p v -> HashPSQ.HashPSQ k p v
insertIfLowerPrio key prio val = snd . HashPSQ.alter
(\mbOldVal -> case mbOldVal of
Just (oldPrio, _)
| prio < oldPrio -> ((), Just (prio, val))
| otherwise -> ((), mbOldVal)
Nothing -> ((), Just (prio, val)))
key
```

Lazy I/O will guarantee that we only load the nodes in the graph when necessary.

However, since we know that the nodes in the graph do not change over time, we can build an additional cache around it. That way, we can also guarantee that we only load every node once.

Implementing such a cache is very simple in Haskell. We can simply use an `MVar`

, that will even take care of blocking ^{2} when we have concurrent access to the cache (assuming that is what we want).

```
cached :: (Hashable k, Ord k) => Cache k v -> k -> IO v -> IO v
cached mvar k iov = modifyMVar mvar $ \cache -> do
case HMS.lookup k cache of
Just v -> return (cache, v)
Nothing -> do
v <- iov
return (HMS.insert k v cache, v)
```

Note that we don’t really delete things from the cache. In order to keep things simple, we can assume that we will use a new cache for every shortest path we want to find, and that we throw away that cache afterwards.

Now, we get to the main focus of the blogpost: how to use lazy I/O primitives to ensure resources are only loaded when they are needed. Since we are only concerned about one datatype (`City`

) our loading code is fairly easy.

The most important loading function takes the SQLite connection, the cache we wrote up previously, and a city ID. We immediately use the `cached`

combinator in the implementation, to make sure we load every `CityId`

only once.

```
getCityById
:: SQLite.Connection -> Cache CityId City -> CityId
-> IO City
getCityById conn cache id' = cached cache id' $ do
```

Now, we get some information from the database. We play it a bit loose here and assume a singleton list will be returned from the query.

The neighbours are stored in a different table because we have a properly normalised database. We can write a simple query to obtain all roads starting from the current city:

```
roads <- SQLite.query conn
"SELECT cost, destination FROM roads WHERE origin = ?"
[id'] :: IO [(Double, CityId)]
```

This leads us to the crux of the matter. The `roads`

variable contains something of the type `[(Double, CityId)]`

, and what we really want is `[(Double, City)]`

. We need to recursively call `getCityById`

to load what we want. However, doing this “the normal way” would cause problems:

- Since the
`IO`

monad is strict, we would end up in an infinite loop if there is a cycle in the graph (which is almost always the case for roads and cities). - Even if there was no cycle, we would run into trouble with our usage of
`MVar`

in the`Cache`

. We block access to the`Cache`

while we are in the`cached`

combinator, so calling`getCityById`

again would cause a deadlock.

This is where Lazy I/O shines. We can implement lazy I/O by using the unsafeInterleaveIO primitive. Its type is very simple and doesn’t look as threatening as `unsafePerformIO`

.

`unsafeInterleaveIO :: IO a -> IO a`

It takes an `IO`

action and *defers* it. This means that the `IO`

action is not executed right now, but only when the value is demanded. That is exactly what we want!

We can simply wrap the recursive calls to `getCityById`

using `unsafeInterleaveIO`

:

And then return the `City`

we constructed:

Lastly, we will add a quick-and-dirty wrapper around `getCityById`

so that we are also able to load cities by name. Its implementation is trivial:

```
getCityByName
:: SQLite.Connection -> Cache CityId City -> T.Text
-> IO City
getCityByName conn cache name = do
[[id']] <- SQLite.query conn
"SELECT id FROM cities WHERE name = ?" [name]
getCityById conn cache id'
```

Now we can neatly wrap things up in our `main`

function:

```
main :: IO ()
main = do
cache <- newCache
conn <- SQLite.open "got.db"
winterfell <- getCityByName conn cache "Winterfell"
kings <- getCityByName conn cache "King's Landing"
printSolution $
shortestPath cityId cityNeighbours winterfell kings
```

This works as expected:

```
*Main> :main
cost: 40.23610549037591, path: Winterfell -> Moat Cailin ->
Greywater Watch -> Inn of the Kneeling Man -> Fairmarket ->
Brotherhood Without Banners Hideout -> Crossroads Inn ->
Darry -> Saltpans -> QuietIsle -> Antlers -> Sow's Horn ->
Brindlewood -> Hayford -> King's Landing
```

Lazy I/O also has many disadvantages, which have been widely discussed. Among those are:

Code becomes harder to reason about. In a setting without lazy I/O, you can casually reason about an

`Int`

as either an integer that’s already computed, or as something that will do some (pure) computation and then yield an`Int`

.When lazy I/O enters the picture, things become more complicated. That

`Int`

you wanted to print? Yeah, it fired a bunch of missiles and returned the bodycount.This is why I would not seriously consider using lazy I/O when working with a team or on a large project – it can be easy to forget what is lazily loaded and what is not, and there’s no easy way to tell.

Scarce resources can easily become a problem if you are not careful. If we keep a reference to a

`City`

in our heap, that means we also keep a reference to the cache and the SQLite connection.We must ensure that we fully evaluate the solution to something that doesn’t refer to these resources (to e.g. a printed string) so that the references can be garbage collected and the connections can be closed.

Closing the connections is a problem in itself – if we cannot guarantee that e.g. streams will be fully read, we need to rely on finalizers, which are pretty unreliable…

If we go a step further and add concurrency to our application, it becomes even tricker. Deadlocks are not easy to reason about – so how about reasoning about deadlocks when you’re not sure when the

`IO`

is going to be executed at all?

Despite all these shortcomings, I believe lazy I/O is a powerful and elegant tool that belongs in every Haskeller’s toolbox. Like pretty much anything, you need to be aware of what you are doing and understand the advantages as well as the disadvantages.

For example, the above downsides do not really apply if lazy I/O is only used *within* a module. For this blogpost, that means we could safely export the following interface:

```
shortestPathBetweenCities
:: FilePath -- ^ Database name
-> CityId -- ^ Start city ID
-> CityId -- ^ Goal city ID
-> IO (Maybe (Double, [CityId])) -- ^ Cost and path
shortestPathBetweenCities dbFilePath startId goalId = do
cache <- newCache
conn <- SQLite.open dbFilePath
start <- getCityById conn cache startId
goal <- getCityById conn cache goalId
case shortestPath cityId cityNeighbours start goal of
Nothing -> return Nothing
Just (cost, path) ->
let ids = map cityId path in
cost `seq` foldr seq () ids `seq`
return (Just (cost, ids))
```

Thanks for reading – and I hope I was able to offer you a nuanced view on lazy I/O. Special thanks to Jared Tobin for proofreading.

In this blogpost, I frequently talk about

*“infinite graphs”*. Of course most of these examples are not truly infinite, but we can consider examples that do not fit in memory completely, and in that way we can regard them as*“infinite for practical purposes”*.↩While blocking is good in this case, it might hurt performance when running in a concurrent environment. A good solution to that would be to stripe the

`MVar`

s based on the keys, but that is beyond the scope of this blogpost. If you are interested in the subject, I talk about it a bit here.↩

Parsec makes it really easy to prototype parsers for certain classes of grammars. Lots of grammars in use today, however, are whitespace-sensitive. There are different approaches for dealing with that. One of the most commonly used approaches is to add explicit `INDENT`

and `DEDENT`

tokens. But that usually requires you to add a separate lexing phase – not a bad idea by itself, but a bit annoying if you are just writing a quick prototype.

That is why I like the indents package – it sits in a sweet spot because it is a straightforward package that allows you turn any Parsec parser into an indentation-based one without having to change too many types.

It offers a bunch of semi-cryptic operators like `<+/>`

and `<*/>`

which I would personally avoid in favor of their named variants, but other than that I would consider it a fairly “easy” package.

Unfortunately, I found a few bugs an inconveniences in the old package. One interesting bug would allow failing branches of the parse to still affect the indentation’s internal state, which is very bad ^{1}. Additionally, the package fixed the underlying monad, which prevented you from using transformers.

Because I didn’t want to confuse people by creating yet another package, I took over the package which is a very smooth process nowadays. I can definitely recommend this to anyone who discovers issues like these in unmaintained packages. The hackage trustees are doing great and valuable work there.

I have now uploaded a new version which fixes these issues. To celebrate that, let’s create a toy parser for indentation-sensitive taxonomies such as the big tea taxonomy ^{2}:

```
tea
green
korean
pucho-cha
chung-cha
vietnamese
snow-green-tea
japanese
roasted
...
black
georgian
traditional
caravan-blend
african
kenyan
tanzanian
...
```

We need some imports to get rolling. After all, this blogpost is a literate haskell file which can be loaded in `GHCi`

.

```
import Control.Applicative ((*>), (<*), (<|>))
import qualified Text.Parsec as Parsec
import qualified Text.Parsec.Indent as Indent
```

We just store a single term in the category as a `String`

.

A taxonomy is then recursively defined as a `Term`

and its children taxonomies.

A parser for a term is easy. We just parse an identifier and then skip the spaces following that.

```
pTerm :: Indent.IndentParser String () String
pTerm =
Parsec.many1 allowedChar <* Parsec.spaces
where
allowedChar = Parsec.alphaNum <|> Parsec.oneOf ".-"
```

In the parser for a `Taxonomy`

, we use the `indents`

library. `withPos`

is used to “remember” the indentation position. After doing that, we can use combinators such as `indented`

to check if we are indented past that point.

```
pTaxonomy :: Indent.IndentParser String () Taxonomy
pTaxonomy = Indent.withPos $ do
term <- pTerm
subs <- Parsec.many $ Indent.indented *> pTaxonomy
return $ Taxonomy term subs
```

Now we have a simple main to function to put it all together;

```
readTaxonomy :: FilePath -> IO Taxonomy
readTaxonomy filePath = do
txt <- readFile filePath
let errOrTax = Indent.runIndentParser parser () filePath txt
case errOrTax of
Left err -> fail (show err)
Right tax -> return tax
where
parser = pTaxonomy <* Parsec.eof
```

And we can verify that this works in GHCi:

```
*Main> readTaxonomy "taxonomy.txt"
Taxonomy "tea" [Taxonomy "green" [Taxonomy "korean" [...
*Main>
```

Special thanks to Sam Anklesaria for writing the original package.

The interesting tea taxonomy can be found in this blogpost: https://jameskennedymonash.wordpress.com/mind-maps/amazing-tea-taxonomy/.↩

At work, I frequently need to give (internal) presentations and demos using video conferencing. I prefer to do these quick-and-dirty presentations in the terminal for a few reasons:

- I don’t spend time worrying about layout, terminal stuff always looks cool.
- I want to write markdown if possible.
- You can have a good
*“Questions?”*slide just by running`cowsay 'Questions?'`

- Seamless switching between editor/shell and presentation using tmux.

The last point is important for video conferencing especially. The software we use allows you to share a single window from your desktop. This is pretty neat if you have a multi-monitor setup. However, it does not play well with switching between a PDF viewer and a terminal.

To this end, I wrote patat – **P**resentations **A**nd **T**he **A**NSI **T**erminal – because I was not entirely happy with the available solutions. You can get it from Hackage: `cabal install patat`

.

You run it simply by doing:

`patat presentation.md`

The key features are:

**Built on Pandoc**:The software I was using before contained some Markdown parsing bugs. By using Pandoc under the hood, this should not happen.

Additionally, we get all the input formats Pandoc supports (Literate Haskell is of particular importance to me) and some additional elements like tables and definition lists.

**Smart slide splitting**:Most Markdown presentation tools seem to split slides at

`---`

(horizontal rulers). This is a bit verbose since you usually start each slide with an`h1`

as well.`patat`

will check if`---`

is used and if it’s not, it will split on`h1`

s instead.**Live reload**:If you run

`patat --watch presentation.md`

,`patat`

will poll the file for changes and reload automatically. This is really handy when you are writing the presentation, I usually use it with split-pane in`tmux`

.

An example of a presentation is:

```
---
title: This is my presentation
author: Jane Doe
...
# This is a slide
Slide contents. Yay.
# Important title
Things I like:
- Markdown
- Haskell
- Pandoc
- Traveling
```

I started writing a simple prototype of `patat`

during downtime at ICFP2016, when I discovered that MDP was not able to parse my presentation correctly.

After ICFP, I flew to Myanmar, and I am currently traveling around the country with my girlfriend. It’s a super interesting place to visit, with a rich history. Now that NLD is the ruling party, I think it is a great time to visit the country responsibly.

However, it is a **huge** country – the largest in south-east Asia – so there is some downtime traveling on domestic flights, buses and boats. I thought it was a good idea to improve the tool a bit further, since you don’t need internet to hack on this sort of thing.

Pull requests are welcome as always! Note that I will be slow to respond: for the next three days I will be trekking from Kalaw to Inle Lake, so I have no connectivity (or electricity, for that matter).

*Sidenote*: *“Patat”* is the Flemish word for “potato”. Dutch people also use it to refer to French Fries but I don’t really do that – in Belgium we just call fries *“Frieten”*.

Not all monitors are alike. If you are one of the many people who have an external monitor sitting next to their laptop, you might have noticed at some point that colors on one monitor look slightly (or sometimes wildly) different than on the other monitor. When you view a picture on your smartphone, colors may be more saturated than on your computer, and vice versa.

This is not a big problem when I am programming or reading, but it becomes an issue when you are doing any visual stuff – including HTML/CSS. For example, when I am editing a picture on my external monitor, I might notice that the reds look a bit too orange-like. But on my laptop screen, they look fine. Does this mean I should shift them or not? Which monitor should I *“trust”*?

The answer is simple: for things like this, you can only trust *calibrated* monitors. There are many tools to calibrate a monitor, and most systems provide some sort of calibration wizard that asks you a bunch of questions. However, if you take digital photography (or any other hobby involving digital visuals) seriously, the only answer is a hardware calibrator. Without one, editing pictures is guesswork at best.

Unfortunately, these are not free. I picked up a DataColor Spyder4Express for around $150. I’m not sure if that particular model is still available, but there are newer models now for similar prices. It is not cheap, but it is still a lot less than the average lens, and I think this benefits you more than buying another lens in a lot of cases. Of course, you can also consider borrowing one. Calibrating once every month or so is more than enough.

In general, these calibrators (or wizards) generate ICC profiles (`.icm`

or `.icc`

files). Such a file is basically a mapping between color spaces, in this case for a specific monitor.

Sometimes, generating and finding these files is a bit tricky. For example, the calibrator I have does not support Linux (but I dual boot Windows).

Additionally, it has the arbitrary restriction that you can only calibrate a single monitor when using the software directly (it is almost like they really want you to get the expensive model!). The problem, however, is easily solved by calibrating one monitor, copying the ICC profile (usually in `/Windows/System32/spool/drivers/color/`

) somewhere and then calibrating other monitors. Now, what remains is loading these profiles in Linux.

While there are a few solutions available, I decided to implement my own in a simple script to learn a bit more about these things – simply because stuff like this is always interesting.

It turns out that monitors can be identified by reading their EDID block. This is (usually) 128 bytes of binary data, that contains, among other things, human-readable names for most monitors. The whole EDID block can be extracted by a tool like xrandr, but we need some code to parse the data. This Python snippet extracts the human-readable parts from an EDID block. These can appear at various offsets but the tags that precede them are known.

```
EDID_NAME_TAGS = [
bytearray.fromhex('000000fc00'),
bytearray.fromhex('000000fe00')
]
def name_from_edid(edid):
names = []
for i in range(0, 6):
offset = 36 + i * 18
tag = edid[offset : offset + 5]
val = edid[offset + 5 : offset + 18]
if tag in EDID_NAME_TAGS:
names += [val.decode('utf-8').strip()]
return '-'.join(names)
```

For my ThinkPad’s monitor, this returns `LG-Display-LP140QH1-SPB1`

.

The rest is fairly trivial. The complete script reads human-readable names from the EDID blocks and then finds the corresponding files in `$HOME/.color/icc`

. If such a file is found, it uses `dispwin`

from `argyllcms`

to set the profile for that monitor.

```
$ colorprof
[DP1] checking: /home/jasper/.color/icc/SyncMaster.icc
[DP1] running: dispwin -d 2 /home/jasper/.color/icc/SyncMaster.icc
[eDP1] checking: /home/jasper/.color/icc/LG-Display-LP140QH1-SPB1.icc
[eDP1] running: dispwin -d 1 /home/jasper/.color/icc/LG-Display-LP140QH1-SPB1.icc
```

]]>