Monads and Arrows: modelling a build system
Published on January 14, 2012 under the tag haskell
This is a recap of an older blogpost of mine. I decided to rewrite it after I wanted to refer a friend to it, and I saw the blogpost clearly failed a bit in getting the point across. In this blogpost, I hope to correct that. It’s about a situation in which Monads fall short, but Arrows (and Applicative) prove to be very powerful. It assumes some basic familiarity with Monads, familiarity with Arrows is not necessary. This blogpost:
- Proposes a simplistic build system model
- Gives an implementation using Monads, and fails
- Gives a working implementation using Arrow (and Applicative)
Setup
{-# LANGUAGE Arrows #-}
import Prelude hiding (id, (.))
import Control.Applicative (Applicative (..), (<$>))
import Control.Arrow (Arrow (..), returnA, (>>>))
import Control.Category (Category (..))
import Control.Monad ((<=<))
import Control.Monad.State (StateT)
import System.Directory (doesFileExist, getModificationTime)
This post uses incremental build systems (think make or ant) as an example. These systems allow you to specify commands which are only executed if the destination file is out-of-date. The reason I’m using this example is that it’s highly applicable in Hakyll, a static site compiler which is one of my side projects.
Let’s use a bottom-up approach and first write a simple function to only do
out-of-date builds. The runBuild
function checks the modification times of the
dependencies and the destination file, and based on that information, calls or
doesn’t call the IO String
workhorse. This is obviously very limited
functionality, but it’s just an example.
runBuild :: FilePath -- ^ Destination
-> [FilePath] -- ^ Dependencies
-> IO String -- ^ Workhorse which produces output
-> IO () -- ^ May or may not run the workhorse
= do
runBuild dest deps f <- doesFileExist dest
exists <- mapM getModificationTime deps
depsModified case (exists, depsModified) of
False, _) -> run
(True, []) -> dontRun
(True, _) -> do
(<- getModificationTime dest
destModified if destModified < maximum depsModified then run else dontRun
where
= putStrLn $ "Up to date: " ++ dest
dontRun = do
run putStrLn $ "Building " ++ dest
writeFile dest =<< f
Let’s implement the Unix paste command. We first have a pure version:
paste :: String -> String -> String
= unlines $ zipWith (\x' y' -> x' ++ "\t" ++ y') (lines x) (lines y) paste x y
And now we can apply our runBuild
function:
testBuild :: IO ()
= runBuild "test-io.txt" ["rainbows.txt", "unicorns.txt"] $ do
testBuild <- readFile "rainbows.txt"
x <- readFile "unicorns.txt"
y return $ paste x y
This works fine, but the annoyance is that we manually have to specify our dependencies: this quickly becomes very tedious. Instead, our goal is to automate the dependency tracking. Haskell allows for many abstractions, so let’s have a look at how we can accomplish this.
Monads
Let’s see if we can capture this behaviour in a Monad. If we declare our Monad as a simple datatype which holds the dependencies and the actual workhorse. we get something like:
data BuildM a = BuildM [FilePath] (IO a)
Running is easy:
runBuildM :: FilePath -> BuildM String -> IO ()
BuildM deps f) = runBuild dest deps f runBuildM dest (
And a readFile
could be implemented like:
readFileM :: FilePath -> BuildM String
= BuildM [path] $ readFile path readFileM path
However, problems arise when we try to pin down the Monad instance for this datatype.
instance Monad BuildM where
return x = BuildM [] $ return x
BuildM deps f) >>= g = BuildM deps $ do
(-- Where do the dependencies of g's result go?
BuildM _ y <- g <$> f
y
Clearly, this datatype doesn’t allow us to get f
s dependencies in mx >>= f
.
We can write the following piece of code, but it won’t be correct, as it ignores
the "unicorns.txt"
dependency.
testBuildM :: IO ()
= runBuildM "test-m.txt" $ do
testBuildM <- readFileM "rainbows.txt"
x <- readFileM "unicorns.txt"
y return $ paste x y
Other datatypes are possible, e.g. one could also try something like:
type BuildM' = StateT [FilePath] IO
This kind of definition leads to another problem: the mx
in mx >>= f
will
always be executed, even if everything is up-to-date. This behaviour is
inherently coupled to the use of Monads, consider code like this:
testBuildM' :: IO ()
= runBuildM "test-m.txt" $ do
testBuildM' <- readFileM "rainbows.txt"
x <- if length x > 200 then readFileM "unicorns.txt" else return ""
y return $ paste x y
We need to evaluate x in order to determine the dependencies! This is not how
a build system should work: the system should not inspect x
and just add
"unicorns.txt"
as a dependency, regardless of the value of x
. The fact that
we can’t get around this makes it clear that Monads are not a good choice here.
Arrows
Two other possibilities will work well here: Arrows and Applicative. I’ll demonstrate the Arrow solution first, because it is a bit more generic 1.
The datatype looks a lot like the one used for the Monad instance 2:
data BuildA a b = BuildA [FilePath] (a -> IO b)
Running this build datatype is also straightforward 3.
runBuildA :: FilePath -> BuildA () String -> IO ()
BuildA deps f) = runBuild dest deps $ f () runBuildA dest (
Arrows are a generalized version of functions, and can be used in a similar way.
Each Arrow is also a Category, so we first need to declare a Category instance.
In order to make our BuildA
an Category, we need an identity operation, and
function composition.
The BuildA a a
identity operation is straightforward to implement: it
obviously has no dependencies, it is a build step which does absolutely nothing.
A composition of two build steps takes the sum of dependencies and composes the
workhorses using <=<
4:
instance Category BuildA where
id = BuildA [] return
BuildA d1 f . BuildA d2 g = BuildA (d1 ++ d2) (f <=< g)
This is not enough to instantiate an Arrow, though. Two more methods need to be
implemented: arr
and first
.
arr
is reasonably simple and allows the user to “lift” a pure function into an
Arrow. For our example, this yields the type signature
arr :: (a -> b) -> BuildA a b
– the implementation is straightforward.
In order to allow the programmer to build computations using Arrows, a mechanism
to pass variables through computations is needed. In our example, we have
first :: BuildA a b -> BuildA (a, c) (b, c)
: it transforms a simple Arrow
into an Arrow which carries an additional variable through the computation.
instance Arrow BuildA where
= BuildA [] (return . f)
arr f BuildA deps f) = BuildA deps $ \(x, y) -> do
first (<- f x
x' return (x', y)
Let’s write the Arrow version of readFileM
which also automatically adds a
dependency:
readFileA :: FilePath -> BuildA () String
= BuildA [path] $ \() -> readFile path readFileA path
Using Arrow notation, we can now implement a (not very pretty) solution which
does bear a lot of resemblance to testBuildM
, with the difference that this
version actually works with proper dependency management:
testBuildA :: IO ()
= runBuildA "test-a.txt" $ proc () -> do
testBuildA <- readFileA "rainbows.txt" -< ()
x <- readFileA "unicorns.txt" -< ()
y -< paste x y returnA
However, writing ugly code like this obviously isn’t the way we want to go.
Arrow-based code can be made a whole lot prettier if you write as much code as
possible as a processing Arrow. For example, we could write an Arrow-based
variant of paste
which processes a file by pasting another file next to it:
pasteFileA :: FilePath -> BuildA String String
= proc x -> do
pasteFileA path <- readFileA path -< ()
y -< paste x y returnA
With utilities like this, we can write a much prettier testBuildA
which
clearly demonstrates the processing approach. >>>
is left-to-right composition
of Arrows, much like a flipped version of .
:
testBuildA' :: IO ()
= runBuildA "test-a.txt" $
testBuildA' "rainbows.txt" >>>
readFileA "unicorns.txt" pasteFileA
Epilogue: Applicative functors
Arrow and Applicative show similar behaviour in many cases. For our example, we also could’ve chosen to implement our solution using Applicative instead of Arrow. I’ve chosen Arrow for two reasons:
- It is often more natural to model a building process as an Arrow.
- We can actually write an Applicative instance for the same datatype, giving the user the freedom of choice!
It’s a fun challenge to implement this Applicative instance for the BuildA
datatype.
I hope this blogpost made some of the advantages and disadvantages between Monad and Arrow clear. All comments and feedback are welcome, as always. Thanks to nudded for proofreading.
More generic in kind: Arrow has a
* -> * -> *
kind, and Applicative has a* -> *
kind. This is important later on, because it means we can reuse our Arrow datatype for the Applicative solution.↩︎The second field is in fact a Kleisli arrow, almost a direct translation of the IO monad to the Arrow structure.↩︎
Note that another option is:
runBuildA :: FilePath -> BuildA a String -> a -> IO ()
.↩︎<=< :: Monad m => (b -> m c) -> (a -> m b) -> (a -> m c)
, the spaceship operator fromControl.Monad
, right-to-left composition of monadic functions.↩︎