Can we write a Monoidal Either?
Published on May 19, 2015 under the tag haskell
{-# LANGUAGE GeneralizedNewtypeDeriving #-}
import Control.Applicative (Applicative (..))
import Data.Monoid (Monoid, (<>))
Introduction
For the last month or, I have been working as a contractor for Luminal. I am helping them implement Fugue, and more specifically Ludwig – a compiler a for statically typed declarative configuration language. This is one of the most interesting projects I have worked on so far – writing a compiler is really fun. While implementing some parts of this compiler, I came across an interesting problem.
In particular, a typeclass instance seemed to adhere to both the Monad and Applicative laws, but with differing behaviour – which felt a bit fishy. I started a discussion on Twitter understand it better, and these are my thoughts on the matter.
The problem
Suppose we’re writing a typechecker. We have to do a number of things:
- Parse the program to get a list of user-defined types.
- Typecheck each expression.
- A bunch of other things, but for the purpose of this post we’re going to leave it at that.
Now, any of these steps could fail, and we’d like to log the reason for failure.
Clearly this is a case for something like Either
! Let’s define ourselves an
appropriate datatype.
data Check e a
= Failed e
| Ok a
deriving (Eq, Show)
We can write a straightforward Functor instance:
instance Functor (Check e) where
fmap _ (Failed e) = Failed e
fmap f (Ok x) = Ok (f x)
The Monad instance is also very obvious:
instance Monad (Check e) where
return x = Ok x
Failed e >>= _ = Failed e
Ok x >>= f = f x
However, the Applicative instance is not that obvious – we seem to have a choice.
But first, lets take a step back and stub out our compiler a bit more, so that we have some more context. Imagine we have the following types in our compiler:
data Type = ...
data Expr = ...
data Program = ...
type TypeScope = [Type]
And our code looks like this:
findDefinedTypes1 :: Program -> Check String TypeScope
= Ok [] -- Assume we can't define types for now. findDefinedTypes1 _
typeCheck1 :: TypeScope -> Expr -> Check String Type
= Failed $ "Could not typecheck: " ++ show e typeCheck1 _ e
compiler1 :: Check String ()
= do
compiler1 <- findDefinedTypes1 program1
scope -- False == 3
typeCheck1 scope expr1 -- [1, 'a']
typeCheck1 scope expr2 return ()
On executing compiler1
, we get the following error:
*Main> compiler1
Failed "Could not typecheck: False == 3"
Which is correct, but using a compiler entirely written in this fashion would be
annoying. Check
, like Either
, short-circuits on the first error it
encounters. This means we would compile our program, fix one error, compile, fix
the next error, compile, and so on.
It would be much nicer if users were to see multiple error messages at once.
Of course, this is not always possible. On one hand, if findDefinedTypes1
throws an error, we cannot possibly call typeCheck1
, since we do not have a
TypeScope
.
On the other hand, if findDefinedTypes1
succeeds, shouldn’t it be possible to
collect error messages from both typeCheck1 scope expr1
and typeCheck1 scope expr2
?
It turns out this is possible, precisely because the second call to typeCheck1
does not depend on the result of the first call – so we can execute them in
parallel, if you will. And that is precisely the difference in expressive power
between Monad and Applicative: Monadic >>=
provides access to previously
computed results, where Applicative <*>
does not. Let’s (ab?)use this to our
advantage.
The solution?
Cleverly, we put together following instance:
instance Monoid e => Applicative (Check e) where
pure x = Ok x
Ok f <*> Ok x = Ok (f x)
Ok _ <*> Failed e = Failed e
Failed e <*> Ok _ = Failed e
Failed e1 <*> Failed e2 = Failed (e1 <> e2)
Using this instance we can effectively collect error messages. We need to
change our code a bit to support a collection of error messages, so let’s use
[String]
instead of String
since a list is a Monoid.
findDefinedTypes2 :: Program -> Check [String] TypeScope
= Ok [] -- Assume we can't define types for now. findDefinedTypes2 _
typeCheck2 :: TypeScope -> Expr -> Check [String] Type
= Failed ["Could not typecheck: " ++ show e] typeCheck2 _ e
compiler2 :: Check [String] ()
= do
compiler2 <- findDefinedTypes2 program1
scope *> typeCheck2 scope expr2
typeCheck2 scope expr1 return ()
Note that *>
is the Applicative equivalent of the Monadic >>
.
Now, every error is represented by a list of error messages (typically a
singleton such as in typeCheck2
), and the Applicative <*>
combines error
messages. If we execute compiler2
, we get:
*Main> compiler2
Failed ["Could not typecheck: False == 3",
"Could not typecheck: [1, 'a']"]
Success! But is that all there is to it?
The problem with the solution
The problem is that we have created a situation where <*>
is not equal to
ap
1. After researching this for a while, it seems that <*> = ap
is not
a verbatim rule. However, most arguments suggest it should be the case – even
the name.
This is important for refactoring, for example. Quite a few Haskell programmers (including myself) would refactor:
do b <- bar
<- qux
q return (Foo b q)
Into:
Foo <$> bar <*> qux
Without putting too much thought in it, just assuming it does the same thing.
In our case, they are clearly similar, but not equal – we would get only one error instead of collecting error messages. One could argue that this is close enough, but when one uses that argument too frequently, you might just end up with something like PHP.
The problem becomes more clear in the following fragment:
>>
checkForCyclicImports modules compileAll modules
Which has completely different behaviour from this fragment:
*>
checkForCyclicImports modules compileAll modules
The latter will get stuck in some sort of infinite recursion, while the former
will not. This is not a subtle difference anymore. While the problem is easy to
spot here (>>
vs. *>
), this is not always the case:
forEveryImport_ :: Monad m => Module -> (Import -> m ()) -> m ()
Ever since AMP, it is impossible to tell whether this will do a forM_
or a
for_
-like traversal without looking at the implementation – this makes
making mistakes easy.
The solution to the problem with the solution
As we discussed in the previous section, it should be possible for a programmer
to tell exactly how a Monad or Applicative will behave, without having to dig
into implementations. Having a structure where <*>
and ap
behave slightly
differently makes this hard.
When a Haskell programmer wants to make a clear distinction between two similar
types, the first thing that comes to mind is probably newtype
s. This problem
is no different.
Let’s introduce a newtype for error-collecting Applicative. Since the Functor
instance is exactly the same, we might as well generate it using
GeneralizedNewtypeDeriving
.
newtype MonoidCheck e a = MonoidCheck {unMonoidCheck :: Check e a}
deriving (Functor, Show)
Now, we provide our Applicative instance for MonoidCheck
:
instance Monoid e => Applicative (MonoidCheck e) where
pure x = MonoidCheck (Ok x)
MonoidCheck l <*> MonoidCheck r = MonoidCheck $ case (l, r) of
Ok f , Ok x ) -> Ok (f x)
(Ok _ , Failed e ) -> Failed e
(Failed e , Ok _ ) -> Failed e
(Failed e1, Failed e2) -> Failed (e1 <> e2) (
Finally, we avoid writing a Monad instance for MonoidCheck
. This approach
makes the code cleaner:
This ensures that when people use
MonoidCheck
, they are forced to use the Applicative combinators, and they cannot accidentally reduce the number of error messages.For other programmers reading the code, it is very clear whether we are dealing with short-circuiting behaviour or that we are collecting multiple error messages: it is explicit in the types.
Usage and conversion
Our fragment now becomes:
findDefinedTypes3 :: Program -> Check [String] TypeScope
= Ok [] -- Assume we can't define types for now. findDefinedTypes3 _
typeCheck3 :: TypeScope -> Expr -> MonoidCheck [String] Type
= MonoidCheck $ Failed ["Could not typecheck: " ++ show e] typeCheck3 _ e
compiler3 :: Check [String] ()
= do
compiler3 <- findDefinedTypes3 program1
scope $ typeCheck3 scope expr1 *> typeCheck3 scope expr2
unMonoidCheck return ()
We can see that while it is not more concise, it is definitely more clear: we can see exactly which functions will collect error messages. Furthermore, if we now try to write:
>> typeCheck3 scope expr2 typeCheck3 scope expr1
We will get a compiler warning telling us we should use *>
instead.
Explicitly, we now convert between Check
and MonoidCheck
by simply calling
MonoidCheck
and unMonoidCheck
. We can do this inside other transformers if
necessary, using e.g. mapReaderT
.
Data.Either.Validation
The MonoidCheck
discussed in this blogpost is available as
Data.Either.Validation on hackage. The main difference is that instead of
using a newtype
, the package authors provide a full-blown datatype.
data Validation e a
= Failure e
| Success a
And two straightforward conversion functions:
validationToEither :: Validation e a -> Either e a
Failure e) = Left e
validationToEither (Success x) = Right x validationToEither (
eitherToValidation :: Either e a -> Validation e a
Left e) = Failure e
eitherToValidation (Right x) = Success x eitherToValidation (
This makes constructing values a bit easier:
Failure ["Can't go mucking with a 'void*'"]
Instead of:
MonoidCheck $ Failed ["Can't go mucking with a 'void*'"]
At this point, it shouldn’t surprise you that Validation
intentionally does not
provide a Monad instance.
Conclusion
This, of course, is all my opinion – there doesn’t seem to be any definite
consensus on whether or not ap
should be the same as <*>
, since differing
behaviour occurs in prominent libraries. While the Monad and Applicative laws
are relatively well known, there is no canonical law saying that ap = <*>
.
Update: there actually is a canonical law that ap
should be <*>
, and
it was right under my nose in the Monad documentation since AMP. Before
that, it was mentioned in the Applicative documentation. Thanks to quchen for
pointing that out to me!
A key point here is that the AMP actually related the two typeclasses. Before that, arguing that the two classes were in a way “unrelated” was still a (dubious) option, but that is no longer the case.
Furthermore, considering this as a law might reveal opportunities for optimisation 2.
Lastly, I am definitely a fan of implementing these differing behaviours using different types and then converting between them: the fact that types explicitly tell me about the behaviour of code is one of the reasons I like Haskell.
Thanks to Alex Sayers for proofreading and suggestions.
ap
is the Monadic sibling of<*>
(which explains why<*>
is commonly pronouncedap
). It can be implemented on top of>>=
/return
:↩︎> ap :: Monad m => m (a -> b) -> m a -> m b > ap mf mx = do > f <- mf > x <- mx > return (f x)
Take this with a grain of salt – Currently, GHC does not use any of the Monad laws to perform any optimisation. However, some Monad instances use them in
RULES
pragmas.↩︎