Tries and elegant Scope Checking
Published on October 30, 2015 under the tag haskell
Introduction
This blogpost is mostly based upon a part of the talk I recently gave at the Haskell eXchange. I discussed scope checking – also referred to as scope analysis or renaming. While the talk focussed on Ludwig, a DSL used to program Fugue, the ideas around scope checking are broadly applicable, so in this blogpost we use a simple toy language.
{-# LANGUAGE DeriveFoldable #-}
{-# LANGUAGE DeriveFunctor #-}
{-# LANGUAGE DeriveTraversable #-}
import qualified Data.HashMap.Strict as HMS
import Data.Hashable (Hashable)
import Data.List (foldl')
import Data.Either.Validation (Validation (..),
validationToEither)import Prelude hiding (lookup)
This part of a Compiler/Interpreter is concerned with resolving occurence names to full names. Occurrence names are just what the programmer uses in the source file, and full names contain more information.
I think this is an interesting area to explore. The vast majority of articles
about creating parsers and interpreters just use String
s as names, in order to
keep things simple (which is of course fully justified). This blogpost, on the
other hand, explains what you can do if things become a bit more complicated.
Consider the following Haskell snippet:
import qualified Data.HashMap.Strict as HMS
= HMS.empty emptyThing
HMS.empty
is an occurrence name. The full name, on the other hand, is
something like unordered-containers-0.2.5.1:Data.HashMap.Base.empty
. Let’s get
started by representing these types in Haskell:
-- E.g. ["HMS", "empty"].
type OccName = [String]
-- E.g. ["Data", "HashMap", "Strict"]
type ModuleName = [String]
-- Just an example of what sort of things can be in 'FullName'.
data BindingScope = ToplevelScope | LocalScope
deriving (Show)
data FullName = FullName
fnOccName :: !OccName
{ fnModuleName :: !ModuleName
, fnBindingScope :: !BindingScope
,deriving (Show) }
Note that this is just a toy example. Firstly, we can use more efficient
representations for the above, and we might want to add newtype
safety.
Secondly, we might also want to store other things in FullName
, for example
the package where the name originated. The FullName
record can really get
quite big.
Now that we have two name types – OccName
and FullName
, we can parametrise
our abstract syntax tree over a name type.
data Expr n
= Literal Int
| Add (Expr n) (Expr n)
| Var n
deriving (Show)
Now, we can formalise the problem of scope checking a bit more: it is a function
which turns an Expr OccName
into an Expr FullName
.
Tries
In order to implement this, it is clear that we need some sort of “Map” to
store the FullName
information. The specific data structure we will use is a
Trie. Tries are somewhat similar to Radix trees, but significantly more
simple. We will implement one here for educational purposes.
A Trie k v
can be seen as a mapping from lists of keys to values, so it
could be defined as:
type Trie k v = HMS.HashMap [k] v
However, there is a nicer representation which we will need in order to support some fast operations.
First, we need a quick-and-dirty strict Maybe
type.
data M a = J !a | N
deriving (Foldable, Functor, Show, Traversable)
Note how we automically added Foldable
, Functor
and Traversable
instances
for this type. Thanks GHC!
Then, we can define Trie
in a recursive way:
data Trie k v = Trie
tValue :: !(M v)
{ tChildren :: !(HMS.HashMap k (Trie k v))
,deriving (Foldable, Functor, Show, Traversable) }
We can have a value at the root (tValue
), and then the other elements in the
Trie
are stored under the first key of their key list (in tChildren
).
Now it is time to construct some machinery to create Trie
s. The 1
empty Trie
is really easy:
empty :: Trie k v
= Trie N HMS.empty empty
Let’s draw the empty Trie
as a simple box with an N
value, since it has no
value and no children.
We can also define a function to create a Trie
with a single element. If the
list of keys is empty, we simply have a J
value at the root. Otherwise, we
define the function recursively.
singleton :: (Eq k, Hashable k) => [k] -> v -> Trie k v
= Trie (J x) HMS.empty
singleton [] x : ks) x = Trie N (HMS.singleton k (singleton ks x)) singleton (k
As an example, this is the result of the call
singleton ["foo", "bar"] "Hello World"
.
We can skip insert
and simply create a unionWith
function instead. This
function unifies two Trie
s, while allowing you to pass in a function that
decides how to merge the two values if there is a key collision.
unionWith :: (Eq k, Hashable k)
=> (v -> v -> v) -> Trie k v -> Trie k v -> Trie k v
Trie v1 c1) (Trie v2 c2) =
unionWith f (Trie v $ HMS.unionWith (unionWith f) c1 c2
where
= case (v1, v2) of
v N, _) -> v2
(N) -> v1
(_, J x, J y) -> J (f x y) (
The bulk of the work is of course done by HMS.unionWith
. This is the result of
calling
unionWith (\x _ -> x) (singleton "foo" "Hello") (singleton "bar" "World")
:
For convenience, we can then extend unionWith
to work on lists:
unionsWith :: (Eq k, Hashable k)
=> (v -> v -> v) -> [Trie k v] -> Trie k v
= foldl' (unionWith f) empty unionsWith f
A last function we need to modify tries is prefix
. This function prefixes a
whole Trie
by nesting it under a list of keys. Because of the way our Trie
is represented, this can be done efficiently and we don’t need to change every
key.
prefix :: (Eq k, Hashable k) => [k] -> Trie k v -> Trie k v
= trie
prefix [] trie : ks) trie = Trie N $ HMS.singleton k (prefix ks trie) prefix (k
This is the result of prefixing the trie from the previous example with
["qux"]
:
In addition to creating Trie
s, we also need to be able to lookup stuff in the
Trie
. All we need for that is a simple lookup
function:
lookup :: (Eq k, Hashable k) => [k] -> Trie k v -> Maybe v
lookup [] (Trie N _) = Nothing
lookup [] (Trie (J x) _) = Just x
lookup (k : ks) (Trie _ children) = do
<- HMS.lookup k children
trie lookup ks trie
These are all the Trie
functions we need. A real implementation would, of
course, offer more.
The scope type
Now, recall that we’re trying to resolve the occurrence names in a module into full names. We will tackle this from the opposite direction: we’ll gather up all the names which are in scope into one place. After this, actually, resolving an occurrence name is as simple as performing a lookup.
In order to gather up all these names we need some datatype – which is, of
course, the Trie
we just implemented!
type Scope a = Trie String a
We will differentiate between two different kinds of scopes (hence the a
). An
AmbiguousScope
might contain duplicate names. In that case, we want to throw
an error or show a warning to the user. In an UnambiguousScope
, on the other
hand, we know precisely what every name refers to.
type AmbiguousScope = Scope [FullName]
type UnambiguousScope = Scope FullName
Let’s first focus on building AmbiguousScope
s. We will later see how we can
validate these and convert them into an UnambiguousScope
.
Building a scope for one module
In order to build a scope, let’s start with a simple case. Let’s look at a sample module in our DSL and construct a scope just for that module.
module Calories.Fruit where
apple = 52
banana = 89
We need to have some intuition for how such a module is represented in Haskell. Let’s try to keep things as simple as possible:
data Module n = Module
mName :: !ModuleName
{ mBindings :: [Binding n]
,deriving (Show) }
data Binding n = Binding
bName :: !n
{ bBody :: !(Expr n)
,deriving (Show) }
We can define a function to convert this module into a local Scope
which
contains all the bindings in the module. In order to keep things simple, we
assume every binding in a module is always exported.
scopeFromModule :: Module OccName -> AmbiguousScope
=
scopeFromModule m ++) $ map scopeFromBinding (mBindings m)
unionsWith (where
scopeFromBinding :: Binding OccName -> AmbiguousScope
= singleton (bName b)
scopeFromBinding b FullName
[ = bName b
{ fnOccName = mName m
, fnModuleName = ToplevelScope
, fnBindingScope
} ]
For our example module, we obtain something like:
Multiple imports
Of course, a realistic program will import multiple modules. Imagine a program with the following import list:
import Calories.Fruit
import qualified Calories.Pie as Pie
-- An apple and an apple pie!
combo = apple + Pie.apple
In order to build the Scope
for the program, we need three more things:
- Joining a bunch of
Scope
s, one for each import statement (plus the local scope, and maybe a builtin scope…); - Qualifying a
Scope
, so that the qualified imports end up under the right name; - Finally, converting the
AmbiguousScope
into anUnambiguousScope
.
The first one is easy, since we have our Trie
machinery.
unionScopes :: [AmbiguousScope] -> AmbiguousScope
= unionsWith (++) unionScopes
So is the second:
qualifyScope :: [String] -> AmbiguousScope -> AmbiguousScope
= prefix qualifyScope
We can now build the scope for our little program. It is:
myScope :: AmbiguousScope
= unionScopes
myScope -- Defines 'combo'
[ scopeFromModule myModule
, scopeFromModule fruitModule"Pie"] $ scopeFromModule pieModule
, qualifyScope [ ]
We get something like:
Great! So now the problem is that we’re left with an AmbiguousScope
instead of
an UnambiguousScope
. Fortunately we can convert between those fairly easily,
because Trie
(and by extension Scope
) is Traversable:
toUnambiguousScope :: AmbiguousScope -> Validation [ScopeError] UnambiguousScope
= traverse $ \fullNames -> case fullNames of
toUnambiguousScope -> pure single
[single] -> Failure [InternalScopeError "empty list in scope"]
[] -> Failure [AmbiguousNames multiple] multiple
It is perhaps worth noting that this behaviour is different from GHC 2.
By using the Validation Applicative, we ensure that we get as many error messages as we can. We have a nice datatype which describes our possible errors:
data ScopeError
= AmbiguousNames [FullName]
| NotInScope OccName
| InternalScopeError String -- For other failures
deriving (Show)
Scope checking an expression
That entails everything we needed to build an UnambiguousScope
, so we can now
scope check a program. The actual scope checking itself is very straightforward:
scExpr :: UnambiguousScope -> Expr OccName
-> Validation [ScopeError] (Expr FullName)
Literal x) = pure (Literal x)
scExpr _ (Add x y) = Add <$> scExpr s x <*> scExpr s y
scExpr s (Var n) = Var <$> scOccName s n
scExpr s (
scOccName :: UnambiguousScope -> OccName
-> Validation [ScopeError] FullName
= case lookup n s of
scOccName s n Just fullName -> pure fullName
Nothing -> Failure [NotInScope n]
Conclusion
I have described a simple and (in my opinion) elegant approach to scope checking. I hope this is inspiring if you ever are in the situation where modules would be a nice extension to some DSL (or full-fledged programming language) you are implementing.
We’ve also seen how one can implement a Trie
in a reasonably easy way. These
often come in handy when you are modelling some sort of hierarchical Map
.
This entire blogpost is written in Literate Haskell, and works as a standalone
example for scope checking. If you feel up to the challenge, try to add
Let-bindings as an exercise! You can find the raw .lhs
file
here.
Appendix
This is the rest of the source code to this blogpost, in order to make it testable (and hackable!).
fruitModule :: Module OccName
= Module
fruitModule = ["Calories.Fruit"]
{ mName =
, mBindings Binding ["apple"] (Literal 52)
[ Binding ["banana"] (Literal 89)
,
]
}
pieModule :: Module OccName
= Module
pieModule = ["Calories.Pie"]
{ mName =
, mBindings Binding ["apple"] (Literal 240)
[ Binding ["blueberry"] (Literal 371)
,
]
}
myModule :: Module OccName
= Module
myModule = ["Main"]
{ mName =
, mBindings Binding ["combo"] $ Add
[ Var ["apple"])
(Var ["Pie", "apple"])
(
]
}
scModule :: UnambiguousScope -> Module OccName
-> Validation [ScopeError] (Module FullName)
Module n bs) = Module n <$> traverse (scBinding s) bs
scModule s (
scBinding :: UnambiguousScope -> Binding OccName
-> Validation [ScopeError] (Binding FullName)
Binding n e) = Binding <$> scOccName s n <*> scExpr s e
scBinding s (
main :: IO ()
= do
main let ambiguous = unionScopes
[ scopeFromModule fruitModule"Pie"] $ scopeFromModule pieModule
, qualifyScope [
, scopeFromModule myModule
]
print $ do
<- validationToEither $ toUnambiguousScope ambiguous
unambiguous $ scModule unambiguous myModule validationToEither
Actually, in this representation, there is no “the” empty trie, since one can represent an empty trie in infinite ways.↩︎
GHC only reports ambiguity errors for imported names when they are actually used, not when they are imported. We could also achieve this behaviour by continuing with the
AmbiguousScope
and throwing an error fromscOccName
when there is ambiguity.↩︎