Practical testing in Haskell

Testing the LRU Cache from the last post as an example
Published on March 13, 2015 under the tag haskell

Introduction

There has been a theme of “Practical Haskell” in the last few blogposts I published, and when I published the last one, on how to write an LRU Cache, someone asked me if I could elaborate on how I would test or benchmark such a module. For the sake of brevity, I will constrain myself to testing for now, although I think a lot of the ideas in the blogpost also apply to benchmarking.

This post is written in Literate Haskell. It depends on the LRU Cache we wrote last time, so you need both modules if you want to play around with the code. Both can be found in this repo.

Since I use a different format for blogpost filenames than GHC expects for module names, loading both modules is a bit tricky. The following works for me:

$ ghci posts/2015-02-24-lru-cache.lhs \
    posts/2015-03-13-practical-testing-in-haskell.lhs
*Data.SimpleLruCache> :m +Data.SimpleLruCache.Tests
*Data.SimpleLruCache Data.SimpleLruCache.Tests>

Alternatively, you can of course rename the files.

Test frameworks in Haskell

There are roughly two kinds of test frameworks which are commonly used in the Haskell world:

Unit testing, for writing concrete test cases. We will be using HUnit.
Property testing, which allows you to test properties rather than specific cases. We will be using QuickCheck. Property testing is something that might be unfamiliar to people just starting out in Haskell. However, because there already are great tutorials out there on there on QuickCheck, I will not explain it in detail. smallcheck also falls in this category.

Finally, it’s nice to have something to tie it all together. We will be using Tasty, which lets us run HUnit and QuickCheck tests in the same test suite. It also gives us plenty of convenient options, e.g. running only a part of the test suite. We could also choose to use test-framework or Hspec instead of Tasty.

A module structure for tests

Many Haskell projects start out by just having a tests.hs file somewhere, but this obviously does not scale well to larger codebases.

The way I like to organize tests is based on how we organize code in general: through the module hierarchy. If I have the following modules in src/:

AcmeCompany.AwesomeProduct.Database
AcmeCompany.AwesomeProduct.Importer
AcmeCompany.AwesomeProduct.Importer.Csv

I aim to have the following modules in tests/:

AcmeCompany.AwesomeProduct.Database.Tests
AcmeCompany.AwesomeProduct.Importer.Tests
AcmeCompany.AwesomeProduct.Importer.Csv.Tests

If I want to add some higher-level tests which basically test the entire product, I can usually add these higher in the module tree. For example, if I wanted to test our entire awesome product, I would write the tests in AcmeCompany.AwesomeProduct.Tests.

Every .Tests module exports a tests :: TestTree value. A TestTree is a tasty concept – basically a structured group of tests. Let’s go to our motivating example: testing the LRU Cache I wrote in the previous blogpost.

Since I named the module Data.SimpleLruCache, we use Data.SimpleLruCache.Tests here.

{-# OPTIONS_GHC -fno-warn-orphans #-}
{-# LANGUAGE BangPatterns               #-}
{-# LANGUAGE GeneralizedNewtypeDeriving #-}
module Data.SimpleLruCache.Tests
    ( tests
    ) where

import           Control.Applicative     ((<$>), (<*>))
import           Control.DeepSeq         (NFData)
import           Control.Monad           (foldM_)
import           Data.Hashable           (Hashable (..))
import qualified Data.HashPSQ            as HashPSQ
import           Data.IORef              (newIORef, readIORef, writeIORef)
import           Data.List               (foldl')
import qualified Data.Set                as S
import           Prelude                 hiding (lookup)
import           Data.SimpleLruCache
import qualified Test.QuickCheck         as QC
import qualified Test.QuickCheck.Monadic as QC
import           Test.Tasty              (TestTree, testGroup)
import           Test.Tasty.HUnit        (testCase)
import           Test.Tasty.QuickCheck   (testProperty)
import           Test.HUnit              (Assertion, (@?=))

What to test

One of the hardest questions is, of course, which functions and modules should I test? If unlimited time and resources are available, the obvious answer is “everything”. Unfortunately, time and resources are often scarce.

My rule of thumb is based on my development style. I tend to use GHCi a lot during development, and play around with datastructures and functions until they seem to work. These “it seems to work” cases I execute in GHCi often make great candidates for simple HUnit tests, so I usually start with that.

Then I look at invariants of the code, and try to model these as QuickCheck properties. This sometimes requires writing tricky Arbitrary instances; I will give an example of this later in this blogpost.

I probably don’t have to say that the more critical the code is, the more tests should be added.

After doing this, it is still likely that we will hit bugs if the code is non-trivial. These bugs form good candidates for testing as well:

First, add a test case to reproduce the bug. Sometimes a test case will be a better fit, sometimes we should go with a property – it depends on the bug.
Fix the bug so the test case passes.
Leave in the test case for regression testing.

Using this strategy, you should be able to convince yourself (and others) that the code works.

Simple HUnit tests

Testing simple cases using HUnit is trivial, so we won’t spend that much time here. @?= asserts that two values must be equal, so let’s use that to check that trimming the empty Cache doesn’t do anything evil:

testCache01 :: Assertion
testCache01 =
    trim (empty 3 :: Cache String Int) @?= empty 3

If we need to some I/O for our test, we can do so without much trouble in HUnit. After all,

Test.HUnit> :i Assertion
type Assertion = IO ()  -- Defined in 'Test.HUnit.Lang'

so Assertion is just IO!

testCache02 :: Assertion
testCache02 = do
    h  <- newHandle 10 :: IO (Handle String Int)
    v1 <- cached h "foo" (return 123)
    v1 @?= 123
    v2 <- cached h "foo" (fail "should be cached")
    v2 @?= 123

That was fairly easy.

As you can see, I usually give simple test cases numeric names. Sometimes there is a meaningful name for a test (for example, if it is a regression test for a bug), but usually I don’t mind using just numbers.

Simple QuickCheck tests

Let’s do some property based testing. There are a few properties we can come up with.

Calling HashPSQ.size takes O(n) time, which is why are keeping our own counter, cSize. We should check that it matches HashPSQ.size, though:

sizeMatches :: (Hashable k, Ord k) => Cache k v -> Bool
sizeMatches c =
    cSize c == HashPSQ.size (cQueue c)

The cTick field contains the priority of our next element that we will insert. The priorities currently in the queue should all be smaller than that.

prioritiesSmallerThanNext :: (Hashable k, Ord k) => Cache k v -> Bool
prioritiesSmallerThanNext c =
    all (< cTick c) priorities
  where
    priorities = [p | (_, p, _) <- HashPSQ.toList (cQueue c)]

Lastly, the size should always be smaller than or equal to the capacity:

sizeSmallerThanCapacity :: (Hashable k, Ord k) => Cache k v -> Bool
sizeSmallerThanCapacity c =
    cSize c <= cCapacity c

Tricks for writing Arbitrary instances

The Action trick

Of course, if you are somewhat familiar with QuickCheck, you will know that the previous properties require an Arbitrary instance for Cache.

One way to write such instances is what I’ll call the “direct” method. For us this would mean generating a list of [(key, priority, value)] pairs and convert that to a HashPSQ. Then we could compute the size of that and initialize the remaining fields.

However, writing an Arbitrary instance this way can get hard if our datastructure becomes more complicated, especially if there are complicated invariants. Additionally, if we take any shortcuts in the implementation of arbitrary, we might not test the edge cases well!

Another way to write the Arbitrary instance is by modeling use of the API. In our case, there are only two things we can do with a pure Cache: insert and lookup.

data CacheAction k v
    = InsertAction k v
    | LookupAction k
    deriving (Show)

This has a trivial Arbitrary instance:

instance (QC.Arbitrary k, QC.Arbitrary v) =>
        QC.Arbitrary (CacheAction k v) where
    arbitrary = QC.oneof
        [ InsertAction <$> QC.arbitrary <*> QC.arbitrary
        , LookupAction <$> QC.arbitrary
        ]

And we can apply these actions to our pure Cache to get a new Cache:

applyCacheAction
    :: (Hashable k, Ord k)
    => CacheAction k v -> Cache k v -> Cache k v
applyCacheAction (InsertAction k v) c = insert k v c
applyCacheAction (LookupAction k)   c = case lookup k c of
    Nothing      -> c
    Just (_, c') -> c'

You probably guessed where this was going by now: we can generate an arbitrary Cache by generating a bunch of these actions and applying them one by one on top of the empty cache.

instance (QC.Arbitrary k, QC.Arbitrary v, Hashable k, NFData v, Ord k) =>
        QC.Arbitrary (Cache k v) where
    arbitrary = do
        capacity <- QC.choose (1, 50)
        actions  <- QC.arbitrary
        let !cache = empty capacity
        return $! foldl' (\c a -> applyCacheAction a c) cache actions

Provided that we can model the complete user facing API using such an “action” datatype, I think this is a great way to write Arbitrary instances. After all, our Arbitrary instance should then be able to reach the same states as a user of our code.

An extension of this trick is using a separate datatype which holds the list of actions we used to generate the Cache as well as the Cache.

data ArbitraryCache k v = ArbitraryCache [CacheAction k v] (Cache k v)
    deriving (Show)

When a test fails, we can then log the list of actions which got us into the invalid state – very useful for debugging. Furthermore, we can implement the shrink method in order to try to reach a similar invalid state using less actions.

The SmallInt trick

Now, note that our Arbitrary instance is for Cache k v, i.e., we haven’t chosen yet what we want to have as k and v for our tests. In this case v is not so important, but the choice of k is important.

We want to cover all corner cases, and this includes ensuring that we cover collisions. If we use String or Int as key type k, collisions are very unlikely due to the high cardinality of both types. Since we are using a hash-based container underneath, hash collisions must also be covered.

We can solve both problems by introducing a newtype which restricts the cardinality of Int, and uses a “worse” (in the traditional sense) hashing method.

newtype SmallInt = SmallInt Int
    deriving (Eq, Ord, Show)

instance QC.Arbitrary SmallInt where
    arbitrary = SmallInt <$> QC.choose (1, 100)

instance Hashable SmallInt where
    hashWithSalt salt (SmallInt x) = (salt + x) `mod` 10

Monadic QuickCheck

Now let’s mix QuickCheck with monadic code. We will be testing the Handle interface to our cache. This interface consists of a single method:

cached
    :: (Hashable k, Ord k)
    => Handle k v -> k -> IO v -> IO v

We will write a property to ensure our cache retains and evicts the right key-value pairs. It takes two arguments: the capacity of the LRU Cache (we use a SmallInt in order to get more evictions), and a list of key-value pairs we will insert using cached (we use SmallInt so we will cover collisions).

historic
    :: SmallInt              -- ^ Capacity
    -> [(SmallInt, String)]  -- ^ Key-value pairs
    -> QC.Property           -- ^ Property
historic (SmallInt capacity) pairs = QC.monadicIO $ do

QC.run is used to lift IO code into the QuickCheck property monad PropertyM – so it is a bit like a more concrete version of liftIO. I prefer it here over liftIO because it makes it a bit more clear what is going on.

    h <- QC.run $ newHandle capacity

We will fold (foldM_) over the pairs we need to insert. The state we pass in this foldM_ is the history of pairs we previously inserted. By building this up again using :, we ensure history contains a recent-first list, which is very convenient.

Inside every step, we call cached. By using an IORef in the code where we would usually actually “load” the value v, we can communicate whether or not the value was already in the cache. If it was already in the cache, the write will not be executed, so the IORef will still be set to False. We store that result in wasInCache.

In order to verify this result, we reconstruct a set of the N most recent keys. We can easily do this using the list of recent-first key-value pairs we have in history.

    foldM_ (step h) [] pairs
  where
    step h history (k, v) = do
        wasInCacheRef <- QC.run $ newIORef True
        _             <- QC.run $ cached h k $ do
            writeIORef wasInCacheRef False
            return v
        wasInCache    <- QC.run $ readIORef wasInCacheRef
        let recentKeys = nMostRecentKeys capacity S.empty history
        QC.assert (S.member k recentKeys == wasInCache)
        return ((k, v) : history)

This is our auxiliary function to calculate the N most recent keys, given a recent-first key-value pair list.

nMostRecentKeys :: Ord k => Int -> S.Set k -> [(k, v)] -> S.Set k
nMostRecentKeys _ keys [] = keys
nMostRecentKeys n keys ((k, _) : history)
    | S.size keys >= n    = keys
    | otherwise           =
        nMostRecentKeys n (S.insert k keys) history

This test did not cover checking that the values in the cache are correct, but only ensures it retains the correct key-value pairs. This is a conscious decision: I think the retaining/evicting part of the LRU Cache code was the most tricky, so we should prioritize testing that.

Tying everything up

Lastly, we have our tests :: TestTree. It is not much more than an index of tests in the module. We use testCase to pass HUnit tests to the framework, and testProperty for QuickCheck properties.

Note that I usually tend to put these at the top of the module, but here I put it at the bottom of the blogpost for easier reading.

tests :: TestTree
tests = testGroup "Data.SimpleLruCache"
    [ testCase "testCache01" testCache01
    , testCase "testCache02" testCache02

    , testProperty "size == HashPSQ.size"
        (sizeMatches :: Cache SmallInt String -> Bool)
    , testProperty "priorities < next priority"
        (prioritiesSmallerThanNext :: Cache SmallInt String -> Bool)
    , testProperty "size < capacity"
        (sizeSmallerThanCapacity :: Cache SmallInt String -> Bool)

    , testProperty "historic" historic
    ]

The last thing we need is a main function for cabal test to invoke. I usually put this in something like tests/Main.hs. If you use the scheme which I described above, this file should look very neat:

module Main where

import           Test.Tasty (defaultMain, testGroup)

import qualified AcmeCompany.AwesomeProduct.Database.Tests
import qualified AcmeCompany.AwesomeProduct.Importer.Csv.Tests
import qualified AcmeCompany.AwesomeProduct.Importer.Tests
import qualified Data.SimpleLruCache.Tests

main :: IO ()
main = defaultMain $ testGroup "Tests"
    [ AcmeCompany.AwesomeProduct.Database.Tests.tests
    , AcmeCompany.AwesomeProduct.Importer.Csv.Tests.tests
    , AcmeCompany.AwesomeProduct.Importer.Tests.tests
    , Data.SimpleLruCache.Tests.tests
    ]

If you are still hungry for more Haskell testing, I would recommend looking into Haskell program coverage for mission-critical modules.

Special thanks to Alex Sayers, who beat everyone’s expectations when he managed to stay sober for just long enough to proofread this blogpost.

Stuff

Links