A bit of ST and a bit of IO: Designing a DCPU-16 emulator

On monadic typeclasses, pure and impure state monads
Posted in: haskell.

When Notch announced that his upcoming game 0x10c would feature an in-game 16 bit CPU, the internet went nuts about it.

At this point, it seems that every self-respecting hacker has at least already implemented an optimizing assembler, a dissassembler, an emulator, a debugger, or even an operating system or an LLVM backend.

I obviously couldn’t lag behind and took the opportunity to think about the design of such an emulator in Haskell. This blogpost is the result of my search, and my experiments with different designs until I arrived at something which had all the nice properties I desired.

The full implementation of this design can be found here.

The core of the problem: load and store

When designing a system, it’s often a good idea to start from a minimal but functional core and work your way up from there – this is also the approach I have taken here.

We can create such a core system by defining the state of the emulator as a datatype with a few primitive operations on it. The state of the emulator can be represented as a Memory datatype. It’s a tiny wrapper around a mutable array, and has three basic operations:

module Memory where
data Address
= Pc
| Sp
| Ram Word16
-- And more...
data Memory s = Memory -- Exact definition omitted
new :: ST s (Memory s)
load :: Memory s -> Address -> ST s Word16
store :: Memory s -> Address -> Word16 -> ST s ()

Using the above Address datatype, we’re able to access and modify all of the values the emulator needs using two simple operations: load and store. The fact that we can access and modify, for example, the stack pointer (Sp) in the same way as we would modify a value in the RAM (Ram 0x1000) simplifies the implementation of the actual emulator. As an example, think of code like this:

SET [0x1000], SP

This all runs in the ST monad. This monad is perhaps a bit of a strange beast, since it allows destructive updates: this means we can update words in the RAM of the CPU really fast. However, the result remains pure nonetheless: unlike the IO monad, code written in the ST monad can only modify state internal to the monad. This is guaranteed by the type system using the Rank2Types extension.

This is great because it allows us to have a really fast emulator, of which we can trivially say it is deterministic.

A necessary abstraction

However, while the current specification does not include keyboard input and video output, this will certainly be added in the future. If we want to support that in our emulator, does it mean that we will eventually be forced to run our code in the IO monad, and that our brave efforts to build a deterministic emulator have been in vain?

Not exactly. Using a typeclass, it is possible to build an extremely useful abstraction layer which helps us with this problem.

module Emulator.Monad where
class Monad m => MonadEmulator m where
load :: Address -> m Word16
store :: Address -> Word16 -> m ()

We can now implement most of the emulator with just the type MonadEmulator m => m (), since we really only need load and store.

The following (overly simplified and incorrect) snippet illustrates that fact.

module Emulator where
import Emulator.Monad
step :: MonadEmulator m => m ()
step = do
-- Load the next instruction (at PC) and execute it
pc <- load Pc
instr <- load (Ram pc)
store Pc (pc + 1)
execute instr

A deterministic implementation

We can write a simple, nice and pure implementation of this typeclass by using ST, as we previously discussed. We need access to the Memory, and for this purpose we can simply wrap a ReaderT around ST:

module Emulator.Monad.ST where
import Emulator.Monad
import Memory (Memory)
import qualified Memory as Memory
newtype STEmulator s a = STEmulator (ReaderT (Memory s) (ST s) a)
deriving (Monad)
instance MonadEmulator (STEmulator s) where
load address = STEmulator $ do
mem <- ask
lift $ Memory.load mem address
store address word = STEmulator $ do
mem <- ask
lift $ Memory.store mem address word
runSTEmulator :: (forall s. STEmulator s a) -> a
-- Implementation uses runST to produce a pure result:
-- runST :: (forall s. ST s a) -> a

And a bit of IO

As briefly discussed before, we want to be able to add keyboard input and video output. I’ll focus on video output for this blogpost, since it’s a little easier to explain. The specification for video output is not yet released, but it was possible to deduce a lot from some screenshots.

The video memory starts at 0x8000, and when we store characters at these addresses, they are displayed on the video terminal. Note that we don’t need to extend the MonadEmulator typeclass for this purpose: we already support writing values to 0x8000 + i, because we can reach those addresses through the regular store operation, discussed above!

But actually displaying the video terminal is something we can’t do with our STEmulator – let’s add an IOEmulator as well.

The stToIO function from Control.Monad.ST allows us to reuse everything in the Memory module.

stToIO :: ST RealWorld a -> IO a

Note that this is not an unsafe function: converting ST to IO is always safe this way (converting IO to ST, however, is tricky business, but this is not needed for our emulator).

Almost exactly like our STEmulator is a ReaderT wrapper around ST, our IOEmulator starts out like a ReaderT wrapper around IO.

module Emulator.Monad.IO where
import Emulator.Monad
import Memory (Memory)
import qualified Memory as Memory
newtype IOEmulator a = IOEmulator (ReaderT (Memory RealWorld) IO a)
deriving (Monad, MonadIO)
instance MonadEmulator IOEmulator where
load address = IOEmulator $ do
mem <- ask
lift $ stToIO $ Memory.load mem address
store address word = IOEmulator $ do
mem <- ask
lift $ stToIO $ Memory.store mem address word
runIOEmulator :: IOEmulator a -> IO a
-- Implementation trivial, omitted

If we now have some code for an emulator, we can run it in two ways:

-- Load a program, execute 1000 instructions, return the value in the RAM at
-- position 0x1000.
test :: MonadEmulator m => m Word16
test = do
loadProgram [0xa461, 0x7001, 0x8403 {-# and more... #-}]
replicateM_ 1000 step
load (Ram 0x1000)
t1 :: Word16
t1 = runSTEmulator test
t2 :: IO Word16
t2 = runIOEmulator test

Let’s not stop here yet, because with our current version of IOEmulator, all we’ve done is create an impure version of STEmulator, without any extra features!

Adding the desired features is relatively easy at this point: suppose we want to add support for the video terminal. We just modify the store implementation of IOEmulator a little:

store address word = IOEmulator $ do
mem <- ask
lift $ stToIO $ Memory.store mem address word
lift $ video address word -- New line added here

And a video terminal implementation!

video :: Address -> Word16 -> IO ()
video (Ram address) word
| address < videoStart = return ()
| address >= videoEnd = return ()
| otherwise = do
-- Calculate row, column, character code and draw!
return ()
videoStart = 0x8000
videoEnd = videoStart + rows * columns -- Unknown yet?
video _ _ = return ()


ST is a wonderful tool for problems that require (local) destructive updates for performance reasons 1. Additionally, we can easily convert it to IO, which allows us to simply pick one of the two using an abstraction layer: we can have deterministic results for tests, proper keyboard/video support, and the entire implementation of the actual CPU is shared code.

My thanks to Andy Georges and Toon Willems for proofreading and some corrections.

  1. Although other, purely functional representations are possible as well, of course. These are however unlikely to match the speed of this solution.