A bit of ST and a bit of IO: Designing a DCPU-16 emulator
Published on April 12, 2012 under the tag haskell
When Notch announced that his upcoming game 0x10c would feature an in-game 16 bit CPU, the internet went nuts about it.
At this point, it seems that every self-respecting hacker has at least already implemented an optimizing assembler, a dissassembler, an emulator, a debugger, or even an operating system or an LLVM backend.
I obviously couldn’t lag behind and took the opportunity to think about the design of such an emulator in Haskell. This blogpost is the result of my search, and my experiments with different designs until I arrived at something which had all the nice properties I desired.
The full implementation of this design can be found here.
The core of the problem: load and store
When designing a system, it’s often a good idea to start from a minimal but functional core and work your way up from there – this is also the approach I have taken here.
We can create such a core system by defining the state of the emulator as a
datatype with a few primitive operations on it. The state of the emulator can be
represented as a Memory
datatype. It’s a tiny wrapper around a mutable array,
and has three basic operations:
module Memory where
data Address
= Pc
| Sp
| Ram Word16
-- And more...
data Memory s = Memory -- Exact definition omitted
new :: ST s (Memory s)
load :: Memory s -> Address -> ST s Word16
store :: Memory s -> Address -> Word16 -> ST s ()
Using the above Address
datatype, we’re able to access and modify all of the
values the emulator needs using two simple operations: load
and store
. The
fact that we can access and modify, for example, the stack pointer (Sp
) in the
same way as we would modify a value in the RAM (Ram 0x1000
) simplifies the
implementation of the actual emulator. As an example, think of code like this:
SP, I
SET [0x1000], SP SET
This all runs in the ST monad. This monad is perhaps a bit of a strange beast,
since it allows destructive updates: this means we can update words in the RAM
of the CPU really fast. However, the result remains pure nonetheless: unlike
the IO monad, code written in the ST monad can only modify state internal to the
monad. This is guaranteed by the type system using the Rank2Types
extension.
This is great because it allows us to have a really fast emulator, of which we can trivially say it is deterministic.
A necessary abstraction
However, while the current specification does not include keyboard input and
video output, this will certainly be added in the future. If we want to support
that in our emulator, does it mean that we will eventually be forced to run our
code in the IO
monad, and that our brave efforts to build a deterministic
emulator have been in vain?
Not exactly. Using a typeclass, it is possible to build an extremely useful abstraction layer which helps us with this problem.
module Emulator.Monad where
class Monad m => MonadEmulator m where
load :: Address -> m Word16
store :: Address -> Word16 -> m ()
We can now implement most of the emulator with just the type
MonadEmulator m => m ()
, since we really only need load
and store
.
The following (overly simplified and incorrect) snippet illustrates that fact.
module Emulator where
import Emulator.Monad
step :: MonadEmulator m => m ()
= do
step -- Load the next instruction (at PC) and execute it
<- load Pc
pc <- load (Ram pc)
instr Pc (pc + 1)
store execute instr
A deterministic implementation
We can write a simple, nice and pure implementation of this typeclass by using
ST
, as we previously discussed. We need access to the Memory
, and for this
purpose we can simply wrap a ReaderT
around ST
:
module Emulator.Monad.ST where
import Emulator.Monad
import Memory (Memory)
import qualified Memory as Memory
newtype STEmulator s a = STEmulator (ReaderT (Memory s) (ST s) a)
deriving (Monad)
instance MonadEmulator (STEmulator s) where
= STEmulator $ do
load address <- ask
mem $ Memory.load mem address
lift = STEmulator $ do
store address word <- ask
mem $ Memory.store mem address word
lift
runSTEmulator :: (forall s. STEmulator s a) -> a
-- Implementation uses runST to produce a pure result:
-- runST :: (forall s. ST s a) -> a
And a bit of IO
As briefly discussed before, we want to be able to add keyboard input and video output. I’ll focus on video output for this blogpost, since it’s a little easier to explain. The specification for video output is not yet released, but it was possible to deduce a lot from some screenshots.
The video memory starts at 0x8000
, and when we store characters at these
addresses, they are displayed on the video terminal. Note that we don’t need to
extend the MonadEmulator
typeclass for this purpose: we already support
writing values to 0x8000 + i
, because we can reach those addresses through the
regular store
operation, discussed above!
But actually displaying the video terminal is something we can’t do with our
STEmulator
– let’s add an IOEmulator
as well.
The stToIO
function from Control.Monad.ST
allows us to reuse everything in
the Memory
module.
stToIO :: ST RealWorld a -> IO a
Note that this is not an unsafe function: converting ST
to IO
is always
safe this way (converting IO
to ST
, however, is tricky business, but this is
not needed for our emulator).
Almost exactly like our STEmulator
is a ReaderT
wrapper around ST
, our
IOEmulator
starts out like a ReaderT
wrapper around IO
.
module Emulator.Monad.IO where
import Emulator.Monad
import Memory (Memory)
import qualified Memory as Memory
newtype IOEmulator a = IOEmulator (ReaderT (Memory RealWorld) IO a)
deriving (Monad, MonadIO)
instance MonadEmulator IOEmulator where
= IOEmulator $ do
load address <- ask
mem $ stToIO $ Memory.load mem address
lift = IOEmulator $ do
store address word <- ask
mem $ stToIO $ Memory.store mem address word
lift
runIOEmulator :: IOEmulator a -> IO a
-- Implementation trivial, omitted
If we now have some code for an emulator, we can run it in two ways:
-- Load a program, execute 1000 instructions, return the value in the RAM at
-- position 0x1000.
test :: MonadEmulator m => m Word16
= do
test 0xa461, 0x7001, 0x8403 {-# and more... #-}]
loadProgram [1000 step
replicateM_ Ram 0x1000)
load (
t1 :: Word16
= runSTEmulator test
t1
t2 :: IO Word16
= runIOEmulator test t2
Let’s not stop here yet, because with our current version of IOEmulator
, all
we’ve done is create an impure version of STEmulator
, without any extra
features!
Adding the desired features is relatively easy at this point: suppose we want to
add support for the video terminal. We just modify the store
implementation of
IOEmulator
a little:
= IOEmulator $ do
store address word <- ask
mem $ stToIO $ Memory.store mem address word
lift $ video address word -- New line added here lift
And a video terminal implementation!
video :: Address -> Word16 -> IO ()
Ram address) word
video (| address < videoStart = return ()
| address >= videoEnd = return ()
| otherwise = do
-- Calculate row, column, character code and draw!
return ()
where
= 0x8000
videoStart = videoStart + rows * columns -- Unknown yet?
videoEnd = return () video _ _
Conclusion
ST
is a wonderful tool for problems that require (local) destructive updates
for performance reasons 1. Additionally, we can easily convert
it to IO
, which allows us to simply pick one of the two using an abstraction
layer: we can have deterministic results for tests, proper keyboard/video
support, and the entire implementation of the actual CPU is shared code.
My thanks to Andy Georges and Toon Willems for proofreading and some corrections.
Although other, purely functional representations are possible as well, of course. These are however unlikely to match the speed of this solution.↩︎