Controlling Chromium in Haskell

Driving Chromium using its built-in WebSockets server
Published on September 1, 2013 under the tag haskell

Introduction

Chromium logo

The chromium browser has a built-in WebSockets server which can be used to control it. I first heard about this through an issue with my WebSockets project.

Since I am currently rewriting the WebSockets library to have it use io-streams, I wanted to give this a try and made it into a blogpost. As a sidenote – the port is going great, io-streams is a very nice library and I managed to solve a whole lot of issues in the library (mostly exception handling stuff).

Note that this code uses a yet unreleased version of the WebSockets library – you can get it from the github repo though, if you want to: check out the io-streams branch.

This file is written in literate Haskell so we will have a few boilerplate declarations and imports first:

{-# LANGUAGE OverloadedStrings #-}
module Main
   ( main
   ) where
import           Control.Applicative  ((<$>))
import           Control.Monad        (forever, mzero)
import           Control.Monad.Trans  (liftIO)
import           Data.Aeson           (FromJSON (..), ToJSON (..), (.:), (.=))
import qualified Data.Aeson           as A
import qualified Data.Map             as M
import           Data.Maybe           (fromMaybe)
import qualified Data.Text.IO         as T
import qualified Data.Vector          as V
import qualified Network.HTTP.Conduit as Http
import qualified Network.URI          as Uri
import qualified Network.WebSockets   as WS

Locating the WebSockets server

To enable the WebSockets server, chrome must be launched with the --remote-debugging-port flag enabled:

chromium --remote-debugging-port=9160

Now, in order to connect to the built-in WebSockets server, we have to know its URI and this requires some extra code. We will first use cURL to demonstrate this:

$ curl localhost:9160/json
[ {
   "description": "",
   ...
   "webSocketDebuggerUrl": "ws://localhost:9160/devtools/page/8937C189-5CED-8E34-E26E-A389641FE8FF"
} ]

That webSocketDebuggerUrl is the one we want. Let us write some Haskell code to automate obtaining it.

We create a datatype to hold this info. Currently, we are only interested in a single field:

data ChromiumPageInfo = ChromiumPageInfo
    { chromiumDebuggerUrl :: String
    } deriving (Show)

We will use aeson to parse the JSON. We need a FromJSON instance for our datatype:

instance FromJSON ChromiumPageInfo where
    parseJSON (A.Object obj) =
        ChromiumPageInfo <$> obj .: "webSocketDebuggerUrl"
    parseJSON _              = mzero

The http-conduit library can be used to do what we just did using curl:

getChromiumPageInfo :: Int -> IO [ChromiumPageInfo]
getChromiumPageInfo port = do
    response <- Http.withManager $ \manager -> Http.httpLbs request manager
    case A.decode (Http.responseBody response) of
        Nothing -> error "getChromiumPageInfo: Parse error"
        Just ci -> return ci
  where
    request = Http.def
        { Http.host = "localhost"
        , Http.port = port
        , Http.path = "/json"
        }

One remaining issue is that the JSON contains the WebSockets URL as a single string, and the WebSockets library expects a (host, port, path) triple. Luckily for us, the standard network library has a Network.URI module which makes this task pretty simple:

parseUri :: String -> (String, Int, String)
parseUri uri = fromMaybe (error "parseUri: Invalid URI") $ do
    u    <- Uri.parseURI uri
    auth <- Uri.uriAuthority u
    let port = case Uri.uriPort auth of (':' : str) -> read str; _ -> 80
    return (Uri.uriRegName auth, port, Uri.uriPath u)

Once we are connected to Chromium, we will be sending commands to it. A simple Haskell datatype can be used to model these commands:

data Command = Command
    { commandId     :: Int
    , commandMethod :: String
    , commandParams :: [(String, String)]
    } deriving (Show)

We use the aeson library again here, to convert these commands into JSON data:

instance ToJSON Command where
    toJSON cmd = A.object
        [ "id"     .= commandId cmd
        , "method" .= commandMethod cmd
        , "params" .= M.fromList (commandParams cmd)
        ]

What is left is a simple main function to tie it all together.

main :: IO ()
main = do
    (ci : _) <- getChromiumPageInfo 9160
    let (host, port, path) = parseUri (chromiumDebuggerUrl ci)
    WS.runClient host port path $ \conn -> do
        -- Send an example command
        WS.sendTextData conn $ A.encode $ Command
            { commandId     = 1
            , commandMethod = "Page.navigate"
            , commandParams = [("url", "http://haskell.org")]
            }

        -- Print output to the screen
        forever $ do
            msg <- WS.receiveData conn
            liftIO $ T.putStrLn msg

Conclusion

This is a very simple example of what you can do with Haskell and Chromium, but I think there are some pretty interesting opportunities to be found here. For example, I wonder if it would be possible to create a simple Selenium-like framework for web application testing in Haskell.

Thanks to Gilles J. for a quick proofread and Ilya Grigorik for this inspiring blogpost!

ce0f13b2-4a83-4c1c-b2b9-b6d18f4ee6d2