Hacking Hakyll

A bit on Hakyll internals for interested hackers
Published on March 30, 2011 under the tag haskell

What is this

I’ve recently released Hakyll 3, and it seems to have reached a certain form of stability now. The documentation is getting better, especially after Benedict Eastaugh ported the examples from Hakyll 2.x to 3.

However, recently it was brought to my attention that hacking on Hakyll is quite difficult – all source code is relatively clean and well-commented but there is no global overview available anywhere. This is what I hope to fix with this blogpost: I will attempt to give a whirlwind tour of the Hakyll internals, and a high-level overview of how it all works together.

Core/Web

The Hakyll module namespace is divided into two large groups: Hakyll.Core and Hakyll.Web.

Hakyll is essentially a compilation system (from this perspective, it looks a little like the Makefiles we all love and hate). On the other hand, it is mostly aimed at creating static websites. So, the distinction is pretty simple:

An important constraint of this I imposed onto myself is that a module located in Hakyll.Core can absolutely never depend on a module in Hakyll.Web.

This is what Hakyll.Core looks like from a high-level point of view (the arrows represent “using” relations):

Hakyll.Core

Hakyll.Core.Compiler

Apart from having the coolest name, this module is probably also the central module in Hakyll (for future reference, when I say Module, I usually mean Module and Module.*).

It exposes the Compiler a b arrow, which is, from a high-level point of view composed out of two things:

Hakyll.Core.Run

This module can be called “the runtime system” of Hakyll. It is the module which actually runs a Compiler. Running happens in two phases:

Other modules in Hakyll.Core

Those are not the only modules in Hakyll.Core. A quick listing of some other interesting modules:

Hakyll.Web

The Hakyll.Web modules are more loosely coupeled, they all provide some specific feature which helps the user in creating static websites. For example, the Hakyll.Web.CompressCss module provides CSS compression.

The Hakyll.Web.Page and Hakyll.Web.Template modules are more tightly integrated and a little more tricky (more in the next section).

I think most hacking opportunities lay in Hakyll.Web: there’s probably a whole range of filter-like compilers I haven’t thought of yet.

The life of page

I want to finish this blogpost by shedding some more light on the process of rendering a page (it’s probably the most commonly used feature of Hakyll).

Usually, a page is compiled using pageCompiler. This is nothing more but a “sane default”, with a pretty simple definition:

pageCompiler :: Compiler Resource (Page String)
pageCompiler =   readPageCompiler
             >>> addDefaultFields
             >>> arr applySelf
             >>> pageRenderPandoc

The first step is readPageCompiler – this is an arrow defined as:

readPageCompiler :: Compiler Resource (Page String)
readPageCompiler =   getResourceString
                 >>> arr readPage

This makes sense – getResourceString simply gets the resource contents (i.e. the file contents) as a String, and readPage is a pure function which parses a String into a Page. If you want to have a certain text transformation on the entire file, you need to replace readPageCompiler by a custom arrow (which will probably look like getResourceString >>> custom >>> arr readPage).

The second step is addDefaultFields. After parsing the Page, it knows all metadata fields specified in the actual file. But there’s other metadata we want available as well: the URL of the page ($$url$$), the source path ($$path$$), … all these fields are added here.

We’re going to fill up beatiful templates with these fields later, but we also want to be able to use, say, $$url$$ in the page itself. In order to accomplish this, we use the applySelf function, which applies a page as a template to itself.

After all this is done, we use pageRenderPandoc to render the page to HTML. pageRenderPandoc, much like pageCompiler is a sane default, it could be defined as:

pageRenderPandoc :: Compiler (Page String) (Page String)
pageRenderPandoc =   pageReadPandoc
                 >>> pageWritePandoc

The actual definition is a little different, but certainly not harder. Again, the point is that it’s a simple pipeline of some other arrows. If you want to perform custom transformations on the pandoc document (this is pretty awesome, since you can edit documents easily using a proper language and not just regexes), it goes here: pageReadPandoc >>> custom >>> pageWritePandoc. For more information on defining these kind of pipelines, you should have a look at what Hakyll.Web.Pandoc provides.

I hope this gives some sort of idea on how to start hacking if you want to extend Hakyll with a certain feature. But then again, do not hesitate to poke me if you’re not sure, I’d be glad to help you get started.

ce0f13b2-4a83-4c1c-b2b9-b6d18f4ee6d2