Hacking Hakyll
Published on March 30, 2011 under the tag haskell
What is this
I’ve recently released Hakyll 3, and it seems to have reached a certain form of stability now. The documentation is getting better, especially after Benedict Eastaugh ported the examples from Hakyll 2.x to 3.
However, recently it was brought to my attention that hacking on Hakyll is quite difficult – all source code is relatively clean and well-commented but there is no global overview available anywhere. This is what I hope to fix with this blogpost: I will attempt to give a whirlwind tour of the Hakyll internals, and a high-level overview of how it all works together.
Core/Web
The Hakyll module namespace is divided into two large groups: Hakyll.Core and
Hakyll.Web.
Hakyll is essentially a compilation system (from this perspective, it looks a little like the Makefiles we all love and hate). On the other hand, it is mostly aimed at creating static websites. So, the distinction is pretty simple:
Hakyll.Core: everything related to the compilation system/dependency; managementHakyll.Web: everything related to web sites.
An important constraint of this I imposed onto myself is that a module located
in Hakyll.Core can absolutely never depend on a module in Hakyll.Web.
This is what Hakyll.Core looks like from a high-level point of view (the
arrows represent “using” relations):
Hakyll.Core.Compiler
Apart from having the coolest name, this module is probably also the central
module in Hakyll (for future reference, when I say Module, I usually mean
Module and Module.*).
It exposes the Compiler a b arrow, which is, from a high-level point of view
composed out of two things:
- A (Kleisli) arrow from
atob: this what will actually produce whatever you want. This could, for example, produce a website page from a markdown file. - A function producing a dependency set, which lists all dependencies used in the arrow mentioned above.
Hakyll.Core.Run
This module can be called “the runtime system” of Hakyll. It is the module which
actually runs a Compiler. Running happens in two phases:
- We run the dependency functions for all compilers. From this, we can infer a dependency graph.
- Using this information, we now run the necessary compiler arrows (i.e. the items which are out-of-date).
Other modules in Hakyll.Core
Those are not the only modules in Hakyll.Core. A quick listing of some other
interesting modules:
Hakyll.Core.Identifierexports anIdentifiertype which is used to globally identify different compilers;Hakyll.Core.DirectedGraphimplements a data structure used for dependency analysis. However, in the future, this might move toHakyll.Core.DependencyAnalyzeror even a separate package;Hakyll.Core.ResourceProviderprovides an interface for basically reading files from the disk. But it is written as an abstract data type so one could add, for example, a backend which allows Hakyll to fetch data from a SQL database;Hakyll.Core.Rulesexports a monad for the rules DSL. This DSL is quite simple, it’s basically aWritermonad which yields a list of compilers;Hakyll.Core.Routescontains the types and functions used for routing. This is a very simple module for now, altough extensions have been suggested;
Hakyll.Core.Configurationexports the globalHakyllConfiguration. This is a relatively small configuration data type, which I think is a good thing;Hakyll.Core.Storeimplements a simple key/value store which allows you to save types instantiatingBinaryin the_cachedirectory.
Hakyll.Web
The Hakyll.Web modules are more loosely coupeled, they all provide some
specific feature which helps the user in creating static websites. For example,
the Hakyll.Web.CompressCss module provides CSS compression.
The Hakyll.Web.Page and Hakyll.Web.Template modules are more tightly
integrated and a little more tricky (more in the next section).
I think most hacking opportunities lay in Hakyll.Web: there’s probably a whole
range of filter-like compilers I haven’t thought of yet.
The life of page
I want to finish this blogpost by shedding some more light on the process of rendering a page (it’s probably the most commonly used feature of Hakyll).
Usually, a page is compiled using pageCompiler. This is nothing more but a
“sane default”, with a pretty simple definition:
pageCompiler :: Compiler Resource (Page String)
pageCompiler = readPageCompiler
>>> addDefaultFields
>>> arr applySelf
>>> pageRenderPandocThe first step is readPageCompiler – this is an arrow defined as:
readPageCompiler :: Compiler Resource (Page String)
readPageCompiler = getResourceString
>>> arr readPageThis makes sense – getResourceString simply gets the resource contents (i.e.
the file contents) as a String, and readPage is a pure function which parses
a String into a Page. If you want to have a certain text transformation on
the entire file, you need to replace readPageCompiler by a custom arrow (which
will probably look like getResourceString >>> custom >>> arr readPage).
The second step is addDefaultFields. After parsing the Page, it knows all
metadata fields specified in the actual file. But there’s other metadata we want
available as well: the URL of the page ($$url$$), the source path
($$path$$), … all these fields are added here.
We’re going to fill up beatiful templates with these fields later, but we also
want to be able to use, say, $$url$$ in the page itself. In order to
accomplish this, we use the applySelf function, which applies a page as a
template to itself.
After all this is done, we use pageRenderPandoc to render the page to HTML.
pageRenderPandoc, much like pageCompiler is a sane default, it could be
defined as:
pageRenderPandoc :: Compiler (Page String) (Page String)
pageRenderPandoc = pageReadPandoc
>>> pageWritePandocThe actual definition is a little different, but certainly not harder. Again,
the point is that it’s a simple pipeline of some other arrows. If you want to
perform custom transformations on the pandoc document (this is pretty awesome,
since you can edit documents easily using a proper language and not just
regexes), it goes here: pageReadPandoc >>> custom >>> pageWritePandoc. For
more information on defining these kind of pipelines, you should have a look at
what Hakyll.Web.Pandoc provides.
I hope this gives some sort of idea on how to start hacking if you want to extend Hakyll with a certain feature. But then again, do not hesitate to poke me if you’re not sure, I’d be glad to help you get started.