Lazy Layout
Published on July 22, 2023 under the tag haskell
Prelude
This blogpost is written in reproducible Literate Haskell, so we need some imports first.
Show me the exact imports…
{-# LANGUAGE DeriveFoldable #-}
{-# LANGUAGE DeriveFunctor #-}
{-# LANGUAGE DeriveTraversable #-}
module Main where
import qualified Codec.Picture as JP
import qualified Codec.Picture.Types as JP
import Control.Monad.ST (runST)
import Data.Bool (bool)
import Data.Foldable (for_)
import Data.List (isSuffixOf, partition)
import Data.List.NonEmpty (NonEmpty (..))
import System.Environment (getArgs)
import System.Random (RandomGen, newStdGen)
import System.Random.Stateful (randomM, runStateGen)
import Text.Read (readMaybe)
Introduction
Haskell is not my only interest — I have also been quite into photography for the past decade. Recently, I was considering moving some of the stuff I have on various social networks to a self-hosted solution.
Tumblr in particular has a fairly nice way to do photo sets, where these can be organised in rows and columns. I wanted to see if I could mimic this in a recursive way, where rows and columns can be subdivided further.
One important constraint is that is that we want to present each picture as the photographer envisioned it: concretely, we can scale it up or down (preserving the aspect ratio), but we can’t crop out parts.
Order is also important in photo essays, so we want the author to specify the photo collage in a declarative way by indicating if horizontal (H) or vertical (V) subdivision should be used, creating a tree. For example:
H img1.jpg
(V img2.jpg
(H img3.jpg
img4.jpg))
The program should then determine the exact size and position of each image, so that we get a fully filled rectangle without any borders or filler:
We will use a technique called circular programming that builds on Haskell’s
laziness to achieve this in an elegant way.
These days, it is maybe more commonly referred to as the repmin
problem.
This was first described by Richard S. Bird in “Using circular programs to
eliminate multiple traversals of data” in 1984, which predates Haskell!
Give me a refresher on repmin
please…
Interlude: repmin
Given a simple tree type:
data Tree a
= Leaf a
| Branch (Tree a) (Tree a)
We would like to write a function repmin
which replaces each value in each
Leaf
with the global minimum in the tree. This is easily done by first
finding the global minimum, and then replacing it everywhere:
repmin_2pass :: Ord a => Tree a -> Tree a
=
repmin_2pass t let globalmin = findmin t in rep globalmin t
where
Leaf x) = x
findmin (Branch l r) = min (findmin l) (findmin r)
findmin (
Leaf _) = Leaf x
rep x (Branch l r) = Branch (rep x l) (rep x r) rep x (
However, this requires two passes over the tree. We can do better by using Haskell’s laziness:
repmin_1pass :: Ord a => Tree a -> Tree a
= t'
repmin_1pass t where
= repmin t
(t', globalmin)
Leaf x) = (Leaf globalmin, x)
repmin (Branch l r) =
repmin (Branch l' r', min lmin rmin)
(where
= repmin l
(l', lmin) = repmin r (r', rmin)
There is an apparent circular dependency here, where repmin
uses globalmin
,
but also computes it. This is possible because we never need to evaluate
globalmin
– it can be stored as a thunk.
For more details, please see the very accessible original paper
(https://doi.org/10.1007/BF00264249
).
Starting out with some types
We start out by giving an elegant algebraic definition for a collage:
data Collage a
= Singleton a
| Horizontal (Collage a) (Collage a)
| Vertical (Collage a) (Collage a)
deriving (Foldable, Functor, Show, Traversable)
We use a higher-order type, which allows us to work with collages of filepaths
as well as actual images (among other things). deriving
instructs the compiler
to generate some boilerplate code for us. This allows us to concisely read all
images using traverse
:
readCollage :: Collage FilePath
-> IO (Collage (JP.Image JP.PixelRGB8))
= traverse $ \path ->
readCollage >>=
JP.readImage path either fail (pure . JP.convertRGB8)
We use the JuicyPixels library to read and write images. The image type in this library can be a bit verbose since it is parameterised around the colour space.
During the layout pass, we don’t really care about this complexity. We only need the relative sizes of the images and not their content. We introduce a typeclass to do just that:
data Size = Sz
szWidth :: Rational
{ szHeight :: Rational
,deriving (Show)
}
class Sized a where
-- | Retrieve the width and height of an image.
-- Both numbers must be strictly positive.
sizeOf :: a -> Size
We use the Rational
type for width and height.
We are only subdividing the 2D space, so we do not need irrational numbers,
and having infinite precision is convenient.
The instance for the JuicyPixels image type is simple:
instance Sized (JP.Image p) where
= Sz
sizeOf img = fromIntegral $ JP.imageWidth img
{ szWidth = fromIntegral $ JP.imageHeight img
, szHeight }
Laying out two images
If we look at the finished image, it may seem like a hard problem to find a configuration that fits all the images with a correct aspect ratio.
But we can use induction to arrive at a fairly straightforward solution. Given two images, it is always possible to put them beside or above each other by scaling them up or down to match them in height or width respectively. This creates a bigger image. We can then repeat this process until just one image is left.
However, this is quite a naive approach since we end up making way too many copies, and the repeated resizing could also result in a loss of resolution. We would like to compute the entire layout first, and then render everything in one go. Still, we can start by formalising what happens for two images and then work our way up.
We can represent the layout of an individual image by its position and size. We use simple (x, y) coordinates for the position and a scaling factor (relative to the original size of the image) for its size.
data Transform = Tr
trX :: Rational
{ trY :: Rational
, trScale :: Rational
,deriving (Show) }
Armed with the Size
and Transform
types, we have enough to tackle the
“mathy” bits.
Let’s look at the horizontal case first and write a function that computes a transform for both left and right images, as well as the size of the result.
horizontal :: Size -> Size -> (Transform, Transform, Size)
Sz lw lh) (Sz rw rh) = horizontal (
We want to place image l
beside image r
, producing a nicely filled
rectangle. Intuitively, we should be matching the height of both images.
There are different ways to do this — we could shrink the taller image, enlarge the shorter image, or something in between. We make a choice to always shrink the taller image, as this doesn’t compromise the sharpness of the result.
let height = min lh rh
= height / lh
lscale = height / rh
rscale = lscale * lw + rscale * rw in width
With the scale for both left and right images, we can compute the left and right transforms. The left image is simply placed at (0, 0) and we need to offset the right image depending on the (scaled) size of the left image.
Tr 0 0 lscale
( Tr (lscale * lw) 0 rscale
, Sz width height
, )
Composing images vertically is similar, just matching the widths rather than the heights of the two images and moving the bottom image below the top one:
vertical :: Size -> Size -> (Transform, Transform, Size)
Sz tw th) (Sz bw bh) =
vertical (let width = min tw bw
= width / tw
tscale = width / bw
bscale = tscale * th + bscale * bh in
height Tr 0 0 tscale
( Tr 0 (tscale * th) bscale
, Sz width height
, )
Composing transformations
Now that we’ve solved the problem of combining two images and placing them, we can apply this to our tree of images. To this end, we need to compose multiple transformations.
Whenever we think about composing things in Haskell, it’s good to ask ourselves
if what we’re trying to compose is a Monoid. A Monoid needs an identity
element (mempty
) and a Semigroup instance, the latter of which contains just
an associative binary operator (<>
).
The identity transform is just offsetting by 0 and scaling by 1:
instance Monoid Transform where
mempty = Tr 0 0 1
Composing two transformations using <>
requires a bit more thinking.
In this case, a <> b
means applying transformation a
after transformation
b
, so we will need to apply the scale of b
to all parts of a
:
instance Semigroup Transform where
Tr ax ay as <> Tr bx by bs =
Tr (ax * bs + bx) (ay * bs + by) (as * bs)
Readers who are familiar with linear algebra may recognise the connection to a sort of restricted affine 2D transformation matrix.
Proving that the identity holds on mempty
is simple so we will only do one
side, namely a <> mempty == a
.
Proof of Monoid right identity…
Tr ax ay as <> mempty
-- Definition of mempy
= Tr ax ay as <> Tr 0 0 1
-- Definition of <>
= Tr (ax * 1 + 0) (ay * 1 + 0) (as * 1)
-- Cancellative property of 0 over +
-- Identity of 1 over *
= Tr ax ay as
Next, we want to prove that the <>
operator is associative, meaning
a <> (b <> c) == (a <> b) <> c
.
Proof of associativity…
Tr ax ay as <> (Tr bx by bs <> Tr cx cy cs)
-- Definition of <>
= Tr (ax * (bs * cs) + (bx * cs + cx))
* (bs * cs) + (by * cs + cy))
(ay * (bs * cs))
(as
-- Associativity of * and +
= Tr (ax * bs * cs + bx * cs + cx)
* bs * cs + by * cs + cy)
(ay * bs) * cs)
((as
-- Distributivity of * over +
= Tr ((ax * bs + bx) * cs + cx)
* bs + by) * cs + cy)
((ay * bs) * cs)
((as
-- Definition of <>
= (Tr ax ay as <> T b by bs) <> Tr cx cy cs
Now that we have a valid Monoid instance, we can use the higher-level <>
and mempty
concepts in our core layout algorithm, rather than worrying over
details like (x, y) coordinates and scaling factors.
The lazy layout
Our main layoutCollage
function takes the user-specified tree as input,
and annotates each element with a Transform
.
In addition to that, we also produce the Size
of the final image so we can
allocate space for it.
layoutCollage :: Sized img
=> Collage img
-> (Collage (img, Transform), Size)
All layoutCollage
does is call layout
— our circular program — with
the identity transformation:
= layout mempty layoutCollage
layout
takes the size and position of the current element as an argument,
and determines the sizes and positions of a tree recursively.
There are some similarities with the algorithms present in browser engines, where a parent element will first lay out its children, and then use their properties to determine its own width.
However, we will use Haskell’s laziness to do this in a single top-down pass. We provide a declarative algorithm and we leave the decision about what to calculate when — more concretely, propagating the requested sizes of the children back up the tree before constructing the transformations — to the compiler!
layout :: Sized img
=> Transform
-> Collage img
-> (Collage (img, Transform), Size)
Placing a single image is easy, since we are receiving the transformation directly as an argument. We return the requested size — which is just the original size of the image. This is an important detail in making the laziness work here: if we tried to return the final size (including the passed in transformation) rather than the requested size, the computation would diverge (i.e. recurse infinitely).
Singleton img) =
layout trans (Singleton (img, trans), sizeOf img) (
In the recursive case for horizontal composition, we call the horizontal
helper we defined earlier with the left and right image sizes as arguments.
This gives us both transformations, that we can then pass in as arguments to
layout
again – returning the left and right image sizes we pass in to the
horizontal
helper, forming our apparent circle.
Horizontal l r) =
layout trans (Horizontal l' r', size)
(where
= layout (ltrans <> trans) l
(l', lsize) = layout (rtrans <> trans) r
(r', rsize) = horizontal lsize rsize (ltrans, rtrans, size)
The same happens for the vertical case:
Vertical t b) =
layout trans (Vertical t' b', size)
(where
= layout (ttrans <> trans) t
(t', tsize) = layout (btrans <> trans) b
(b', bsize) = vertical tsize bsize (ttrans, btrans, size)
It’s worth thinking about why this works: the intuitive explanation is that we can “delay” the execution of the transformations until the very end of the computation, and then fill them in everywhere. This works since no other parts of the algorithm depend on the transformation, only on the requested sizes.
Conclusion
We’ve written a circular program! Although I was aware of repmin
for a long
time, it’s not a technique I’ve applied often. To me, it is quite interesting
because, compared to repmin
:
- it is easier to explain to a novice why this is useful;
- it is perhaps easier to understand due to the visual aspect; and
- it is an example outside of the realm of parsers and compilers.
The structure is also somewhat different; rather than having a circular step at the top-level function invocation, we have it at every step of the recursion.
Thanks to Francesco Mazzoli and Titouan Vervack reading a draft of this blogpost and suggesting improvements. And thanks to you for reading!
What follows below are a number of relatively small functions that take care of various tasks, included so this can function as a standalone program:
- Actually rendering the layout back to an image
- Parsing a collage description
- Generating random collages
- Putting together the CLI
- Resizing the result
Appendices
Rendering the result
Once we’ve determined the layout, we still need to apply it and draw all the images using the computed transformations. We use simple nearest-neighbour scaling since that is not the focus of this program, you could consider Lánczos interpolation in a real application.
render :: Foldable f
=> Size
-> f (JP.Image JP.PixelRGB8, Transform)
-> JP.Image JP.PixelRGB8
Sz width height) images = runST $ do
render (<- JP.createMutableImage (round width) (round height) black
canvas $ transform canvas
for_ images
JP.unsafeFreezeImage canvaswhere
= JP.PixelRGB8 0 0 0
black
Tr dstX dstY dstS) =
transform canvas (img, round dstX .. round (dstX + dstW) - 1] $ \outX ->
for_ [round dstY .. round (dstY + dstH) - 1] $ \outY ->
for_ [let inX = min (JP.imageWidth img - 1) $ round $
fromIntegral (outX - round dstX) / dstS
= min (JP.imageHeight img - 1) $ round $
inY fromIntegral (outY - round dstY) / dstS in
$ JP.pixelAt img inX inY
JP.writePixel canvas outX outY where
= fromIntegral (JP.imageWidth img) * dstS
dstW = fromIntegral (JP.imageHeight img) * dstS dstH
Parsing a collage description
We use a simple parser to allow the user to specify collages as a string, for example on the command line. This is a natural fit for polish notation as using parentheses in command line arguments is very awkward.
As an example, we want to parse the following arguments:
H img1.jpg V img2.jpg H img3.jpg img4.jpg
Into this tree:
(Horizontal "img1.jpg"
(Vertical "img2.jpg")
(Horizontal "img3.jpg" "img4.jpg"))
We don’t even need a parser library, we can just treat the arguments as a stack:
parseCollage :: [String] -> Maybe (Collage FilePath)
= do
parseCollage args <- parseTree args
(tree, []) pure tree
where
= Nothing
parseTree [] "H" : stack0) = do
parseTree (<- parseTree stack0
(x, stack1) <- parseTree stack1
(y, stack2) pure (Horizontal x y, stack2)
"V" : stack0) = do
parseTree (<- parseTree stack0
(x, stack1) <- parseTree stack1
(y, stack2) pure (Vertical x y, stack2)
: stack0) = Just (Singleton x, stack0) parseTree (x
Generating random collages
In order to test this program, I also added some functionality to generate random collages.
randomCollage :: RandomGen g => NonEmpty a -> g -> (Collage a, g)
= runStateGen gen $ \g -> go g ne
randomCollage ne gen where
The utility rc
picks a random constructor.
= bool Horizontal Vertical <$> randomM g rc g
In our worker function, we keep one item on the side (x
), and randomly decide
if other items will go in the left or right subtree:
:| xs) = do
go g (x <- partition snd <$>
(lts, rts) traverse (\y -> (,) y <$> randomM g) xs
Then, we look at the random partitioning we just created. If they’re both empty, the only thing we can do is create a singleton collage:
case (map fst lts, map fst rts) of
-> pure $ Singleton x ([], [])
If either of them is empty, we put x
in the other partition to ensure we don’t
create invalid empty trees:
: ls), []) -> rc g <*> go g (l :| ls) <*> go g (x :| [])
((l : rs)) -> rc g <*> go g (x :| []) <*> go g (r :| rs) ([], (r
Otherwise, we decide at random which partition x
goes into:
: ls), (r : rs)) -> do
((l <- randomM g
xLeft if xLeft
then rc g <*> go g (x :| l : ls) <*> go g (r :| rs)
else rc g <*> go g (l :| ls) <*> go g (x :| r : rs)
Putting together the CLI
We support two modes of operation for our little CLI:
- Using a user-specified collage using the parser we wrote before.
- Generating a random collage from a number of images.
In both cases, we also take an output file as the first argument, so we know
where we want to write the image to. We also take an optional -fit
flag so
we can resize the final image down to a requested size.
data Command = Command
cmdOut :: FilePath
{ cmdFit :: Maybe Int
, cmdCollage :: CommandCollage
,
}
data CommandCollage
= User (Collage FilePath)
| Random (NonEmpty FilePath)
deriving (Show)
There is some setup to parse the output and a -fit
flag. The important
bit happens in parseCommandCollage
further down.
parseCommand :: [String] -> Maybe Command
= case cmd of
parseCommand cmd -> Nothing
[] "-fit" : num : args) | Just n <- readMaybe num -> do
(<- parseCommand args
cmd' pure cmd' {cmdFit = Just n}
: args) -> Command o Nothing <$> parseCommandCollage args (o
We’ll use R
for a random collage, and H
/V
will be parsed by
parseCollage
.
where
"R" : x : xs) = Just $ Random (x :| xs)
parseCommandCollage (= User <$> parseCollage spec parseCommandCollage spec
Time to put everything together in the main
function. First, we do some
parsing:
main :: IO ()
= do
main <- getArgs
args <- maybe (fail "invalid command") pure $
command
parseCommand args<- case cmdCollage command of
pathsCollage User explicit -> pure explicit
Random paths -> do
<- newStdGen
gen let (random, _) = randomCollage paths gen
pure random
Followed by actually reading in all the images:
<- readCollage pathsCollage imageCollage
This gives us the Collage (JP.Image JP.PixelRGB8)
. We can pass that to our
layout
function and write it to the output, after optionally applying our
fit
:
let (result, box) = case cmdFit command of
Nothing -> layoutCollage imageCollage
Just f -> fit f $ layoutCollage imageCollage
$ JP.ImageRGB8 $ render box result
write (cmdOut command) where
write output| ".jpg" `isSuffixOf` output = JP.saveJpgImage 80 output
| otherwise = JP.savePngImage output
Resizing the result
Most of the time I don’t want to host full-resolution pictures for web viewing. This is an addition I added later on to resize an image down to a requested “long edge” (i.e. a requested maximum width or height, whichever is bigger).
Interestingly I think this can also be done by having an additional parameter
to layout
, and using circular programming once again to link the initial
transformation to the requested size. However, the core algorithm is harder
to understand that way, so I left it as a separate utility:
fit :: Int
-> (Collage (img, Transform), Size)
-> (Collage (img, Transform), Size)
Sz w h)
fit longEdge (collage, | long <= fromIntegral longEdge = (collage, Sz w h)
| otherwise =
fmap (<> tr) <$> collage, Sz (w * scale) (h * scale))
(where
= max w h
long = fromIntegral longEdge / long
scale = Tr 0 0 scale tr