Template Haskell (TH) is a powerful tool for specializing programs and allows shifting some work from runtime to compile time. It can be a bit intimidating to use for beginners. So I thought I would write up how to use TH to turn certain kind runtime computations into compile time computations.
In particular we will turn the initialization of a fully static data
structure into a compile time operation.
This pattern works for many data structures but we will look
at IntSet
in particular.
A working example
As an example, consider a function such as:
isStaticId :: Int -> Bool
=
isStaticId x `elem` staticIds
x where
= [1,2,3,5,7 :: Int] staticIds
We have a set of known things here, represented by a list named staticIds
.
We use Int
as it makes the example easier. But these could be Strings or all kinds of things.
In particular, I was inspired by GHC’s list of known builtin functions.
Upsides of the list representation
The advantage of the code as written above is that the list is statically known. As a result the list will be built into the final object code as static data, and accessing it will not require any allocation/computation.
You can check this by looking at the core dump (-ddump-simpl
).
Don’t forget to enable optimizations or this might not work as expected.
In the core there should be a number of definitions like the one below.
-- RHS size: {terms: 3, types: 1, coercions: 0, joins: 0/0}
isStaticId3= : isStaticId8 isStaticId4 isStaticId3
Note that the above is Core syntax for isStaticId3 = isStaticId8 : isStaticId4
,
i.e., this is just denoting a part of the list, and each element gets its own
identifier.
All these definitions will be compiled to static data, and
will eventually be represented as just a number of words encoding the
constructor and its fields.
We can confirm this by looking at the Cmm output where the corresponding fragment will look like this:
[section ""data" . isStaticId3_closure" {
isStaticId3_closure:
const :_con_info;
const isStaticId8_closure+1;
const isStaticId4_closure+2;
const 3;
}]
I won’t go into the details of how to read the Cmm
, but it shows us that the binding will end up in the data section.
The constant :_con_info;
tells us that we are dealing with a Cons cell,
and then we have the actual data stored in the cell.
What is important here is that this is static data. The GC won’t have to traverse it so having the data around does not affect GC performance. We also don’t need to compute it at runtime as it’s present in the object file in its fully evaluated form.
Switching to IntSet
What if we aggregate more data? If we blow up the list to a hundred, a thousand or more elements, it’s likely that performing a linear search will become a bottleneck for performance.
So we rewrite our function to use a set as follows:
isStaticIdSet :: Int -> Bool
=
isStaticIdSet x `S.member` staticIds
x where
= S.fromList [1,2,3,5,7 :: Int] :: IntSet staticIds
This looks perfectly fine on the surface.
Instead of having O(n)
lookups we should get O(log(n))
lookups, right?
Pitfalls of runtime initialization
However, what happens at runtime? In order to query the set we have to first convert
the list into a set.
This is where disaster strikes. We are no longer querying static data, as the list argument
has to be converted into a set. The S.fromList
function call will not be evaluated
at compilation time.
In many cases, GHC may manage to at least share our created set staticIds
across calls.
But this is fragile, and depending on the exact code in question,
it might not. Then we can end up paying the cost of set construction for each call to isStaticIdSet
.
So while we reduced the lookup cost from O(n)
to O(log(n))
the total cost is now
O(n*min(n,W)+ log(n))
, where n*min(n,W)
is the cost of constructing the set
from a list.
We could optimize this somewhat by making sure the list is sorted and has no duplicates. But we would
still end up worse than with the list-based code we started out with.
It’s a shame that GHC can’t evaluate S.fromList
at compile time … or can it?
Template Haskell (TH) to the rescue
What we really want to do is to force GHC to fully evaluate our input data to an IntSet
.
Then ensure the IntSet
is stored as static data just like it happens for the list in our initial example.
How can TH help?
Template Haskell allows us to specify parts of the program to compute at compile time.
So we “simply” tell GHC to compute the set at compile time and are done.
Like so:
{-# NOINLINE isStaticIdSet #-}
isStaticIdSet :: Int -> Bool
=
isStaticIdSet x `S.member` staticIds
x where
= $$( liftTyped (S.fromList [1,2,3,5,7] :: IntSet)) staticIds
This results in Core that is even simpler as in the [Int]
example above:
-- RHS size: {terms: 3, types: 0, coercions: 0, joins: 0/0}
isStaticIdSet1= Tip 0# 174##
isStaticIdSet1
-- RHS size: {terms: 7, types: 3, coercions: 0, joins: 0/0}
isStaticIdSet
isStaticIdSet= \ x_a5ar ->
case x_a5ar of { I# ww1_i5r2 -> $wmember ww1_i5r2 isStaticIdSet1 }
No longer will we allocate the set at compilation time; instead the whole set is encoded in isStaticIdSet1
.
We only get a single constructor because IntSet can encode small sets using a single constructor.
How it works
From the outside in:
$$( .. )
is TH syntax for a typed splice.1 Splicing is the process of
inserting generated syntax into our program. The splice construct takes an expression
denoting a syntax tree, evaluates it and inserts the resulting syntax at the place where
the splice occurs.
The next piece of magic is liftTyped
. It takes a regular Haskell expression,
evaluates it at compile time to an abstract syntax tree that, when spliced, equals
the evaluated value of the Haskell expression.
This leaves S.fromList [1,2,3,5,7]
which is regular set creation.
Putting these together, during compilation GHC will:
- Evaluate
S.fromList [1,2,3,5,7]
. - Turn the resulting set into an abstract syntax tree using
liftTyped
. - Splice that abstract syntax tree into our program using
$$(..)
, effectively inserting the fully evaluated set expression into our program.
The resulting code will be compiled like any other, in this case resulting in fully static data.
Full example
Now you might think this was too easy, and you are partially right.
The main issue is that liftTyped
requires a instance of the Lift
typeclass.
But for the case of IntSet
, we can have GHC derive one for us.
So all it costs us is slightly more boiler plate.
Here is a full working example for you to play around with:
-- First module
{-# LANGUAGE TemplateHaskell #-} -- Enable TH
{-# LANGUAGE StandaloneDeriving #-}
{-# LANGUAGE DeriveLift #-}
module TH_Lift where
import Language.Haskell.TH.Syntax
import Data.IntSet.Internal
deriving instance Lift (IntSet)
---------------------------------
-- Second module
{-# LANGUAGE TemplateHaskell #-}
module M (isStaticIdSet) where
import TH_Lift
import Data.IntSet as S
import Language.Haskell.TH
import Language.Haskell.TH.Syntax
type Id = Int
isStaticIdSet :: Int -> Bool
=
isStaticIdSet x `S.member` staticSet
x where
= $$(liftTyped (S.fromList [1,2,3,5,7] :: IntSet)) staticSet
Why do we require two modules?
We translate liftTyped (S.fromList [1,2,3,5,7] :: IntSet)
into a TH expression at
compile time. For this, GHC will call the (already compiled) lift method of the Lift
instance.
However if we define isStaticIdSet
and the Lift
instance in the same module,
GHC can’t call liftTyped
as it’s not yet compiled by the time we need it.
In practice most packages have companions which already offer Lift
instances.
For example, th-lift-instances
offers instances for the containers
package.2
Disclaimer: This won’t work for all data types!
For many data types the result of liftTyped
will be an expression that
can be compiled to static data as long as the contents are known.
This is in particular true for “simple” ADTs like the ones used by IntSet
or Set
.
However certain primitives like arrays can’t be allocated at compile time.
This sadly means this trick won’t currently work for Array
s or Vector
s.
There is a ticket about removing this restriction on arrays on GHC’s issue
tracker..
So hopefully, we will be able to lift arrays at some point in the future.
Note furthermore that lifting won’t work for infinite data structures, as it usually requires its argument to be evaluated completely if we want it to result in static data.
We are using Typed Template Haskell here in order to advertise it a bit better. Typed Template Haskell ensures that we are building type-correct code. In an example, as simple as this, it hardly makes a difference, because even normal Template Haskell does type-check the generated code. We could equally well have written
↩︎= $(lift (S.fromList [1,2,3,5,7] :: IntSet)) staticSet
Unfortunately, some of the instances defined in
th-lift-instances
up to version 0.1.16 are “wrong” for the purposes of this post. For example, theIntSet
instance is based on a call tofromList
, not statically building the internal representation. Make sure that you useth-lift-instances
version 0.1.17 or later.↩︎