Suppose you are writing a compiler for some programming language or DSL. If you are doing source to source transformations in your compiler, perhaps as part of an optimization pass, you will need to construct and deconstruct bits of abstract syntax. It would be very convenient if we could write that abstract syntax using the syntax of your language. In this blog post we show how you can reuse your existing compiler infrastructure to make this possible by writing a quasi-quoter with support for metavariables. As we will see, a key insight is that we can reuse object variables as meta variables.
Toy Language “Imp”
For the sake of this blog post we will be working with a toy language called
Imp. The abstract syntax for Imp is defined by
type VarName = String
data Expr =
Var VarName
| Add Expr Expr
| Sub Expr Expr
| Int Integer
| Read
deriving (Data, Typeable, Show, Eq)
data Cmd =
Write Expr
| Assign VarName Expr
| Decl VarName
deriving (Data, Typeable, Show)
data Prog = Prog [Cmd]
deriving (Data, Typeable, Show)and we will assume that we have some parsec parsers
parseExpr :: Parser Expr
parseProg :: Parser ProgWe will also make use of
topLevel :: Parser a -> Parser a
topLevel p = whiteSpace *> p <* eofand the following useful combinator for running a parser:
parseIO :: Parser a -> String -> IO aThe details of these parsers are beyond the scope of this post. There are
plenty of parsec tutorials online; for instance, you could start with the
parsec chapter in Real World Haskell. Moreover, the full code
for this blog post, including a simple interpreter for the language, is
available on github if you want to play with it. Here is a simple
example of an Imp program:
var x ;
x := read ;
write (x + x + 1)
A simple quasi-quoter
We want to be able to write something like
prog1 :: Prog
prog1 = [prog|
var x ;
x := read ;
write (x + x + 1)
|]where the intention is that the [prog| ... |] quasi-quote will expand to
something like
prog1 = Prog [
Decl "x"
, Assign "x" Read
, Write (Add (Add (Var "x") (Var "x")) (Int 1))
]To achieve this, we have to write a quasi-quoter. A quasi-quoter is an instance of the following data type:
data QuasiQuoter = QuasiQuoter {
quoteExp :: String -> Q Exp
, quotePat :: String -> Q Pat
, quoteType :: String -> Q Type
, quoteDec :: String -> Q [Dec]
}The different fields are used when using the quasi-quoter in different places in your Haskell program: at a position where we expect a (Haskell) expression, a pattern (we will see an example of that later), a type or a declaration; we will not consider the latter two at all in this blog post.
In order to make the above example (prog1) work, we need to implement
quoteExp but we can leave the other fields undefined:
prog :: QuasiQuoter
prog = QuasiQuoter {
quoteExp = \str -> do
l <- location'
c <- runIO $ parseIO (setPosition l *> topLevel parseProg) str
dataToExpQ (const Nothing) c
, quotePat = undefined
, quoteType = undefined
, quoteDec = undefined
}Let’s see what’s going on here. The quasi-quoter gets as argument the string in
the quasi-quote brackets, and must return a Haskell expression in the
Template-Haskell Q monad. This monad supports, amongst other things, getting
the current location in the Haskell file. It also supports IO.
Location
The first thing that we do is find the current location in the Haskell source
file and convert it to parsec format:
location' :: Q SourcePos
location' = aux <$> location
where
aux :: Loc -> SourcePos
aux loc = uncurry (newPos (loc_filename loc)) (loc_start loc)Running the parser
Once we have the location we then parse the input string to a term in our
abstract syntax (something of type Prog). We use parsec’s setPosition to
tell parsec where we are in the Haskell source file, so that if we make a
mistake such as
prog1 :: Prog
prog1 = [prog|
var x ;
x := read ;
write (x + x + )
|]we get an error that points to the correct location in our Haskell file:
TestQQAST.hs:6:9:
Exception when trying to run compile-time code:
user error ("TestQQAST.hs" (line 9, column 20):
unexpected ")"
expecting "(", "read", identifier or integer)
Converting to Haskell abstract syntax
The parser returns something of type Prog, but we want something of type
Exp; Exp is defined in Template Haskell and reifies the abstract
syntax of Haskell. For example, we would have to translate the Imp abstract
syntax term
Var "x" :: Progto its reflection as a piece of abstract Haskell syntax as
AppE (ConE 'Var) (LitE (StringL "x")) :: TH.Expwhich, when spliced into the Haskell source, yields the original Prog value.
Fortunately, we don’t have to write this translation by hand, but we can make
use of the following Template Haskell function:
dataToExpQ :: Data a
=> (forall b. Data b => b -> Maybe (Q Exp))
-> a -> Q ExpThis function can translate any term to a reified Haskell expression, as long
as the type of the term derives Data (Data instances can be auto-derived by
ghc if you enable the DeriveDataTypeable language extension). The first
argument allows you to override the behaviour of the function for specific
cases; we will see an example use case in the next section. In our quasi-quoter
so far we don’t want to override anything, so we pass a function that always
returns Nothing.
Once we have defined this quasi-quoter we can write
prog1 :: Prog
prog1 = [prog|
var x ;
x := read ;
write (x + x + 1)
|]and ghc will run our quasi-quoter and splice in the Haskell expression
corresponding to the abstract syntax tree of this program (provided that we
enable the QuasiQuotes language extension).
Meta-variables
Consider this function:
prog2 :: VarName -> Integer -> Prog
prog2 y n = [prog|
var x ;
x := read ;
write (x + y + n)
|]As mentioned, in the source code for this blog post we also have an
interpreter for the language. What happens if we try to run (prog2 "x" 1)?
*Main> intIO $ intProg (prog2 "x" 2)
5
*** Exception: user error (Unbound variable "y")
Indeed, when we look at the syntax tree that got spliced in for prog2 we see
Prog [ Decl "x"
, Assign "x" Read
, Write (Add (Add (Var "x") (Var "y")) (Var "n"))
]What happened? Didn’t we pass in "x" as the argument y? Actually, on
second thought, this makes perfect sense: this is after all what our string
parses to. The fact that y and n also happen to be Haskell variables, and
happen to be in scope at the point of the quasi-quote, is really irrelevant.
But we would still like prog2 to do what we expected it to do.
Meta-variables in Template Haskell
To do that, we have to support meta variables: variables from the
“meta” language (Haskell) instead of the object language (Imp).
Template Haskell supports this out of the box. For example, we can define
ex :: Lift a => a -> Q Exp
ex x = [| id x |]Given any argument that supports Lift, ex constructs a piece of abstract
Haskell syntax which corresponds to the application of the identity function to
x. (Don’t confuse this with anti-quotation; see Brief Intro to
Quasi-Quotation.) Lift is a type class with a single
method
class Lift t where
lift :: t -> Q ExpFor example, here is the instance for Integer:
instance Lift Integer where
lift x = return (LitE (IntegerL x))Meta-variables in quasi-quotes
Quasi-quotes don’t have automatic support for meta-variables. This makes sense: Template Haskell is for quoting Haskell so it has a specific concrete syntax to work with, where as quasi-quotes are for arbitrary custom syntaxes and so we have to decide what the syntax and behaviour of meta-variables is going to be.
For Imp we want to translate any unbound Imp (object-level) variable in the
quasi-quote to a reference to a Haskell (meta-level) variable. To do that, we
will introduce a similar type class to Lift:
class ToExpr a where
toExpr :: a -> Exprand provide instances for variables and integers:
instance ToExpr VarName where
toExpr = Var
instance ToExpr Integer where
toExpr = IntWe will also need to know which variables in an Imp program are bound
and unbound; in the source code you will find a function which
returns the set of free variables in an Imp program:
fvProg :: Prog -> Set VarNameOverriding the behaviour of dataToExpQ
In the previous section we mentioned that rather than doing the Prog -> Q Exp
transformation by hand we use the generic function dataToExpQ to do it for
us. However, now we want to override the behaviour of this function for the
specific case of unbound Imp variables, which we want to translate to
Haskell variables.
Recall that dataToExpQ has type
dataToExpQ :: Data a
=> (forall b. Data b => b -> Maybe (Q Exp))
-> a -> Q ExpThis is a rank-2 type: the first argument to dataToExpQ must itself
be polymorphic in b: it must work on any type b that derives Data. So far
we have been passing in
const Nothingwhich is obviously polymorphic in b since it completely ignores its argument.
But how do we do something more interesting? Data and its associated
combinators come from a generic programming library called Scrap Your
Boilerplate (Data.Generics). A full discussion of SYB is beyond
the scope of this blog post; the SYB papers are a good starting
point if you would like to know more (I would recommend reading them in
chronological order, the first published paper first). For the sake of what we
are trying to do it suffices to know about the existence of the following
combinator:
extQ :: (Typeable a, Typeable b) => (a -> q) -> (b -> q) -> a -> qGiven a polymorphic query (forall a)—in our case this is const Nothing—extQ allows to extend the query with a type specific case (for
a specific type b). We will use this to give a specific case for Expr: when
we see a free variable in an expression we translate it to an application of
toExpr to a Haskell variable with the same name:
metaExp :: Set VarName -> Expr -> Maybe ExpQ
metaExp fvs (Var x) | x `Set.member` fvs =
Just [| toExpr $(varE (mkName x)) |]
metaExp _ _ =
NothingThe improved quasi-quoter
With this in hand we can define our improved quasi-quoter:
prog :: QuasiQuoter
prog = QuasiQuoter {
quoteExp = \str -> do
l <- location'
c <- runIO $ parseIO (setPosition l *> topLevel parseProg) str
dataToExpQ (const Nothing `extQ` metaExp (fvProg c)) c
, quotePat = undefined
, quoteType = undefined
, quoteDec = undefined
}Note that we are extending the query for Expr, not for Prog; dataToExpQ
(or, more accurately, SYB) makes sure that this extension is applied at all the
right places. Running (prog2 "x" 2) now has the expected behaviour:
*Main> intIO $ intProg (prog2 "x" 2)
6
14
Indeed, when we have a variable in our code that is unbound both in Imp
and in Haskell, we now get a Haskell type error:
prog2 :: VarName -> Integer -> Prog
prog2 y n = [prog|
var x ;
x := read ;
write (x + z + n)
|]gives
TestQQAST.hs:15:19: Not in scope: ‘z’
Parenthetical remark: it is a design decision whether or not we want to allow
local binding sites in a splice to “capture” meta-variables. Put
another way, when we pass in "x" to prog2, do we mean the x that is bound
locally in prog2, or do we mean a different x? Certainly a case can be made
that we should not be able to refer to the locally bound x at all—after
all, it’s not bound outside of the snippet! This is an orthogonal concern
however and we will not discuss it any further in this blog post.
Quasi-quoting patterns
We can also use quasi-quoting to support patterns. This enables us to write something like
optimize :: Expr -> Expr
optimize [expr| a + n - m |] | n == m = optimize a
optimize other = otherAs before, the occurrence of a in this pattern is free, and we intend it
to correspond to a Haskell variable, not an Imp variable; the above code
should correspond to
optimize (Sub (Add a n) m) | n == m = optimize a(note that this is comparing Exprs for equality, hence the need for Expr
to derive Eq). We did not mean the pattern
optimize (Sub (Add (Var "a") (Var "n")) (Var "m"))To achieve this, we can define a quasi-quoter for Expr that supports patterns
(as well as expressions):
expr :: QuasiQuoter
expr = QuasiQuoter {
quoteExp = \str -> do
l <- location'
e <- runIO $ parseIO (setPosition l *> topLevel parseExpr) str
dataToExpQ (const Nothing `extQ` metaExp (fvExpr e)) e
, quotePat = \str -> do
l <- location'
e <- runIO $ parseIO (setPosition l *> topLevel parseExpr) str
dataToPatQ (const Nothing `extQ` metaPat (fvExpr e)) e
, quoteType = undefined
, quoteDec = undefined
}The implementation of quotePat is very similar to the definition of
quoteExp. The only difference is that we use dataToPatQ instead of
dataToExpQ to generate a Haskell pattern rather than a Haskell expression,
and we use metaPat to give a type specific case which translates free
Imp variables to Haskell pattern variables:
metaPat :: Set VarName -> Expr -> Maybe PatQ
metaPat fvs (Var x) | x `Set.member` fvs = Just (varP (mkName x))
metaPat _ _ = NothingNote that there is no need to lift now; the Haskell variable will be bound to whatever matches in the expression.
Limitations
We might be tempted to also add support for Prog patterns. While that is
certainly possible, it’s of limited use if we follow the same strategy that we
followed for expressions. For instance, we would not be able to write something
like
opt [prog| var x ; c |] | x `Set.notMember` fvProg c = opt cThe intention here is that we can remove unused variables. Unfortunately, this
will not work because this will cause a parse error: the syntax for Imp does
not allow for variables for commands, and hence we also don’t allow for
meta-variables at this point. This is important to remember:
By using object-level variables as stand-ins for meta-level variables, we only allow for meta-level variables where the syntax for the object-level language allows variables.
If this is too restrictive, we need to add special support in the ADT and in the corresponding parsers for meta-variables. This is a trade-off in increased expressiveness of the quasi-quotes against additional complexity in their implementation (new parsers, new ADTs).
Conclusions
By reusing object-level variables as stand-ins for meta variables you can reuse existing parsers and ADTs to define quasi-quoters. Using the approach described in this blog we were able to add support for quasi-quoting to a real compiler for a domain specific programming language with a minimum of effort. The implementation is very similar to what we have shown above, except that we also dealt with renaming (so that meta variables cannot be captured by binding sites in the quasi quotes) and type checking (reusing the existing renamer and type checker, of course).