This is a second blog post in a series about engineering optimizing compilers; the previous was Quasi-quoting DSLs for free. In this blog we will see how we might write a very simple SMT solver that will allow us to write an optimization pass that can turn something like
if a == 0 then
if !(a == 0) && b == 1 then
1
write else
2
write else
3 write
into
if a == 0 then
2
write else
3 write
without much effort at all.
Preliminaries
For the sake of this blog post we will consider a very simple imperative object language, defined as
data Expr =
-- Arithmetic expressions
ExprInt Integer -- ^ Integer constants
| ExprAdd Expr Expr -- ^ Addition
-- Boolean expressions
| ExprBool Bool -- ^ Boolean constants
| ExprEq Expr Expr -- ^ Equality
| ExprNot Expr -- ^ Negation
| ExprAnd Expr Expr -- ^ Logical conjunction
| ExprOr Expr Expr -- ^ Logical disjunction
-- Variables
| ExprVar VarName -- ^ Read from a variable
| ExprAssign VarName Expr -- ^ Write to a variable
-- Control flow
| ExprSeq Expr Expr -- ^ Sequential composition
| ExprIf Expr Expr Expr -- ^ Conditional
-- Input/output
| ExprRead -- ^ Read an integer from the console
| ExprWrite Expr -- ^ Write an integer to the console
We will assume the existence of a quasi-quoter for this language so that we can write Haskell fragments such as
[expr| if a == 0 then read else write b |]
instead of
ExprIf (ExprEq (ExprVar a) (ExprInt 0))
ExprRead
ExprWrite (ExprVar b)) (
How you can write such a quasi-quoter was the topic of the previous blog
post. You should
however be able to read this blog post without having read the previous post;
hopefully the mapping from the concrete syntax to the constructors of Expr
is
pretty obvious.
Simplifying assumption
Our goal will be to write a function
provable :: Expr -> Bool
Let’s consider some examples:
The expression
a || True
should be provable: no matter what value variablea
has,a || True
is alwaysTrue
.Likewise, the expression
a || !a
should be provable, for the same reason.However, expression
a
should not be provable (a
might beFalse
)Likewise, expression
!a
should also not be provable (a
might beTrue
)
Note that this means that it’s perfectly possible for both an expression and the negation of that expression to be unprovable.
What about an expression !(n == 0 && n == 1)
? Morally speaking, this
expression should be provable. However, we will be making the following very
important simplifying assumption:
provable
is allowed to give up: whenprovable
returnsFalse
, this should be interpreted as “failed to prove”, rather than “there exist a counterexample”.
From a compiler perspective, if something is not statically provable, that simply means that an optimization may not be applied even though it could: that is, we miss an opportunity to make the program go faster, but we will not break the code.
An evaluator
We don’t want (or need) to embed a full blown theorem prover into our compiler: ideally we write something simple that will still catch a lot of the common cases. Moreover, we would prefer to reuse as much of our existing compiler infrastructure as we can. Our optimizer is likely to contain an interpreter for the object language, so that we can attempt to statically evaluate expressions. We are going to adapt this interpreter so that we can also use it in our solver. In fact, as we shall see, the solver will be a one-liner.
But we’re getting ahead of ourselves. Let’s consider how to write the
interpreter first. The interpreter will be running in an Eval
monad; for our
first attempt, let’s define this monad as a a simple wrapper around the list
monad:
newtype Eval a = Eval { unEval :: [a] }
deriving ( Functor
Applicative
, Alternative
, Monad
, MonadPlus
,
)
runEval :: Eval a -> [a]
= unEval act runEval act
We will use the monad for failure:
throwError :: Eval a
= Eval [] throwError
We can provide error recovery through
onError :: Eval a -> Eval a -> Eval a
Eval act) (Eval handler) = Eval $
onError (case act of
-> handler
[] -> rs rs
We will see why we need the list monad when we discuss the evaluator for boolean expressions; but let’s consider the evaluator for integer expressions first:
evalInt :: Expr -> Eval Integer
= go
evalInt where
ExprInt i) = return i
go (ExprAdd a b) = (+) <$> evalInt a <*> evalInt b
go (ExprIf c a b) = do cond <- evalBool c
go (if cond then a else b)
evalInt (= throwError go _
Hopefully this should be pretty self explanatory; our toy language only has a few integer-valued expressions, so there isn’t much to do. The interpreter for boolean expressions is more interesting:
evalBool :: Expr -> Eval Bool
= \e -> go e `onError` guess e
evalBool where
ExprBool a) = return a
go (ExprEq a b) = (==) <$> evalInt a <*> evalInt b
go (ExprNot a) = not <$> evalBool a
go (ExprAnd a b) = (&&) <$> evalBool a <*> evalBool b
go (ExprOr a b) = (||) <$> evalBool a <*> evalBool b
go (ExprIf c a b) = do cond <- evalBool c
go (if cond then a else b)
evalBool (= throwError
go _
= return True <|> return False guess _e
The definition of go
contains no surprises, and follows the definition of
go
in evalInt
very closely. However, the top-level definition
= \e -> eval e `onError` guess e evalBool
is more interesting. If for some reason we fail to evaluate a boolean
expression (for example, because it contains a variable) then guess
returns
both True
and False
. Let’s consider some examples:
$ evalBool [expr| True |] runEval
evalutes to [True]
;
$ evalBool [expr| a |] runEval
evaluates to [True, False]
because we don’t know what value a
has, but
$ evalBool [expr| a || True |] runEval
evaluates to [True, True]
: we still have two guesses for a
, but no matter
what we guess a || True
always evaluates to True
.
Satisfiability
We can now write our SMT solver; as promised, it will be a single line:
satisfiable :: Expr -> Bool
= or . runEval . evalBool satisfiable
Function satisfiable
(the “S” in SMT) checks if the expression is
true for some values of the variables in the expression. How do we check
this? Well, we just run our evaluator which, when it encounters a variable,
will return both values for the variable. Hence, if any of the values returned
by the evaluator is True
, then the expression is true at least for one value
for each variable.
Once we have an implementation of satisfiability, we can implement provable
very easily: an expression is provable if its negation is not satisfiable:
provable :: Expr -> Bool
= not . satisfiable . ExprNot provable
If we consider the three examples from the previous section, we will find that
both True
and a || True
are provable, but a
by itself is not, as expected.
Inconsistent guesses
The careful reader might at this point find his brow furrowed, because something is amiss:
$ evalBool [expr| a || !a |] runEval
evaluates to
True, True, False, True] [
This happens because the evaluator will make two separate independent guesses
about the value of a
. As a consequence, a || !a
will be considered not
provable.
Is this a bug? No, it’s not. Recall that we made the simplifying assumption
that provable
is allowed to give up: it’s allowed to say that an expression
is not provable even when it morally is. The corresponding (dual) simplifying
assumption for satisfability, and hence the interpreter, is:
The interpreter, and hence satisfiability, is allowed to make inconsistent guesses.
Making an inconsistent guess is equivalent to assuming False
: anything
follows from False
and hence any expression will be considered satisfiable
once we make an inconsistent guess. As a consequence, this means that once we
make inconsistent guesses, we will consider the expression as not provable.
More precision
We can however do better: whenever we make a guess that a particular
expression evaluates to True
or False
, then if we see that same expression
again we can safely make the same guess, rather than making an independent
guess. To implement this, we extend our Eval
monad with some state to
remember the guesses we made so far:
newtype Eval a = Eval { unEval :: StateT [(Expr, Bool)] [] a }
deriving ( Functor
Applicative
, Alternative
, Monad
, MonadPlus
, MonadState [(Expr, Bool)]
,
)
runEval :: Eval a -> [a]
= evalStateT (unEval act) []
runEval act
throwError :: Eval a
= Eval $ StateT $ \_st -> []
throwError
onError :: Eval a -> Eval a -> Eval a
Eval act) (Eval handler) = Eval $ StateT $ \st ->
onError (case runStateT act st of
-> runStateT handler st
[] -> rs rs
The implementation of evalInt
does not change at all. The implementation of
evalBool
also stays almost the same; the only change is the definition of
guess
:
= do
guess e <- get
st case lookup e st of
Just b -> return b
Nothing -> (do put ((e, True) : st) ; return True)
<|> (do put ((e, False) : st) ; return False)
This implements the logic we just described: when we guess the value for an
expression e
, we first look up if we already made a guess for this expression.
If so, we return the previous guess; otherwise, we make a guess and record that
guess.
Once we make this change
$ evalBool [expr| a || !a |] runEval
will evaluate to [True,True]
and consequently a || !a
will be considered
provable.
Example: folding nested conditionals
As an example of how one might use this infrastructure, we will consider a
simple pass that simplifies nested conditionals (if-statements). We can
use provable
to check if one expression implies the other:
(~>) :: Expr -> Expr -> Bool
~>) a b = provable [expr| ! $a || $b |] (
The simplifier is now very easy to write:
simplifyNestedIf :: Expr -> Expr
= everywhere (mkT go)
simplifyNestedIf where
go [expr| if $c1 then
if $c2 then
$e1
else
$e2
else
$e3 |]| c1 ~> c2 = [expr| if $c1 then $e1 else $e3 |]
| c1 ~> [expr| ! $c2 |] = [expr| if $c1 then $e2 else $e3 |]
= e go e
The auxiliary function go
pattern matches against any if-statement that has
another if-statement as its “then” branch. If we can prove that the
condition of the outer if-statement implies the condition of the inner
if-statement, we can collapse the inner if-statement to just its
“then” branch. Similarly, if we can prove that the condition of the
outer if-statement implies the negation of the condition of the inner
if-statement, we can collapse the inner if-statement to just its
“else” branch. In all other cases, we return the expression
unchanged. Finally, we use the Scrap Your
Boilerplate
operators everywhere
and mkT
to apply this transformation everywhere in an
expression (rather than just applying it top-level). For example,
simplifyNestedIf [expr|
if a == 0 then
if !(a == 0) && b == 1 then
write 1
else
write 2
else
write 3 |]
evaluates to
if a == 0 then
2
write else
3 write
as promised. Incidentally, perhaps you are wondering why such an optimization would be useful; after all, which self respecting programmer writes such code? However, code like the above may result from other compiler optimizations such as inlining: imagine that the inner if-statement came from a inlined function. A lot of compiler optimizations are designed to clean up other compiler optimizations.
Conclusion
We can implement a very simple but useful SMT solver for use in an optimizing compiler by making a small change to the interpreter, which we need anyway.
Note that the increase in precision gained from recording previous guesses is a gradual increase only; satisfiability may still make some inconsistent guesses. For example
$ evalBool [expr| !(n == 0 && n == 1) |] runEval
will evaluate to
False,True,True,True] [
because it is making independent guesses about n == 0
and n == 1
;
consequently !(n == 0 && n == 1)
will not be considered provable. We can
increase the precision of our solver by making guess
smarter (the
“MT” or “modulo theories” part of SMT). For example, we
could record some information about the guesses about integer valued variables.
At the extreme end of the scale we would be implementing a full decision procedure for first order arithmetic, which is probably optimization gone too far. However, the approach we took above where we start from the basis that we are allowed to make inconsistent guesses gives us a way to implement a simple solver that addresses the most important cases, even if it misses a few—without requiring a lot of complicated infrastructure in the compiler. And as long as we make sure that our evaluator never returns too few values (even if it may return too many) we don’t have to worry that we might generate invalid code: the worst that can happen is that we miss an optimization.