Stateful APIs are everywhere: file systems, databases, widget libraries, the list goes on. Automated testing of such APIs requires generating sequences of API calls, and when we find a failing test, ideally shrinking such a sequence to a minimal test case. Neither the generation nor the shrinking of such sequences is trivial. After all, it is the very nature of stateful systems that later calls may depend on earlier calls: we can only add rows to a database table after we create it, we can only write to a file after we open it, etc. Such dependencies need to be tracked carefully. Moreover, in order to verify the responses we get back from the system, the test needs to maintain some kind of internal representation of what it thinks the internal state of the system is: when we read from a file, we need to know what was in the file in order to be able to verify if the response was correct or not.
In this blog post we will take an in-depth look at
quickcheck-state-machine
,
a library for testing stateful code. Our running example will be the development
of a simple mock file system that should behave identically to a real file
system. Although simple, the example will be large enough to give us an
opportunity to discuss how we can verify that our generator is producing all
test cases we need, and how we can inspect whether the shrinker is doing a good
job; in both cases, test case labelling will turn out to be essential. Throughout
we will also discuss design patterns for quickcheck-state-machine
tests which
improve separation of concerns and reduce duplication. It should probably be
pointed out that this is an opinionated piece: there are other ways to set
things up than we present here.
We will not show the full development in this blog post, and instead focus on
explaining the underlying concepts. If you want to follow along, the code is
available for download.
We will assume version 0.6 of
quickcheck-state-machine
,
which was recently released. If you are using an older version, it is
recommended to upgrade, since the newer version includes some important bug
fixes, especially in the shrinker.
Introducing the running example
Our running example will be the development of a simple mock file system; the intention is that its behaviour is identical to the real file system, within the confines of what it needs to support. We will represent the state of the file system as
data Mock = M {
dirs :: Set Dir
files :: Map File String
, open :: Map MHandle File
, next :: MHandle
,
}
type MHandle = Int
emptyMock :: Mock
= M (Set.singleton (Dir [])) Map.empty Map.empty 0 emptyMock
We record which directories (folders) exist, the contents of all files on the
system, the currently open handles (where mock handles are just integers), and
the next available mock handle. To avoid confusion between files and directories
we do not use FilePath
but instead use
data Dir = Dir [String]
data File = File {dir :: Dir, name :: String}
As one example, here is the mock equivalent of readFile
:
type MockOp a = Mock -> (Either Err a, Mock)
mRead :: File -> MockOp String
@(M _ fs hs _)
mRead f m| alreadyOpen = (Left Busy , m)
| Just s <- Map.lookup f fs = (Right s , m)
| otherwise = (Left DoesNotExist , m)
where
= f `List.elem` Map.elems hs alreadyOpen
We first check if there is an open handle to the file; if so, we disallow reading this file (“resource busy”); if the file exists, we return its content; otherwise we report a “does not exist” error. The implementation of the other mock functions is similar; the full API is
mMkDir :: Dir -> MockOp ()
mOpen :: File -> MockOp MHandle
mWrite :: MHandle -> String -> MockOp ()
mClose :: MHandle -> MockOp ()
mRead :: File -> MockOp String
Finally, we should briefly talk about errors; the errors that the mock file system can report are given by
data Err = AlreadyExists | DoesNotExist | HandleClosed | Busy
and they capture a subset of the IO exceptions1
fromIOError :: IOError -> Maybe Err
=
fromIOError e case ioeGetErrorType e of
GHC.AlreadyExists -> Just AlreadyExists
GHC.NoSuchThing -> Just DoesNotExist
GHC.ResourceBusy -> Just Busy
GHC.IllegalOperation -> Just HandleClosed
-> Nothing _otherwise
Testing
Typically we are developing some stateful code, and then write a pure (mock) implementation of the same thing to test it, making sure that the stateful implementation and the simpler pure model compute the same things. Here we are doing the opposite: we are adjusting the model (the mock file system) to match what the real file system does. Either way, the process is the same: we write tests, execute them both in the real thing and in the model, and compare results.
If we were writing unit tests, we might write tests such as
- Write to two different files
- Write to a file and then read it
- etc.
However, as John Hughes of QuviQ – and one of the original authors of QuickCheck – likes to point out, as systems grow, it becomes impossible to write unit tests that test the interaction between all the features of the system. So, don’t write unit tests. Instead, generate tests, and verify properties.
To generate tests for our mock file system, we have to generate sequences of
calls into the API; “open this file, open that file, write to the first file we
opened, …”. We then execute this sequence both against the mock file system
and against the real thing, and compare results. If something goes wrong, we end
up with a test case that we can inspect. Ideally, we should then try to reduce
this test to try and construct a minimal test case that illustrates the bug.
We have to be careful when shrinking: for example, when we remove a call to
open
from a test case, then any subsequent writes that used that file handle
must also be removed. A library such as quickcheck-state-machine
can be used
both to help with generating such sequences and, importantly, with shrinking
them.
Reifying the API
It is important that we generate the test before executing it. In other words, the test generation should not depend on any values that we only get when we run the test. Such a dependency makes it impossible to re-run the same test multiple times (no reproducible test cases) or shrink tests to obtain minimal examples. In order to do this, we need to reify the API: we need to define a data type whose constructors correspond to the API calls:
data Cmd h =
MkDir Dir
| Open File
| Write h String
| Close h
| Read File
Cmd
is polymorphic in the type of handles h
; this is important, because
we should be able to execute commands both against the mock file system and
against the real file system:
runMock :: Cmd MHandle -> Mock -> (.. Mock)
runIO :: .. -> Cmd IO.Handle -> IO ..
What should the return type of these functions be? After all, different functions return different things: Open
returns a new handle, Read
returns a string, the other functions return unit. To solve this problem we will simply introduce a union type2 for successful responses
data Success h = Unit () | Handle h | String String
A response is then either a succesful response or an error:
newtype Resp h = Resp (Either Err (Success h))
It is now easy to implement runMock
: we just map all the constructors in
Cmd
to the corresponding API calls, and wrap the result in the appropriate
constructor of Success
:
runMock :: Cmd MHandle -> Mock -> (Resp MHandle, Mock)
MkDir d) = first (Resp . fmap Unit) . mMkDir d
runMock (Open f) = first (Resp . fmap Handle) . mOpen f
runMock (Write h s) = first (Resp . fmap Unit) . mWrite h s
runMock (Close h) = first (Resp . fmap Unit) . mClose h
runMock (Read f) = first (Resp . fmap String) . mRead f runMock (
where first :: (a -> b) -> (a, x) -> (b, x)
comes from Data.Bifunctor
.
We can write a similar interpreter for IO; it will take a FilePath
as
an additional argument that it will use as a prefix for all paths; we will
use this to run the IO test in some temporary directory.
runIO :: FilePath -> Cmd IO.Handle -> IO (Resp IO.Handle)
References
Our interpreter for IO takes real IO handles as argument; but will not have
any real handles until we actually run the test. We need a way to generate
commands that run in IO but don’t use real handles (yet). Here is where we
see the first bit of infrastructure provided by quickcheck-state-machine
, references:
data Reference a r = Reference (r a)
where we will instantiate that r
parameter to either Symbolic
or Concrete
:3
data Symbolic a = Symbolic Var
data Concrete a = Concrete a
In other words, a Reference a Concrete
is really just a wrapper around an
a
; indeed, quickcheck-state-machine
provides
reference :: a -> Reference a Concrete
concrete :: Reference a Concrete -> a
However, a Reference a Symbolic
is a variable:
newtype Var = Var Int
An example of a program using symbolic references is
openThenWrite :: [Cmd (Reference IO.Handle Symbolic)]
= [
openThenWrite Open (File (Dir []) "a")
Open (File (Dir []) "b")
, Write (Reference (Symbolic (Var 0))) "Hi"
, ]
This program corresponds precisely to our example from earlier: “open this file,
open that file, then write to the first file we opened”. Commands can return as
many symbolic references in their result values as they want4; in our simple
example, only Open
creates a new reference, and so Var 0
returns to the
handle returned by the first call to Open
.
When we execute the
test, those variables will be instantiated to their real values, turning
symbolic references into concrete references. We will of course not write
programs with symbolic references in them by hand; as we will see later,
quickcheck-state-machine
provides infrastructure for doing so.
Since we will frequently need to instantiate Cmd
and Resp
with references to
handles, we will introduce some special syntax for this:
newtype At f r = At (f (Reference IO.Handle r))
type f :@ r = At f r
For example, here is a wrapper around runIO
that we will need that executes
a command with concrete references:
semantics :: FilePath -> Cmd :@ Concrete -> IO (Resp :@ Concrete)
At c) = (At . fmap reference) <$>
semantics root (<$> c) runIO root (concrete
This is really just a call to runIO
, with some type wrapping and unwrapping.
Relating the two implementations
When we run our tests, we will execute the same set of commands against the
mock implementation and in real IO, and compare the responses we get after
each command. In order to compare, say, a command “write to this MHandle
”
against the mock file system to a command “write to this IOHandle
” in IO,
we need to know the relation between the mock handles and the IO handles. As
it turns out, the most convenient way to store this mapping is as a mapping
from references to the IO handles (either concrete or symbolic) to the
corresponding mock handles.
type HandleRefs r = [(Reference IO.Handle r, MHandle)]
(!) :: Eq k => [(k, a)] -> k -> a
! r = fromJust (lookup r env) env
Then to compare the responses from the mock file system to the responses from IO we need to keep track of the state of the mock file system and this mapping; we will refer to this as the model for our test:
data Model r = Model Mock (HandleRefs r)
initModel :: Model r
= Model emptyMock [] initModel
The model must be polymorphic in r
: during test generation we will
instantiate r
to Symbolic
, and during test execution we will instantiate
r
to Concrete
.
Stepping the model
We want to work towards a function
transition :: Eq1 r => Model r -> Cmd :@ r -> Resp :@ r -> Model r
to step the model; we will gradually build up towards this. First, we can use the model to translate from commands or responses in terms of references to the corresponding commands or responses against the mock file system:
toMock :: (Functor f, Eq1 r) => Model r -> f :@ r -> f MHandle
Model _ hs) (At fr) = (hs !) <$> fr toMock (
Specifically, this can be instantiated to
toMock :: Eq1 r => Model r -> Cmd :@ r -> Cmd MHandle
which means that if we have a command in terms of references, we can translate that command to the corresponding command for the mock file system and execute it:
step :: Eq1 r => Model r -> Cmd :@ r -> (Resp MHandle, Mock)
@(Model mock _) c = runMock (toMock m c) mock step m
In order to construct the full new model however we also need to know how to
extend the handle mapping. We can compute this by comparing the response we
get from the “real” semantics (Resp :@ r
) to the response we get from the
mock semantics (from step
), and simply zip the handles from both
responses together to obtain the new mapping. We wrap all this up into an
event:
data Event r = Event {
before :: Model r
cmd :: Cmd :@ r
, after :: Model r
, mockResp :: Resp MHandle
, }
and we construct an event from a model, the command we executed, and the response we got from the real implementation:
lockstep :: Eq1 r
=> Model r
-> Cmd :@ r
-> Resp :@ r
-> Event r
@(Model _ hs) c (At resp) = Event {
lockstep m= m
before = c
, cmd = Model mock' (hs <> hs')
, after = resp'
, mockResp
}where
= step m c
(resp', mock') = zip (toList resp) (toList resp') hs'
The function we mentioned at the start of this section is now easily derived:
transition :: Eq1 r => Model r -> Cmd :@ r -> Resp :@ r -> Model r
= after . lockstep m c transition m c
as well as a function that compares the mock response and the response from the real file system and checks that they are the same:
postcondition :: Model Concrete
-> Cmd :@ Concrete
-> Resp :@ Concrete
-> Logic
= toMock (after e) r .== mockResp e
postcondition m c r where
= lockstep m c r e
We will pass this function to quickcheck-state-machine
to be run after every
command it executes to make sure that the model and the real system do indeed
return the same responses; it therefore does not need to be polymorphic in r
.
(Logic
is a type introduced by quickcheck-state-machine
; think of it as a
boolean with some additional information, somewhat similar to QuickCheck
’s
Property
type.)
Events will also be very useful when we label our tests; more on that later.
Constructing symbolic responses
We mentioned above that we will not write programs with symbolic references in
it by hand. Instead what will happen is that we execute commands in the mock
file system, and then replace any of the generated handles by new variables.
Most of this happens behind the scenes by quickcheck-state-machine
, but we do
need to give it this function to construct symbolic responses:
symbolicResp :: Model Symbolic
-> Cmd :@ Symbolic
-> GenSym (Resp :@ Symbolic)
= At <$> traverse (const genSym) resp
symbolicResp m c where
= step m c (resp, _mock')
This function does what we just described: we use step
to execute the command
in the mock model, and then traverse the response, constructing a new (fresh)
variable for each handle. GenSym
is a monad defined in
quickcheck-state-machine
for the sole purpose of generating variables; we
won’t use it anywhere else except in this function.
Generating commands
To generate commands, quickcheck-state-machine
requires a function that
produces the next command given the current model; this function will be a
standard QuickCheck
generator. For our running example, the generator is
easy to write:
generator :: Model Symbolic -> Maybe (Gen (Cmd :@ Symbolic))
Model _ hs) = Just $ QC.oneof $ concat [
generator (
withoutHandleif null hs then [] else withHandle
,
]where
withoutHandle :: [Gen (Cmd :@ Symbolic)]
= [
withoutHandle fmap At $ MkDir <$> genDir
fmap At $ Open <$> genFile
, fmap At $ Read <$> genFile
,
]
withHandle :: [Gen (Cmd :@ Symbolic)]
= [
withHandle fmap At $ Write <$> genHandle <*> genString
fmap At $ Close <$> genHandle
,
]
genDir :: Gen Dir
= do
genDir <- QC.choose (0, 3)
n Dir <$> replicateM n (QC.elements ["x", "y", "z"])
genFile :: Gen File
= File <$> genDir <*> QC.elements ["a", "b", "c"]
genFile
genString :: Gen String
= QC.sized $ \n -> replicateM n (QC.elements "ABC")
genString
genHandle :: Gen (Reference IO.Handle Symbolic)
= QC.elements (map fst hs) genHandle
A few comments on the generator:
- When we generate paths, we choose from a very small set of directory and file names. We are not really interested in testing that, for example, our implementation is Unicode-safe here; by picking from a small known set we generate tests that are more likely to contain multiple references to the same file, without having to add special provisions to do so.
- We cannot, of course, generate handles out of nowhere; fortunately, the model
tells us which handles we have available, and so
genHandle
just picks one at random from that. We do not limit the generator to picking open handles: it is important to also test the behaviour of the system in the error case where a program attempts to write to a closed handle. - The
elements
function fromQuickCheck
, which picks a random element from a list, is partial: it will throw an error if the list is empty. We must be careful not to generate any commands requiring handles when we have no handles yet. - The reason for the
Maybe
in the type signature is that in some circumstances a generator might be unable to generate any commands at all given a particular model state. We don’t need to take advantage of this feature and therefore always returnJust
a generator.
We can also define a shrinker for commands, but we can start with a shrinker that says that individual commands cannot be shrunk:
shrinker :: Model Symbolic -> Cmd :@ Symbolic -> [Cmd :@ Symbolic]
= [] shrinker _ _
Putting it all together
In order to actually start generating and running tests,
quickcheck-state-machine
wants us to bundle all of this functionality up in a
single record:
sm :: FilePath -> StateMachine Model (At Cmd) IO (At Resp)
= StateMachine {
sm root = initModel
initModel = transition
, transition = precondition
, precondition = postcondition
, postcondition = Nothing
, invariant = generator
, generator = Nothing
, distribution = shrinker
, shrinker = semantics root
, semantics = symbolicResp
, mock }
We have defined all of these functions above, with the exception of
precondition
. When quickcheck-state-machine
finds a failing test, it will
try to shrink it to produce a smaller test case. It does this by trying to
remove commands from the program and then checking if the resulting program
still “makes sense”: does the precondition of all commands still hold?
The precondition should be as liberal as possible; for instance, the precondition should not require that a file handle is open before writing to it, because we should also check that errors are handled correctly. Typically, the only thing the precondition should verify is that commands do not refer to non-existent handles (i.e., if we remove a call to open, then subsequent uses of the handle returned by that call to open simply cannot be executed anymore). Thus, we will define:
precondition :: Model Symbolic -> Cmd :@ Symbolic -> Logic
Model _ hs) (At c) =
precondition (forall (toList c) (`elem` map fst hs)
Aside. Although we do not need it for our running example, one other thing a
precondition might do is rule out test cases that we explicitly don’t want to
test. For example, if our mock file system throws an error in some cases because
a particular combination of arguments to a call is simply not supported, then we
don’t want to flag this as a bug. A clean way to implement this is to extend the
error type with a field that marks an error as an intentional limitation; then
the precondition
can simply rule out any commands that (in the model) would
return an error with this mark. This keeps the definition of the precondition
itself simple, and the logic of what is and what isn’t an intentional limitation
lives in the implementation of the model itself.
Running the tests
Apart from some additional type class instances (see the code), we are now ready to define and execute the actual test:
prop_sequential :: FilePath -> QC.Property
=
prop_sequential tmpDir Nothing $ \cmds ->
forAllCommands (sm rootUnused) $ do
QC.monadicIO <- liftIO $ createTempDirectory tmpDir "QSM"
tstTmpDir let sm' = sm tstTmpDir
<- runCommands sm' cmds
(hist, _model, res)
prettyCommands sm' hist$ checkCommandNames cmds
$ res QC.=== Ok
rootUnused :: FilePath
= error "root not used during command generation" rootUnused
All functions prefixed QC
are standard QuickCheck
functions. The others
come from quickcheck-state-machine
:
forAllCommands
usesQuickCheck
’sforAllShrink
and instantiates it will the command generation and shrinking infrastructure fromquickcheck-state-machine
runCommands
then executes the generated commands, validating the postcondition at every step.prettyCommands
renders those commands in case of a test failure.checkCommandNames
adds some statistics about the distribution of commands in the generated tests.
We can run the test in ghci
:
> quickCheck (prop_sequential "./tmp")
+++ OK, passed 100 tests:
56% coverage
3% [("MkDir",1),("Read",1)]
3% []
2% [("MkDir",1),("Open",1),("Read",1)]
2% [("MkDir",1)]
1% [("Close",1),("MkDir",1),("Open",5),("Read",4),("Write",2)]
...
It tells us that all tests passed, and gives us some statistics about the tests
that were executed: in 3% of the cases, they contained a single MkDir
and a
single Read
, 3% were completely empty, 2% contained one call to MkDir
, one
call to Open
, one call to Read
, and so on.
Labelling
At this point you might conclude that you’re done. We have the real implementation, we have the mock implementation, they return identical results for all tests, what else could we want?
Let’s think back to those unit tests we were on the verge of writing, but stopped just in time because we remembered that we should generate unit tests, not write them:
- Write to two different files
- Write to a file and then read it
How do we know that our generated tests include these two cases (and all the
other unit tests that we would have written)? We get some statistics from
quickcheck-state-machine
, but it’s not terribly helpful. For example, the
first line above tells us that 3% of our test cases contain one call to MkDir
and one call to Read
; but we know that that call to Read
must fail,
because if these are the only two commands we execute, there aren’t any files
for that Read
to read.
The solution is to label tests. For example, we might introduce labels, or tags, that correspond to the two unit tests above:
data Tag = OpenTwo | SuccessfulRead
We then need to write a function that looks at a particular test case and checks
which labels apply. It is important to realize that this does not mean we
are bringing back the same unit tests under a different guise: programs that
will be labelled OpenTwo
must write to two different files, but may also
do a whole bunch of other things.
We can use use the foldl
package
to write such a labelling function in a convenient manner. The labelling
function will look at the Event
s and produce Tag
s. To check if we open (at
least) two different files, we keep track of all the successful calls to open;
for SucessfulRead
we simply remember if we have seen a call to Read
with a
non-error result:
tag :: [Event Symbolic] -> [Tag]
= Foldl.fold $ catMaybes <$> sequenceA [
tag
openTwo
, successfulRead
]where
openTwo :: Fold (Event Symbolic) (Maybe Tag)
= Fold update Set.empty extract
openTwo where
update :: Set File -> Event Symbolic -> Set File
=
update opened ev case (cmd ev, mockResp ev) of
At (Open f), Resp (Right _)) ->
(
Set.insert f opened->
_otherwise
opened
extract :: Set File -> Maybe Tag
= do
extract opened >= 2)
guard (Set.size opened return $ OpenTwo
successfulRead :: Fold (Event Symbolic) (Maybe Tag)
= Fold update False extract
successfulRead where
update :: Bool -> Event Symbolic -> Bool
= didRead ||
update didRead ev case (cmd ev, mockResp ev) of
At (Read _), Resp (Right _)) ->
(True
->
_otherwise False
extract :: Bool -> Maybe Tag
= do
extract didRead
guard didReadreturn SuccessfulRead
(For a read to be successful, we must have created the file first – this is a property of the semantics, we don’t need to enforce this in the labeller.)
The commands we get back from the forAllCommands
function of
quickcheck-state-machine
are of type Commands
. This is a simple wrapper
around a list of Command
; Command
in turn bundles a command (Cmd
) along
with its response (and the variables in that response). We can therefore easily
turn this into a list of events:
execCmd :: Model Symbolic
-> Command (At Cmd) (At Resp)
-> Event Symbolic
Command cmd resp _vars) =
execCmd model (
lockstep model cmd resp
execCmds :: Commands (At Cmd) (At Resp) -> [Event Symbolic]
= \(Commands cs) -> go initModel cs
execCmds where
go :: Model Symbolic
-> [Command (At Cmd) (At Resp)]
-> [Event Symbolic]
= []
go _ [] : cs) = e : go (after e) cs
go m (c where
= execCmd m c e
We can then define a variant on prop_sequential
above that replaces
checkCommandNames
with our own labelling function (you could also use both,
if you wanted to):
prop_sequential' :: FilePath -> QC.Property
=
prop_sequential' tmpDir Nothing $ \cmds ->
forAllCommands (sm rootUnused) $ do
QC.monadicIO <- liftIO $ createTempDirectory tmpDir "QSM"
tstTmpDir let sm' = sm tstTmpDir
<- runCommands sm' cmds
(hist, _model, res)
prettyCommands sm' hist$ QC.tabulate "Tags" (map show $ tag (execCmds cmds))
$ res QC.=== Ok
Here tabulate
is the standard QuickCheck
function
for adding multiple labels to a test case. If we now re-run our tests, we get
> quickCheck (prop_sequential' "./tmp")
+++ OK, passed 100 tests.
Tags (57 in total):
63% OpenTwo
37% SuccessfulRead
The numbers here are a bit awkward to interpret: it says that across all tests a
total of 57 labels were found, and of those 57 labels, 63% were OpenTwo
and
37% were SuccessfulRead
. In other words, 63% * 57 = 36 tags were OpenTwo
(36 tests out of 100 were labelled as OpenTwo
), and 37% * 57 = 21 tags were
SuccessfulRead
(21 tests out of 100 were labelled as SuccessfulRead
). Note
that it is perfectly possible for these two sets of tests to overlap (i.e., a
single program can be tagged as both OpenTwo
and SuccessfulRead
).
Inspecting the labelling function
So we have a generator, and we have labels for the unit tests that we didn’t
write; have we covered our bases now? Well, how do we know that our labelling
function is correct? We might seem to descent into infinite regress here,
testing the tests that tests the tests… but it would be good to at least
have a look at some test case examples and how they are labelled. Fortunately,
this is functionality that QuickCheck provides out of the box through a function called labelledExamplesWith
. We can define
a simple wrapper around it that gives us the possibility to specify a particular
seed (which will be useful once we start working on shrinking):
showLabelledExamples :: Maybe Int -> IO ()
= do
showLabelledExamples mReplay <- case mReplay of
replaySeed Nothing -> getStdRandom $ randomR (1, 999999)
Just seed -> return seed
putStrLn $ "Using replaySeed " ++ show replaySeed
let args = QC.stdArgs {
= 10000
QC.maxSuccess = Just (QC.mkQCGen replaySeed, 0)
, QC.replay
}
$
QC.labelledExamplesWith args Nothing $ \cmds ->
forAllCommands (sm rootUnused) . execCmds $ cmds) $
repeatedly QC.collect (tag True QC.property
Instead of tabulate
we must use
collect
to label examples when constructing examples; collect
takes a single label at
a time though, so we use my all-time favourite combinator to repeatedly call
collect
for all tags:
repeatedly :: (a -> b -> b) -> ([a] -> b -> b)
= flip . List.foldl' . flip repeatedly
(I wonder if I can patent this function?)
When we run this, we see something like
> showLabelledExamples Nothing
Using replaySeed 288187
*** Found example of OpenTwo
Commands [
.. Open (File (Dir []) "b")
, .. Open (File (Dir []) "a")
]
*** Found example of SuccessfulRead
Commands [
.. Open (File (Dir []) "b")
, .. Close (Reference (Symbolic (Var 0)))
, .. Read (File (Dir []) "b")
]
(I’ve tidied up the output a bit and removed some redundant information.)
This is looking good: both of these look like the kind of examples we would have written ourselves for these tags.
Standard shrinking
In fact, if we look at those test cases at the end of the previous section, you might be thinking that those examples look surprisingly good: not only are they indeed instances of those tags, but they are very small test cases: they are pretty much the unit tests that we would have written if we were so misguided as to think we need to write unit tests. Were we simply lucky?
No, we were not. QuickCheck’s labelledExamples
not only searches for labelled
test cases, it also tries to shrink them when it finds one, and continues to
shrink them until it can shrink them no further without losing the label. The
shrinker itself is implemented in quickcheck-state-machine
; if we disable it
altogether and re-run the search for labelled examples, we might find an example
such as the following for SuccessfulRead
, where for clarity I’ve marked all
failing commands with an asterisk (*)
:
Commands [
.. Open (File (Dir ["z", "x", "y"]) "c") (*)
.. MkDir (Dir ["z", "x") (*)
, .. Read (File (Dir ["z", "y", "y"]) "c") (*)
, .. Read (File (Dir []) "c") (*)
, .. MkDir (Dir [ "x" , "x" ]) (*)
, .. Read (File (Dir ["x", "z", "x"]) "b") (*)
, .. MkDir (Dir ["y"])
, .. MkDir (Dir []) (*)
, .. Open (File (Dir ["z"]) "b") (*)
, .. Open (File (Dir []) "a")
, .. MkDir (Dir []) (*)
, .. Open (File (Dir ["x"]) "b") (*)
, .. Close (Reference (Symbolic (Var 0)))
, .. Close (Reference (Symbolic (Var 0)))
, .. Open (File (Dir ["x", "y"]) "b") (*)
, .. Open (File (Dir []) "b")
, .. MkDir (Dir ["y"]) (*)
, .. Read (File (Dir []) "a")
, .. MkDir (Dir ["z"])
, .. Close (Reference (Symbolic (Var 0)))
, .. Close (Reference (Symbolic (Var 1)))
, .. Open (File (Dir ["z" , "z" , "y"]) "a") (*)
, .. Open (File (Dir ["x" , "x" , "z"]) "c") (*)
, .. Close (Reference (Symbolic (Var 0)))
, .. Close (Reference (Symbolic (Var 0)))
, .. Open (File (Dir ["y", "y"]) "b") (*)
, .. Read (File (Dir ["x", "y", "x"]) "a") (*)
, ]
This is looking significantly less ideal! If there was a bug in read
, then this
would certainly not be a very good minimal test case, and not something you
would want to debug. So how does quickcheck-state-machine
shrink tests?
The basic approach is quite simple: it simply removes commands from the
program. If the resulting program contains commands whose precondition is
not satisfied (remember, for our running example this means that those commands
would use handles that are no longer created) then it discards it as a possible
shrink candidate; otherwise, it reruns the test, and if it still fails, repeats
the process.
The large percentage of commands that are unsuccessful can easily be removed
by quickcheck-state-machine
:
Commands [
.. MkDir (Dir ["y"])
.. Open (File (Dir []) "a")
, .. Close (Reference (Symbolic (Var 0)))
, .. Close (Reference (Symbolic (Var 0)))
, .. Open (File (Dir []) "b")
, .. Read (File (Dir []) "a")
, .. MkDir (Dir ["z"])
, .. Close (Reference (Symbolic (Var 0)))
, .. Close (Reference (Symbolic (Var 1)))
, .. Close (Reference (Symbolic (Var 0)))
, .. Close (Reference (Symbolic (Var 0)))
, ]
Both calls to MkDir
can easily be removed, and the resulting program would
still be tagged as SuccessfulRead
; and the same is true for repeated closing
of the same handle:
Commands [
.. Open (File (Dir []) "a")
.. Close (Reference (Symbolic (Var 0)))
, .. Open (File (Dir []) "b")
, .. Read (File (Dir []) "a")
, .. Close (Reference (Symbolic (Var 1)))
, ]
At this point the shrinker cannot remove the second call to Open
because
the second call to Close
depends on it, but it can first remove that
second call to Close
and then remove that second call to Open
, and
we end up with the minimal test case that we saw in the previous section:
Commands [
.. Open (File (Dir []) "a")
.. Close (Reference (Symbolic (Var 0)))
, .. Read (File (Dir []) "a")
, ]
Perfect.
Improving shrinking
Unfortunately, this does not mean that we can depend on
quickcheck-state-machine
to solve all our shrinking problems. Consider this
run of showLabelledExamples
:
> showLabelledExamples (Just 166205)
Using replaySeed 166205
*** Found example of OpenTwo
Commands [
.. MkDir (Dir ["x"])
, .. Open (File (Dir []) "c")
, .. Open (File (Dir ["x"]) "a")
]
This is indeed an example of a program in which we open at least two files;
however, why is that call to MkDir
still there? After all, if there was a bug
in opening more than one file, and the “minimal” test case would include a call
to MkDir
, that would be a red herring which might send the person debugging
the problem down the wrong path.
The reason that quickcheck-state-machine
did not remove the call to MkDir
,
of course, is that without it the second call to Open
would fail: it tries to
create a file in directory x
, and if that directory does not exist, it would
fail. To fix this, we need to tell quickcheck-state-machine
how to shrink
individual commands; so far we have been using
shrinker :: Model Symbolic -> Cmd :@ Symbolic -> [Cmd :@ Symbolic]
= [] shrinker _ _
which says that individual commands cannot be shrunk at all. So how might we
shrink an Open
call? One idea might be to shrink the directory, replacing for
instance /x/y/a
by /x/a
or indeed just /a
. We can implement this using
shrinker :: Model Symbolic -> Cmd :@ Symbolic -> [Cmd :@ Symbolic]
At cmd) =
shrinker _ (case cmd of
Open (File (Dir d) f) ->
At $ Open (File (Dir d') f) | d' <- QC.shrink d]
[->
_otherwise []
If we use this shrinker and run showLabelledExamples
a number of times,
we will find that all the examples of OpenTwo
are now indeed minimal…
until it finds an example that isn’t:
> showLabelledExamples (Just 980402)
Using replaySeed 980402
*** Found example of OpenTwo
Commands [
.. MkDir (Dir ["x"]))
, .. Open (File (Dir []) "a")
, .. Open (File (Dir ["x"]) "a")
]
In this example we cannot shrink the directory to []
because the resulting
program would try to open the same file twice, which is not allowed (“resource
busy”). We need a better way to shrink this program.
What we want to implement is “try to replace the file path by some file in the (local) root. It is important to realize however that a shrinker such as
shrinker :: Model Symbolic -> Cmd :@ Symbolic -> [Cmd :@ Symbolic]
At cmd) =
shrinker _ (case cmd of
Open _ -> -- BAD EXAMPLE
At $ Open (File (Dir []) f') | f' <- ["t1", "t2"]]
[->
_otherwise []
is a bad idea. This shrinker tries to replace any file path with either /t1
or /t2
. However, this means that shrinking now never stops:
Open (File (Dir [] "t1"))
can be shrunk to
Open (File (Dir [] "t2"))
which can then be shrunk back to
Open (File (Dir [] "t1"))
and QuickCheck will loop when trying to shrink the test case. It is important that there is a clear direction to shrinking.
An approach that does work is the following: any file path that doesn’t start
with a t
can be replaced by path /t100
; moreover, any /tN
(for some
number N) can be replaced by /tN'
for some number N’ < N:
shrinker :: Model Symbolic -> Cmd :@ Symbolic -> [Cmd :@ Symbolic]
At cmd) =
shrinker _ (case cmd of
Open (File (Dir []) ('t' : n)) ->
| n' <- QC.shrink (read n)]
[openTemp n' Open _ ->
100]
[openTemp ->
_otherwise
[]where
openTemp :: Int -> Cmd :@ Symbolic
= At $ Open (File (Dir []) ('t' : show n)) openTemp n
Now
Commands [
.. MkDir (Dir ["x"]))
, .. Open (File (Dir []) "a")
, .. Open (File (Dir ["x"]) "a")
]
can shrink to
Commands [
.. MkDir (Dir ["x"]))
, .. Open (File (Dir []) "a")
, .. Open (File (Dir []) "t100")
]
at which point the call to MkDir
can be removed; this will eventually
shrink down to
Commands [
.. Open (File (Dir []) "t0")
, .. Open (File (Dir []) "t1")
]
Dependencies between commands
It’s been a long road, but we are almost there. The last thing we need to discuss is how to shrink programs with dependencies. The “open at least two” example above was relatively easy to shrink because we could shrink one command at the time. Sometimes however there are dependencies between commands. For example, consider this “minimal” example of “successful read”:
> showLabelledExamples (Just 617213)
Using replaySeed 617213
*** Found example of SuccessfulRead
Commands [
.. MkDir (Dir [ "z" ])
, .. Open (File (Dir ["z"]) "b")
, .. Close (Reference (Symbolic (Var 0)))
, .. Read (File (Dir ["z"]) "b")
]
As before, we have a call to MkDir
which should ideally be removed. However,
if we tried to change the path in the call to Open
, the call to Read
would
fail: both of these commands should refer to the same path. But shrinking
can only change one command at a time, and this is important to keep the
computational complexity (runtime cost) of shrinking down. What to do?
The problem is that we want that call to Read
to use the same path as the
call to Open
, but we have no way to express this. The solution is to make
this expressible. After all, we can already express “the handle returned by that
call to Open
”; all we need to do is introduce a second kind of reference: a
reference to a path.
The language of commands changes to
data Cmd fp h = Read (Expr fp) | -- rest as before
Cmd
gets an additional type argument fp
to record the types of paths, and
instead of a File
, Read
now takes an Expr
as argument:
data Expr fp = Val File | Var fp
We can of course still use a concrete path, as before, but we can also use a
variable: “use the same path as used in that call to open”. This means that
Open
must return that reference, so Success
and Resp
get adjusted
accordingly:
data Success fp h = Handle fp h | -- rest as before
newtype Resp fp h = Resp (Either Err (Success fp h))
Just like was the case for handles, when we actually run code all variables have been resolved, so the interpreter isn’t any more difficult:
runMock :: Cmd File MHandle -> Mock -> (Resp File MHandle, Mock)
Open f) = first (Resp . fmap (Handle f)) . mOpen f
runMock (Read e) = first (Resp . fmap String) . mRead (eval e)
runMock (-- rest as before
eval :: Expr File -> File
Val f) = f
eval (Var f) = f eval (
The IO interpreter is modified similarly. Most of the rest of the changes to the
code are minor and mostly automatic. For example, At
must now instantiate both
parameters
newtype At f r = At (f (Reference File r) (Reference IO.Handle r))
the model must record the mapping from file references now too
type FileRefs r = [(Reference File r, File)]
data Model r = Model Mock (FileRefs r) (HandleRefs r)
See Version2.hs
in the example package for details.
Crucially, we can now take advantage of this in the shrinker: when we see a
call to Read
with a file path that we have seen before, we can shrink that
to use a variable instead:
shrinker :: Model Symbolic -> Cmd :@ Symbolic -> [Cmd :@ Symbolic]
Model _ fs _) (At cmd) =
shrinker (case cmd of
Read (Val f) ->
At $ Read (Var r) | r <- mapMaybe (matches f) fs]
[-- other cases as before
where
matches :: File -> (r, File) -> Maybe r
| f == f' = Just r
matches f (r, f') | otherwise = Nothing
This means that the problematic example5
> showLabelledExamples (Just 617213)
Using replaySeed 617213
*** Found example of SuccessfulRead
Commands [
.. MkDir (Dir [ "z" ])
, .. Open (File (Dir ["z"]) "b")
, .. Close (Reference (Symbolic (Var 1)))
, .. Read (Val (File (Dir ["z"]) "b"))
]
can now shrink to
Commands [
.. MkDir (Dir [ "z" ])
, .. Open (File (Dir ["z"]) "b")
, .. Close (Reference (Symbolic (Var 1)))
, .. Read (Var 0)
]
At this point the shrinking for Open
that we defined above can kick in,
replacing z/b
with /t100
, making the call to MkDir
redundant, and the
example can ultimately shrink to
Commands [
.. Open (File (Dir []) "t0")
, .. Close (Reference (Symbolic (Var 1)))
, .. Read (Var 0)
]
Conclusions
Writing unit tests by hand is problematic for at least two reasons. First, as
the system grows in complexity the number of interactions between the various
features will be impossible to capture in hand written unit tests. Second, unit
tests can only exercise the system in ways that the author of the unit tests
foresees; often, bugs arise when the system is exercised in ways that were not
foreseen. It is therefore important not to write unit tests, but rather generate
them. We have seen how a library such as quickcheck-state-machine
can assist
in generating and shrinking meaningful sequences of API calls.
We have also seen why it is important to label test cases, how to inspect
the labelling function, and how to use labelling to improve shrinking. Without
writing a shrinker for individual commands, the standard
quickcheck-state-machine
shrinker already does a really good job at removing
redundant commands. However, if we are serious about producing minimal test
cases without red herrings that might lead debugging efforts astray (and we
should be serious about that), then we also need to put some thought into
shrinking individual commands.
Finally, we have seen how we can improve shrinking by making dependencies between commands explicit. This also serves as an example of a language with multiple types of references; the approach we put forth in this blog post essentially scales to an arbitrary number of types of references without much difficulty.
We have not talked about running parallel tests at all in this blog post. This
is an interesting topic in its own right, but affects only the very outermost
level of the test: prop_sequential
would get an analogue prop_parallel
, and
nothing else would be affected; the quickcheck-state-machine
documentation
shows how to do this; the original paper describing the
approach
(in Erlang) by John Hughes and others is also well worth a read. Finally,
quickcheck-state-machine
is not the only library providing this functionality
in Haskell; in particular,
hedgehog, an alternative to
QuickCheck, does also.
The way we set up the tests in this blog post is not the only way possible, but
is one that we believe leads to a clean design. The running example in this blog
post is a simplified version of a mock file system that we use in the new
Ouroboros consensus
layer, where we use it
to simulate file system errors when testing a blockchain database. The tests for
that blockchain database in turn also use the design patterns laid out in this
blog post, and we have used those design patterns also in a test we added to
quickcheck-state-machine
itself to test the
shrinker.
Of course, being Haskellers, we would prefer to turn design patterns into actual
code; indeed, we believe this to be possible, but it may require some more
advanced type system features. This is something we want to investigate further.
Footnotes
Mapping from
IllegalOperation
toHandleClosed
is bit too coarse, but suffices for the sake of this blog post.↩︎This has a somewhat unpleasant untyped feel to it. However, if the design patterns proposed in this blog post are used, this barely affects any code that we write: we never (or almost never) have to pattern match on that
Success
type.↩︎I’m eliding a
Typeable
constraint here.↩︎Technically,
quickcheck-state-machine
uses a functionResp :@ Symbolic -> [Var]
to find all the references “bound” by a command.↩︎The
Close
call now refers toVar 1
instead ofVar 0
, becauseOpen
now creates two variables: the reference to the path and the reference to the handle.↩︎