Haskell and other call-by-need languages use lazy evaluation as their default evaluation strategy. For beginners and advanced programmers alike this can sometimes be confusing. At Well-Typed a core part of our business is teaching Haskell, which we do through public courses (such as the upcoming Skills Matter courses Fast Track to Haskell, Haskell Performance and Optimization and The Haskell Type System), private in-house courses targeted at specific client needs, and of course through writing blog posts.
In order to help us design these courses, we developed a tool called visualize-cbn
. It is a simple interpreter for a mini Haskell-like language which outputs the state of the program at every step in a human readable format. It can also generate a HTML/JavaScript version with “Previous” and “Next” buttons to step through a program. We released the tool as open source to github and Hackage, in the hope that it will be useful to others.
The README.md
file in the repo explains how to run the tool. In this blog post we will illustrate how one might take advantage of it. We will revisit the infamous triple of functions foldr
, foldl
, foldl'
, and show how they behave. As a slightly more advanced example, we will also study the memory behaviour of mapM
in the Maybe
monad. Hopefully, this show-rather-than-tell blog post might help some people understand these functions better.
Throughout this section we will use this definition of enumFromTo
:
enumFromTo n m =
if n <= m then n : enumFromTo (n + 1) m
else []
so that, say, [1..3]
corresponds to (enumFromTo 1 3)
.
foldr
/foldrl
/foldl'
In this section we will examine the difference between these three functions. We will not study these functions directly, however, but study a slightly simpler variant in the form of three definitions of length
on lists. For a more in-depth discussion of this triple of functions, see our earlier blog post on this topic.
foldr
Consider the naive definition of length
:
length xs =
case xs of
-> 0
[] :xs') -> 1 + length xs' (x
This corresponds to defining length = foldr (\x n -> 1 + n) 0
.
Let’s consider what happens when we compute length [1..3]
; you can click on Prev and Next to step through the execution:
Term Heap
When you step through this, notice what is going on:
- We first apply the definition of
length
. - Then in step 1,
length
needs to do a case analysis, which forces us applyenumFromTo
, and evaluate it until we have a top-levelCons
constructor (in step 4) - At that point we can execute the pattern match and the process continues.
- When we evaluate
enumFromTo (add 1 1) 3
in step 6, note that the expressionadd 1 1
is only evaluated once, although it is used twice; this sharing of computation is what makes Haskell a true lazy (call-by-need) language (as opposed to a call-by-name language). - In these animations these shared expressions are shown separately below the expression; you can think of this as the “heap”, and accordingly the animation also shows when these expressions are garbage collected (e.g. step 14).
Note as you step through that we have a build up of calls to add
which are only resolved until the very end. This is the source of the memory leak in this definition of length
(corresponding to foldr
).
foldl
In a foldl
-style definition of length
, we introduce an accumulator:
length acc xs =
case xs of
-> acc
[] :xs' -> length (1 + acc) xs' x
This corresponds to defining length = foldl (\n x -> 1 + n) 0
.
Unlike the previous definition, this is tail-recursive; however, it still suffers from a memory leak due to Haskell’s extremely lazy nature. You will see why when you step through the execution:
Term Heap
Notice how we are still building up a chain of additions, except they are now in the accumulator instead. This chain is only resolved (and garbage collected) at the very end (step 26).
foldl'
In the final definition, we make sure to evaluate the accumulator as we go:
length acc xs =
case xs of
-> acc
[] :xs' -> let acc' = add 1 acc in seq acc' (length acc' xs') x
This corresponds to defining length = foldl' (\n x -> 1 + n) 0
.
If you step through this note that we evaluate the accumulator immediately at each step, and moreover that garbage collection can now happen as we compute as well:
Term Heap
mapM
over the Maybe
monad
As a final example of a slightly more advanced nature, try predicting what
will happen when we run this in ghci
:
case mapM return [1..] of Just (x:_) -> x
If you try it out and the result is not what you expected, perhaps stepping
through the following evaluation of mapM return [1..3]
to weak-head normal
form (whnf: when there is a constructor at the top-level) will help you
understand:
Term Heap
Note that this expression reduces to whnf only after the entire list has been
evaluated, and moreover that this requires an O(n)
nested pattern matches.
The take-away point from this example is that mapM
should not be applied to
long lists in most monads, as this will result in a memory leak.
Conclusion
Laziness can be tricky to understand sometimes, and being able to go through the evaluation of a program step by step can be very helpful. The visualize-cbn
tool can be used to generate HTML/JavaScript files that can be used to visualize this evaluation as shown on this blog post; alternatively, it can also write the evaluation trace to the console. The source files (the various definitions of length
) can be found in the repo. Feedback and pull requests are of course always welcome :)