Debugging your Haskell application with `debuggable`

Edsko de Vries – Friday, 06 December 2024

In this blog post we will introduce a new open source Haskell library called debuggable, which provides various utilities designed to make it easier to debug your applications. Some of these are intended for use during actual debugging, others are designed to be a regular part of your application, ready to be used if and when necessary.

Non-interleaved output

Ever see output like this when debugging concurrent applications?

ATnhdi st hiiss  ai sm eas smaegses afgreo mf rtohme  tfhier sste ctohnrde atdh
read
AndT htihsi si si sa  am emsessasgaeg ef rformo mt hteh ef isrescto ntdh rteharde
ad
TAhnids  tihsi sa  imse sas amgees sfargoem  ftrhoem  ftihres ts etchorneda dt
hread

The problem is that concurrent calls to putStrLn can result in interleaved output. To solve this problem, debuggable offers Debug.NonInterleavedIO, which provides variants of putStrLn and friends, as well as trace and its variants, all of which can safely be called concurrently without ever resulting in interleaved output. For example:

import Debug.NonInterleavedIO qualified as NIIO

useDebuggable :: IO ()
useDebuggable = do
    concurrently_
      ( replicateM_ 10 $ do
          NIIO.putStrLn "This is a message from the first thread"
          threadDelay 100_000
      )
      ( replicateM_ 10 $ do
          NIIO.putStrLn "And this is a message from the second thread"
          threadDelay 100_000
      )

If we run this as-is, we will only see

niio output to /tmp/niio2418318-0

on the terminal; inspecting /tmp/niio2418318-0 will show

And this is a message from the second thread
This is a message from the first thread
And this is a message from the second thread
This is a message from the first thread
...

If you want to send the output to a specific file (or /dev/stdout for output to the terminal), you can set the NIIO_OUTPUT environment variable.

Provenance

Provenance is about tracking of what was called when and where.

Call-sites

Consider the following example:

f1 :: IO ()
f1 = f2

f2 :: HasCallStack => IO ()
f2 = f3

f3 :: HasCallStack => IO ()
f3 = putStrLn $ prettyCallStack callStack

The callstack we get from this example looks something like this:

CallStack (from HasCallStack):
  f3, called at Demo/Provenance.hs:15:6 in ..
  f2, called at Demo/Provenance.hs:12:6 in ..

Callstacks are awesome, and a huge help during debugging, but there are some minor issues with this example:

Personally, this has always felt a bit “off by one” to me: the first entry tells us that we are in f3, but we were called from line 15, which is f2; likewise, the second entry in the stack tells us that we are in f2, but we were called from line 12, which is f1. Not a huge deal, but arguably a bit confusing. (See also GHC ticket #25546: Make HasCallStack include the caller.)
Somewhat relatedly, when we are in f3, and ask for a CallStack, being told that we are in f3 is not particularly helpful (we knew that already).
Finally, it is sometimes useful to have just the “first” entry in the callstack; “we were called from line such and such, which is function so and so”.

For this reason, Debug.Provenance provides a CallSite abstraction

g1 :: IO ()
g1 = g2

g2 :: HasCallStack => IO ()
g2 = g3

g3 :: HasCallStack => IO ()
g3 = print callSite

This outputs:

g2 -> g3 (Demo/CallSite.hs:31:6)

where line 31 is the call to g3 in g2. Due to the (alleged) “off-by-one”, both g2 and g3 must be given a HasCallStack constraint, otherwise we get

{unknown} -> g3 (Demo/CallSite.hs:31:6)

when g2 lacks the constraint, or

{unknown} -> {unknown} ()

when g3 does. There is also a variant callSiteWithLabel, which results in output such as

g2 -> g3 (Demo/CallSite.hs:31:6, "foo")

Invocations

Sometimes we are not so much interested in where we are called from, but how often a certain line in the source is called. Debug.Provenance offers “invocations” to track this:

g1 :: IO ()
g1 = replicateM_ 2 g2

g2 :: HasCallStack => IO ()
g2 = do
    print =<< newInvocation
    replicateM_ 2 g3

g3 :: HasCallStack => IO ()
g3 = print =<< newInvocation

This results in output such as

g2 (Demo/Invocation.hs:30:15) #1
g3 (Demo/Invocation.hs:34:16) #1
g3 (Demo/Invocation.hs:34:16) #2
g2 (Demo/Invocation.hs:30:15) #2
g3 (Demo/Invocation.hs:34:16) #3
g3 (Demo/Invocation.hs:34:16) #4

We see the first call to g2, then the first and second call to g3, then the second call to g2, and finally the third and fourth call to h3.

When debugging problems such as deadlocks, it is often useful to insert putStrLn statements like this:

f4 :: IO ()
f4 = do
    putStrLn "f4:1"
    -- f4 does something ..
    putStrLn "f4:2"
    -- f4 does something else ..
    putStrLn "f4:3"

This pattern too can be made a bit simpler by using invocations:

g4 :: HasCallStack => IO ()
g4 = do
    print =<< newInvocation
    -- f4 does something ..
    print =<< newInvocation
    -- f4 does something else ..
    print =<< newInvocation

Resulting in output such as

g4 (Demo/Invocation.hs:48:15) #1
g4 (Demo/Invocation.hs:50:15) #1
g4 (Demo/Invocation.hs:52:15) #1

Scope

The definition of g4 above is still a little clunky, especially if we also want to include other output than just the invocation itself. We can do better:

import Debug.NonInterleavedIO.Scoped qualified as Scoped

g4 :: HasCallStack => IO ()
g4 = do
    Scoped.putStrLn "start"
    -- f4 does something ..
    Scoped.putStrLn "middle"
    -- f4 does something else ..
    Scoped.putStrLn "end"

outputs

[g4 (Demo/Scope.hs:21:5) #1] start
[g4 (Demo/Scope.hs:23:5) #1] middle
[g4 (Demo/Scope.hs:25:5) #1] end

As the name suggests, though, there is more going on here than simply a more convenient API: Debug.Provenance.Scope offers a combinator called scoped for scoping invocations:

g1 :: IO ()
g1 = g2

g2 :: HasCallStack => IO ()
g2 = scoped g3

g3 :: HasCallStack => IO ()
g3 = scoped g4

This results in

[g4 (Demo/Scope.hs:29:5) #1, g3 (Demo/Scope.hs:25:6) #1, g2 (Demo/Scope.hs:22:6) #1] start
[g4 (Demo/Scope.hs:31:5) #1, g3 (Demo/Scope.hs:25:6) #1, g2 (Demo/Scope.hs:22:6) #1] middle
[g4 (Demo/Scope.hs:33:5) #1, g3 (Demo/Scope.hs:25:6) #1, g2 (Demo/Scope.hs:22:6) #1] end

Threads

The counters that are part of an Invocation can be very useful to cross-reference output messages from multiple threads. Continuing with the g4 example we introduced in the section on Scope, suppose we have

concurrent :: IO ()
concurrent = concurrently_ g4 g4

we might get output like this:

[g4 (Demo/Scope.hs:32:5) #1] start
[g4 (Demo/Scope.hs:32:5) #2] start
[g4 (Demo/Scope.hs:34:5) #1] middle
[g4 (Demo/Scope.hs:34:5) #2] middle
[g4 (Demo/Scope.hs:36:5) #1] end
[g4 (Demo/Scope.hs:36:5) #2] end

(where the scheduling between the two thread might be different, of course).

Scope is always thread local, but debuggable provides a way to explicitly inherit the scope of a parent thread in a child thread:

h1 :: IO ()
h1 = h2

h2 :: HasCallStack => IO ()
h2 = scoped h3

h3 :: HasCallStack => IO ()
h3 = scoped $ do
    tid <- myThreadId
    concurrently_
      (inheritScope tid >> g4)
      (inheritScope tid >> g4)

results in

[g4 (Demo/Scope.hs:34:5) #1, h3 (Demo/Scope.hs:50:6) #1, h2 (Demo/Scope.hs:47:6) #1] start
[g4 (Demo/Scope.hs:34:5) #2, h3 (Demo/Scope.hs:50:6) #1, h2 (Demo/Scope.hs:47:6) #1] start
[g4 (Demo/Scope.hs:36:5) #1, h3 (Demo/Scope.hs:50:6) #1, h2 (Demo/Scope.hs:47:6) #1] middle
[g4 (Demo/Scope.hs:36:5) #2, h3 (Demo/Scope.hs:50:6) #1, h2 (Demo/Scope.hs:47:6) #1] middle
[g4 (Demo/Scope.hs:38:5) #1, h3 (Demo/Scope.hs:50:6) #1, h2 (Demo/Scope.hs:47:6) #1] end
[g4 (Demo/Scope.hs:38:5) #2, h3 (Demo/Scope.hs:50:6) #1, h2 (Demo/Scope.hs:47:6) #1] end

Callbacks

Suppose we have some functions which take another function, a callback, as argument, and invoke that callback at some point:

f1 :: HasCallStack => (Int -> IO ()) -> IO ()
f1 k = f2 k

f2 :: HasCallStack => (Int -> IO ()) -> IO ()
f2 k = scoped $ k 1

Let’s use this example callback function:

g1 :: HasCallStack => Int -> IO ()
g1 n = g2 n

g2 :: HasCallStack => Int -> IO ()
g2 n = Scoped.putStrLn $ "n = " ++ show n ++ " at " ++ prettyCallStack callStack

and invoke f1 as follows:

withoutDebuggable :: HasCallStack => IO ()
withoutDebuggable = f1 g1

This outputs:

[g2 (Demo/Callback.hs:26:8) #1, f2 (Demo/Callback.hs:20:8) #1]
  n = 1 at CallStack (from HasCallStack):
    g2, called at Demo/Callback.hs:23:8 in ..
    g1, called at Demo/Callback.hs:29:24 in ..
    withoutDebuggable, called at Demo.hs:25:36 in ..

Confusingly, this callstack does not include any calls to f1 or f2. This happens because the call to k in f2 does not pass the current CallStack; instead we see the CallStack as it was when we defined g1.

For callbacks like this it is often useful to have two pieces of information: the CallStack that shows how the callback is actually invoked, and the CallSite where the callback was defined. Debug.Provenance.Callback provides a Callback abstraction that does exactly this. A Callback m a b is essentially a function a -> m b, modulo treatment of the CallStack. Let’s change f1 and f2 to take a CallBack instead:

h1 :: HasCallStack => Callback IO Int () -> IO ()
h1 k = h2 k

h2 :: HasCallStack => Callback IO Int () -> IO ()
h2 k = scoped $ invokeCallback k 1

If we now use this top-level function

useDebuggable :: HasCallStack => IO ()
useDebuggable = h1 (callback g1)

we get a much more useful CallStack:

[g2 (Demo/Callback.hs:26:8) #1, h2 (Demo/Callback.hs:39:8) #1]
  n = 1 at CallStack (from HasCallStack):
    g2, called at Demo/Callback.hs:23:8 in ..
    g1, called at Demo/Callback.hs:42:30 in ..
    callbackFn, called at src/Debug/Provenance/Callback.hs:57:48 in ..
    invoking callback defined at useDebuggable (Demo/Callback.hs:42:21), called at ..
    h2, called at Demo/Callback.hs:36:8 in ..
    h1, called at Demo/Callback.hs:42:17 in ..
    useDebuggable, called at Demo.hs:26:36 in ..

Alternative: profiling backtraces

in addition to HasCallStack-style backtraces, there may also be other types of backtraces available, depending on how we build and how we run the code (we discuss some of these in the context of exception handling in episode 29 of the Haskell Unfolder). The most important of these is probably the profiling (cost centre) backtrace.

We can request the “current” callstack with currentCallstack, and the callstack attached to an object (“where was this created”) using whoCreated. This allows us to make similar distinctions that we made in Callback, for example:

f1 :: (Int -> IO ()) -> IO ()
f1 k = do
    cs <- whoCreated k
    putStrLn $ "f1: invoking callback defined at " ++ show (cs)
    f2 k

f2 :: (Int -> IO ()) -> IO ()
f2 k = k 1

g1 :: Int -> IO ()
g1 n = g2 n

g2 :: Int -> IO ()
g2 n = do
    cs <- currentCallStack
    putStrLn $ "n = " ++ show n ++ " at " ++ show cs

This does require the code to be compiled with profiling enabled. The profiling callstacks are sometimes more useful than HasCallstack callstacks, and sometimes worse; for example, in

demo :: Maybe Int -> IO ()
demo Nothing  = f1 (\x -> g1 x)
demo (Just i) = f1 (\x -> g1 (x + i))

the function defined in the Just case will have a useful profiling callstack, but since the function defined in the Nothing case is entirely static (does not depend on any runtime info), its callstack is reported as

["MAIN.DONT_CARE (<built-in>)"]

It would be useful to extend debuggable with support for both types of backtraces in a future release.

Performance considerations

Adding permanent HasCallStack constraints to functions does come at a slight cost, since they correspond to additional arguments that must be passed at runtime. For most functions this is not a huge deal; personally, I consider some well-placed HasCallStack constraints part of designing with debugging in mind. That said, you will probably want to avoid adding HasCallStack constraints to functions that get called repeatedly in tight inner loops; similar considerations also apply to the use of the Callback abstraction.

Conclusions

Although debuggable is a small library, it offers some functionality that has proven quite useful in debugging applications, especially concurrent ones. We can probably extend it over time to cover more use cases; “design for debuggability” is an important principle, and is made easier with proper library support. Contributions and comments are welcome!

As a side note, the tracing infrastructure of debuggable can also be combined with the recover-rtti package, which implements some dark magic to recover runtime type information by looking at the heap; in particular, it offers

anythingToString :: forall a. a -> String

which can be used to print objects without having a Show a instance available (though this is not the only use of recover-rtti). The only reason that debuggable doesn’t provide explicit support for this is that the dependency footprint of recover-rtti is a bit larger.

Debugging your Haskell application with debuggable