Haskell programmers tend to spend far less time with debuggers than programmers in other languages. Partly this is because for pure code debuggers are of limited value anyway, and Haskell’s type system is so strong that a lot of bugs are caught at compile time rather than at runtime. Moreover, Haskell is a managed language – like Java, say – and errors are turned into exceptions. Unlike in unmanaged languages such as C “true” runtime errors such as segmentation faults almost never happen.
I say “almost” because they can happen: either because of bugs in
ghc
or the Haskell runtime, or because we are doing low level stuff in our
own Haskell code. When they do happen we have to drop down to a system debugger
such as lldb
or gdb
, but debugging Haskell at that level can be difficult
because Haskell’s execution model is so different from the execution model of
imperative languages. In particular, compiled Haskell code barely makes any use
of the system stack or function calls, and uses a continuation passing style
instead (see my previous blog posts Understanding the
Stack and Understanding the
RealWorld). In this blog post I will explain a
technique I sometimes use to help diagnose low-level problems.
Since I work on OSX I will be using lldb
as my debugger. if you are using
gdb
you can probably use similar techniques; The LLDB Debugger
shows how gdb
and lldb
commands correlate, and the ghc wiki also lists
some tips. However, I have no experience with scripting gdb
so your
mileage may vary.
Description of the problem
As our running example I will use a bug that I was tracking down in a client project. The details of the project don’t matter so much, except that this project happens to use the GHC API to compile Haskell code—at runtime—into bytecode and then run it; moreover, it also—dynamically, at runtime—loads C object files into memory.
In one example run it loads the (compiled) C code
#include <stdio.h>
void hello(void) {
("hello\n");
printf}
and then compiles and runs this Haskell code:
{-# LANGUAGE ForeignFunctionInterface #-}
module Main where
import ccall "hello" hello :: IO ()
foreign
= hello main
Sadly, however, this resulted in total system crash.
Starting point
By attaching lldb
to the running process we got a tiny bit more information
about the crash:
* thread #4: tid = 0x3550aa, 0x000000010b3b8226, stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
frame #0: 0x000000010b3b8226
-> 0x10b3b8226: addb %al, (%rax)
0x10b3b8228: addb %al, (%rax)
0x10b3b822a: addb %al, (%rax)
0x10b3b822c: addb %al, (%rax)
It turns out we have a null-pointer dereferencing here. Anybody who has spent any time debugging Intel assembly code however will realize that this particular instruction
addb %al, (%rax)
is in fact the decoding of zero:
(lldb) memory read -c 8 0x10b3b8226
0x10b3b8226: 00 00 00 00 00 00 00 00 ........
In other words, chances are good we were never meant to execute this
instruction at all. Unfortunately, asking lldb
for a backtrace tells us
absolutely nothing new:
(lldb) bt
* thread #4: tid = 0x3550aa, 0x000000010b3b8226, stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
* frame #0: 0x000000010b3b8226
Finding a call chain
The lack of a suitable backtrace in lldb
is not surprising, since compiled
Haskell code barely makes use of the system stack. Instead, the runtime
maintains its own stack, and code is compiled into a continuation passing
style. For example, if we have the Haskell code
functionA :: IO ()
functionA = do .. ; functionB ; ..
functionB :: ()
functionB = do .. ; functionC ; ..
functionC :: IO ()
functionC = .. crash ..
main :: IO ()
main = functionA
and we step through the execution of this program in lldb
, and we ask for a
backtrace when we start executing function A all we get is
(lldb) bt
* thread #1: tid = 0x379731, 0x0000000100000a20 Main`A_functionA_info, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
* frame #0: 0x0000000100000a20 Main`A_functionA_info
with no mention of main
. Similarly, the backtraces on entry to functions B and C are
* thread #1: tid = 0x379731, 0x0000000100000b90 Main`B_functionB_info, queue = 'com.apple.main-thread', stop reason = breakpoint 2.1
* frame #0: 0x0000000100000b90 Main`B_functionB_info
and
* thread #1: tid = 0x379731, 0x0000000100000c88 Main`C_functionC_info, queue = 'com.apple.main-thread', stop reason = breakpoint 3.1
* frame #0: 0x0000000100000c88 Main`C_functionC_info
none of which is particularly informative. However, stepping manually through the program we do first see function A on the (singleton) call stack, then function B, and finally function C. Thus, by the time we reach function C, we have discovered a call chain A, B, C—it’s just that it involves quite a bit of manual work.
Scripting lldb
Fortunately, lldb
can be scripted (see Using Scripting and Python to Debug
in LLDB and the LLDB Python
Reference). What we want to do is keep stepping through
the code, showing the top-level (and quite possibly only) function at the top
of the call stack at each step, until we crash.
We can use the following Python script to do this:
import lldb
def step_func(debugger, command, result, internal_dict):
= debugger.GetSelectedTarget().GetProcess().GetSelectedThread()
thread
while True:
thread.StepOver()
= lldb.SBStream()
stream
thread.GetStatus(stream)= stream.GetData()
description print description
if thread.GetStopReason() == lldb.eStopReasonException:
break
def __lldb_init_module(debugger, dict):
'command script add -f %s.step_func sf' % __name__) debugger.HandleCommand(
For the above example, we might use this as follows: we load our application
into lldb
# lldb Main
Current executable set to 'Main' (x86_64).
register our new command sf
(lldb) command script import mystep.py
set a breakpoint where we want to start stepping
(lldb) breakpoint set -n A_functionA_info
Breakpoint 1: where = Main`A_functionA_info, address = 0x0000000100000b90
run to the breakpoint:
(lldb) run
Process 54082 launched: 'Main' (x86_64)
Process 54082 stopped
* thread #1: tid = 0x384510, 0x0000000100000b90 Main`A_functionA_info, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
frame #0: 0x0000000100000b90 Main`A_functionA_info
and then use sf
to find a call-trace until we crash:
(lldb) sf
...
* thread #1: tid = 0x384510, 0x0000000100000bf0 Main`B_functionB_info, queue = 'com.apple.main-thread', stop reason = instruction step over
frame #0: 0x0000000100000bf0 Main`B_functionB_info
Main`B_functionB_info:
...
* thread #1: tid = 0x384510, 0x0000000100000c78 Main`C_functionC_info, queue = 'com.apple.main-thread', stop reason = instruction step over
frame #0: 0x0000000100000c78 Main`C_functionC_info
Main`C_functionC_info:
...
* thread #1: tid = 0x384510, 0x0000000100000d20 Main`crash + 16 at crash.c:3, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
frame #0: 0x0000000100000d20 Main`crash + 16 at crash.c:3
Note that if you are using the threaded runtime, you may have to select which
thread you want to step through before calling sf
:
(lldb) thread select 4
(lldb) sf
Tweaking the script
You will probably want to tweak the above script in various ways. For instance,
in the application I was debugging, I wanted to step into each assembly
language instruction but over each C function call, mostly because lldb
was
getting confused with the call stack. I also added a maximum step count:
def step_func(debugger, command, result, internal_dict):
= shlex.split(command)
args
if len(args) > 0:
= int(args[0])
maxCount else:
= 100
maxCount
= debugger.GetSelectedTarget().GetProcess().GetSelectedThread()
thread = 0;
i
while True:
= thread.GetFrameAtIndex(0)
frame file = frame.GetLineEntry().GetFileSpec().GetFilename()
= type(file) is str and file.endswith(".c")
inC
if inC:
thread.StepOver()else:
False)
thread.StepInstruction(
= lldb.SBStream()
stream
thread.GetStatus(stream)= stream.GetData()
description
print i
print description
+= 1;
i if thread.GetStopReason() == lldb.eStopReasonException or i > maxCount:
break
You may want to tweak this step into/step over behaviour to suit your application; certainly you don’t want to have a call trace involving every step taken in the Haskell RTS or worse, in the libraries it depends on.
Back to the example
Rather than printing every step along the way, it may also be useful to simply remember the step-before-last and show that on a crash; often it is sufficient to know what happened just before the crash. Indeed, in the application I was debugging the call stack just before the crash was:
2583
thread #3: tid = 0x35e743, 0x00000001099da56d libHSrts_thr_debug-ghc7.8.3.20140729.dylib`schedule(initialCapability=0x0000000109a2f840, task=0x00007fadda404550) + 1533 at Schedule.c:470, stop reason = step over
frame #0: 0x00000001099da56d libHSrts_thr_debug-ghc7.8.3.20140729.dylib`schedule(initialCapability=0x0000000109a2f840, task=0x00007fadda404550) + 1533 at Schedule.c:470
467 }
468
469 case ThreadInterpret:
-> 470 cap = interpretBCO(cap);
471 ret = cap->r.rRet;
472 break;
473
2584
thread #3: tid = 0x35e743, 0x0000000103106226, stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
frame #0: 0x0000000103106226
-> 0x103106226: addb %al, (%rax)
0x103106228: addb %al, (%rax)
0x10310622a: addb %al, (%rax)
0x10310622c: addb %al, (%rax)
Which is a lot more helpful than the backtrace, as we now have a starting point: something went wrong when running the bytecode interpreter (remember that the application was compiling and running some Haskell code at runtime).
To pinpoint the problem further, we can set a breakpoint in interpretBCO
and
run sf
again (the way we defined sf
it steps over any C function calls by
default). This time we get to:
4272
thread #4: tid = 0x35f43a, 0x000000010e77c548 libHSrts_thr_debug-ghc7.8.3.20140729.dylib`interpretBCO(cap=0x000000010e7e7840) + 18584 at Interpreter.c:1463, stop reason = step over
frame #0: 0x000000010e77c548 libHSrts_thr_debug-ghc7.8.3.20140729.dylib`interpretBCO(cap=0x000000010e7e7840) + 18584 at Interpreter.c:1463
1460 tok = suspendThread(&cap->r, interruptible ? rtsTrue : rtsFalse);
1461
1462 // We already made a copy of the arguments above.
-> 1463 ffi_call(cif, fn, ret, argptrs);
1464
1465 // And restart the thread again, popping the stg_ret_p frame.
1466 cap = (Capability *)((void *)((unsigned char*)resumeThread(tok) - STG_FIELD_OFFSET(Capability,r)));
4273
thread #4: tid = 0x35f43a, 0x0000000107eba226, stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
frame #0: 0x0000000107eba226
-> 0x107eba226: addb %al, (%rax)
0x107eba228: addb %al, (%rax)
0x107eba22a: addb %al, (%rax)
0x107eba22c: addb %al, (%rax)
Ok, now we are really getting somewhere. Something is going wrong when are
doing a foreign function call. Let’s re-run the application once more, setting
a breakpoint at ffi_call
:
(lldb) breakpoint set -n ffi_call
Breakpoint 1: where = libffi.dylib`ffi_call + 29 at ffi64.c:421, address = 0x0000000108e098dd
(lldb) cont
Process 51476 resuming
Process 51476 stopped
* thread #4: tid = 0x360fd3, 0x0000000108e098dd libffi.dylib`ffi_call(cif=0x00007fb262f00000, fn=0x00000001024a4210, rvalue=0x000000010b786ac0, avalue=0x000000010b7868a0) + 29 at ffi64.c:421, stop reason = breakpoint 1.1
frame #0: 0x0000000108e098dd libffi.dylib`ffi_call(cif=0x00007fb262f00000, fn=0x00000001024a4210, rvalue=0x000000010b786ac0, avalue=0x000000010b7868a0) + 29 at ffi64.c:421
418 /* If the return value is a struct and we don't have a return value
419 address then we need to make one. Note the setting of flags to
420 VOID above in ffi_prep_cif_machdep. */
-> 421 ret_in_memory = (cif->rtype->type == FFI_TYPE_STRUCT
422 && (cif->flags & 0xff) == FFI_TYPE_VOID);
423 if (rvalue == NULL && ret_in_memory)
424 rvalue = alloca (cif->rtype->size);
and let’s take a look at the function we’re about to execute:
(lldb) disassemble -s fn
0x1024a4210: pushq %rbp
0x1024a4211: movq %rsp, %rbp
0x1024a4214: leaq (%rip), %rdi
0x1024a421b: popq %rbp
0x1024a421c: jmpq 0x1024a4221
0x1024a4221: pushq $0x6f6c6c65
0x1024a4226: addb %al, (%rax)
0x1024a4228: addb %al, (%rax)
0x1024a422a: addb %al, (%rax)
0x1024a422c: addb %al, (%rax)
0x1024a422e: addb %al, (%rax)
We were expecting to execute hello
:
# otool -tV hello.o
hello.o:
(__TEXT,__text) section
_hello:
0000000000000000 pushq %rbp
0000000000000001 movq %rsp, %rbp
0000000000000004 leaq L_str(%rip), %rdi ## literal pool for: "hello"
000000000000000b popq %rbp
000000000000000c jmpq _puts
and if you compare this with the code loaded into memory it all becomes clear. The jump instruction in the object file
jmpq _puts
contains a symbolic reference to puts
; but the jump in the code that we are
about to execute in fact jumps to the next instruction in memory:
0x1024a421c: jmpq 0x1024a4221
0x1024a4221: ...
In other words, the loaded object file has not been properly relocated, and
when we try to call puts
we end up jumping into nowhere. At this point the
bug was easily resolved.
Further Reading
We have barely scratched the surface here of what we can do with lldb or gdb.
In particular, ghc maintains quite a bit of runtime information that we can
inspect with the debugger. Tim Schröder has an excellent blog
post about inspecting ghc’s runtime data structures with gdb
, and
Nathan Howell has written some extensive lldb
scripts to do the
same, although they may now be somewhat outdated. See also the reddit
discussion about this blog post.