One of our clients sent us a bug report, which consisted of a 700 line Haskell program with a complaint that “it deadlocks”. After studying the code I concluded that, for the sake of the bug report, it could be summarized as
print a bunch of stuff, then crash with a stack overflow
So I wanted to replace the original code with code that did just that: print a bunch of stuff, then crash with a stack overflow. No big deal:
{-# LANGUAGE CPP #-}
module Main (main) where
go :: Int -> IO Int
= do
go n if (n `rem` THRESHOLD == 0)
then putChar '.'
else return ()
<- go (n + 1)
n' return (n + n')
main :: IO ()
= print =<< go 0 main
The function go
prints a dot every THRESHOLD
recursive calls; we have a
(dummy, since it can never be reached) addition after the recursive call to ensure
that go
is not tail recursive and will eventually run out of stack space.
The THRESHOLD
macro variable is there so that we can tweak how quickly the
program runs out of stack space, or in other words, how many dots it prints
before it crashes. For example, if we compile the code with
ghc -O1 PrintThenCrash.hs -fforce-recomp -DTHRESHOLD=23
it prints 22,126 dots before crashing; if we compile with
ghc -O1 PrintThenCrash.hs -fforce-recomp -DTHRESHOLD=26
it prints 19,483 dots before crashing. But as I was experimenting to find the right value for this parameter, I noticed something very strange. If we compile with
ghc -O1 PrintThenCrash.hs -fforce-recomp -DTHRESHOLD=24
it never crashes! It just prints, and prints, and prints (with GHC 7.6.3). This
blog post documents my attempt to understand this behaviour, and explains some
aspects of ghc
’s runtime, and in particular, its stack, as it goes.
Checking core
After type checking ghc
translates Haskell to an intermediate language called
core
, which is the “real life” version of the more
“academic” language System FC. We can ask ghc
to
output the (optimized) core
version of our program with
ghc -O1 PrintThenCrash.hs -fforce-recomp -DTHRESHOLD=24 -dsuppress-all -ddump-simpl
This is useful, because it allows to verify that even with a THRESHOLD
value
of 24 the optimizer did not somehow manage to make the function tail recursive:
$wa =
->
\ ww_s1SW w_s1SY case remInt# ww_s1SW 24 of _ {
->
__DEFAULT case $wa (+# ww_s1SW 1) w_s1SY of _ { (# ipv_aEw, ipv1_aEx #) ->
# ipv_aEw,
(case ipv1_aEx of _ { I# y_axO -> I# (+# ww_s1SW y_axO) } #)
};0 ->
case $wa5 stdout '.' w_s1SY of _ { (# ipv_aEw, _ #) ->
case $wa (+# ww_s1SW 1) ipv_aEw of _ { (# ipv2_XET, ipv3_XEV #) ->
# ipv2_XET,
(case ipv3_XEV of _ { I# y_axO -> I# (+# ww_s1SW y_axO) } #)
}
} }
This is a bit difficult to read; don’t worry about the details for now, we will
start dealing with low level details soon enough. For now, it’s enough to note
that $wa
is the translation of go
, and that the recursive calls to $wa
are not in tail position; the optimizer did not somehow manage to make the
function tail recursive. Ok. Then what? Why is this function not crashing with
a stack overflow?
Simplifying the problem
If we cannot understand the behaviour of the code by looking at core
then we
need to drop all the way down to assembly language. However, if we want to have
any hope of being able to step through the assembly language we need to
simplify that call to putChar
. Sadly, replacing it with
import ccall "putchar" c_putchar :: Char -> IO () foreign
made the problem go away: the program now always crashed. So the strange
behaviour of our program had something to do with the the implementation of
putChar
. The real putChar
is more involved that it might
seem; it deals with buffering, character encodings, concurrent access to
Handle
s, etc.
Unfortunately, since I had no idea what aspect of the implementation of
putChar
was causing the behaviour I was seeing, I could think of no other
approach than to inline putChar
and the functions it calls, and start
simplifying it bit by bit until the strange behaviour disappeared.
Many, many hours later I ended up with this code:
hPutChar :: State# RealWorld -> (# State# RealWorld, () #)
= case stdout of () -> (# w0, () #)
hPutChar w0
go :: State# RealWorld -> (# State# RealWorld, () #)
=
go w0 case maskAsyncExceptions# hPutChar w0 of
# w1, _ #) ->
(case go w1 of
# w2, () #) -> (# w2, () #)
(
stdout :: ()
= ()
stdout
main :: IO ()
= IO go main
In order to understand this code, you first have to realize that the IO monad
is a state monad with the RealWorld
as the state:
newtype IO a = IO (State# RealWorld -> (# State# RealWorld, a #))
I have written the code with explicit state passing, instead of using the
monad, so that the Haskell code is as close as possible to the generated core
and beyond.
That out of the way, let’s discuss the actual code. First of all, the real
putChar
is defined as
putChar = hPutChar stdout
where stdout
has type Handle
. As it turns out, there are only two aspects of
hPutChar
that are relevant:
hPutChar
does a case analysis on something (actually, of course, it does case analyses on lots of things); as it turns out, all that matters for our example is that it does a case analysis at all, so we have modelledstdout
simply by()
.This case analysis happens while asynchronous exceptions have been masked. This happens because a
Handle
contains anMVar
, and most of the I/O operation happens while we have taken the value from thisMVar
(seewithHandle'
inGHC.IO.Handle.Internals
).
At this point we can simplify no further. Although we are not actually printing
any dots anymore, if we run this code it will run forever until you interrupt
it manually; but if you remove either the case analysis from hPutChar
or the
call to maskAsyncExceptions#
from go
the program will exit with a stack
overflow almost immediately. What gives?
Understanding hPutChar
When ghc
compiles your program, after lexical analysis and syntax analysis it
first removes syntactical sugar and does scope analysis; then, after type
checking, it translates your code to core
, as we saw. The core
gets
optimized and then translated to stg
(for “Spineless Tagless
G-machine”); stg
code is very similar to core
, but with a
handful of additional restrictions; most importantly, all constructor
applications must be fully applied (and if not, an explicit lambda must be
constructed). Finally, the stg
code gets translated to C--
, which is a
portable assembly language similar in intent to llvm
, and the C--
code then
is translated to assembly language.
We wrote our code in such a low level way that the core
and stg
translations are not very interesting; they all look very alike. The C--
code
however is a big step down in level. Recall that we simplified hPutChar
to
hPutChar :: State# RealWorld -> (# State# RealWorld, () #)
= case stdout of () -> (# w0, () #) hPutChar w0
The C--
code for this “hPutChar
” is (-ddump-opt-cmm
):
:
Main.hPutChar_info((Sp + -8) < SpLim) goto c1ae;
if = PicBaseReg + IO.stdout_closure;
R1 [Sp - 8] = PicBaseReg + s19G_info;
I64= Sp - 8;
Sp (R1 & 7 != 0) goto c1ah;
if [R1]; // [R1]
jump I64c1ae:
= PicBaseReg + Main.hPutChar_closure;
R1 (I64[BaseReg - 8]); // [R1]
jump c1ah: jump s19G_info; // [R1]
s19G_info:
= PicBaseReg + GHC.Tuple.()_closure + 1;
R1 = Sp + 8;
Sp (I64[Sp + 0]); // [R1] jump
This might look rather frightening, especially when you are used to Haskell, so let’s take it step by step. We will be pushing a value to the stack, so we first check if there is enough room on the stack to do so:
if ((Sp + -8) < SpLim) goto c1ae;
If not, we will call a function from the GHC runtime called __stg_gc_fun
,
which will extend the stack, if possible. We will come back to this in detail
later.
In order to do case analysis on stdout
we first have to make sure that it is
in weak head normal form (in this case, it will always be, but it might not
be in general). In order to do this, we need to call its definition; but before
we can do that, we need to do two more things.
First, in general stdout
might be a thunk, with free variables in the
closure, and hence it needs to know where it can find the values of those free
variables. The convention is therefore that when we call a closure, the address
of the closure can always be found in register R1
:
= PicBaseReg + IO.stdout_closure; R1
Secondly, when stdout
completes, it needs to know what to do next. It does
this by looking at the stack: when it finishes, it loads a “continuation
address” from the top of the stack and calls it. Hence we need to push this
address to the stack so that stdout
will be able to find it later:
[Sp - 8] = PicBaseReg + s19G_info;
I64= Sp - 8; Sp
At this point we could call stdout
, but actually we can do slightly better.
ghc
implements an optimization called pointer tagging. Since
addresses are always word aligned, the lower 3 bits (or, on 32-bit machines,
the lower 2 bits) of addresses are always 0. For datatype constructors ghc
uses these bits to encode, as part of the pointer itself, which constructor
of the datatype it is (non-zero value) or whether the thunk is not yet in weak
head normal form. So we can check by looking at the pointer if there is a point
calling stdout
at all:
(R1 & 7 != 0) goto c1ah;
if
c1ah: jump s19G_info; // [R1]
If the constructor is already in weak head normal form, we call the
continuation directly (note that s19G_info
is the same address that we pushed
to the stack as the continuation address for stdout
). Finally, if it turns out
that we do still need to evaluate the stdout
we call it:
[R1]; // [R1] jump I64
In the continuation we now know that stdout
is in weak head normal form;
technically speaking, we should now pattern match on it to find out which
constructor it is, but of course ()
only has a single constructor, so we can
skip that step. This means that all that is left to do is to call our
continuation with the result. We are returning
# w0, () #) (
In general, unboxed tuples are represented as a pair of pointers, either as two
consecutive memory locations or, ideally, in two registers. However, real world
tokens disappear from generated code, so all we have to return is ()
. By
convention the first few arguments are passed in registers; in this case, that
means that we need to load the address of ()
into register R1
before
calling the continuation:
= PicBaseReg + GHC.Tuple.()_closure + 1;
R1 = Sp + 8;
Sp (I64[Sp + 0]); // [R1] jump
The + 1
part is pointer tagging at work: ()
is already in weak head normal
form, and it is the first (and only) constructor of the ()
type.
Understanding go
At this point you might realize why I said at the start that it would be
completely unfeasible to step through the real hPutChar
; our simplified
version just does a case analysis (on something of unit value, no less) and
then returns unit, and it is already complicated enough! What about go
?
go :: State# RealWorld -> (# State# RealWorld, () #)
=
go w0 case maskAsyncExceptions# hPutChar w0 of
# w1, _ #) ->
(case go w1 of
# w2, () #) -> (# w2, () #) (
Thankfully, the code was carefully written to make the translation to lower
level code as simple as possible; the C--
translation of go
does not
introduce any more concepts that we already used in hPutChar
:
():
Main.go_infoc1aM:
((Sp + -8) < SpLim) goto c1aO;
if = PicBaseReg + Main.hPutChar_closure + 1;
R1 [Sp - 8] = PicBaseReg + s19O_info;
I64= Sp - 8;
Sp #; // [R1]
jump stg_maskAsyncExceptionsc1aO:
= PicBaseReg + Main.go_closure;
R1 (I64[BaseReg - 8]); // [R1]
jump
s19O_info:
[Sp + 0] = PicBaseReg + s19N_info;
I64; // []
jump Main.go_info
s19N_info:
[Sp + 0] = PicBaseReg + s19M_info;
I64(R1 & 7 != 0) goto c1aI;
if [R1]; // [R1]
jump I64c1aI:
; // [R1]
jump s19M_info
s19M_info:
= PicBaseReg + GHC.Tuple.()_closure + 1;
R1 = Sp + 8;
Sp (I64[Sp + 0]); // [R1] jump
The structure is very similar as before, except that we have two (actually, three) case statements.
First we do a stack overflow check again, as before.
We need to evaluate the scrutinee of the first case statement: the call to the primop
maskAsyncExceptions#
(the equivalent ofmask_
in Haskell-land), which expects its argument inR1
. In this case, the argument ishPutChar
, except that as before we use pointer tagging (in this case, to indicate that the function has been evaluated and that its arity is 1).The continuation after we are finished with the scrutinee is
s19O_info
; this is a second case statement, so we push a second continuation (s19N_info
) onto the stack and recursively callgo
.The second continuation—which is never reached—makes sure that
go
indeed returned unit, and then returns unit; if you look at thestg
translation ofgo
(-ddump-stg
) you will notice thatgo
is actually a triply nested case statement:
case maskAsyncExceptions# [Main.hPutChar w0_s19t] of _ {
#,#) ipv_s19x _ ->
(case Main.go ipv_s19x of _ {
#,#) ipv2_s19D ipv3_s19B ->
(case ipv3_s19B of _ { () -> (#,#) [ipv2_s19D GHC.Tuple.()]; };
}; };
Three case statements, hence three continuations (s19O_info
, s19N_info
,
s19M_info
).
Running the code
All this is pure theory, so far. Let’s confirm it by actually running our code.
We can load up the code into a debugger; make sure to compile the code with
-debug
to link it against the version of the Haskell RTS that has debugging
symbols. Since I am working on OSX Mavericks I will be using lldb
; gdb
will
work in a very similar way.
One of the abstractions that C--
offers over real assembly language is
virtual registers. When we translate C--
to assembly these are mapped
to machine registers by a register allocator in ghc
’s native code generator.
On my machine, Sp
and SpLim
are mapped to rbp
and r15
, and R1
is
mapped to rbx
.
Let’s load up our code, set a breakpoint in go
, ask it to print Sp
and
SpLim
whenever we hit a breakpoint, and then start the program (which, during
my debugging, I called “weird
”, which is why you will see many
references to “weird
” below).
# lldb weird
Current executable set to 'weird' (x86_64).
(lldb) breakpoint set -n Main_go_info
Breakpoint 1: where = weird`Main_go_info, address = 0x0000000100000f98
(lldb) target stop-hook add --one-liner "register read rbp r15"
Stop hook #1 added.
(lldb) run
Process 35298 launched: '/Users/dev/wt/weird/weird' (x86_64)
rbp = 0x0000000000000000
r15 = 0x0000000000000000
Process 35298 stopped
* thread #1: tid = 0xebc474, 0x0000000100000f98 weird`Main_go_info, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
frame #0: 0x0000000100000f98 weird`Main_go_info
weird`Main_go_info:
-> 0x100000f98: leaq -0x8(%rbp), %rax
0x100000f9c: cmpq %r15, %rax
0x100000f9f: jb 0x100000fc0 ; Main_go_info + 40
0x100000fa1: leaq 0xd8818(%rip), %rax ; Main_hPutChar_closure
rbp = 0x0000000100405370
r15 = 0x00000001004050c0
lldb
started the program, stopped at the breakpoint, and told us the value of
Sp
and SpLim
, as asked. It also shows us a disassembly of the code. The C--
code started with
((Sp + -8) < SpLim) goto c1aO; if
which, if you recall, was to check if we might run into a stack overflow. This translates into
leaq -0x8(%rbp), %rax
cmpq %r15, %rax
jb 0x100000fc0 ; Main_go_info + 40
in Intel assembly language. The details don’t matter very much, and are beyond
the scope of this blog post. If you haven’t seen Intel assembly language,
leaq
is “load effective address”, cmpq
is “compare”
and jb
“jump if below”. We will not explain such details any
further; hopefully you will be able to squint a bit and see that it resembles
the C--
rather closely.
Before we do anything else, let’s see what’s on the stack before we start:
(lldb) memory read -format A -count 3 $rbp
0x100405370: 0x000000010009eb68 weird`stg_catch_frame_info
0x100405378: 0x0000000000000000
0x100405380: 0x00000001000dbf72 base_GHCziTopHandler_runIO2_closure + 2
So, the stack frame immediately above us when we start is a catch frame.
Catch frames are pushed onto the stack when you use catch
, and are used by
the runtime to find which exception handler to run when an exception occurs.
The catch frame has two additional fields in its payload: the handler to run
(in this case, the default top-level handler), and a mask which indicates if
asynchronous exceptions have been blocked or not (in this case, they haven’t).
We can use the step
command from lldb
to start stepping through the
execution of the code. We check for stack overflow, find that there is none,
and then set things up for the first case statement in go
: we push a
continuation address to the stack, put the argument to maskAsyncExceptions#
in rbx
(the machine equivalent of R1), and then call maskAsyncExceptions
.
At this point the stack therefore looks like
0x100405368: 0x0000000100000f68 weird`s19O_info
0x100405370: 0x000000010009eb68 weird`stg_catch_frame_info
0x100405378: 0x0000000000000000
0x100405380: 0x00000001000dbf72 base_GHCziTopHandler_runIO2_closure + 2
and we can confirm that rbx
contains the tagged address of hPutChar
(lldb) image lookup --address $rbx
Address: weird[0x00000001000d97c1] (weird.__DATA.__data + 1)
Summary: Main_hPutChar_closure + 1
You can find the implementation of maskAsyncExceptions#
as
stg_maskAsyncExceptionszh
in Exception.cmm
in the rts/
directory of the
ghc
source (“zh
” is the z-encoding of “#
”). The
details are not so important, however. It masks async exceptions by setting a
flag in a register, pushes a frame onto the stack to unmask exceptions when we
are done, and then calls the function (whose address is in R1
). This means
that we will end up in hPutChar
, at which point the stack looks like
0x100405360: 0x000000010009e540 weird`stg_unmaskAsyncExceptionszh_ret_info
0x100405368: 0x0000000100000f68 weird`s19O_info
0x100405370: 0x000000010009eb68 weird`stg_catch_frame_info
0x100405378: 0x0000000000000000
0x100405380: 0x00000001000dbf72 base_GHCziTopHandler_runIO2_closure + 2
This continues on. hPutChar
executes, completes, and runs the contination,
which happens to be stg_unmaskAsyncExceptionszh_ret_info
; this unmasks
asynchronous exceptions, and continues with the continuation above it, which in
this case is the original continuation from go
. And after some more steps we
end up back in go for the next recursive call, and the whole process repeats.
However, the stack now looks like
0x100405368: 0x0000000100000f38 weird`s19N_info
0x100405370: 0x000000010009eb68 weird`stg_catch_frame_info
0x100405378: 0x0000000000000000
0x100405380: 0x00000001000dbf72 base_GHCziTopHandler_runIO2_closure + 2
0x100405388: 0x00000001000a46c0 weird`stg_stop_thread_info
Remember that go
is not tail recursive, so once this recursive call
completes, we still need to call the continuation from the previous invocation.
And this repeats; if we run until the breakpoint again (using cont
) and print
the stack, we will see
0x100405360: 0x0000000100000f38 weird`s19N_info
0x100405368: 0x0000000100000f38 weird`s19N_info
0x100405370: 0x000000010009eb68 weird`stg_catch_frame_info
0x100405378: 0x0000000000000000
0x100405380: 0x00000001000dbf72 base_GHCziTopHandler_runIO2_closure + 2
0x100405388: 0x00000001000a46c0 weird`stg_stop_thread_info
and so on. This is precisely the behaviour that we were expecting: since the function is not tail recursive, we are using up more and more stack space and should eventually run out, and crash with a stack overflow. So why don’t we?
Stack overflow
After precisely 84 recursive calls, the stack looks like
0x1004050d0: 0x0000000100000f38 weird`s19N_info
...
0x100405368: 0x0000000100000f38 weird`s19N_info
0x100405370: 0x000000010009eb68 weird`stg_catch_frame_info
0x100405378: 0x0000000000000000
0x100405380: 0x00000001000dbf72 base_GHCziTopHandler_runIO2_closure + 2
with 84 s19N_info
pointers in total. Moreover, the top of the stack (virtual register Sp
, real register rbp
) is now dangerously close to the stack limit (virtual register SpLim
, real register r15
):
(lldb) register read rbp r15
rbp = 0x00000001004050d0
r15 = 0x00000001004050c0
That is, we have 16 bytes left, or space for two addresses. This means that we
have just enough stack space to make it to the entry point for hPutChar
; we
get there in the same way as before, and the stack now looks like
0x1004050c0: 0x000000010009e540 weird`stg_unmaskAsyncExceptionszh_ret_info
0x1004050c8: 0x0000000100000f68 weird`s19O_info
0x1004050d0: 0x0000000100000f38 weird`s19N_info
...
0x100405368: 0x0000000100000f38 weird`s19N_info
0x100405370: 0x000000010009eb68 weird`stg_catch_frame_info
0x100405378: 0x0000000000000000
0x100405380: 0x00000001000dbf72 base_GHCziTopHandler_runIO2_closure + 2
Note that we pushed two additional addresses onto the stack, which is therefore
now full. Remember that hPutChar
starts with a check for stack overflow:
((Sp + -8) < SpLim) goto c1ae;
if
c1ae:
= PicBaseReg + Main.hPutChar_closure;
R1 (I64[BaseReg - 8]); // [R1] jump
Actually, if we look at the unoptimized C--
minus instead (-ddump-cmm
) we
will see
c1ae:
= Main.hPutChar_closure;
R1 ; // [R1] jump stg_gc_fun
instead, which is a lot clearer: we load the address of hPutChar
into
register R1
and then run the garbage collector. In the optimized C--
code we
find the address of the garbage collector somewhere else, but we can verify in
lldb
that it’s the same thing:
(lldb) memory read -format A -count 1 $r13-8
0x1000e1290: 0x000000010009fcb0 weird`__stg_gc_fun
Either way, the garbage collector runs. It notices that we are out of stack
space (as opposed to out of heap space; we carefully avoided any heap
allocation in this test code), creates a new, bigger, stack, copies the old
stack over, and then calls back into our function. In other words, we end
up back at the start of hPutChar
, but now the stack looks like:
0x1004fcd30: 0x000000010009e540 weird`stg_unmaskAsyncExceptionszh_ret_info
0x1004fcd38: 0x0000000100000f68 weird`s19O_info
0x1004fcd40: 0x0000000100000f38 weird`s19N_info
...
0x1004fcfd8: 0x0000000100000f38 weird`s19N_info
0x1004fcfe0: 0x000000010009eb68 weird`stg_catch_frame_info
0x1004fcfe8: 0x0000000000000000
0x1004fcff0: 0x00000001000dbf72 base_GHCziTopHandler_runIO2_closure + 2
which is the same stack as before, except at a different location; and, crucially, we have space on the stack again:
(lldb) register read rbp r15
rbp = 0x00000001004fcd30
r15 = 0x00000001004f50c0
so we can continue recursing.
Running out of stack space
Of course, we cannot continue increasing the stack forever; eventually we
should reach the maximum stack size (8 MB by default in ghc
7.6). So why
don’t we? Well, now that we understand the problem at this level of detail a
quick Google search will reveal that this is due to a “bug” in
ghc
(depending on your definition of bug, I suppose) prior to version
7.8: when a stack overflow happens while asynchronous exceptions are blocked,
the stack will be grown no matter what the limitation on the stack size is.
In this example, it just so happens that we run out of stack space when we try
to push the continuation address in hPutChar
, at which point asynchronous
exceptions are indeed masked. Hence, this program will continue growing the
stack unpunished, until we run out of machine memory completely. If we remove
the case statement from hPutChar
(or indeed, if we modify the program in a
myriad of other, minor, ways) we would detect the stack overflow outside of the
maskAsyncExceptions#
and hence we would crash with a stack overflow exception
when we reach the maximum stack size.
In the original program that we started with, the value of THRESHOLD
determines how many continuation addresses for go
are on the stack when we
call (the real) putChar
. We run out of stack space either while
asynchronous exceptions were masked (somewhere deep inside the bowels of the
real hPutChar
), or outside it, and hence we would run forever or crash almost
immediately, respectively. This is clearly not what one would expect, and with
GHC 7.8 the problem has been resolved – a stack overflow is now treated like
any other asynchronous exception, and the program will crash with a stack
overflow as soon as asynchronous exceptions are no longer masked.
Further Reading
If you want to understand ghc’s runtime execution model How to make a fast
curry: push/enter vs eval/apply is essential reading, although you
can ignore the comparison with push/enter, which is not used in ghc. Faster
Laziness Using Dynamic Pointer Tagging explains the pointer tagging trick we
discussed above. Apart from these two papers, there are various other
references that might be helpful. Be aware however that ghc
is constantly
evolving and many of these references may no longer precisely match what is in
ghc
right now.
System F with Type Equality Coercions describes System FC, the idealized version of
core
; Chapter 25 (Profiling and Optimization) of Real World Haskell discusses readingcore
and using it for optimization purposes.I know kung fu: learning STG by example on the
ghc
wiki describing the compilation from Haskell down to assembly language in quite some detail.Edward Z. Yang has a number of blog posts that discuss the execution of Haskell code at a low level of detail; they are all worth reading: Unraveling the mystery of the IO monad, Tracing the compilation of Hello Factorial!, and Interrupting GHC
Blog post Thunks and lazy blackholes: an introduction to GHC at runtime by Keegan McAllister discusses more about the heap, which we did not cover at all here.
For
C--
it is unfortunately hard to find definitive references;ghc
’s version ofC--
(cmm
) differs significantly from the official standard. The description on the ghc wiki is probably a good starting point; the comment at the top ofCmmParse.y
in theghc
sources is also very helpful. The paperC--
: a portable assembly language that supports garbage collection (Abstract) describes some of the underlying motivations but is somewhat outdated. Hoopl: A Modular, Reusable Library for Dataflow Analysis and Transformation describes a library that is used to optimizeC--
.Runtime Support for Multicore Haskell describes how Haskell runs on multiple cores; Section 4 contains some information about the GHC runtime, and in particular, how it relates to foreign calls (calls into the C world). Extending the Haskell Foreign Function Interface with Concurrency is an older paper that describes this in some detail, too.
Debugging GHC-compiled code with gdb on the
ghc
wiki contains some useful information about usinggdb
on compiled Haskell code; GDB and LLDB Command Examples is a nice a reference forlldb
.Finally, David Terei has a presentation about
ghc
where he gives a bird’s eye view of the entire thing, and his bachelor’s thesis Low Level Virtual Machine for Glasgow Haskell Compiler talks about thellvm
backend forghc
.
Postscript
One of the many variations that I played with while tracking this bug down looked like
module Main (main) where
import Control.Exception
import Data.IORef
import System.IO.Unsafe (unsafePerformIO)
counter :: IORef Int
{-# NOINLINE counter #-}
= unsafePerformIO $ newIORef 0
counter
hPutChar :: IO ()
= mask_ $ do
hPutChar <- readIORef counter
n + 1)
writeIORef counter (n
go :: Int -> IO Int
=
go n case n `rem` 2 of
0 -> do hPutChar
+ 1)
go (n -> do n' <- go (n + 1)
_ return (n + n')
main :: IO ()
= do _ <- go 0 ; return () main
When you run this in 7.8 this program will crash with
Weird: internal error: scavenge_stack: weird activation record found on stack: 415597384
(GHC version 7.8.2 for x86_64_apple_darwin)
Please report this as a GHC bug: http://www.haskell.org/ghc/reportabug
Abort trap: 6
This turns out to be due to an unrelated bug in 7.8.2; see #9045 (and #8866). It has already been fixed and should be released as part of 7.8.3.