This is the fifth and final post of a series examining GHC’s support for DWARF debug information and the tooling that this support enables:
- Part 1 introduces DWARF debugging information and explains how its generation can be enabled in GHC.
- Part 2 looks at a DWARF-enabled program in
gdb
and examines some of the limitations of this style of debug information. - Part 3 looks at the backtrace support of GHC’s runtime system and how it can be used from Haskell.
- Part 4 examines how the Linux
perf
utility can be used on GHC-compiled programs. - Part 5 concludes the series by describing future work, related projects, and ways in which you can help.
Future work
In the previous four posts we saw of some the functionality enabled by DWARF debug information. As of GHC 8.10.2 everything we saw above should be possible with the standard DWARF-enabled GHC binary distributions.
However, there is still a great deal of untapped potential and much remains to be done. Here is a sampling of tasks in no particular order:
- Merge the fruits of my latest push on DWARF support upstream (!2380, !2373, !2387)
- Make GHC-generated symbols (e.g.
59fw_info
) more reflective of their origin in the source program - Preserve call-stacks in exceptions (as discussed in part 3)
- Reduce the size of debug information through more concise representation (see #17609)
- Some RTS symbols (e.g.
stg_PAP_apply
) don’t have accurate unwind information, leading to truncated backtraces in some cases (#17627) - Implement a native (e.g. non-DWARF-based) stack unwinder in the GHC runtime system, allowing improved unwind performance in Haskell code
- Windows PDB support (#12397)
- Try moving GHC’s stack pointer to the native stack pointer register, enabling call-graph profiling via DWARF unwinding (as discussed in part 4, #8272)
- Build statistical profiling support into the GHC runtime system (#10915)
- Add support for expressing local variables in C–, enabling allocation profiling
- Add support for tracking register value semantics in STG-to-C– and DWARF type information, enabling local variable introspection.
- Implement thread support in
GHC.ExecutionStack
- Make better use of GHC-specific source-note information (mentioned briefly in part 1)
- Symbol demangling support in the GHC RTS,
perf
, andgdb
- Analysis tools
As always, we are looking for people to help with this effort. If any of the above tasks sound enticing to you, do let us know. Deep compiler experience is quite unnecessary for many of these tasks, especially those in the area of analysis tools.
Below I will describe in greater detail a few of the tasks which I think hold the greatest potential.
Profile analysis tools
In his thesis, Peter Wortmann shows that the one-to-one correspondence
between instructions and line numbers required by DWARF (see part
1) can result in rather un-helpful profiles. He shows
that one can do significantly better by splitting the attribution of an
instruction across the full set of source locations that gave rise to
it. This is not something that existing tools can do. One could
implement this approach on top of the sample data produced by
perf record
(e.g. exporting the samples via the perf script
tool or
the linux-perf
Haskell
library) and using the
the extended DWARF annotations produced by
GHC.
Peter’s Haskell Implementor’s Workshop demonstration showed one possible interface for such an analysis tool, marrying Haskell source and Core with sample data in the ThreadScope interface. It would be great to continue exploration down this path.
Using native stack pointer register
As noted in part 4, GHC’s current execution model on
x86 precludes use of perf record
’s call-graph profiling functionality.
The most promising avenue to fix this would be to rework GHC to use the
native stack pointer register to track the Haskell stack (#8272). This
would potentially carry a few benefits:
it would enable use of native profiling tools
the native code generator could use the
PUSH
andPOP
instructions, which may be more concise or better optimised in the microarchitecture than our current stack manipulation strategy
However, there are also a few tricky points:
LLVM makes very strong assumptions about the nature of the stack; consequently, moving the LLVM backend to this scheme may be non-trivial.
the System V ABI requires that the stack always have a small region above the stack pointer (called the “red zone”) which code can use for temporary storage. GHC would need to ensure this before calling into foreign code.
There is some interesting discussion surrounding this idea in #8272 and GHC Proposal MR #17.
Building sampling profiling into the GHC runtime
Without fixing the stack register issue described above, perf
’s
call-graph profiling functionality is unusable. However, nothing is
stopping GHC from providing its own sampling infrastructure in the
runtime (#10915). In 2016 I started a
branch) doing
exactly this using perf_events
’s signal-based sampling interface,
dumping samples to GHC’s eventlog.
As far as I can recall the wip/libdw-prof
branch can readily collect
samples; the work that remains primarily revolves around developing
analysis tools.
One approach would be to build a tool to convert the GHC-eventlog-based
output from the wip/libdw-prof
branch into a perf.data
file for use
with perf report
. However, one could no doubt do much better with a
more specialised tool, as described in the “Profile analysis tools”
above.
While simple, this signal-based approach does imply a slightly more
overhead (in the form of context-switches) than necessary. A more
efficient approach might involve the Linux eBPF mechanism, which can be
triggered from a perf_events
event.
In-scope bindings
Most imperative compilers produce debug information that allow debuggers display and modify in-scope variables and their values. In principle GHC could also provide such support. However, doing so in a way that will be useful in simplified programs would be quite non-trivial. For instance, consider the program:
f :: (Int, String) -> Int
= x + 4 f (x, _)
GHC’s worker-wrapper transformation would likely transform this to,
f :: (Int, String) -> Int
=
f pair case pair of (x, _) ->
case x of I# x# ->
case $wf x# of result ->
I# result
$wf :: Int# -> Int#
$wf x# = x# + 4
This sort of transformation is ubiquitous and critical to the quality of
GHC’s produced code. Naturally, we would want to ensure that the debug
information of $wf
can represent the fact that x#
is the unboxed
first element of the argument of f
. I suspect that the best way to
accomplish this would be to propagate value provenance information
through binders’ (e.g. in this case x#
) IdInfo
metadata.
This would involve:
- Adding syntax in C– to encode local variable information
- Producing such syntax in the STG-to-C– code generator
- Adding information in Core to propagate value provenance, as discussed above
- Populate this information in worker-wrapper
While being able to poke around at Haskell values in gdb
is perhaps a
tempting proposition, all-in-all I suspect that the costs (both in
implementation time and complexity) of would likely outweigh the
benefits it would bring. This is especially true given that GHC already
has the GHCi debugger for cases where such interactive debugging is
necessary.
Aside: Event tracing
Some users have related to me that they have sometimes wished that GHC
programs were as “traceable” as other programming language. In
particular, tools like perf
, bcc
, bpftrace
, and dtrace
provide
robust, minimal-overhead, language-agonstic tracing infrastructure which
can be invaluable in production settings. It would be great if Haskell
programs could benefit from these same tools.
The easiest on-ramp to tracing support is via the User-space Statically-Defined Tracepoint (USDT) mechanism supported by all of the aforementioned tools. Under this scheme, the traced program embeds a bit of metadata describing the available tracepoints, the information they provide, and how they are enabled.
It turns out that GHC’s runtime system already defines a number of USDT
tracepoints (although they need to be enabled when configuring GHC with
the --enable-dtrace
configure
flag). However, it is possible that
this support may have bit-rotted (#15543).
However, it may also be useful to be able to define USDT tracepoints in Haskell programs. A simple implementation of this would simply be a Template Haskell splice which would generate the necessary C stubs and splice in a foreign function import and call into the program.
Aside: LLVM and X-Ray
It should also be noted that LLVM provides another, much different approach to the tracing/profiling problem with its XRay instrumentation infrastructure. This approach seeks to introduce low-cost tracing instrumentation in generated code, allowing precise and highly detailed accounting of runtime costs.
Matthew Pickering tried (#15929) adding XRay support to GHC’s LLVM
backend. Unfortunately, this effort ended up being rather stunted, in
part due to limitations of LLVM itself (specifically difficulties with
tail-calls) and in part due to limitations of GHC’s LLVM backend
(namely, we rely on the LLVM IR alias
mechanism to convince LLVM that
our type annotations are correct; this confuses the XRay logic).
Acknowledgments
This work has been a multi-year (off-and-on) effort for me, but it would not have been possible without a number of others.
In particular, this work would never have even started without the efforts of Peter Wortmann. Not only does the causality formalism he described in his dissertation provide the theoretical foundation for all of this functionality, but his initial implmentation kick-started the effort and the promising results he demonstrated at the Haskell Implementors’ Workshop provided me with the motivation to keep picking away at the seemingly endless stream of details which arose as I refined the feature over the years.
In general, Well-Typed’s work on GHC (and, therefore, my own work) would not have been possible without the support of Microsoft Research, IOHK, and others who have supported the position which allows me to work on GHC for many years. In addition, some of my early work in 2015 to clean up the original DWARF implementation was supported directly by funding from Microsoft Research.