When profiling Haskell programs,
time spent in foreign functions (functions defined in C) does not show up on
normal time profiles, which can be problematic when debugging or optimizing
performance of code that makes heavy use of the foreign function interface (FFI). In this blog
post we present a new compiler plugin called trace-foreign-calls
, which makes
this time visible and available for analysis.
The trace-foreign-calls plugin as well as a simple analysis tool ghc-events-util are both available on GitHub.
Overview
Consider a C function
long slow_add(long a, long b) {
while(b--) {
++;
a}
return a;
}
with corresponding Haskell import
import capi unsafe "test_cbits.h slow_add"
foreign c_slowAddIO :: CLong -> CLong -> IO CLong
and an application that invokes
main :: IO ()
= do
main print =<< slowAddIO a b
print =<< slowAddIO b a
where
= 1_000_000_000
a = 2_000_000_000 b
When we compile the application with the trace-foreign-calls
plugin enabled,
run it, and then look at the generated .eventlog
using ghc-events-util
, we
will see:
607.16ms 607.16ms 1922573 cap 1 trace-foreign-calls: call c_slowAddIO (capi unsafe "test_cbits.h slow_add")
0.26ms 0.00ms 609077635 cap 1 trace-foreign-calls: return c_slowAddIO
302.02ms 302.02ms 609336093 cap 1 trace-foreign-calls: call c_slowAddIO (capi unsafe "test_cbits.h slow_add")
0.01ms 0.00ms 911353269 cap 1 trace-foreign-calls: return c_slowAddIO
The important column here is the first column, which for each event reports the
time from that event to the next; in this case, from the call to the foreign
call to its return. Perhaps ghc-events-util
could be given a mode that is
designed specifically for trace-foreign-events
to make this output a bit more
readable, but for now this general purpose output suffices.1
If we additionally compile with profiling enabled, we get an additional event for each foreign call, recording the cost-centre callstack:
0.00ms 0.00ms 4643217 cap 1 trace-foreign-calls: call c_slowAddIO (capi unsafe "test_cbits.h slow_add")
856.37ms 856.37ms 4643327 cap 1 heap prof sample 0, residency 1, cost centre stack:
slowAddIO in Example at Example.hs:29:1-78
main in Main at test/Main.hs:(8,1)-(19,21)
runMainIO1 in GHC.Internal.TopHandler at <no location info>
0.65ms 0.00ms 861018010 cap 1 trace-foreign-calls: return c_slowAddIO
0.00ms 0.00ms 861672464 cap 1 trace-foreign-calls: call c_slowAddIO (capi unsafe "test_cbits.h slow_add")
426.79ms 426.79ms 861672834 cap 1 heap prof sample 0, residency 1, cost centre stack:
slowAddIO in Example at Example.hs:29:1-78
main in Main at test/Main.hs:(8,1)-(19,21)
runMainIO1 in GHC.Internal.TopHandler at <no location info>
0.06ms 0.00ms 1288461103 cap 1 trace-foreign-calls: return c_slowAddIO
Note that we are abusing the “heap profile sample” event to record the cost-centre callstack to the foreign function (see “Conclusions and future work”, below).2
Dependencies
Suppose in example-pkg-A
we have
import capi "cbits.h xkcdRandomNumber"
foreign someFunInA :: IO CInt
and we use this function in example-pkg-B
main :: IO ()
= do
main <- someFunInA
randomNumber let bs = compress (BS.Char8.pack $ show randomNumber)
print $ BS.Word8.unpack bs
where compress
is from zlib. Although we are running main
from example-pkg-B
, in order to get information about someFunInA
we need to
enable the plugin when compiling example-pkg-A
; the
README.md describes how to enable the plugin for
all dependencies. Indeed, when we do this, we see calls to libz
as well:
0.00ms 0.00ms 414047 cap 0 trace-foreign-calls: call someFunInA (capi safe "cbits.h xkcdRandomNumber")
0.00ms 0.00ms 414607 cap 0 trace-foreign-calls: return someFunInA
(..)
0.00ms 0.00ms 493076 cap 0 trace-foreign-calls: call c_zlibVersion (capi unsafe "zlib.h zlibVersion")
0.00ms 0.00ms 493866 cap 0 trace-foreign-calls: return c_zlibVersion
Indeed, if we are willing to do a custom build of ghc
, we can even enable the
plugin on the boot libraries, which (amongst other things) makes the final
print
also visible:
0.00ms 0.00ms 609576 cap 0 trace-foreign-calls: call unsafe_fdReady (ccall unsafe "fdReady")
0.00ms 0.00ms 611846 cap 0 trace-foreign-calls: return unsafe_fdReady
0.01ms 0.00ms 612286 cap 0 trace-foreign-calls: call c_write (capi unsafe "HsBase.h write")
0.23ms 0.00ms 618236 cap 0 trace-foreign-calls: return c_write
Conclusions and future work
The trace-foreign-calls
compiler plugin can be used to generate eventlog
events for foreign function invocations, so that the time spent in foreign
functions becomes visible; the ghc-events-util
tool can be used to inspect
these eventlogs.
The plugin works by renaming each foreign function import of foo
to
foo_uninstrumented
, and then introducing a new wrapper function foo
which
emits some custom events to the eventlog before and after calling
foo_uninstrumented
. Since we want the plugin to work even on the GHC boot
libraries, the wrapper tries to use only functionality from GHC.Prim
, which
limits what we can do. One consequence is that because the plugin reuses “heap
profile sample” events to record the cost centre stacks for foreign functions,
it is not currently possible to record both regular heap profile samples (that
is, run the code with +RTS -p
) and enable the plugin at the same time.
A better solution would be to add support for profiling foreign functions to
ghc
itself. This would involve creating new eventlog event types, and then
upgrading existing time profiling tools to interpret these new events. Until
then, however, time profiling of foreign function invocations is now at least
possible.
The first three columns are the time from each event to the next visible event (some events might be filtered out), the time from each event to the next actual event, and the time of the event since the start of the program.↩︎
The
sample
field will always be 0; theresidency
field is used to record the capability. The latter allows us to correlate events of concurrent foreign function invocations; the--res-is-cap
command line option toghc-events-util
makes it understand this convention.↩︎