Rodrigo worked on an internship with the GHC team at Well-Typed this summer. In this post he reports on his progress improving GHC’s configuration tooling as a step towards better cross-compiler support.
GHC, like most high-level language compilers, depends upon a set of tools like assemblers, linkers, and archivers for the production of machine code. Collectively these tools are known as a toolchain and capture a great deal of platform-dependent knowledge.
Traditionally, developers generate a ./configure
script using the venerable
autoconf
tool,
then users execute this script
when they install a GHC binary distribution. The ./configure
script
determines the location of programs (such as the C compiler) and which options GHC
will need to pass them.
While this autoconf
-centric model of toolchain configuration has served GHC well,
it has two key issues:
For cross-compiling to a different platform, it would be highly valuable to users if GHC would become a runtime-retargetable compiler (like
rustc
andgo
). That is, the user should be able to download a single GHC binary distribution and use it to compile not only for their local machine, but also any other targets that GHC supports.The ten-thousand-line
sh
file that is GHC’s./configure
script has historically been challenging to maintain and test. Modifications to the./configure
script are among the most risky changes routinely made to the compiler, because it is easy to introduce a bug on some specific toolchain configuration, and infeasible to test all possible configurations in CI.
To address these issues, we are introducing ghc-toolchain
, a new way to
configure the toolchain for GHC, which will
eventually replace the existing toolchain configuration logic in the
./configure
script.
Its main goal is to allow new compilation toolchains to be configured for GHC at
any point in time, notably after the compiler has been installed.
For example, calling ghc-toolchain --triple=x86_64-w64-mingw32
will configure a
compilation toolchain on the host machine capable of producing code for
an x86_64 machine running Windows using MinGW.
This is an important step towards making GHC runtime-retargetable, and since
ghc-toolchain
is implemented in Haskell, it will be much easier to modify
and test than the ./configure
script.
In this post we explain in more detail how GHC interacts with the system toolchain and how
ghc-toolchain
facilitates our future goal of making GHC a runtime-retargetable
compiler.
Compiler Toolchains
GHC cannot produce executables from Haskell programs in isolation – it requires a correctly configured toolchain to which it can delegate some responsibilities. For example, GHC’s native code generator backend is capable of generating assembly code from a Haskell program, however, producing object code from that assembly, and linking the objects into an executable, are all tasks done by the compilation toolchain, which is invoked by GHC using the flags that were configured for it.
Configuring a compiler toolchain is about locating the set of tools required for compilation, the base set of flags required to invoke each tool, and properties of these tools. For example, this might include:
- determining various characteristics of the platform (e.g. the word size).
- probing to find which tools are available (the C compiler, linker, archiver, object merging tool, etc.),
- identifying which flags GHC should pass to each of these tools,
- determining whether the tools support response files to work around command-line length limits, and
- checking for and working around bugs in the toolchain.
At the moment, when a GHC binary distribution is installed, the ./configure
script will perform the above steps and store the results in a settings
file.
GHC will then read this file so it can correctly invoke the toolchain programs
when compiling Haskell executables.
To cross-compile a Haskell program, a user must build GHC from source as a cross-compiler (see the GHC wiki). This requires configuring a cross-compilation toolchain, that is, a toolchain that runs on the machine compiling the Haskell program but that produces executables to run on a different system. It is currently a rather involved process.
The runtime-retargetable future of GHC
A key long-term goal of this work is to allow GHC to become runtime-retargetable.
This means being able to call ghc --target=aarch64-apple-darwin
and have GHC
output code for an AArch64 machine, or call ghc --target=javascript-ghcjs
to generate Javascript code, regardless of the platform ghc
is being invoked on.
Crucially, this requires the configuration step to be repeated at a later point, rather than only when the GHC binary distribution is installed. Once GHC is fully runtime-retargetable, this will allow you to use multiple different toolchains, potentially targeting different platforms, with the same installed compiler.
At the simplest level, you might just have two different toolchains for your host platform (for example, a
gcc
-based toolchain and aclang
-based toolchain), or you might just configure a toolchain which uses the newmold
linker rather thanld.gold
.In a more complex scenario, you may have a normal compiler toolchain as well as several different cross-compiler toolchains. For example, a toolchain which produces Javascript, a toolchain which produces WebAssembly, a toolchain which produces AArch64 object code and so on.
The idea is that the brand new ghc-toolchain
will be called once to configure the toolchain
that GHC will use when compiling for a target, then ghc --target=<triple>
can
be called as many times as needed. For example, if you have an x86 Linux machine
and wish to produce code for AArch64 devices, the workflow could look something
like:
# Configure the aarch64-apple-darwin target first
# (We only need to do this once!)
ghc-toolchain --triple=aarch64-apple-darwin --cc-opt="-I/some/include/dir" --cc-linker-opt="-L/some/library/dir"
# Now we can target aarch64-apple-darwin (as many times as we'd like!)
ghc --target=aarch64-apple-darwin -o MyAwesomeTool MyAwesomeTool.hs
ghc --target=aarch64-apple-darwin -o CoolProgram CoolProgram.hs
Introducing ghc-toolchain
ghc-toolchain
is a standalone tool for configuring a toolchain.
It receives as input a target triplet
(e.g. x86_64-deb10-linux
) and user options, discovers the configuration, and
outputs a “target description” (.target
file) containing the configured
toolchain.
At the moment, .target
files generated by ghc-toolchain
can be used by GHC’s
build system (Hadrian) by invoking ./configure
with
--enable-ghc-toolchain
. Otherwise, Hadrian reads the configuration from a
.target
file generated by ./configure
itself.
In the future, ghc-toolchain
will be shipped in binary distributions to allow
new toolchains to be added after the compiler is installed (generating new
.target
files). GHC will then be able to choose the .target
file for the
particular target requested by the user.
From a developer standpoint, ghc-toolchain
being written in Haskell makes it
easier to modify in future, especially when compared to the notoriously
difficult to write and debug ./configure
scripts.
Migration to ghc-toolchain
We are migrating to ghc-toolchain
in a staged manner, since toolchain configuration
logic is amongst the most sensitive things to change in the compiler.
We want to ensure that the configuration logic in
ghc-toolchain
is correct and agrees with the logic in
./configure
. Therefore, in GHC 9.10 ghc-toolchain
will be shipped
and validated but not enabled by default.
To validate ghc-toolchain
, GHC will generate .target
files with both ./configure
and
ghc-toolchain
and compare the outputs against each other, emitting a warning if they differ.
This means we will be able to catch mistakes in
ghc-toolchain
(and in ./configure
too!) before we make ghc-toolchain
the default
method for configuring toolchains in a subsequent release. This mechanism has already identified
plenty of issues to resolve.
Future work
Despite ghc-toolchain
bringing us closer to a runtime-retargetable GHC, there
is still much work left to be done (see
#11470).
The next step is to instruct GHC
to choose between multiple available .target
files at runtime, instead of
reading the usual settings
file (tracked in #23682).
Beyond that, however, there are many open questions still to resolve:
- How will the runtime system, and core libraries such as
base
, be provided for the multiple selected targets? - How will this fit into
ghcup
’s installation story? - How will
cabal
handle multiple targets?
At the moment, binary distributions include the RTS/libraries already compiled for a single target. Instead, we are likely to need some mechanism for users to recompile the RTS/libraries when they configure a new target, or to download ready-built versions from upstream.
Moreover, accommodating TemplateHaskell
under runtime retargetability is particularly
nontrivial, and needs more design work.
Conclusion
ghc-toolchain
is a new tool for configuring toolchains and targets. It
improves on GHC’s existing ./configure
-based configuration workflow by allowing multiple targets’ toolchains
to be configured at any time, and by making maintenance and future updates to
the toolchain configuration logic much easier.
However, toolchain configuration is a challenging part of the compiler, so we’re
being conservative in migrating to ghc-toolchain
, and carefully validating it
before making it the default.
Moreover, ghc-toolchain
is an important step towards making a runtime-retargetable
GHC a reality, though there is still much work left to do.
We are grateful to all the GHC developers involved in working towards runtime-retargetability.
Well-Typed is able to work on GHC, HLS, Cabal and other core Haskell infrastructure thanks to funding from various sponsors. If your company might be able to contribute to this work, sponsor maintenance efforts, or fund the implementation of other features, please read about how you can help or get in touch.