No untracked dependencies!
Years ago, back when Isaac Potoczny-Jones and others were defining the Cabal specification, the big idea was to make Haskell software portable to different environments. One of the mantras was “no untracked dependencies!”.
The problem at the time was that Haskell code had all kinds of implicit dependencies which meant that while it worked for you, it wouldn’t build for me. For example, I might not have some other module that it needed, or the right version of the module.
So of course that’s what the build-depends
in .cabal
files is all about,
requiring that the author of the code declare just what the code requires of
its environment. The other important part is that the build system only lets
your code see the dependencies you’ve declared, so that you don’t accidentally
end up with these untracked dependencies.
This mantra of no untracked dependencies is still sound. If we look at a system like nix, part of what enables it to work so well is that it is absolutely fanatical about having no untracked dependencies.
Untracked dependencies?!
One weakness in the original Cabal specification is with Setup.hs
scripts.
These scripts are defined in the spec to be the entry point for the system.
According to the Cabal spec, to build a package you’re required to compile the
Setup.hs
script and then use its command line interface to get things done. Because
in the original spec the Setup.hs
is the first entry point, it’s vital that
it be possible to compile Setup.hs
without any extra fuss (the
runhaskell
tool was invented just to make this possible, and to make it
portable across compilers).
But by having the Setup.hs
as the primary entry point, it meant that it’s
impossible to reliably use external code in a Setup.hs
script, because you
cannot guarantee that that code is pre-installed. Going back to the “no
untracked dependencies” mantra, we can see of course that all dependencies of
Setup.hs
scripts are in fact untracked!
This isn’t just a theoretical problem. Haskell users that do have complex
Setup.hs
scripts often run into versioning problems, or need external tools
to help them get the pre-requisite packages installed. Or as another example:
Michael Snoyman noted earlier this year in a diagnosis of an annoying
packaging bug that:
As an aside, this points to another problematic aspect of our toolchain: there is no way to specify constraints on dependencies used in custom
Setup.hs
files. That’s actually caused more difficulty than it may sound like, but I’ll skip diving into it for now.
The solution: track dependencies!
As I said, the mantra of no untracked dependencies is still sound, we just need to apply it more widely.
These days the Setup.hs
is effectively no longer a human interface, it is
now a machine interface used by other tools like cabal
or by distro’s
install scripts. So we no longer have to worry so much about Setup.hs
scripts always compiling out of the box. It would be acceptable now to say
that the first entry point for a tool interacting with a package is the
.cabal
file, which might list the dependencies of the Setup.hs
. The tool
would then have to ensure that those dependencies are available when compiling
the Setup.hs
.
So this is exactly what we have now done. Members of the
Industrial Haskell Group have funded us to fix this long standing problem
and we have recently merged the solution into the development version of
Cabal
and cabal-install
.
From a package author’s point of view, the solution looks like this: in your
.cabal
file you can now say:
build-type: Custom
custom-setup
setup-depends: base >= 4.6,
directory >= 1.0,
Cabal >= 1.18 && < 1.22,
acme-setup-tools == 0.2.*
So it’s a new stanza, like libraries or executables, and like these you can
specify the library dependencies of the Setup.hs
script.
Now tools like cabal
will compile the Setup.hs
script with these and only
these dependencies, just like it does normally for executables. So no more
untracked dependencies in Setup.hs
scripts. Newer cabal
versions will
warn about not using this new section. Older cabal
versions will ignore
the new section (albeit with a warning). So over time we hope to encourage
all packages with custom setup scripts to switch over to this.
In addition, the Setup.hs
script gets built with CPP version macros
(MIN_VERSION_{pkgname}
) available so that the code can be made to work with
a wider range of versions of its dependencies.
In the solver…
So on the surface this is all very simple and straightforward, a rather minor feature even. In fact it’s been remarkably hard to implement fully for reasons I’ll explain, but the good news is that it works and the hard work has also gotten us solutions to a couple other irksome problems.
Firstly, why isn’t it trivial? It’s inevitable that sooner or later you will
find that your application depends on one package that has setup deps like
Cabal == 1.18.*
and another with setup deps like Cabal == 1.20.*
.
At that point we have a problem. Classically we aim
to produce a build plan that uses at most one version of each package. We do
that because otherwise there’s a danger of type errors from using multiple
versions of the same package. Here with setup dependencies there is no such
danger: it’s perfectly possible for me to build one setup script with one
version of the Cabal
library and another script with a different Cabal
version. Because these are executables and not libraries, the use of these
dependencies does not “leak”, and so we would be safe to use different
versions in different places.
So we have extended the cabal
solver to allow for limited controlled use of
multiple versions of the same package. The constraint is that all the “normal”
libraries and exes all use the same single version, just as before, but
setup scripts are allowed to introduce their own little world where
independent choices about package versions are allowed. To keep things sane,
the solver tries as far as possible not to use multiple versions unless it
really has to.
If you’re interested in the details in the solver, see Edsko’s recent blog post.
Extra goodies
This work in the solver has some extra benefits.
Improve Cabal lib API without breaking everything
In places the Cabal library is a little crufty, and the API it exposes was
never really designed as an API. It has been very hard to fix this because
changes in the Cabal library interface break Setup.hs
scripts, and there
was no way for packages to insulate themselves from this.
So now that we can have packages have proper dependencies for their custom
Setup.hs
, the flip side is that we have an opportunity to make breaking
changes to the Cabal library API. We have an opportunity to throw out the
accumulated cruft, clean up the code base and make a library API that’s not
so painful to use in Setup.hs
scripts.
Shim (or compat) packages for base
Another benefit is that the new solver is finally able to cope with having “base shim” packages, as we used in the base 3.x to 4.x transition. For two GHC releases, GHC came with both base-3.x and base-4.x. The base-4 was the “true” base, while the base-3 was a thin wrapper that re-exported most of base-4 (and syb), but with some changes to implement the old base-3 API. At the time we adapted cabal to cope with this situation of having two versions of a package in a single solution.
When the new solver was implemented however support for this situation was not added (and the old solver implementation was retained to work with GHC 6.12 and older).
This work for setup deps has made it relatively straightforward to add support
for these base shims. So next time GHC needs to make a major bump to the
version of base then we can use the same trick of using a shim package. Indeed
this might also be a good solution in other cases, perhaps cleaner than all
these *-compat
packages we’ve been accumulating.
It has also finally allowed us to retire the old solver implementation.
Package cycles involving test suites and benchmarks
Another feature that is now easy to implement (though not actually implemented yet) is dealing with the dependency cycles in packages’ test suites and benchmarks.
Think of a core package like bytestring
, or even
less core like Johan’s cassava
csv library. These packages have benchmarks
that use the excellent criterion
library. But of course criterion
is a
complex beast and itself depends on bytestring
, cassava
and a couple dozen
other packages.
This introduces an apparent cycle and cabal
will fail to find an install
plan. I say apparent cycle because there isn’t really a cycle: it’s only the
benchmark component that uses criterion
, and nothing really depends on that.
Here’s another observation: when benchmarking a new bytestring
or cassava
,
it does not matter one bit that criterion
might be built against an older
stable version of bytestring
or cassava
. Indeed it’s probably sensible
that we use a stable version. It certainly involves less rebuilding: I don’t
really want to rebuild criterion
against each minor change in bytestring
while I’m doing optimisation work.
So here’s the trick: we break the cycle by building criterion
(or say
QuickCheck
or tasty
) against another version of bytestring
, typically
some existing pre-installed one. So again this means that our install plan
has two versions of bytestring
in it: the one we mean to build, and the
one we use as a dependency for criterion
. And again this is ok, just as with
setup dependencies, because dependencies of test suites and benchmarks do not
“leak out” and cause diamond dependency style type errors.
One technical restriction is that the test suite or benchmark must not depend on the library within the same package, but must instead use the source files directly. Otherwise there would genuinely be a cycle.
Now in general when we have multiple components in a .cabal
file we want
them to all use the same versions of their dependencies. It would be deeply
confusing if a library and an executable within the same package ended up
using different versions of some dependency that might have different
behaviour. Cabal has always enforced this, and we’re not relaxing it now. The
rule is that if there are dependencies of a test suite or benchmark that are
not shared with the library or executable components in the package, then
we are free to pick different versions for those than we might pick elsewhere
within the same solution.
As another example – that’s nothing to do with cycles – we might pick
different versions of QuickCheck
for different test suites in different
packages (though only where necessary). This helps with the problem that
one old package might require QuickCheck == 2.5.*
while another requires
QuickCheck == 2.8.*
. But it’d also be a boon if we ever went through another
major QC-2 vs QC-3 style of transition. We would be able to have both QC-2
and QC-3 installed and build each package’s test suite against the version
it requires, rather than freaking out that they’re not the same version.
Private dependencies in general
Technically, this work opens the door to allowing private dependencies more generally. We’re not pursuing that at this stage, in part because it is not clear that it’s actually a good idea in general.
Mark Lentczner has pointed out the not-unreasonable fear that once you allow multiple versions of packages within the same solution it will in practice become impossible to re-establish the situation where there is just one version of each package, which is what distros want and what most people want in production systems.
So that’s something we should consider carefully as a community before opening those flood gates.