TL;DR: GHC HEAD (but not GHC 7.8) will soon support
OverloadedRecordFields
, an extension to permit datatypes to reuse
field labels and even turn them into lenses.
Introduction
The Haskell records system is a frequent source of frustration to Haskell programmers working on large projects. On the face of it, it is simple: a datatype such as
data Person = Person { id :: Int, name :: String }
gives rise to top-level projection functions:
id :: Person -> Int name :: Person -> String
However, this means that using multiple records with the same field label leads to name clashes. Workarounds are possible, such as declaring one datatype per module (so the module prefix can be used to disambiguate) or adding a per-datatype prefix to all the field labels, but they are somewhat clumsy.
Over the summer before I started working for Well-Typed, I implemented
a new GHC extension, OverloadedRecordFields
, that allows the same
name to be used for multiple fields. This is not a whole new records
system for Haskell, and does not include everything one might want
(such as anonymous records), but it is a small step forward in a
notorious quagmire. Proposals for better systems are welcome, but
while it is easy to propose a more powerful design in isolation,
integrating it with other extensions and syntax in GHC is another
matter!
Unfortunately, the extension will not be ready for GHC 7.8, to allow time for the design to settle and the codebase changes to mature. However, it should land in HEAD soon after the 7.8 release is cut, so the adventurous are encouraged to build GHC and try it out. Feedback from users will let us polish the design before it is finally released in 7.10.
Record projection
The essential point of the Haskell records system is unchanged by this extension: record projections are still functions, except now they are polymorphic in the choice of datatype. Instead of
name :: Person -> String
we have
name :: (r { name :: t }) => r -> t
where r { name :: t }
is a constraint meaning that the type r
has
a field name
of type t
. The typechecker will automatically solve
such constraints, based on the datatypes in scope. This allows module
abstraction boundaries to be maintained just as at present. If a
module does not export a field, clients of that module cannot use the
OverloadedRecordFields
machinery to access it anyway.
For example, the following code will be valid:
data Company { name :: String, employees :: [Person] }
companyNames :: Company -> [String] companyNames c = name c : map name (employees c)
Notice that name
can be used at two different record types, and the
typechecker will figure out which type is meant, just like any other
polymorphic function. Similarly, functions can be polymorphic in the
record type:
nameDefined :: (r { name :: [a] }) => r -> Bool nameDefined = not . null . name
Record update
The traditional Haskell record update syntax is very powerful: an expression like
e { id = 3, name = "Me" }
can update (and potentially change the types of) multiple fields. The
OverloadedRecordFields
extension does not attempt to generalise this
syntax, so a record update expression will always refer to a single
datatype, and if the field names do not determine the type uniquely, a
type signature may be required. With the Person
and Company
types
as defined above, the previous expression is accepted but the
definition
f x = x { name = "Me" }
is ambiguous, so a type signature must be given to f
(or a local
signature given on x
or the update expression).
On the other hand, type-changing update of single fields is possible in some circumstances using lenses, discussed below.
Technical details
The constraint r { name :: t }
introduced above is in fact syntactic
sugar for a pair of constraints
(Has r "name", t ~ FldTy r "name")
where the Has r n
class constraint means that the type r
has a
field named n
, and the FldTy r n
type family gives the type of the
field. They use type-level strings (with the DataKinds
extension)
to name fields. The type family is used so that the field type is a
function of the data type and field name, which improves type
inference.
Unlike normal classes and type families, the instances of Has
and
FldTy
are generated automatically by the compiler, and cannot be
given by the user. Moreover, these instances are only in scope if the
corresponding record projection is in scope. For example, if the
name
field of Person
is in scope, the extension works as if the
following declarations had been given:
instance Has Person "name" type instance FldTy Person "name" = String
Lenses
An excellent way to deal with nested data structures, and to alleviate
the shortcomings of the Haskell record system, is to use one of the
many Haskell lens libraries, such as
lens,
data-lens,
fclabels and so on.
These pair together a "getter" and a "setter" for a field (or more
generally, a substructure of a larger type), allowing them to be
composed as a single unit. Many of these libraries use Template
Haskell to generate lenses corresponding to record fields. A key
design question for OverloadedRecordFields
was how to fit neatly
into the lenses ecosystem.
The solution is to generalise the type of field functions still further. Instead of the desugared
name :: (Has r "name") => r -> FldTy r "name"
we in fact generate
name :: (Accessor p r "name") => p r (FldTy r "name")
where Accessor
is a normal typeclass (no instances are generated
automatically). In particular, it has an instance
instance Has r n => Accessor (->) r n
so when p = (->)
, the new field type specialises to the previous
type. Thus a field can still be used as a (record-polymorphic)
function. Moreover, each lens library can give its own instance of
Accessor
, allowing fields to be used directly as lenses of that
type. (Unfortunately this doesn't quite work for van Laarhoven
lenses,
as in the lens library,
because they do not have the shape p r (FldTy r n)
. A wrapper
datatype can be used to define a combinator that converts fields into
van Laarhoven lenses.)
Concluding remarks
For more information, see the detailed discussion of the design and implementation on the GHC wiki. A prototype implementation, that works in GHC 7.6.3, shows how the classes are defined and gives examples of the instances that will be generated by GHC. Keep an eye out for the extension to land in GHC HEAD later this year, and please try it out and give your feedback!
I'm very grateful to Simon Peyton Jones for acting as mentor for this
project and as my guide to the GHC codebase. Edward Kmett threw
several wrenches in the works, and the final design owes a great deal
to his careful scrutiny and helpful suggestions. Anthony Clayden also
gave many helpful comments on the design. My thanks go also to the
many denizens of the ghc-devs
and glasgow-haskell-users
mailing
lists who gave feedback at various stages in the development process.
This work was supported by the Google Summer of Code programme.