The large-anon
library provides support for anonymous records; that is,
records that do not have to be declared up-front. For example, used as a plugin along with the record-dot-preprocessor
plugin, it makes it
possible to write code such as this:
magenta :: Record [ "red" := Double, "green" := Double, "blue" := Double ]
= ANON { red = 1, green = 0, blue = 1 }
magenta
reduceRed :: RowHasField "red" r Double => Record r -> Record r
= c{red = c.red * 0.9} reduceRed c
The type signatures are not necessary; type inference works as aspected for these records. If you prefer to use lenses1, that is also possible:
reduceBlue :: RowHasField "blue" r Double => Record r -> Record r
= over #blue (* 0.9) reduceBlue
The library offers a small but very expressive API, and it scales to large
records (with 100 fields and beyond), with excellent compilation time
performance and good runtime performance. In this blog post we will first
present the library from a user’s perspective, then give an overview of the
internals with an aim to better to understand the
library’s runtime characteristics, and finally show some
benchmarks. The library is available from
Hackage and is currently compatible with ghc
8.8, 8.10
and 9.0 (extending this to 9.2 should not be too hard).
If you want to follow along, the full source code for all the examples in this
blog post can be found in Test.Sanity.BlogPost
in the
large-anon
test suite.
The simple interface
The library offers two interfaces, “simple” and “advanced”. We will present the simple interface first, then explore the advanced interface below.
The simple interface can be summarized as follows:
Data.Record.Anon.Simple
wheredata Record (r :: Row Type) deriving (Eq, Ord, Show, Large.Generic, ToJSON, FromJSON) data Pair a b = a := b type Row k = [Pair Symbol k] instance (RowHasField n r a, ..) => HasField n (Record r) a empty :: Record '[] insert :: Field n -> a -> Record r -> Record (n := a : r) get :: RowHasField n r a => Field n -> Record r -> a set :: RowHasField n r a => Field n -> a -> Record r -> Record r project :: SubRow r r' => Record r -> Record r' inject :: SubRow r r' => Record r' -> Record r -> Record r merge :: Record r -> Record r' -> Record (Merge r r')
Large.Generic
comes from thelarge-generics
package.
In the remainder of this section we will introduce this API by means of
examples. When there is possibility for confusion, we will use the prefix S.
to refer to the simple interface (and A.
for the advanced
interface).
Record construction and field access
In the introduction we used some syntactic sugar: the ANON
record constructor
makes it possible to use regular record syntax for anonymous records. This
syntax is available as soon as you use the large-anon
plugin. ANON
desugars
to calls to empty
and insert
; it does not depend on any kind of internal or
unsafe API) and there is no need to use it if you prefer not to (though see
Applying pending changes):
purple :: Record [ "red" := Double, "green" := Double, "blue" := Double ]
=
purple #red 0.5
S.insert $ S.insert #green 0
$ S.insert #blue 0.5
$ S.empty
Similarly, the example in the introduction used RecordDotSyntax
as provided
by record-dot-preprocessor
, but we can also
use get
and set
:
reduceGreen :: RowHasField "green" r Double => Record r -> Record r
= S.set #green (S.get #green c * 0.9) c reduceGreen c
Constraints
The summary of the simple interface showed that Record
has a
Show
instance. Let’s take a closer look at its precise signature:
instance (KnownFields r, AllFields r Show) => Show (Record r)
The KnownFields
constraint says that the field names of r
must be known,
and the AllFields r Show
constraint says that all fields of r
must in turn
satisfy Show
; the show
instance uses this to output records like this:
> magenta
ANON {red = 1.0, green = 0.0, blue = 1.0}
In fact, Show
for Record
simply uses gshow
from
large-generics
.
The order of the fields is preserved in the output: large-anon
regards records with rows that differ only in their order as different types;
isomorphic, but different. The project
function can be used to translate
between records with different field order; we shall see an example
when we discuss sequenceA
.
The RowHasField
, KnownFields
, AllFields
and SubRow
constraints (for
project
) are solved by the large-anon
typechecker plugin, so you will need
to add
{-# OPTIONS_GHC -fplugin=Data.Record.Anon.Plugin #-}
at the top of your Haskell file. We will see later how to manually prove such constraints when the plugin cannot.
Project and inject
In the previous section we saw that project
can be used to reorder fields, but
is actually more general than that. In addition to reordering fields, we can
also omit fields: a SubRow r r'
constraint is satisfied whenever the fields
of r'
are a subset of the fields of r
. Moreover, when SubRow r r'
holds
we can also update the larger record from the smaller one: project
and
inject
together form a lens.
Let’s consider an example. Suppose we have some kind of renderer with a bunch of configuration options:
type Config = [
"margin" := Double
"fontSize" := Int
, "header" := String
, ...
,
]
defaultConfig :: Record Config
= ANON {
defaultConfig = 1
margin = 18
, fontSize = ""
, header ...
,
}
render :: Record Config -> ...
To call render
, we would need to construct such a record; for example, we
could do2
$ defaultConfig{margin = 2} render
There is an alternative, however. Rather than passing in the full configuration, we could offer an API where the caller only passes in the overrides:
render' :: SubRow Config overrides => Record overrides -> ...
= render (S.inject overrides defaultConfig) render' overrides
Now we no longer need to export a defaultConfig
to the user:
$ ANON { margin = 2 } render'
The advanced interface
The key difference between the simple interface and the advanced one is that
Record
is additionally parameterised by a type constructor f
:3
data Record (f :: k -> Type) (r :: Row k)
Intuitively, every field in the record will be wrapped in an application of f
.
Indeed, the simple interface is but a thin layer around the advanced one,
instantiating f
to the identity functor I
:
magenta' :: A.Record I [ "red" := Double, "green" := Double, "blue" := Double ]
= S.toAdvanced magenta magenta'
The additional type constructor argument makes records a lot more expressive, and consequently the advanced API is much richer than the simple one. We will give some representative examples.
Foldable
and zipping
“Folding” (as in Foldable
) essentially means “turning into a list”. With
records we cannot do that, unless all fields of the record happen to have the
same type. We can express this by using the
constant functor K
:
collapse :: Record (K a) r -> [a]
toList :: KnownFields r => Record (K a) r -> [(String, a)]
Similarly, because every field in the record has a different type, zipping requires a polymorphic function:
zipWith :: (forall x. f x -> g x -> h x) -> Record f r -> Record g r -> Record h r
(There are also monadic and constrained variations of zipping available.)
Example: toList
and zipWith
Suppose we want to write a function that translates records to JSON values, but allow the user per-field overrides which can change how the value of that field gets output. That is, we want to enable the user to provide a function of type
newtype FieldToJSON x = FieldToJSON (x -> Value)
for every field of type x
. We will do this by providing a record of such
functions to our JSON generation function, in addition to the actual record we
want to translate:
recordToJSON :: KnownFields r => A.Record FieldToJSON r -> Record r -> Value
= Aeson.object . map (first fromString) $
recordToJSON fs xs $ A.zipWith aux fs (S.toAdvanced xs)
A.toList where
aux :: FieldToJSON x -> I x -> K Value x
FieldToJSON f) (I x) = K (f x) aux (
Function aux
is returning K Value x
, emphasizing that the result of aux
is
a Value
, no matter what the type of the field was; this is what enables the
call to toList
.
It is worth noting quite how short and simple this function is; try doing this with regular records!
Applicative
Recall the types of pure
and (<*>)
from the prelude:
pure :: Applicative f => a -> f a
(<*>) :: Applicative f => f (a -> b) -> f a -> f b
Records are “Applicative
-like”, but don’t quite match this interface because,
again, every field of the record has a different type. The corresponding
functions in the advanced record API are:
pure :: KnownFields r => (forall x. f x) -> Record f r
cpure :: AllFields r c => Proxy c -> (forall x. c x => f x) -> Record f r
ap :: Record (f -.-> g) r -> Record f r -> Record g r
A function of type (f -.-> g) x
is really a function from f x -> g x
; thus,
the type of ap
says: “provided you have a record containing functions from
f x -> g x
for every field of type x
in the record, and a corresponding
record of arguments of type f x
, then I can construct a record of results of
type g x
”.
Similarly, to construct a record in the first place, we can use pure
or
cpure
. The type of pure
is simpler, but it is less often useful: it requires
the caller to construct a value of type f x
for any x
at all. Often that is
not possible, and we need to know that some constraint c x
holds; cpure
can be used in this case.
If you have used large-generics
or (more likely)
sop-core
before, you will find this style familiar. If
not, this may look a little intimidating, but hopefully the examples in this
blog post will help. You might also like to read the paper True Sums of
Products where this style of programming was introduced.
Example: cpure
Our example JSON construction function took as argument a record of
FieldToJSON
values. In most cases, we just want to use toJSON
for every
field. We can write a function that constructs such a record for any row using
cpure
:
defaultFieldToJSON :: AllFields r ToJSON => A.Record FieldToJSON r
= A.cpure (Proxy @ToJSON) (FieldToJSON toJSON) defaultFieldToJSON
Suppose for the sake of an example that we want to generate JSON for our
Config
example, but that we want to output null
for
the header if it’s empty:
headerToJSON :: String -> Value
"" = Aeson.Null
headerToJSON = toJSON xs headerToJSON xs
Then
recordToJSON= FieldToJSON headerToJSON}
defaultFieldToJSON{header defaultConfig
will result in something like
{
"margin": 1,
"fontSize": 18,
"header": null
}
Example: ap
Suppose that we want the function that creates the value to also be passed the field name:
newtype NamedFieldToJSON a = NamedFieldToJSON (String -> a -> Value)
Our generation function must now zip three things: the record of functions, a record of names, and the actual record of values. We can get a record of names using
reifyKnownFields :: KnownFields r => proxy r -> Record (K String) r
(We will see reification and reflection of constraints in more detail when we
discuss how to manually prove constraints.)
However, large-anon
does not offer a zipWith3
. Not to worry; just like for
ordinary Applicative
structures we can write
pure f <*> xs <*> ys <*> zs
to combine three structures, we can do the same for records:
recordToJSON' :: forall r.
KnownFields r
=> A.Record NamedFieldToJSON r -> Record r -> Value
= Aeson.object . map (first fromString) $
recordToJSON' fs xs $
A.toList
A.pure (fn_3 aux)`A.ap` fs
`A.ap` A.reifyKnownFields (Proxy @r)
`A.ap` S.toAdvanced xs
where
aux :: NamedFieldToJSON x -> K String x -> I x -> K Value x
NamedFieldToJSON f) (K name) (I x) = K (f name x) aux (
Traversable
The essence of Traversable
is that we sequence effects: given some traversable
structure of actions, create an action returning the structure:
sequenceA :: (Traversable t, Applicative f) => t (f a) -> f (t a)
We can do the same for records; the advanced API offers
sequenceA' :: Applicative m => Record m r -> m (Record I r)
sequenceA :: Applicative m => Record (m :.: f) r -> m (Record f r)
and the simplified API offers
sequenceA :: Applicative m => A.Record m r -> m (Record r)
When we are sequencing actions, order matters, and large-anon
guarantees
that actions are executed in row-order (another reason not to consider rows
“up to reordering”).
Example: sequenceA
Let’s go back to our Config
running example, and let’s assume we want to
write a parser for it. Let’s say that the serialised form of the Config
is just a list of values, something like
2.1 14 Example
Then we could write our parser as follows (ANON_F
is the equivalent of ANON
for the advanced interface):
parseConfig :: Parser (Record Config)
= S.sequenceA $ ANON_F {
parseConfig = parseDouble
margin = parseInt
, fontSize = parseString
, header }
We are using sequenceA
to turn a record of parsers into a parser of a record.
However, what if the order of the serialised form does not match the order in
the record? No problem, we can parse in the right order and then use project
to reorder the fields:
parseConfig' :: Parser (Record Config)
= fmap S.project . S.sequenceA $ ANON_F {
parseConfig' = parseString
header = parseDouble
, margin = parseInt
, fontSize }
Of course, first ordering and then sequencing would not work!
Incidentally, anonymous records have an advance over regular records here; with normal records we could write something like
parseConfig :: Parser Config
=
parseConfig MkConfig
<$> parseDouble
<*> parseInt
<*> parseString
but there is no way to use the record field names with Applicative
(unless we
explicitly give the record a type constructor argument and then write
infrastructure for dealing with it), nor is there an easy way to change the
order.
Manually proving constraints
This section is aimed at advanced usage of the library; in most cases, use of the API we describe here is not necessary.
The large-anon
type checker plugin proves KnownFields
, AllFields
and
SubRow
constraints, but only for concrete rows. When this is insufficient,
the advanced interface provides three pairs of functions for proving each of
these.
Inductive reasoning over these constraints is not possible. Induction over type-level structures leads to largeghc
core size and bad compilation time, and is avoided entirely inlarge-anon
.
Example: reflectAllFields
For reflectAllFields
the pair of functions looks like this:
reifyAllFields :: AllFields r c => proxy c -> Record (Dict c) r
reflectAllFields :: Record (Dict c) r -> Reflected (AllFields r c)
The former turns a constraint AllFields
over a record into a record of
dictionaries; the latter goes in the opposite direction. The only difference
between Dict
(defined in sop-core
) and Reflected
(defined in large-anon
) is that the former takes a type constructor argument:
data Dict c a where
Dict :: c a => Dict c a
data Reflected c where
Reflected :: c => Reflected c
We’ll consider two examples. First, if a constraint c
holds for every field in
some larger record, then it should also hold for every field in a record with
a subset of the larger record’s fields:
smallerSatisfies :: forall r r' c.
SubRow r r', AllFields r c)
(=> Proxy c -> Proxy r -> Reflected (AllFields r' c)
=
smallerSatisfies pc _ $ A.project (A.reifyAllFields pc :: A.Record (Dict c) r) A.reflectAllFields
Second, if a constraint c
implies c'
, then if every field of a record
satisfies c
, every field should also satisfy c'
. For example, Ord
implies Eq
, and hence:
ordImpliesEq :: AllFields r Ord => Reflected (AllFields r Eq)
=
ordImpliesEq $ A.map aux (A.reifyAllFields (Proxy @Ord))
A.reflectAllFields where
aux :: forall x. Dict Ord x -> Dict Eq x
Dict = Dict aux
Example: reflectSubRow
For the SubRow
constraint, the pair of functions is
data InRow r a where
InRow :: (KnownSymbol n, RowHasField n r a) => Proxy n -> InRow r a
reifySubRow :: (KnownFields r', SubRow r r') => Record (A.InRow r) r'
reflectSubRow :: Record (A.InRow r) r' -> Reflected (SubRow r r')
For our final and most sophisticated example of the use of the advanced API, we
will show how we can do a runtime check to see if one row can be projected to
another. Such a check is useful when dealing with records with over existential
rows, for example when constructing records from JSON values (see someRecord
in the advanced API). The large-anon
test suite contains contains an example
of this in Test.Infra.DynRecord.Simple
, as well as a
slightly better version of checkIsSubRow
in
Test.Infra.Discovery
.
Starting point
We want to write a function of type
checkIsSubRow ::
..)
(=> proxy r1 -> proxy' r2 -> Maybe (Reflected (SubRow r1 r2))
We need to use reflectSubRow
to do this, so we need to construct a record
over r'
, where every field contains evidence that that field is a member of
r
.
Let’s consider how to do this one bit at a time, starting with perhaps a
non-obvious first step: we will use reifySubRow
to construct a record
for r
with evidence that every field of r
is (obviously!) a member of r
,
and similarly for r'
:
=
checkIsSubRow _ _ <$> go A.reifySubRow A.reifySubRow
A.reflectSubRow where
go :: A.Record (InRow r ) r
-> A.Record (InRow r') r'
-> Maybe (A.Record (InRow r) r')
= ... go r r'
The strategy is now going to be as follows: we are going to try and translate
the evidence of InRow r'
to evidence of InRow r
, by matching every field of
r'
with the corresponding field in r
(if it exists).
Matching fields
In order to check if we have a match, we need to check two things: the field names need to match, and the field types need to match. For the former we can use
sameSymbol ::
KnownSymbol n, KnownSymbol n')
(=> Proxy n -> Proxy n' -> Maybe (n :~: n')
from GHC.TypeLits
, but to be able to do a runtime
type check we need some kind of runtime type information. An obvious choice
would be to use Typeable
, but here we will stick with something simpler. Let’s
suppose the only types we are interested in are Int
and Bool
; we can
implement a runtime type check as follows:
data SupportedType a where
SupportedInt :: SupportedType Int
SupportedBool :: SupportedType Bool
class IsSupportedType a where
supportedType :: Proxy a -> SupportedType a
instance IsSupportedType Int where supportedType _ = SupportedInt
instance IsSupportedType Bool where supportedType _ = SupportedBool
sameType :: SupportedType a -> SupportedType b -> Maybe (a :~: b)
SupportedInt SupportedInt = Just Refl
sameType SupportedBool SupportedBool = Just Refl
sameType = Nothing sameType _ _
With this in hand, let’s now go back to our matching function. We have
evidence that some field x'
is a member of r'
, and we want evidence that
x'
is a member of r
. We do this by trying to match it against evidence that
another field x
is a member of r
, checking both the field name and the
field type:
checkIsMatch :: forall x x'.
IsSupportedType x, IsSupportedType x')
(=> InRow r' x' -> InRow r x -> K (Maybe (InRow r x')) x
InRow x') (InRow x) = K $ do
checkIsMatch (Refl <- sameSymbol x x'
Refl <- sameType (supportedType (Proxy @x)) (supportedType (Proxy @x'))
return $ InRow x
Now for a given field x'
of r'
, we need to look through all the fields in
r
, looking for a match:
findField :: forall x'.
IsSupportedType x'
=> A.Record (InRow r) r -> InRow r' x' -> Maybe (InRow r x')
=
findField r x' . catMaybes . A.collapse $
listToMaybe Proxy @IsSupportedType) (checkIsMatch x') r A.cmap (
Finally, we just need to repeat this for all fields of r'
; the full
implementation of checkIsSubRow
is
checkIsSubRow :: forall (r :: Row Type) (r' :: Row Type) proxy proxy'.
KnownFields r
( KnownFields r'
, SubRow r r
, SubRow r' r'
, AllFields r IsSupportedType
, AllFields r' IsSupportedType
,
)=> proxy r -> proxy' r' -> Maybe (Reflected (SubRow r r'))
=
checkIsSubRow _ _ <$> go A.reifySubRow A.reifySubRow
A.reflectSubRow where
go :: A.Record (InRow r ) r
-> A.Record (InRow r') r'
-> Maybe (A.Record (InRow r) r')
= A.cmapM (Proxy @IsSupportedType) (findField r) r' go r r'
Discussion: choice of InRow
Recall the type of reflectSubRow
:
data InRow r a where
InRow :: (KnownSymbol n, RowHasField n r a) => Proxy n -> InRow r a
reflectSubRow :: Record (A.InRow r) r' -> Reflected (SubRow r r')
This may look obvious in hindsight, but during development of the library
it was far from clear what the right representation was for the argument to
reflectSubRow
; after all, we are dealing with two rows r
and r'
,
and it was not evident how to represent this as a single record.
When we finally settled on the above representation it intuitively “felt right”,
and this intuition was confirmed in two ways. First, checkIsSubRow
previously could only be defined internally in the library by making use of
unsafe features; the library is now expressive enough that it can be defined
entirely user-side. Indeed, Test.Infra.Discovery
in the
large-anon
test suite also provides an example of the runtime computation of
the intersection between two rows, again using safe features of the library
only (turns out that this is a minor generalization of checkIsSubRow
).
Secondly, if we look at the generated core for reflectSubRow
(and clean
it up a bit), we find
= unsafeCoerce $ fmap aux (toCanonical d)
reflectSubRow d where
InRow _name index _proxy) = index aux (
so we see that it literally just projects out the indices of each field, which
is quite satisfying. In fact, if we didn’t include evidence of KnownSymbol
in
InRow
then reflectSubRow
would just be the identity function!
Indeed, the choice to include KnownSymbol
evidence in InRow
is somewhat
unfortunate, as it feels like an orthogonal concern. Ultimately the reason we
need it is that the kind of the type constructor argument to Record
is
k -> Type
, rather than Symbol -> k -> Type
: it is not passed the field
names, and hence the field name must be an existential in InRow
.
Internal representation
In this section we will give a short overview of the internal representation
of a Record
. The goal here is not to provide a detailed overview of the
internals of the library, but rather to provide users with a better
understanding of its runtime characteristics.
A Record
is represented as follows:
data Record (f :: k -> Type) (r :: Row k) =
NoPending {-# UNPACK #-} !(Canonical f)
| HasPending {-# UNPACK #-} !(Canonical f) !(Diff f)
We’ll consider the two cases separately.
No pending changes
When there are no pending changes (that is, updated or added fields),
Record
just wraps Canonical
:
newtype Canonical (f :: k -> Type) = Canonical (StrictArray (f Any))
newtype StrictArray a = WrapLazy { unwrapLazy :: SmallArray a }
In addition, the evidence for RowHasField
is just an Int
:
class RowHasField (n :: Symbol) (r :: Row k) (a :: k) | n r -> a where
rowHasField :: Tagged '(n, r, a) Int
This means that reading from a record in canonical form is just an array access, and should be very fast.
Pending changes
Updating is however an expensive operation, because the entire array needs to
be copied. This is fine for small arrays, but this is not an approach that
scales well. Record
therefore represents a record with pending changes—added
or updated fields—as a combination of the original array along with a Diff
:
data Diff (f :: k -> Type) = Diff {
diffUpd :: !(IntMap (f Any))
diffIns :: [FieldName]
, diffNew :: !(SmallHashMap FieldName (NonEmpty (f Any)))
, }
The details don’t matter too much, but diffUpd
contains the new values of
updated fields, and diffIns
records which new fields have been inserted;
diffNew
is necessary to deal with shadowing, which is beyond the scope of this
blog post.
FieldName
is a combination of a precomputed hash and the name of the field:
data FieldName = FieldName {
fieldNameHash :: Int
fieldNameLabel :: String
,
}
instance Hashable FieldName where
= fieldNameHash hash
These hashes are computed at compile time (through the KnownHash
class,
defined in large-anon
).
The take-away here is that the performance of a Record
will degrade to
the performance of a hashmap (with precomputed hashes) when there are many
pending updates. This makes updating the record faster, but accessing the
record slower.
Applying pending changes
The obvious question then is when we apply pending changes, so that we have a flat array again. First of all, the library provides a function to do this:
applyPending :: Record f r -> Record f r
(and similary in the simplified interface). It might be advisable to call this function after having done a lot of field updates, for example. Of course, we shouldn’t call it after every field update because that would result in a full array copy for every update again.
The library also calls applyPending
internally in two places:
- The
ANON
andANON_F
syntactic sugar callapplyPending
after the record has been constructed. - All of the combinators on records (
map
,pure
,zipWith
, etc.) all callapplyPending
on any input records, and only construct records in canonical form. Since these operations are anywayO(n)
, the additional cost of callingapplyPending
is effectively hidden.
Benchmarks
So does all this work? Yes, yes it does, and in this section we will show a
bunch of benchmarks to prove it. For a baseline, we will compare against
superrecord
; this is a library which has been optimized
for runtime performance, but makes heavy use of type families and induction and
consequently suffers from poor compilation times. It could certainly be argued
that this is not the library’s fault, and that ghc
should do better; for now,
however, we must work with what we have. It should also be noted that unlike
large-anon
, superrecord
does treat rows “up to reordering”.
Record construction
In superrecord
there are two ways to construct records: a safe API
(rnil
and rcons
), and an unsafe API (unsafeRNil
and unsafeRCons
). The
latter is unsafe in two ways: unsafeRNil
must be told how much space to
allocate for the record, and unsafeRCons
does in-place update of the record,
potentially breaking referential transparency if used incorrectly.
The safe API has such bad compilation time performance that we effectively
cannot compare it to large-anon
. By the time we get to records of 40 fields,
we end up with a ghc
core size of 25 million AST nodes (terms, types and
coercions), and it takes 20 seconds to compile a single record; this time
roughly doubles with every 10 more fields.
We will instead compare with the unsafe API:
We see that for records with 80 fields, large-anon
results in ghc
core that
is roughly an order of magnitude smaller, and compilation time that is about
5.5x faster. The left graph here might suggest that the ghc
core size
generated by large-anon
is linear in the size of the record; this is not quite
the case:
(We are showing the core size after desugaring, the very simple optimizer, and
the simplifier, but in this case all three are basically of identical size.)
The green line is what large-anon
does out of the box, and we see that it is
not linear. The reason is that repeated calls to insert
result in O(n²)
type
arguments (see Avoiding quadratic core code size with large records
for a detailed discussion of this problem). We do have experimental support for
integration with typelet
(see Type-level sharing in Haskell,
now), and while does indeed result in ghc
core that is linear
in size (blue line), unfortunately it actually makes compilation time worse
(although still very good) – at least for this benchmark. Fortunately,
compilation time without typelet
is linear (again, for this benchmark).
The runtime performance of superrecord
is much better, of course:
The most relevant lines here are the red line (unsafe superrecord
API) and the
green line (default large-anon
: no typelet
integration, and with a call to
applyPending
after the record is constructed). We see that superrecord
is
significantly faster here, by roughly two orders of magnitude. This is not
surprising: large-anon
first builds up a Map
, and then flattens it, whereas
superrecord
just constructs a single array and then updates it in place
(albeit in an unsafe manner).
Accessing record fields
Let’s now consider the performance of reading a bunch of fields from a record. The benchmark here constructs a function that extracts half of the fields of a record (into a non-record datatype).
The ghc
core size in large-anon
is so small that is is entirely dwarfed by
superrecord
; it is in fact linear, going up to roughly 3,500 AST nodes for a
record of 80 fields, about 3 orders of magnitude better than superrecord
.
Compilation time is similarly much better, by more than an order of magnitude
(50 ms versus 2.5 seconds), and also linear. Showing just large-anon
by
itself:
Comparing runtime is a bit more difficult, because of the hybrid representation
used by large-anon
: it very much depends on whether the record has many
pending changes or not. We will therefore measure the two extremes: when the
record has no pending changes at all, and when the record consists entirely
of pending changes, with an empty base array:
Note that when the record is in canonical form (green line), large-anon
and
superrecord
have very similar performance; large-anon
is slower by roughly a
factor 2x, which can be explained by having to check whether the record is in
canonical form on every field access. At the other extreme (blue line),
large-anon
again degrades to the performance of a Map
and is therefore about
an order of magnitude slower. Actual performance in any application will fall
somewhere between these two extremes.
Updating record fields
The hybrid nature of large-anon
here too makes a direct comparison with
superrecord
a bit difficult. The performance of updating a single field will
be different to updating many, and will depend on whether or not we call
applyPending
. We will therefore show a few different measurements.
Let’s first consider updating a single field. Both superrecord
and
large-anon
have good compilation time performance here; superrecord
is
non-linear, but in this benchmark we don’t really notice this because
compilation is essentially neglible:
In terms of runtime, however, since superrecord
needs to copy the entire
array, we expect large-anon
to do better here:
Indeed, updating a single field has a constant cost in large-anon
, since it
just adds a single entry to the map.
Of course, in practice we will eventually want to update a bunch of fields,
and then call applyPending
, so let’s measure that too. First, compilation time:
Here the non-linear compilation time of superrecord
really becomes noticable;
for a record of 80 fields, it is again more than an order of magnitude slower
(50 ms versus 2.5 seconds).
At runtime, field update in large-anon
is slightly slower than superrecord
for small arrays, but does better than superrecord
for larger records. After
all, every single field update results in an full array copy in superrecord
,
which is inherently O(n²)
. By contrast, large-anon
merely updates the map,
and then flattens it out at the end, constructing a single array. This is more
expensive for smaller arrays, but is O(n log n)
instead and therefore scales
and becomes faster for larger arrays. Of course, it does mean that
applyPending
must be called at an appropriate moment (see Applying pending
changes).
We should emphasize again that the goal of large-anon
was not to create a
library that would be better than superrecord
at runtime, but rather to
create a library with good enough runtime performance but excellent compile
time performance. Nonetheless, the O(n²)
vs O(n log n)
cost of updating
records may be important for some applications. Moreover, all functions in
large-anon
that work with entire records (functions such a (c)map
and co)
are all O(n)
.
Generics
There is no explicit support for generics in superrecord
, but it does support
conversions between records and JSON values. We will compare this to the JSON
conversion functions in large-anon
, which are defined in terms of generics
(indeed, they are just the functions defined in
large-generics
). Thus, toJSON
will serve as an
example of a generic consumer, and parseJSON
as an example of a generic
producer. If anything this benchmark is skewed in favour of superrecord
,
because what we are measuring there is the performance of more specialized
functions.
Let’s first consider the generic consumer, i.e., toJSON
:
The ghc
core size and compilation time of large-anon
get dwarfed here by
those of superrecord
, so let’s consider them by themselves:
We see that the ghc
core size in large-anon
is beautifully linear, and so is
compilation time.4 Moreover, compilation time is roughly two order of magnitude
faster than superrecord
(60 ms versus 6 seconds).
Runtime performance:
We see that large-anon
is a little more than 2x slower than superrecord
,
quite an acceptable performance for a generic function.
Finally, the generic producer, i.e., parseJSON
:
Here the different in compile time is less extreme, but large-anon
is still
roughly an order of magnitude faster (with much, much smaller ghc
core).
Runtime:
We see that superrecord
is again roughly 2x faster (slightly less).
Conclusions
The large-anon
library provides anonymous records for Haskell, which are
- practical: the library comes with good syntactic sugar and a very expressive API.
- scalable: compilation time is linear in the size of records.
For records with 80 fields, compilation time is somewhere between one and two
orders of magnitude faster than superrecord
. For runtime performance of
reading record fields, large-anon
lies somewhere between superrecord
and
Data.Map
; for writing record fields and generic operations, large-anon
is up
to roughly 2x slower than superrecord
, but sometimes much faster.
The runtime performance of the library can almost certainly be
improved; the focus has been on excellent compilation time performance, not
excellent runtime performance. That said, I would be pretty certain that for
nearly all applications the runtime performance of large-anon
is just fine.
The development of large-anon
is the latest, and for now probably final,
installment in our research on improving compilation time on behalf of
Juspay; see the blog posts tagged with
compile-time-performance
for everything we have written on
this topic. In addition, the large-records
repo contains a detailed
benchmarks report, covering large-records
, large-anon
,
and typelet
, as well as the various individual experiments we have done. In
addition to documenting the research, perhaps this can also help research into
compilation time by other people. We are thankful to Juspay for
sponsoring this research and improving the Haskell ecosystem.
Other features
We have covered most of the library’s features in this blog post, but not quite all:
All examples of the advanced API in this blog post have been over rows of kind
Type
(*
). The actual API is kind polymorphic;Test.Sanity.PolyKinds
in thelarge-anon
test suite contains an example of records with types like this:Record Boxed ["a" := Lazy Bool, "b" := Strict Int]
This is taking advantage of kind polymorphism to differentiate between strict and lazy fields. (In practice this is probably overkill;
large-anon
is strict by default; to get lazy fields, just use a boxdata Box a = Box a
.)Indeed, the runtime functions on rows such as
checkIsSubRow
(see section Example:reflectSubRow
above) are also entirely kind polymorphic, and as demonstrated inTest.Infra.DynRecord.Advanced
, row discovery for existential records also works for kinds other thanType
.Records can also be merged (concatenated):
merge :: Record f r -> Record f r' -> Record f (Merge r r')
The
Merge
type family does not reduce:example :: Record Maybe (Merge '[ "a" := Bool ] '[]) = merge (insert #a (Just True) empty) empty example
HasField
constraint can be solved for rows containing applications ofMerge
, andproject
can be used to flatten merged records:example :: Record Maybe '[ "a" := Bool ] = project $ merge (insert #a (Just True) empty) empty example
We have not covered the full set of combinators, but hopefully the Haddock documentation is helpful here. Moreover, the set of combinators should be entirely familiar to people who have worked with
large-generics
orsop-core
.In principle the library supports scoped labels. After all,
insert
has no constraints:insert :: Field n -> f a -> Record f r -> Record f (n := a : r)
The absence of any constraints on
insert
means that a sequence of many calls toinsert
to construct a large record is perfectly fine in terms of compilation time, but it also means that fields inserted later can shadow fields inserted earlier. Indeed, those newer fields might have different types than their older namesakes. Everything in the library is designed to take this into account, and I believe it makes for a simpler and more uniform design.However, the library currently offers no API for making shadowed fields visible again by removing the field that is shadowing them. There is no fundamental reason why this isn’t supported, merely a lack of time. The work by Daan Leijen in scoped labels (for example, Extensible records with scoped labels) may provide some inspiration here.
Alternative approaches
In a previous blog post Induction without core-size blow-up: a.k.a. Large
records: anonymous edition we discussed some techniques that can be
used to do type-level induction in Haskell without resulting in huge ghc
core
and bad compilation time. The reason we ended up not going down this path in the
end for large-anon
was primarily one of usability.
Consider checking whether a
field is a member of a (type-level) row. If the row is a list, then the search
is necessarily O(n)
. If we want to reduce this to O(log n)
, we could index
records by type-level balanced trees. We explored this to some degree; in fact,
we’ve gone as far as implementing guaranteed-to-be-balanced type-level
red-black trees. In the end though this results in a poorer user
experience, since these type-level trees then appear in user visible types,
error messages, and so on.
Using a plugin resulted in a more practical library.
Note, though, that we are using a plugin only for better compile
time performance. In principle everything that large-anon
does could be
done with type families within Haskell itself; this is different to plugins such
as Coxswain which really try to implement a theory of rows.
The large-anon
library does not attempt this; this keeps the library more
simple, but also more predictable. For example, we have seen various examples
above that having rows be ordered is useful.
The
large-anon
library comes with support foroptics
out of the box, but of course integration with other flavours of lenses is also possible.↩︎The
record-dot-preprocessor
syntax for record field update isr{f = ..}
, with no spaces allowed; currently none ofr { f = .. }
,r{ f = .. }
orr {f = ..}
are recognized, although this is apparently not quite intentional. See the GitHub ticket about Syntax for updating multiple fields?.↩︎This technique is used by various records and generic programming libraries, such as
barbies
,higgledy
,sop-core
andvinyl
.↩︎Compilation time measurements are inherently somewhat noisy when times are small, which explains the outlier at a record size of 90 fields. This is why we present
ghc
core size measurements as well, which are much more reliably reproducible.↩︎