A generational copying garbage collector, in its most basic form, is quite simple. However, as we’ll see, not all objects can be copied, and some objects require more bookkeeping by the RTS. In this post we’re going to look at these type of objects that require special treatment from the garbage collector (GC in short). For each type of object we’ll look at how they’re allocated and collected. Each of these objects solves a particular problem:
- Large objects solve the problem with expensive copying of e.g. large arrays.
- Pinned objects solve the problem of sharing data with foreign code.
- Compact objects solve the problem of expensive scanning of large boxed heap objects.
- Weak pointers allow finalizers and non-leaky memo tables.
- Threads with special collection rules allow
BlockedIndefinitelyOnMVar
async exceptions. - Stable pointers solve the problem with sharing references to Haskell objects with foreign code.
- Stable names allow pointer equality.
- Static objects allow efficient collection of data allocated in compile-time by not allocating them in the heap.
This post is mostly intended for GHC hackers, but may also be useful for Haskell developers who develop performance-critical systems and need to know about impacts of these objects to GC and allocation pauses.
1. Large objects
A unit of allocation in GHC’s storage manager is a block, which is 212 bytes (4k) large. A large object is an object that is at least as large as 80% of a block (3277 bytes). Because these objects are expensive to copy, GHC’s RTS treats them as immovable, and does not use the usual copying GC routines to collect them.
Allocation: Allocation is done via the RTS function
allocateMightFail
. If the size parameter is smaller than 80% of a block
allocateMightFail
allocates in the nursery. Otherwise it allocates enough
blocks to hold the requested amount, marks allocated blocks as BF_LARGE
, adds
the blocks to the first generation’s large_objects
list. One example
where this happens is when newArray#
is called with large-enough size
argument.
Because large object allocation requires allocating new blocks, this requires
synchronization via sm_mutex
(sm
for “storage manager”).
Collection: When we see a large object during GC (i.e. a block with
BF_LARGE
flag set) we remove the object from its generation’s large_objects
list1, add it to the GC’s todo_large_objects
list.
todo_large_objects
list is later scanned by scavenge_large
, which adds
large objects to their next generation’s scavenged_large_objects
lists.
Finally, after the GC is done, scavenged_large_objects
lists are appended to
large_objects
lists of their generations.
Effectively we move large objects that we found during the usual copying GC to their next generations without copying.
2. Pinned objects
Pinned objects are similar to large objects; they’re also not moved by the GC.
However, unlike large objects, they don’t have a minimum size. The RTS maintains
capability-local pinned block lists, and allocates pinned objects in these
blocks. Currently the only pinned object type is the MutableByteArray#
,
which is allocated via newPinnedByteArray#
.
Allocation: Allocation is done via the RTS function allocatePinned
.
If we have enough space in the capability-local pinned object block we
allocate there, otherwise we allocate a new block and start using it as the new
pinned object block. The old block is added to the capability-local
pinned_object_blocks
list, which will be later moved to the first
generation’s large_objects
list for collection.
Pinned object allocation only requires synchronization (via sm_mutex
) when the
current capability’s pinned object block does not have enough space.
Collection: When allocating pinned objects we cheat a little bit, and mark
all pinned blocks as BF_LARGE
. This makes the GC collect pinned blocks
the same way that it collects large blocks. collect_pinned_object_blocks
moves all capabilities’ pinned_object_blocks
to the first generation’s
large_objects
list.
One another difference between a pinned block and a large block is that a pinned block may have many objects, while a large block is just one object. However, this does not cause problems when collecting a pinned block as if it was a large block, because the primop for pinned object allocation returns a mutable byte array, which does not contain any pointers and does not provide an API for adding arbitrary heap objects to the array. This also means unlike large objects pinned objects are not scavenged.
3. Compact objects
Compact objects are similar to pinned and large objects in that they’re not moved. Unlike pinned and large objects, they can have multiple blocks linked together (whereas pinned and large objects are contiguous block groups). As far as GC is concerned a compact object is just one object with no outgoing pointers, so the fact that they can have multiple linked block groups is not important: we only worry about liveness of the object as a whole, and there is no need to scavenge it.
See the paper for some example uses of compact objects.
Allocation: Allocation is done via compactNew
RTS function.
compactNew
adds the compact block to the first generation’s compact
list. compactAdd
copies an object (with everything referenced by
it) to a compact by allocating in the compact’s existing blocks, allocating new
blocks if necessary (code).
Compact object blocks are marked as BF_COMPACT
. Compact object
allocation requires synchronization via sm_lock
.
Collection: When we see an object that lives in a block with
BF_COMPACT
we get the head of the block (remember that blocks for a
compact object are linked together) and evacuate the head block by removing it
from its current compact list adding it to the next generation’s live compact
objects list (code). After GC is done any compact objects left in
compact lists are collected and live compact objects are moved to compact
lists.
4. Weak pointers
A weak pointer (or weak object) is essentially a record with key, value, and finalizer fields. While all of these fields are ordinary Haskell objects, they’re not marked as pointer fields in the weak object info table23, so a reachable weak object does not make these fields reachable.
As to why this is useful, see the paper and System.Mem.Weak module documentation.
Allocation: Weak pointers are allocated via mkWeak
and
mkWeakPtr
. These high-level functions use the mkWeak#
primop.
mkWeak#
allocates a heap object for the actual Weak
value, and adds it to
the current capability’s weak pointer list.
Collection: As already noted, because fields of weak objects are not marked as pointers, we do not copy its fields. Instead we do this:
Right before starting evacuating the roots, move Capability-local weak pointers to the first generation’s weak pointer list. (code)
After evacuating the roots but before scavenging, move weak pointer lists of collected generations to
old_weak_ptr_list
s. (code)During scavenging, traverse
old_weak_ptr_list
s of collected generations, and for each weak pointer check if the key is alive (code) (keys in uncollected generations are considered alive). If it is then evacuate other fields of the weak object, and promote it to next generation’sweak_ptr_list
. Otherwise just skip it (do not detach it from the list!).If in this step we evacuated anything (i.e. found an alive key) we signal the scavenging code to continue scavenging, because we have to copy any heap objects reachable from the weak object’s fields.
If we traverse all of the weak pointer lists without evacuating anything then we’re done. We append all
old_weak_ptr_list
s to the globaldead_weak_ptr_list
while also evacuating their finalizer fields.dead_weak_ptr_list
is later used to schedule threads to run finalizers.
This effectively implements our special GC strategy for weak objects: a weak object does not make its key or value reachable, but if its key is reachable (even if the weak itself is not reachable) then the value is also reachable. When the key becomes unreachable we schedule finalizers.
5. Threads
A GHC thread is a heap object like any other. However, when a thread becomes unreachable we sometimes want to raise an async exception in the thread instead of collecting it, effectively resurrecting the thread.
What does it mean for a thread to become unreachable? The reachability rules for threads are:
If
ThreadId
for a thread is reachable, then the thread is also reachable 4.If a thread has work to do and not blocked then it’s reachable. This is because threads with work to do are in capabilities’ run queues, which are roots for GC.
If a thread is in a blocking queue (e.g. when blocked by an
MVar
operation), and the object that owns the blocking queue is reachable, then the thread is also reachable.
Now suppose that a thread was blocked in an MVar
operation, and its ThreadId
is unreachable. In this case we want to raise a
BlockedIndefinitelyOnMVar
exception in the thread. This suggests that we
have to somehow know about threads that become unreachable. So we maintain a
list of threads. Before moving on to the details, here’s a simple program that
demonstrates the reachability rules:
{-# LANGUAGE ScopedTypeVariables #-}
module Main where
import Control.Concurrent
import Control.Exception
import System.Mem
main :: IO ()
= do
main var :: MVar Int <- newEmptyMVar
<-
thr $
forkIO catch (takeMVar var >>= print)
e :: BlockedIndefinitelyOnMVar) ->
(\(putStrLn "Blocked")
performMajorGC
yield
performMajorGC1000000 threadDelay
This version should print a single Blocked
line. This is because both the
MVar
and the thread become unreachable during the first GC and the thread is
resurrected5.
Now, add seq var (return ())
or seq thr (return ())
at the end. In both
cases we don’t get an exception in the thread because it’s still reachable, and
thus not resurrected.
Now, the details.
Allocation: A thread is allocated with forkIO
, which calls fork#
primop, which in turn calls createThread
RTS function.
createThread
allocates the thread in the nursery, and adds the thread to
the first generation’s threads
list6. Thread allocation requires
synchronization via sched_mutex
.
At this point we have a thread but it’s not scheduled yet. For this the primop
calls scheduleThread
RTS function with the return value of
createThread
. scheduleThread
adds the thread to the current capability’s run
queue. At this point the thread becomes a GC root.
Collection: Collection works similarly with weak object collection. Before
starting the GC, for collected generations, we move the threads
lists to
old_threads
lists. Any threads that will be in the old_threads
lists
will need resurrection if they’re blocked (otherwise they will be collected).
During scavenging, we traverse old_threads
lists, moving any alive threads to
their next generation’s threads
lists (code). Note that this process
interleaved with step (3) of weak object collection, because a weak object with
alive key may make a thread reachable (by referencing to it in its value).
When we don’t find any more weaks with live keys we “resurrect” threads in the
old_threads
lists if they were blocked in an operation (rather than being
killed or finished). This happens in resurrectUnreachableThreads
.
Because resurrecting makes some unreachable objects reachable again (by adding
threads to run queues) we continue with scavenging after this process.
Aside: The way blocked threads are resurrected causes somewhat counter-intuitive behavior when multiple threads are blocked but resurrecting one thread unblocks others. Apparently this behavior is discovered by users and reported as a bug from time to time (see e.g. #10241). The code from #10241 demonstrates the issue (slightly simplified):
module Main where
import Control.Concurrent
import Control.Concurrent.MVar
import Control.Exception
main :: IO ()
= do
main <- newEmptyMVar :: IO (MVar ())
mvar1 <- newEmptyMVar :: IO (MVar ())
mvar2
$ do
forkIO `catch` errorHandler1
takeMVar mvar1 putStrLn "after error catch"
1000000
threadDelay
putMVar mvar2 ()putStrLn "after putMVar"
`catch` errorHandler2
readMVar mvar2 putStrLn "after readMVar"
5000000
threadDelay where
errorHandler1 :: BlockedIndefinitelyOnMVar -> IO ()
= putStrLn ("errorHandler1: " ++ show e)
errorHandler1 e
errorHandler2 :: BlockedIndefinitelyOnMVar -> IO ()
= putStrLn ("errorHandler2: " ++ show e) errorHandler2 e
The idea is that while both threads are blocked, resurrecting one of them
unblocks the other, so we should see only one “errorHandler…” line in the
output. However, because all dead threads are resurrected once, both threads end
up getting a BlockedIndefinitelyOnMVar
exception.
There’s a simple fix that I explained in the ticket and implemented, but it’s rejected because it’s non-deterministic in which thread to resurrect.
Personally I think this non-determinism is OK because current behavior is
worse, and scheduling (and thus GC and BlockedIndefinitelyOnMVar
exceptions)
are already non-deterministic, so I doubt anyone relies on determinism in such
programs.
6. Stable pointers
A stable pointer is simply a reference to a Haskell object. However, they’re never collected, and after every GC stable pointers are updated to point to new locations of the Haskell objects that they were pointing to before the GC. This is useful when storing references to Haskell objects in foreign code.
Allocation: Done via getStablePtr
RTS function. Stable pointers are
held in an array of spEntry
(which is just typedef for a pointer). New
pointers are simply added at the end of the array, and array is resized
(capacity is doubled) when it’s out of room.
Collection: The stable pointer table is never collected. However, its contents are evacuated as GC roots, and after every GC addresses of objects in the table is updated.
GHC itself uses stable pointers in an interesting way: some closures are used by the RTS, but not by generated code. Because these closures are not referenced by the generated code the GC is free to collect them. To prevent this the RTS adds them to the stable pointer table, effectively making them roots. This way they’re never collected and the RTS can use them.
7. Stable names
A stable name just some abstract object unique to a Haskell object. They
come with a fast Eq
instance and a hash function, and when you generate
stable name for an object multiple times you get the same stable name back.
This means stable names can be used for implementing something similar to
pointer equality.
Allocation: Done via makeStableName#
primop. Stable names are held
in an array similar to stable pointers, however to enable collection the entry
type is more complex. I updated comments of this type while writing this
post so give it a read for details.
Collection: Because we keep a list of all stable names, and stable names
know addresses of their StableName
objects (sn_obj
field), after a GC we can
check if a stable name is still alive. If not then we free its entry in the
table. Otherwise we evacaute the pointed object. This happens in
gcStableTables
.
Note that stable names, unlike stable pointers, are not roots. gcStableTables
is called after GC is done.
8. Static objects
Top-level definitions are allocated as static objects. The allocation happens in compile time, outside of the heap.
Allocation: Allocation happens in compile time. It’s not possible to allocate a static object in runtime.
Collection: The space allocated for static objects is never really reclaimed. When the GC encounters a static object7 it adds the static object to the GC-thread-local static objects list. This list later used by scavenge to evacuate any objects referenced by these static objects.
To avoid collecting a static object multiple times, evacuated static objects
are tagged with a flag. Before a major GC the flag is swapped so
that evacuate_static
can distinguish static objects flagged in the current GC
from those flagged in the previous GC.
Static objects are only collected in major GC.
Conclusion
Most of these objects require a combination of global data structures (like stable name and pointer tables), capability-local lists (large objects, pinned objects, weak pointers, threads) and generation-local lists (large objects, pinned objects, weak pointers, compact objects, threads). Among these objects, most of the complexity is caused by weak pointers and threads, because of the special reachability rules. When these are added the runtime becomes quite complex but also useful.
This is a doubly-linked list so this operation is efficient.↩︎
The info table is defined here. The
1
for one pointer field for C finalizer list,4
is for four non-pointer fields for the key, value, Haskell finalizer, and a field to make an intrusive linked lists from weak objects.↩︎The C finalizers field is used by addCFinalizerToWeak# primop. I don’t know what high-level functions use this. C finalizers are like Haskell finalizers, but they’re more reliable in that they’re always called eventually when key of a weak objects becomes unreachable (in the worst case when the Haskell process terminates).↩︎
This is actually a “misfeature”. Ideally a
ThreadId
would only keep the thread alive when it still has work to do.↩︎Why do we need two
performMajorGC
s and ayield
in between? I don’t understand this yet, but I can see in gdb that the thread is definitely resurrected during the first GC and added to the run queue. The reason why we need the other stuff is probably related with the scheduler.↩︎The name choice here is terrible because it’s making searching for uses impossible. For this reason I have a commit that renames this to
thread_list
which I cherry-pick to my work branches.↩︎This is done via
HEAP_ALLOCED
macro, which checks if the object is outside of heap.↩︎